Note: This page contains sample records for the topic genome sequencing centers from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results.
Last update: August 15, 2014.
1

Genome Sequencing Centers  

Cancer.gov

The Cancer Genome Atlas (TCGA) Genome Sequencing Centers (GSCs) perform large-scale DNA sequencing using the latest sequencing technologies. Supported by the National Human Genome Research Institute (NHGRI) large-scale sequencing program, the GSCs generate the enormous volume of data required by TCGA, while continually improving existing technologies and methods to expand the frontier of what can be achieved in cancer genome sequencing.

2

The Genome Sequencing Center at NCGR  

SciTech Connect

Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

Schilkey, Faye [National Center for Genome Resources

2010-06-02

3

Genome Sequencing Center Tour Videos and Classroom Activities  

NSDL National Science Digital Library

A video tour of the Washington University Genome Sequencing CenterâÂÂsupplemented by additional films and classroom activitiesâÂÂcan help advanced high school students and college undergraduates understand the classical techniques of genome sequencing.

Sarah Elgin (Washington University;)

2010-05-28

4

Genomic sequencing.  

PubMed Central

Unique DNA sequences can be determined directly from mouse genomic DNA. A denaturing gel separates by size mixtures of unlabeled DNA fragments from complete restriction and partial chemical cleavages of the entire genome. These lanes of DNA are transferred and UV-crosslinked to nylon membranes. Hybridization with a short 32P-labeled single-stranded probe produces the image of a DNA sequence "ladder" extending from the 3' or 5' end of one restriction site in the genome. Numerous different sequences can be obtained from a single membrane by reprobing. Each band in these sequences represents 3 fg of DNA complementary to the probe. Sequence data from mouse immunoglobulin heavy chain genes from several cell types are presented. The genomic sequencing procedures are applicable to the analysis of genetic polymorphisms, DNA methylation at deoxycytidines, and nucleic acid-protein interactions at single nucleotide resolution. Images

Church, G M; Gilbert, W

1984-01-01

5

Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)  

SciTech Connect

John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

Crow, John [National Center for Genome Resources] [National Center for Genome Resources

2012-06-01

6

Introducing National Center for Genome Resources (NCGR) Informatics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)  

ScienceCinema

John Crow from the National Center for Genome Resources discusses his organization's informatics at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

Crow, John [National Center for Genome Resources

2013-01-25

7

Genome Characterization Centers  

Cancer.gov

Genomics is a fast-moving field with novel technologies and platforms that help characterize the genome being made available to the research community on a continual basis. The Cancer Genome Atlas (TCGA) Genome Characterization Centers (GCCs) are responsible for characterizing all of the genomic changes found in the tumors studied as part of the TCGA program.

8

Genome Data Analysis Centers  

Cancer.gov

The use of novel technologies, the need to integrate different data types and the immense quantity of data generated by The Cancer Genome Atlas (TCGA) Research Network has led to an expansion of the TCGA Research Network to include new centers devoted to data analysis. The Genome Data Analysis Centers (GDACs) work hand-in-hand with the Genome Characterization Centers (GCCs) to develop state-of-the-art tools that assist researchers with processing and integrating data analyses across the entire genome.

9

Sequencing technologies and genome sequencing  

Microsoft Academic Search

The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human\\u000a and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers\\u000a based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern\\u000a bioinformatics tools at unprecedented pace,

Chandra Shekhar Pareek; Rafal Smoczynski; Andrzej Tretyn

10

Whole Genome Sequencing  

MedlinePLUS

... If you do choose to have your whole genome sequenced, it is very important and helpful to review your results with a trained professional. Also, you should make sure the lab is CLIA certified. What do the test results mean? Whole genome sequencing is not your average diagnostic test. A ...

11

The Genome Center at Washington University  

SciTech Connect

Bob Fulton of Washington University discusses the sequencing platforms in use at this large scale genome center on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

Fulton, Bob [Washington University

2010-06-02

12

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TIGR and the Malaria Program, NMRC, were to: Specific Aim 1, sequence 3.5 Mb of P. falciparum genomic DNA; Specific Aim 2, annotate the sequence; Specific Aim 3, release the information to the sc...

M. J. Gardner

2002-01-01

13

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this Cooperative Agreement were: Specific Aim 1, sequence 3.5 Mb of P. falciparum genomic DNA; Specific Aim 2, annotate the sequence; Specific Aim 3, release the information to the scientific community. Two Specific Aims were added to th...

M. J. Gardner

2004-01-01

14

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TIGR and the Malaria Program, NMRC, were to: Specific Aim 1, sequence 3.5 Mb of P. falciparum genomic DNA; Specific Aim 2, annotate the sequence; Specific Aim 3, release the information to the sc...

M. J. Gardner

2000-01-01

15

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TTGR and the Malaria Program, NMRC, were to: (Specific Aim 1) sequence 3.5 Mb of P. falciparum genomic DNA; (Specific Aim 2) annotate the sequence; (Specific Aim 3) release the information to the...

M. J. Gardner

2003-01-01

16

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TICR and the Malaria Program, NMPC, were to: Specific Aim 1, sequence 3.5 Mb of P. ralciparum genomic DNA; Specific Aim 2, annotate the sequence; Specific Aim 3, release the information to the sc...

M. J. Gardner

2001-01-01

17

Bacterial genome sequencing.  

PubMed

For over 30 yr, the Sanger method has been the standard for DNA sequencing. Instruments have been developed and improved over time to increase throughput, but they always relied on the same technology. Today, we are facing a revolution in DNA sequencing with many drastically different platforms that have become or will soon become available on the market. We review a number of sequencing technologies and provide examples of applications. We also discuss the impact genomics and new DNA sequencing approaches have had on various fields of biological research. PMID:19521879

Tettelin, Hervé; Feldblyum, Tamara

2009-01-01

18

Prenatal Whole Genome Sequencing  

PubMed Central

With whole genome sequencing set to become the preferred method of prenatal screening, we need to pay more attention to the massive amount of information it will deliver to parents—and the fact that we don't yet understand what most of it means.

Donley, Greer; Hull, Sara Chandros; Berkman, Benjamin E.

2014-01-01

19

NCI Center for Cancer Genomics  

Cancer.gov

NCI’s Center for Cancer Genomics applies genome science to better diagnose and treat cancer patients. The Center supports research to identify the genetic drivers of cancer and to advance the adoption of precise tumor diagnosis and treatment.

20

Genome Sequence Databases (Overview): Sequencing and Assembly  

SciTech Connect

From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

Lapidus, Alla L.

2009-01-01

21

Fungal Genome Sequencing and Bioenergy  

SciTech Connect

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

Schadt, Christopher Warren [ORNL; Baker, Scott [Pacific Northwest National Laboratory (PNNL); Thykaer, Jette [Pacific Northwest National Laboratory (PNNL); Adney, William S [National Renewable Energy Laboratory (NREL); Brettin, Tom [Los Alamos National Laboratory (LANL); Brockman, Fred [Pacific Northwest National Laboratory (PNNL); Dhaeseleer, Patrick [Lawrence Livermore National Laboratory (LLNL); Martinez, A diego [Los Alamos National Laboratory (LANL); Miller, R michael [Argonne National Laboratory (ANL); Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Torok, Tamas [U.S. Department of Energy, Joint Genome Institute; Tuskan, Gerald A [ORNL; Bennett, Joan [Rutgers University; Berka, Randy [Novozymes, Inc; Briggs, Steven [University of California, San Diego; Heitman, Joseph [Duke University; Rizvi, L [Royal Ontario Museum; Taylor, John [University of California, Berkeley; Turgeon, Gillian [Cornell University; Werner-Washburne, Maggie [University of New Mexico, Albuquerque; Himmel, Michael [ORNL

2008-01-01

22

Fungal Genome Sequencing and Bioenergy  

SciTech Connect

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions. Published by Elsevier Ltd on behalf of The British Mycological Society.

Baker, Scott [Pacific Northwest National Laboratory (PNNL); Thykaer, Jette [Pacific Northwest National Laboratory (PNNL); Adney, William S [National Renewable Energy Laboratory (NREL); Brettin, Tom [Los Alamos National Laboratory (LANL); Brockman, Fred [Pacific Northwest National Laboratory (PNNL); Dhaeseleer, Patrick [Lawrence Livermore National Laboratory (LLNL); Martinez, A diego [Los Alamos National Laboratory (LANL); Miller, R michael [Argonne National Laboratory (ANL); Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Schadt, Christopher Warren [ORNL; Torok, Tamas [U.S. Department of Energy, Joint Genome Institute; Tuskan, Gerald A [ORNL; Bennett, Joan [Rutgers University; Berka, Randy [Novozymes, Inc; Briggs, Steven [University of California, San Diego; Heitman, Joseph [Duke University; Taylor, John [University of California, Berkeley; Turgeon, Gillian [Cornell University; Werner-Washburne, Maggie [University of New Mexico, Albuquerque; Himmel, Michael E [National Renewable Energy Laboratory (NREL)

2008-01-01

23

Whole-exome/genome sequencing and genomics.  

PubMed

As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers. PMID:24298129

Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

2013-12-01

24

MIPS: a database for protein sequences and complete genomes  

Microsoft Academic Search

The MIPS group (Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)) at the Max-Planck- Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis

Hans-werner Mewes; Jean Hani; Friedhelm Pfeiffer; Dmitrij Frishman

1998-01-01

25

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several

Hans-werner Mewes; Dmitrij Frishman; Christian Gruber; Birgitta Geier; Dirk Haase; Andreas Kaps; Kai Lemcke; Gertrud Mannhaupt; Friedhelm Pfeiffer; Christine M. Schüller; S. Stocker; B. Weil

2000-01-01

26

Sequencing Complex Genomic Regions  

SciTech Connect

Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

Eichler, Evan [University of Washington

2009-05-28

27

Sequencing Complex Genomic Regions  

SciTech Connect

Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

Eichler, Evan [University of Washington

2009-05-28

28

Genome sequences and great expectations.  

PubMed

To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function. PMID:11178275

Iliopoulos, I; Tsoka, S; Andrade, M A; Janssen, P; Audit, B; Tramontano, A; Valencia, A; Leroy, C; Sander, C; Ouzounis, C A

2001-01-01

29

Evidence from genome-wide simple sequence repeat markers for a polyphyletic origin and secondary centers of genetic diversity of Brassica juncea in China and India.  

PubMed

The oilseed Brassica juncea is an important crop with a long history of cultivation in India and China. Previous studies have suggested a polyphyletic origin of B. juncea and more than one migration from the primary to secondary centers of diversity. We investigated molecular genetic diversity based on 99 simple sequence repeat markers in 119 oilseed B. juncea varieties from China, India, Europe, and Australia to test whether molecular differentiation follows Vavilov's proposal of secondary centers of diversity in India and China. Two distinct groups were identified by markers in the A genome, and the same two groups were confirmed by markers in the B genome. Group 1 included accessions from central and western India, in addition to those from eastern China. Group 2 included accessions from central and western China, as well as those from northern and eastern India. European and Australian accessions were found only in Group 2. Chinese accessions had higher allelic diversity per accession (Group 1) and more private alleles per accession (Groups 1 and 2) than those from India. The marker data and geographic distribution of Groups 1 and 2 were consistent with two independent migrations of B. juncea from its center of origin in the Middle East and neighboring regions along trade routes to western China and northern India, followed by regional adaptation. Group 1 migrated further south and west in India, and further east in China, than Group 2. Group 2 showed diverse agroecological adaptation, with yellow-seeded spring-sown types in central and western China and brown-seeded autumn-sown types in India. PMID:23519868

Chen, Sheng; Wan, Zhenjie; Nelson, Matthew N; Chauhan, Jitendra S; Redden, Robert; Burton, Wayne A; Lin, Ping; Salisbury, Phillip A; Fu, Tingdong; Cowling, Wallace A

2013-01-01

30

Development in Rice Genome Research Based on Accurate Genome Sequence  

PubMed Central

Rice is one of the most important crops in the world. Although genetic improvement is a key technology for the acceleration of rice breeding, a lack of genome information had restricted efforts in molecular-based breeding until the completion of the high-quality rice genome sequence, which opened new opportunities for research in various areas of genomics. The syntenic relationship of the rice genome to other cereal genomes makes the rice genome invaluable for understanding how cereal genomes function. Producing an accurate genome sequence is not an easy task, and it is becoming more important as sequence deviations among, and even within, species highlight functional or evolutionary implications for comparative genomics.

Matsumoto, Takashi; Wu, Jianzhong; Antonio, Baltazar A.; Sasaki, Takuji

2008-01-01

31

The diploid genome sequence of Candida albicans  

Microsoft Academic Search

We present the diploid genome sequence of the fungal pathogen Candida albicans. Because C. albicans has no known haploid or homozygous form, sequencing was performed as a whole-genome shotgun of the heterozygous diploid genome in strain SC5314, a clinical isolate that is the parent of strains widely used for molecular analysis. We developed computational methods to assemble a diploid genome

Ted Jones; Nancy A. Federspiel; Hiroji Chibana; Jan Dungan; Sue Kalman; B. B. Magee; George Newport; Yvonne R. Thorstenson; Nina Agabian; P. T. Magee; Ronald W. Davis; Stewart Scherer

2004-01-01

32

The Sequence of the Human Genome  

Microsoft Academic Search

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome

J. Craig Venter; Mark D. Adams; Eugene W. Myers; Peter W. Li; Richard J. Mural; Granger G. Sutton; Hamilton O. Smith; Mark Yandell; Cheryl A. Evans; Robert A. Holt; Jeannine D. Gocayne; Peter Amanatides; Richard M. Ballew; Daniel H. Huson; Jennifer R. Wortman; Qing Zhang; Chinnappa D. Kodira; Xiangqun H. Zheng; Lin Chen; Marian Skupski; Gangadharan Subramanian; Paul D. Thomas; Jinghui Zhang; George L. Gabor Miklos; Catherine Nelson; Samuel Broder; Andrew G. Clark; Joe Nadeau; Victor A. McKusick; Norton Zinder; Arnold J. Levine; Mel Simon; Carolyn Slayman; Michael Hunkapiller; Randall Bolanos; Arthur Delcher; Ian Dew; Daniel Fasulo; Michael Flanigan; Liliana Florea; Aaron Halpern; Sridhar Hannenhalli; Saul Kravitz; Samuel Levy; Clark Mobarry; Knut Reinert; Karin Remington; Jane Abu-Threideh; Ellen Beasley; Kendra Biddick; Vivien Bonazzi; Rhonda Brandon; Michele Cargill; Ishwar Chandramouliswaran; Rosane Charlab; Kabir Chaturvedi; Zuoming Deng; Valentina Di Francesco; Patrick Dunn; Karen Eilbeck; Carlos Evangelista; Andrei E. Gabrielian; Weiniu Gan; Wangmao Ge; Fangcheng Gong; Zhiping Gu; Ping Guan; Thomas J. Heiman; Maureen E. Higgins; Rui-Ru Ji; Zhaoxi Ke; Karen A. Ketchum; Zhongwu Lai; Yiding Lei; Zhenya Li; Jiayin Li; Yong Liang; Xiaoying Lin; Fu Lu; Gennady V. Merkulov; Natalia Milshina; Helen M. Moore; Ashwinikumar K Naik; Vaibhav A. Narayan; Beena Neelam; Deborah Nusskern; Douglas B. Rusch; Steven Salzberg; Wei Shao; Bixiong Shue; Jingtao Sun; Zhen Yuan Wang; Aihui Wang; Xin Wang; Jian Wang; Ming-Hui Wei; Ron Wides; Chunlin Xiao; Chunhua Yan; Alison Yao; Jane Ye; Ming Zhan; Weiqing Zhang; Hongyu Zhang; Qi Zhao; Liansheng Zheng; Fei Zhong; Wenyan Zhong; Shiaoping C. Zhu; Shaying Zhao; Dennis Gilbert; Suzanna Baumhueter; Gene Spier; Christine Carter; Anibal Cravchik; Trevor Woodage; Feroze Ali; Huijin An; Aderonke Awe; Danita Baldwin; Holly Baden; Mary Barnstead; Ian Barrow; Karen Beeson; Dana Busam; Amy Carver; Ming Lai Cheng; Liz Curry; Steve Danaher; Lionel Davenport; Raymond Desilets; Susanne Dietz; Kristina Dodson; Lisa Doup; Steven Ferriera; Neha Garg; Andres Gluecksmann; Brit Hart; Jason Haynes; Charles Haynes; Cheryl Heiner; Suzanne Hladun; Damon Hostin; Jarrett Houck; Timothy Howland; Chinyere Ibegwam; Jeffery Johnson; Francis Kalush; Lesley Kline; Shashi Koduru; Amy Love; Felecia Mann; David May; Steven McCawley; Tina McIntosh; Ivy McMullen; Mee Moy; Linda Moy; Brian Murphy; Keith Nelson; Cynthia Pfannkoch; Eric Pratts; Vinita Puri; Hina Qureshi; Matthew Reardon; Robert Rodriguez; Yu-Hui Rogers; Deanna Romblad; Bob Ruhfel; Richard Scott; Cynthia Sitter; Michelle Smallwood; Erin Stewart; Renee Strong; Ellen Suh; Reginald Thomas; Ni Ni Tint; Sukyee Tse; Claire Vech; Gary Wang; Jeremy Wetter; Sherita Williams; Monica Williams; Sandra Windsor; Emily Winn-Deen; Keriellen Wolfe; Jayshree Zaveri; Karena Zaveri; Josep F. Abril; Roderic Guigo; Michael J. Campbell; Kimmen V. Sjolander; Brian Karlak; Anish Kejariwal; Huaiyu Mi; Betty Lazareva; Thomas Hatton; Apurva Narechania; Karen Diemer; Anushya Muruganujan; Nan Guo; Shinji Sato; Vineet Bafna; Sorin Istrail; Ross Lippert; Russell Schwartz; Brian Walenz; Shibu Yooseph; David Allen; Anand Basu; James Baxendale; Louis Blick; Marcelo Caminha; John Carnes-Stine; Parris Caulk; Yen-Hui Chiang; Carl Dahlke; Anne Deslattes Mays; Maria Dombroski; Michael Donnelly; Dale Ely; Shiva Esparham; Carl Fosler; Harold Gire; Stephen Glanowski; Kenneth Glasser; Anna Glodek; Mark Gorokhov; Ken Graham; Barry Gropman; Michael Harris; Jeremy Heil; Scott Henderson; Jeffrey Hoover; Donald Jennings; John Kasha; Leonid Kagan; Cheryl Kraft; Alexander Levitsky; Mark Lewis; Xiangjun Liu; John Lopez; Daniel Ma; William Majoros; Joe McDaniel; Sean Murphy; Matthew Newman; Trung Nguyen; Ngoc Nguyen; Marc Nodell; Sue Pan; Jim Peck; Marshall Peterson; William Rowe; Robert Sanders; John Scott; Michael Simpson; Thomas Smith; Arlan Sprague; Timothy Stockwell; Russell Turner; Eli Venter; Mei Wang; Meiyuan Wen; David Wu; Mitchell Wu; Ashley Xia; Ali Zandieh; Xiaohong Zhu

2001-01-01

33

Personal genome sequencing: current approaches and challenges  

PubMed Central

The revolution in DNA sequencing technologies has now made it feasible to determine the genome sequences of many individuals; i.e., “personal genomes.” Genome sequences of cells and tissues from both normal and disease states have been determined. Using current approaches, whole human genome sequences are not typically assembled and determined de novo, but, instead, variations relative to a reference sequence are identified. We discuss the current state of personal genome sequencing, the main steps involved in determining a genome sequence (i.e., identifying single-nucleotide polymorphisms [SNPs] and structural variations [SVs], assembling new sequences, and phasing haplotypes), and the challenges and performance metrics for evaluating the accuracy of the reconstruction. Finally, we consider the possible individual and societal benefits of personal genome sequences.

Snyder, Michael; Du, Jiang; Gerstein, Mark

2010-01-01

34

Plant genome sequencing - applications for crop improvement.  

PubMed

It is over 10 years since the genome sequence of the first crop was published. Since then, the number of crop genomes sequenced each year has increased steadily. The amazing pace at which genome sequences are becoming available is largely due to the improvement in sequencing technologies both in terms of cost and speed. Modern sequencing technologies allow the sequencing of multiple cultivars of smaller crop genomes at a reasonable cost. Though many of the published genomes are considered incomplete, they nevertheless have proved a valuable tool to understand important crop traits such as fruit ripening, grain traits and flowering time adaptation. PMID:24679255

Bolger, Marie E; Weisshaar, Bernd; Scholz, Uwe; Stein, Nils; Usadel, Björn; Mayer, Klaus F X

2014-04-01

35

Genome sequencing of lymphoid malignancies.  

PubMed

Our understanding of the pathogenesis of lymphoid malignancies has been transformed by next-generation sequencing. The studies in this review have used whole-genome, exome, and transcriptome sequencing to identify recurring structural genetic alterations and sequence mutations that target key cellular pathways in acute lymphoblastic leukemia (ALL) and the lymphomas. Although each tumor type is characterized by a unique genomic landscape, several cellular pathways are mutated in multiple tumor types-transcriptional regulation of differentiation, antigen receptor signaling, tyrosine kinase and Ras signaling, and epigenetic modifications-and individual genes are mutated in multiple tumors, notably TCF3, NOTCH1, MYD88, and BRAF. In addition to providing fundamental insights into tumorigenesis, these studies have also identified potential new markers for diagnosis, risk stratification, and therapeutic intervention. Several genetic alterations are intuitively "druggable" with existing agents, for example, kinase-activating lesions in high-risk B-cell ALL, NOTCH1 in both leukemia and lymphoma, and BRAF in hairy cell leukemia. Future sequencing efforts are required to comprehensively define the genetic basis of all lymphoid malignancies, examine the relative roles of germline and somatic variation, dissect the genetic basis of clonal heterogeneity, and chart a course for clinical sequencing and translation to improved therapeutic outcomes. PMID:24041576

Mullighan, Charles G

2013-12-01

36

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, devel- ops and maintains genome oriented databases. It is commonplace that the amount of sequence data avail- able increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. There- fore, our strategy aims to cope with the data stream by the comprehensive application of

Hans-werner Mewes; Klaus Heumann; Andreas Kaps; Klaus F. X. Mayer; Friedhelm Pfeiffer; S. Stocker; Dmitrij Frishman

1999-01-01

37

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein

Hans-werner Mewes; Dmitrij Frishman; Ulrich Güldener; Gertrud Mannhaupt; Klaus F. X. Mayer; Martin Mokrejs; Burkhard Morgenstern; Martin Münsterkötter; Stephen Rudd; B. Weil

2002-01-01

38

Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project  

Microsoft Academic Search

Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity

Mark D. Adams; Jenny M. Kelley; Jeannine D. Gocayne; Mark Dubnick; Mihael H. Polymeropoulos; Hong Xiao; Carl R. Merril; Andrew Wu; Bjorn Olde; Ruben F. Moreno; Anthony R. Kerlavage; W. Richard McCombie; J. Craig Venter

1991-01-01

39

Genome sequencing and functional genomics approaches in tomato  

Microsoft Academic Search

Tomato genome sequencing has been taking place through an international, 10-year initiative entitled the “International Solanaceae Genome Project” (SOL). The strategy proposed by the SOL consortium is to sequence the approximately 220?Mb of euchromatin that contains the majority of genes, rather than the entire tomato genome. Tomato and other Solanaceae plants have unique developmental aspects, such as the formation of

Daisuke Shibata

2005-01-01

40

The diploid genome sequence of Candida albicans  

PubMed Central

We present the diploid genome sequence of the fungal pathogen Candida albicans. Because C. albicans has no known haploid or homozygous form, sequencing was performed as a whole-genome shotgun of the heterozygous diploid genome in strain SC5314, a clinical isolate that is the parent of strains widely used for molecular analysis. We developed computational methods to assemble a diploid genome sequence in good agreement with available physical mapping data. We provide a whole-genome description of heterozygosity in the organism. Comparative genomic analyses provide important clues about the evolution of the species and its mechanisms of pathogenesis.

Jones, Ted; Federspiel, Nancy A.; Chibana, Hiroji; Dungan, Jan; Kalman, Sue; Magee, B. B.; Newport, George; Thorstenson, Yvonne R.; Agabian, Nina; Magee, P. T.; Davis, Ronald W.; Scherer, Stewart

2004-01-01

41

Sequencing Intractable DNA to Close Microbial Genomes  

SciTech Connect

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

2012-01-01

42

Draft Genome Sequence of Lactobacillus rhamnosus 2166  

PubMed Central

In this report, we present a draft sequence of the genome of Lactobacillus rhamnosus strain 2166, a potential novel probiotic. Genome annotation and read mapping onto a reference genome of L. rhamnosus strain GG allowed for the identification of the differences and similarities in the genomic contents and gene arrangements of these strains.

Melnikov, Vyacheslav G.; Kosarev, Igor V.; Abramov, Vyacheslav M.

2014-01-01

43

Value of a newly sequenced bacterial genome  

PubMed Central

Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information.

Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

2014-01-01

44

Genome Sequence of Pseudomonas mandelii PD30  

PubMed Central

The genome sequence of Pseudomonas mandelii PD30 is reported in this announcement. The genes for the reduction of nitrate to dinitrogen were identified in the genome assembly and subsequently used in gene expression research.

Formusa, Philip A.; Hsiang, Tom; Habash, Marc B.; Lee, Hung

2014-01-01

45

Parking Strategies for Genome Sequencing  

PubMed Central

The parking strategy is an iterative approach to DNA sequencing. Each iteration consists of sequencing a novel portion of target DNA that does not overlap any previously sequenced region. Subject to the constraint of no overlap, each new region is chosen randomly. A parking strategy is often ideal in the early stages of a project for rapidly generating unique data. As a project progresses, parking becomes progressively more expensive and eventually prohibitive. We present a mathematical model with a generalization to allow for overlaps. This model predicts multiple parameters, including progress, costs, and the distribution of gap sizes left by a parking strategy. The highly fragmented nature of the gaps left after an initial parking strategy may make it difficult to finish a project efficiently. Therefore, in addition to our parking model, we model gap closing by walking. Our gap-closing model is generalizable to many other strategies. Our discussion includes modified parking strategies and hybrids with other strategies. A hybrid parking strategy has been employed for portions of the Human Genome Project.

Roach, Jared C.; Thorsson, Vesteinn; Siegel, Andrew F.

2000-01-01

46

Genomic strategies to identify mammalian regulatory sequences  

Microsoft Academic Search

With the continuing accomplishments of the human genome project, high-throughput strategies to identify DNA sequences that are important in mammalian gene regulation are becoming increasingly feasible. In contrast to the historic, labour-intensive, wet-laboratory methods for identifying regulatory sequences, many modern approaches are heavily focused on the computational analysis of large genomic data sets. Data from inter-species genomic sequence comparisons and

Len A. Pennacchio; Edward M. Rubin

2001-01-01

47

BSMAP: whole genome bisulfite sequence MAPping program  

Microsoft Academic Search

BACKGROUND: Bisulfite sequencing is a powerful technique to study DNA cytosine methylation. Bisulfite treatment followed by PCR amplification specifically converts unmethylated cytosines to thymine. Coupled with next generation sequencing technology, it is able to detect the methylation status of every cytosine in the genome. However, mapping high-throughput bisulfite reads to the reference genome remains a great challenge due to the

Yuanxin Xi; Wei Li

2009-01-01

48

The sequence of the human genome.  

PubMed

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge. PMID:11181995

Venter, J C; Adams, M D; Myers, E W; Li, P W; Mural, R J; Sutton, G G; Smith, H O; Yandell, M; Evans, C A; Holt, R A; Gocayne, J D; Amanatides, P; Ballew, R M; Huson, D H; Wortman, J R; Zhang, Q; Kodira, C D; Zheng, X H; Chen, L; Skupski, M; Subramanian, G; Thomas, P D; Zhang, J; Gabor Miklos, G L; Nelson, C; Broder, S; Clark, A G; Nadeau, J; McKusick, V A; Zinder, N; Levine, A J; Roberts, R J; Simon, M; Slayman, C; Hunkapiller, M; Bolanos, R; Delcher, A; Dew, I; Fasulo, D; Flanigan, M; Florea, L; Halpern, A; Hannenhalli, S; Kravitz, S; Levy, S; Mobarry, C; Reinert, K; Remington, K; Abu-Threideh, J; Beasley, E; Biddick, K; Bonazzi, V; Brandon, R; Cargill, M; Chandramouliswaran, I; Charlab, R; Chaturvedi, K; Deng, Z; Di Francesco, V; Dunn, P; Eilbeck, K; Evangelista, C; Gabrielian, A E; Gan, W; Ge, W; Gong, F; Gu, Z; Guan, P; Heiman, T J; Higgins, M E; Ji, R R; Ke, Z; Ketchum, K A; Lai, Z; Lei, Y; Li, Z; Li, J; Liang, Y; Lin, X; Lu, F; Merkulov, G V; Milshina, N; Moore, H M; Naik, A K; Narayan, V A; Neelam, B; Nusskern, D; Rusch, D B; Salzberg, S; Shao, W; Shue, B; Sun, J; Wang, Z; Wang, A; Wang, X; Wang, J; Wei, M; Wides, R; Xiao, C; Yan, C; Yao, A; Ye, J; Zhan, M; Zhang, W; Zhang, H; Zhao, Q; Zheng, L; Zhong, F; Zhong, W; Zhu, S; Zhao, S; Gilbert, D; Baumhueter, S; Spier, G; Carter, C; Cravchik, A; Woodage, T; Ali, F; An, H; Awe, A; Baldwin, D; Baden, H; Barnstead, M; Barrow, I; Beeson, K; Busam, D; Carver, A; Center, A; Cheng, M L; Curry, L; Danaher, S; Davenport, L; Desilets, R; Dietz, S; Dodson, K; Doup, L; Ferriera, S; Garg, N; Gluecksmann, A; Hart, B; Haynes, J; Haynes, C; Heiner, C; Hladun, S; Hostin, D; Houck, J; Howland, T; Ibegwam, C; Johnson, J; Kalush, F; Kline, L; Koduru, S; Love, A; Mann, F; May, D; McCawley, S; McIntosh, T; McMullen, I; Moy, M; Moy, L; Murphy, B; Nelson, K; Pfannkoch, C; Pratts, E; Puri, V; Qureshi, H; Reardon, M; Rodriguez, R; Rogers, Y H; Romblad, D; Ruhfel, B; Scott, R; Sitter, C; Smallwood, M; Stewart, E; Strong, R; Suh, E; Thomas, R; Tint, N N; Tse, S; Vech, C; Wang, G; Wetter, J; Williams, S; Williams, M; Windsor, S; Winn-Deen, E; Wolfe, K; Zaveri, J; Zaveri, K; Abril, J F; Guigó, R; Campbell, M J; Sjolander, K V; Karlak, B; Kejariwal, A; Mi, H; Lazareva, B; Hatton, T; Narechania, A; Diemer, K; Muruganujan, A; Guo, N; Sato, S; Bafna, V; Istrail, S; Lippert, R; Schwartz, R; Walenz, B; Yooseph, S; Allen, D; Basu, A; Baxendale, J; Blick, L; Caminha, M; Carnes-Stine, J; Caulk, P; Chiang, Y H; Coyne, M; Dahlke, C; Mays, A; Dombroski, M; Donnelly, M; Ely, D; Esparham, S; Fosler, C; Gire, H; Glanowski, S; Glasser, K; Glodek, A; Gorokhov, M; Graham, K; Gropman, B; Harris, M; Heil, J; Henderson, S; Hoover, J; Jennings, D; Jordan, C; Jordan, J; Kasha, J; Kagan, L; Kraft, C; Levitsky, A; Lewis, M; Liu, X; Lopez, J; Ma, D; Majoros, W; McDaniel, J; Murphy, S; Newman, M; Nguyen, T; Nguyen, N; Nodell, M; Pan, S; Peck, J; Peterson, M; Rowe, W; Sanders, R; Scott, J; Simpson, M; Smith, T; Sprague, A; Stockwell, T; Turner, R; Venter, E; Wang, M; Wen, M; Wu, D; Wu, M; Xia, A; Zandieh, A; Zhu, X

2001-02-16

49

Human Genome Sequencing in Health and Disease  

PubMed Central

Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

2013-01-01

50

Clinical relevance of cancer genome sequencing  

PubMed Central

The arrival of both high-throughput and bench-top next-generation sequencing technologies and sequence enrichment methods has revolutionized our approach to dissecting the genetic basis of cancer. These technologies have been almost invariably employed in whole-genome sequencing (WGS) and whole-exome sequencing (WES) studies. Both WGS and WES approaches have been widely applied to interrogate the somatic mutational landscape of sporadic cancers and identify novel germline mutations underlying familial cancer syndromes. The clinical implications of cancer genome sequencing have become increasingly clear, for example in diagnostics. In this editorial, we present these advances in the context of research discovery and discuss both the clinical relevance of cancer genome sequencing and the challenges associated with the adoption of these genomic technologies in a clinical setting.

Ku, Chee Seng; Cooper, David N; Roukos, Dimitrios H

2013-01-01

51

A framework for sequencing the rice genome.  

PubMed

Rice is an important food crop and a model plant for other cereal genomes. The Clemson University Genomics Institute framework project, begun two years ago in anticipation of the now ongoing international effort to sequence the rice genome, is nearing completion. Two bacterial artificial chromosome (BAC) libraries have been constructed from the Oryza sativa cultivar Nipponbare. Over 100,000 BAC end sequences have been generated from these libraries and, at a current total of 28 Mbp, represent 6.5% of the total rice genome sequence. This sequence information has allowed us to draw first conclusions about unique and redundant rice genomic sequences. In addition, more than 60,000 clones (19 genome equivalents) have been successfully fingerprinted and assembled into contigs using FPC software. Many of these contigs have been anchored to the rice chromosomes using a variety of techniques. Hybridization experiments have shown these contigs to be very robust. Contig assembly and hybridization experiments have revealed some surprising insights into the organization of the rice genome, which will have significant repercussions for the sequencing effort. Integration of BAC end sequence data with anchored contig information has provided unexpected revelations on sequence organization at the chromosomal level. PMID:11387975

Presting, G G; Budiman, M A; Wood, T; Yu, Y; Kim, H R; Goicoechea, J L; Fang, E; Blackman, B; Jiang, J; Woo, S S; Dean, R A; Frisch, D; Wing, R A

2001-01-01

52

The genome sequence of Drosophila melanogaster.  

SciTech Connect

The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

NONE

2000-03-24

53

Genome Walking by Next Generation Sequencing Approaches  

PubMed Central

Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes.

Volpicella, Mariateresa; Leoni, Claudia; Costanza, Alessandra; Fanizza, Immacolata; Placido, Antonio; Ceci, Luigi R.

2012-01-01

54

Poultry genome sequences: progress and outstanding challenges.  

PubMed

The first build of the chicken genome sequence appeared in March, 2004 - the first genome sequence of any animal agriculture species. That sequence was done primarily by whole genome shotgun Sanger sequencing, along with the use of an extensive BAC contig-based physical map to assemble the sequence contigs and scaffolds and align them to the known chicken chromosomes and linkage groups. Subsequent sequencing and mapping efforts have improved upon that first build, and efforts continue in search of missing and/or unassembled sequence, primarily on the smaller microchromosomes and the sex chromosomes. In the past year, a draft turkey genome sequence of similar quality has been obtained at a much lower cost primarily due to the development of 'next-generation' sequencing techniques. However, assembly and alignment of the sequence contigs and scaffolds still depended on a detailed BAC contig map of the turkey genome that also utilized comparison to the existing chicken sequence. These 2 land fowl (Galliformes) genomes show a remarkable level of similarity, despite an estimated 30-40 million years of separate evolution since their last common ancestor. Among the advantages offered by these sequences are routine re-sequencing of commercial and research lines to identify the genetic correlates of phenotypic change (for example, selective sweeps), a much improved understanding of poultry diversity and linkage disequilibrium, and access to high-density SNP typing and association analysis, detailed transcriptomic and proteomic studies, and the use of genome-wide marker- assisted selection to enhance genetic gain in commercial stocks. PMID:21335957

Dodgson, J B; Delany, M E; Cheng, H H

2011-01-01

55

Next-generation sequencing: applications beyond genomes  

Microsoft Academic Search

The development of DNA sequencing more than 30 years ago has profoundly impacted biological research. In the last couple of years, remarkable technological innovations have emerged that allow the direct and cost-effective sequencing of complex samples at unprecedented scale and speed. These next-generation technologies make it feasible to sequence not only static genomes, but also entire transcriptomes expressed under different

Samuel Marguerat; Jürg Bähler

2008-01-01

56

Multifractal Analysis of Genomic Sequences CGR Images  

Microsoft Academic Search

To describe the fractal feature of chaos game representation (CGR) images of genomic sequences, a multifractal theory is presented in the analysis. With the probability set of CGR images, the general dimension spectrum and the multifractal spectrum are calculated and compared between two sample groups of gene thick sequences and gene black sequences. The experimental result shows that the probability

Weijuan Fu; Yuanyuan Wang; Daru Lu

2005-01-01

57

Sequencing the genome of the Atlantic salmon (Salmo salar)  

PubMed Central

The International Collaboration to Sequence the Atlantic Salmon Genome (ICSASG) will produce a genome sequence that identifies and physically maps all genes in the Atlantic salmon genome and acts as a reference sequence for other salmonids.

2010-01-01

58

MapToGenome: A Comparative Genomic Tool that Aligns Transcript Maps to Sequenced Genomes  

PubMed Central

Efforts to generate whole genome assemblies and dense genetic maps have provided a wealth of gene positional information for several vertebrate species. Comparing the relative location of orthologous genes among these genomes provides perspective on genome evolution and can aid in translating genetic information between distantly related organisms. However, large-scale comparisons between genetic maps and genome assemblies can prove challenging because genetic markers are commonly derived from transcribed sequences that are incompletely and variably annotated. We developed the program MapToGenome as a tool for comparing transcript maps and genome assemblies. MapToGenome processes sequence alignments between mapped transcripts and whole genome sequence while accounting for the presence of intronic sequences, and assigns orthology based on user-defined parameters. To illustrate the utility of this program, we used MapToGenome to process alignments between vertebrate genetic maps and genome assemblies 1) self/self alignments for maps and assemblies of the rat and zebrafish genome; 2) alignments between vertebrate transcript maps (rat, salamander, zebrafish, and medaka) and the chicken genome; and 3) alignments of the medaka and zebrafish maps to the pufferfish (Tetraodon nigroviridis) genome. Our results show that map-genome alignments can be improved by combining alignments across presumptive intron breaks and ignoring alignments for simple sequence length polymorphism (SSLP) marker sequences. Comparisons between vertebrate maps and genomes reveal broad patterns of conservation among vertebrate genomes and the differential effects of genome rearrangement over time and across lineages.

Putta, Srikrishna; Smith, Jeramiah J.; Staben, Chuck; Voss, S. Randal

2007-01-01

59

Cancer Genome Sequencing - An Interim Analysis  

PubMed Central

With the publishing of the first complete, whole genome of a human cancer and its paired normal, we have passed a key milestone in the cancer genome sequencing strategy. The generation of such data will, thanks to technical advances, soon become commonplace. As a significant number of proof-of-concept studies have been published, it is important to analyze now the likely implications of this data and how it might frame cancer research in the near future. The diversity of genes mutated within individual tumor-types, the most striking feature of all studies reported to date, challenges gene-centric models of tumorigenesis. While cancer genome sequencing will revolutionize certain aspects of personalized care, the value of these studies in facilitating the development of new therapies, their primary goal, appears less promising. Most significantly, however, the cancer genome sequencing strategy, as currently applied, fails to characterize the most relevant genomic features of cancer – the mutational heterogeneity within individual tumors.

Fox, Edward J.; Salk, Jesse J.; Loeb, Lawrence A.

2009-01-01

60

Streptococcal taxonomy based on genome sequence analyses.  

PubMed

The identification of the clinically relevant viridans streptococci group, at species level, is still problematic. The aim of this study was to extract taxonomic information from the complete genome sequences of 67 streptococci, comprising 19 species, by means of genomic analyses, multilocus sequence analysis (MLSA), average amino acid identity (AAI), genomic signatures, genome-to-genome distances (GGD) and codon usage bias. We then attempted to determine the usefulness of these genomic tools for species identification in streptococci. Our results showed that MLSA, AAI and GGD analyses are robust markers to identify streptococci at the species level, for instance, S. pneumoniae, S. mitis, and S. oralis. A Streptococcus species can be defined as a group of strains that share ? 95% DNA similarity in MLSA and AAI, and > 70% DNA identity in GGD. This approach allows an advanced understanding of bacterial diversity. PMID:24358875

Thompson, Cristiane C; Emmel, Vanessa E; Fonseca, Erica L; Marin, Michel A; Vicente, Ana Carolina P

2013-01-01

61

Streptococcal taxonomy based on genome sequence analyses  

PubMed Central

The identification of the clinically relevant viridans streptococci group, at species level, is still problematic. The aim of this study was to extract taxonomic information from the complete genome sequences of 67 streptococci, comprising 19 species, by means of genomic analyses, multilocus sequence analysis (MLSA), average amino acid identity (AAI), genomic signatures, genome-to-genome distances (GGD) and codon usage bias. We then attempted to determine the usefulness of these genomic tools for species identification in streptococci. Our results showed that MLSA, AAI and GGD analyses are robust markers to identify streptococci at the species level, for instance, S. pneumoniae, S. mitis, and S. oralis. A Streptococcus species can be defined as a group of strains that share ? 95% DNA similarity in MLSA and AAI, and > 70% DNA identity in GGD. This approach allows an advanced understanding of bacterial diversity.

2013-01-01

62

Genome Sequence of Serratia plymuthica V4.  

PubMed

Serratia spp. are gammaproteobacteria and members of the family Enterobacteriaceae. Here, we announce the genome sequence of Serratia plymuthica strain V4, which produces the siderophore serratiochelin and antimicrobial compounds. PMID:24831138

Cleto, S; Van der Auwera, G; Almeida, C; Vieira, M J; Vlamakis, H; Kolter, R

2014-01-01

63

Genome Sequence of Serratia plymuthica V4  

PubMed Central

Serratia spp. are gammaproteobacteria and members of the family Enterobacteriaceae. Here, we announce the genome sequence of Serratia plymuthica strain V4, which produces the siderophore serratiochelin and antimicrobial compounds.

Cleto, S.; Van der Auwera, G.; Almeida, C.; Vieira, M. J.; Vlamakis, H.

2014-01-01

64

DNA sequencing, automation, and the human genome  

SciTech Connect

DNA sequencing is one of the key analytical operations of modern molecular biology and a crucial element of biotechnology. The principles of DNA sequencing and details of the technologies of both manual, radioisotope-based and automated, fluorescence-based approaches are described. The goals and rationale of the Human Genome Initiative are discussed along with implications for future sequencing technologies. Finally, a glimpse of emerging DNA sequencing technologies is offered.

Trainor, G.L. (E. I. du Pont de Nemours and Co., Inc., Wilmington, DE (USA))

1990-03-01

65

Complete genome sequences of nine mycobacteriophages.  

PubMed

Genome analyses of a large number of mycobacteriophages, bacterial viruses that infect members of the genus Mycobacterium, yielded novel enzymes and tools for the genetic manipulation of mycobacteria. We report here the complete genome sequences of nine mycobacteriophages, including a new singleton, isolated using Mycobacterium smegmatis mc(2)155 as a host strain. PMID:24874666

Franceschelli, Jorgelina Judith; Suarez, Cristian Alejandro; Terán, Lucrecia; Raya, Raúl Ricardo; Morbidoni, Héctor Ricardo

2014-01-01

66

Draft Genome Sequence of Lactobacillus crispatus 2029.  

PubMed

This report describes a draft genome sequence of Lactobacillus crispatus 2029. The reads generated by the Ion Torrent PGM were assembled into contigs with a total size of 2.2 Mb. The data were annotated using the NCBI GenBank and RAST servers. A comparison with the reference strain revealed specific features of the genome. PMID:24558253

Karlyshev, Andrey V; Melnikov, Vyacheslav G; Khlebnikov, Valentin C; Abramov, Vyacheslav M

2014-01-01

67

Complete Genome Sequences of Nine Mycobacteriophages  

PubMed Central

Genome analyses of a large number of mycobacteriophages, bacterial viruses that infect members of the genus Mycobacterium, yielded novel enzymes and tools for the genetic manipulation of mycobacteria. We report here the complete genome sequences of nine mycobacteriophages, including a new singleton, isolated using Mycobacterium smegmatis mc2155 as a host strain.

Franceschelli, Jorgelina Judith; Suarez, Cristian Alejandro; Teran, Lucrecia; Raya, Raul Ricardo

2014-01-01

68

Complementary DNA sequencing: Expressed sequence tags and human genome project  

SciTech Connect

Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity to genes from other organisms, such as a yeast RNA polymerase II subunit; Drosophila kinesin, Notch, and Enhancer of split; and a murine tyrosine kinase receptor. Forty-six ESTs were mapped to chromosomes after amplification by the polymerase chain reaction. This fast approach to cDNA characterization will facilitate the tagging of most human genes in a few years at a fraction of the cost of complete genomic sequencing, provide new genetic markers, and serve as a resource in diverse biological research fields.

Adams, M.D.; Kelley, J.M.; Gocayne, J.D.; Dubnick, M.; Wu, A.; Olde, B.; Moreno, R.F.; Kerlavage, A.R.; McCombie, W.R.; Venter, J.C. (National Institutes of Health, Bethesda, MD (United States)); Polymeropoulos, M.H.; Hong Xiao; Merril, C.R. (National Inst. of Mental Health, Washington, DC (United States))

1991-06-21

69

Single-molecule sequencing of an individual human genome  

PubMed Central

Recent advances in high-throughput DNA sequencing technologies have enabled order-of-magnitude improvements in both cost and throughput. Here we report the use of single-molecule methods to sequence an individual human genome. We aligned billions of 24- to 70-bp reads (32 bp average) to ~90% of the National Center for Biotechnology Information (NCBI) reference genome, with 28× average coverage. Our results were obtained on one sequencing instrument by a single operator with four data collection runs. Single-molecule sequencing enabled analysis of human genomic information without the need for cloning, amplification or ligation. We determined ~2.8 million single nucleotide polymorphisms (SNPs) with a false-positive rate of less than 1% as validated by Sanger sequencing and 99.8% concordance with SNP genotyping arrays. We identified 752 regions of copy number variation by analyzing coverage depth alone and validated 27 of these using digital PCR. This milestone should allow widespread application of genome sequencing to many aspects of genetics and human health, including personal genomics.

Pushkarev, Dmitry; Neff, Norma F; Quake, Stephen R

2014-01-01

70

Library Preparation and Data Analysis Packages for Rapid Genome Sequencing  

PubMed Central

High-throughput sequencing (HTS) has quickly become a valuable tool for comparative genetics and genomics and is now regularly carried out in laboratories that are not connected to large sequencing centers. Here we describe an updated version of our protocol for constructing single- and paired-end Illumina sequencing libraries, beginning with purified genomic DNA. The present protocol can also be used for “multiplexing,” i.e. the analysis of several samples in a single flowcell lane by generating “barcoded” or “indexed” Illumina sequencing libraries in a way that is independent from Illumina-supported methods. To analyze sequencing results, we suggest several independent approaches but end users should be aware that this is a quickly evolving field and that currently many alignment (or “mapping”) and counting algorithms are being developed and tested.

Pomraning, Kyle R.; Smith, Kristina M.; Bredeweg, Erin L.; Connolly, Lanelle R.; Phatale, Pallavi A.; Freitag, Michael

2013-01-01

71

Sorghum Genome Sequencing by Methylation Filtration  

Microsoft Academic Search

Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged,

Joseph A. Bedell; Muhammad A. Budiman; Andrew Nunberg; Robert W. Citek; Dan Robbins; Joshua Jones; Elizabeth Flick; Theresa Rohlfing; Jason Fries; Kourtney Bradford; Jennifer McMenamy; Michael Smith; Heather Holeman; Bruce A. Roe; Graham Wiley; Ian F. Korf; Pablo D. Rabinowicz; Nathan Lakey; W. Richard McCombie; Jeffrey A. Jeddeloh; Robert A. Martienssen

2005-01-01

72

Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships  

PubMed Central

Background Camellia is an economically and phylogenetically important genus in the family Theaceae. Owing to numerous hybridization and polyploidization, it is taxonomically and phylogenetically ranked as one of the most challengingly difficult taxa in plants. Sequence comparisons of chloroplast (cp) genomes are of great interest to provide a robust evidence for taxonomic studies, species identification and understanding mechanisms that underlie the evolution of the Camellia species. Results The eight complete cp genomes and five draft cp genome sequences of Camellia species were determined using Illumina sequencing technology via a combined strategy of de novo and reference-guided assembly. The Camellia cp genomes exhibited typical circular structure that was rather conserved in genomic structure and the synteny of gene order. Differences of repeat sequences, simple sequence repeats, indels and substitutions were further examined among five complete cp genomes, representing a wide phylogenetic diversity in the genus. A total of fifteen molecular markers were identified with more than 1.5% sequence divergence that may be useful for further phylogenetic analysis and species identification of Camellia. Our results showed that, rather than functional constrains, it is the regional constraints that strongly affect sequence evolution of the cp genomes. In a substantial improvement over prior studies, evolutionary relationships of the section Thea were determined on basis of phylogenomic analyses of cp genome sequences. Conclusions Despite a high degree of conservation between the Camellia cp genomes, sequence variation among species could still be detected, representing a wide phylogenetic diversity in the genus. Furthermore, phylogenomic analysis was conducted using 18 complete cp genomes and 5 draft cp genome sequences of Camellia species. Our results support Chang’s taxonomical treatment that C. pubicosta may be classified into sect. Thea, and indicate that taxonomical value of the number of ovaries should be reconsidered when classifying the Camellia species. The availability of these cp genomes provides valuable genetic information for accurately identifying species, clarifying taxonomy and reconstructing the phylogeny of the genus Camellia.

2014-01-01

73

Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements  

Microsoft Academic Search

As genomes evolve, they undergo large-scale evolutionary processes that present a challenge to sequence comparison not posed by short sequences. Recombination causes frequent genome rearrangements, horizontal transfer introduces new sequences into bacterial chromosomes, and deletions remove segments of the genome. Consequently, each genome is a mosaic of unique lineage-specific segments, regions shared with a subset of other genomes and segments

Aaron C. E. Darling; Bob Mau; Frederick R. Blattner; Nicole T. Perna

2004-01-01

74

Whole Genome Sequence of a Turkish Individual  

PubMed Central

Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ?1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP) discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123?2,122,671 or 1?1.5) and transition/transversion ratios (2,383,204?1,154,590 or 2.06?1) were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1?1.09 insertion/deletion ratio), ranging from ?52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale.

Dogan, Haluk; Can, Handan; Otu, Hasan H.

2014-01-01

75

Next-generation sequencing applied to rare diseases genomics.  

PubMed

Genomics has revolutionized the study of rare diseases. In this review, we overview the latest technological development, rare disease discoveries, implementation obstacles and bioethical challenges. First, we discuss the technology of genome and exome sequencing, including the different next-generation platforms and exome enrichment technologies. Second, we survey the pioneering centers and discoveries for rare diseases, including few of the research institutions that have contributed to the field, as well as an overview survey of different types of rare diseases that have had new discoveries due to next-generation sequencing. Third, we discuss the obstacles and challenges that allow for clinical implementation, including returning of results, informed consent and privacy. Last, we discuss possible outlook as clinical genomics receives wider adoption, as third-generation sequencing is coming onto the horizon, and some needs in informatics and software to further advance the field. PMID:24702023

Danielsson, Krissi; Mun, Liew Jun; Lordemann, Amanda; Mao, Jimmy; Lin, Cheng-Ho Jimmy

2014-05-01

76

Finishing the euchromatic sequence of the human genome  

Microsoft Academic Search

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and

2004-01-01

77

Standardized Metadata for Human Pathogen/Vector Genomic Sequences  

PubMed Central

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.

Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderon, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinead; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.

2014-01-01

78

Standardized metadata for human pathogen/vector genomic sequences.  

PubMed

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976

Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H

2014-01-01

79

Automated design of bacterial genome sequences  

PubMed Central

Background Organisms have evolved ways of regulating transcription to better adapt to varying environments. Could the current functional genomics data and models support the possibility of engineering a genome with completely rearranged gene organization while the cell maintains its behavior under environmental challenges? How would we proceed to design a full nucleotide sequence for such genomes? Results As a first step towards answering such questions, recent work showed that it is possible to design alternative transcriptomic models showing the same behavior under environmental variations than the wild-type model. A second step would require providing evidence that it is possible to provide a nucleotide sequence for a genome encoding such transcriptional model. We used computational design techniques to design a rewired global transcriptional regulation of Escherichia coli, yet showing a similar transcriptomic response than the wild-type. Afterwards, we “compiled” the transcriptional networks into nucleotide sequences to obtain the final genome sequence. Our computational evolution procedure ensures that we can maintain the genotype-phenotype mapping during the rewiring of the regulatory network. We found that it is theoretically possible to reorganize E. coli genome into 86% fewer regulated operons. Such refactored genomes are constituted by operons that contain sets of genes sharing around the 60% of their biological functions and, if evolved under highly variable environmental conditions, have regulatory networks, which turn out to respond more than 20% faster to multiple external perturbations. Conclusions This work provides the first algorithm for producing a genome sequence encoding a rewired transcriptional regulation with wild-type behavior under alternative environments.

2013-01-01

80

International Rice Genome Sequencing Project: the effort to completely sequence the rice genome.  

PubMed

The International Rice Genome Sequencing Project (IRGSP) involves researchers from ten countries who are working to completely and accurately sequence the rice genome within a short period. Sequencing uses a map-based clone-by-clone shotgun strategy; shared bacterial artificial chromosome/P1-derived artificial chromosome libraries have been constructed from Oryza sativa ssp. japonica variety 'Nipponbare'. End-sequencing, fingerprinting and marker-aided PCR screening are being used to make sequence-ready contigs. Annotated sequences are immediately released for public use and are made available with supplemental information at each IRGSP member's website. The IRGSP works to promote the development of rice and cereal genomics in addition to producing genome sequence data. PMID:10712951

Sasaki, T; Burr, B

2000-04-01

81

Chicken genomics charts a path to the genome sequence.  

PubMed

In this paper, the current status of chicken genomics is reviewed. This is timely given the current intense activity centred on sequencing the complete genome of this model species. The genome project is based on a decade of map building by genetic linkage and cytogenetic methods, which are now being replaced by high-resolution radiation hybrid and bacterial artificial chromosome (BAC) contig maps. Markers for map building have generally depended on labour-intensive screening procedures, but in recent years this has changed with the availability of almost 500,000 chicken expressed sequence tags (ESTs). These resources and tools will be critical in the coming months when the chicken genome sequence is being assembled (eg cross-checked with other maps) and annotated (eg gene structures based on ESTs). The future for chicken genome and biological research is an exciting one, through the integration of these resources. For example, through the proposed chicken Ensembl database, it will be possible to solve challenging scientific questions by exploiting the power of a chicken model. One area of interest is the study of developmental mechanisms and the discovery of regulatory networks throughout the genome. Another is the study of the molecular nature of quantitative genetic variation. No other animal species have been phenotyped and selected so intensively as agricultural animals and thus there is much to be learned in basic and medical biology from this research. PMID:15163360

Burt, David W

2004-04-01

82

The Diploid Genome Sequence of an Individual Human  

PubMed Central

Presented here is a genome sequence of an individual human. It was produced from ?32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.

Levy, Samuel; Sutton, Granger; Ng, Pauline C; Feuk, Lars; Halpern, Aaron L; Walenz, Brian P; Axelrod, Nelson; Huang, Jiaqi; Kirkness, Ewen F; Denisov, Gennady; Lin, Yuan; MacDonald, Jeffrey R; Pang, Andy Wing Chun; Shago, Mary; Stockwell, Timothy B; Tsiamouri, Alexia; Bafna, Vineet; Bansal, Vikas; Kravitz, Saul A; Busam, Dana A; Beeson, Karen Y; McIntosh, Tina C; Remington, Karin A; Abril, Josep F; Gill, John; Borman, Jon; Rogers, Yu-Hui; Frazier, Marvin E; Scherer, Stephen W; Strausberg, Robert L; Venter, J. Craig

2007-01-01

83

Genome sequence of Haemophilus parasuis strain 29755  

PubMed Central

Haemophilus parasuis is a member of the family Pasteurellaceae and is the etiologic agent of Glässer’s disease in pigs, a systemic syndrome associated with only a subset of isolates. The genetic basis for virulence and systemic spread of particular H. parasuis isolates is currently unknown. Strain 29755 is an invasive isolate that has long been used in the study of Glässer’s disease. Accordingly, the genome sequence of strain 29755 is of considerable importance to investigators endeavoring to understand the molecular pathogenesis of H. parasuis. Here we describe the features of the 2,224,137 bp draft genome sequence of strain 29755 generated from 454-FLX pyrosequencing. These data comprise the first publicly available genome sequence for this bacterium.

Mullins, Michael A.; Bayles, Darrell O.; Dyer, David W.; Kuehn, Joanna S.; Phillips, Gregory J.

2011-01-01

84

Exploring Cancer through Genomic Sequence Comparisons: A National Cancer Institute-National Human Genome Research Institute Workshop. Held in Bethesda, Maryland on April 14-15, 2004.  

National Technical Information Service (NTIS)

On April 14-15, 2004, the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) convened a workshop, 'Exploring Cancer through Genomic Sequence Comparisons.' Participants included leaders from the Nation's cancer centers...

2004-01-01

85

Genome walking by next generation sequencing approaches.  

PubMed

Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes. PMID:24832505

Volpicella, Mariateresa; Leoni, Claudia; Costanza, Alessandra; Fanizza, Immacolata; Placido, Antonio; Ceci, Luigi R

2012-01-01

86

Sequencing and comparative analysis of the gorilla MHC genomic sequence  

PubMed Central

Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

2013-01-01

87

The Trichomonas vaginalis Genome Sequencing Project  

NSDL National Science Digital Library

The Institute for Genomic Research (TIGR) in 2003 released the first draft assembly of the Trichomonas vaginalis_genome, available through this website to the academic and not-for-profit research community for noncommercial use only. TIGR will release more data at regular intervals during the sequencing project, which should help researchers better understand this widespread parasite and its role in HIV infection, neo-natal disorders, predisposition to cervical cancer, and of course, vaginitis. The website also includes background information on T. vaginalis, as well as a link to TIGR's sequencing project for Entamoeba histolytica -- a closely related organism.

88

The Center for Eukaryotic Structural Genomics  

Microsoft Academic Search

The Center for Eukaryotic Structural Genomics (CESG) is a “specialized” or “technology development” center supported by the\\u000a Protein Structure Initiative (PSI). CESG’s mission is to develop improved methods for the high-throughput solution of structures\\u000a from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three\\u000a years of PSI-2, CESG selected targets representing 601

John L. Markley; David J. Aceti; Craig A. Bingman; Brian G. Fox; Ronnie O. Frederick; Shin-ichi Makino; Karl W. Nichols; George N. Phillips Jr; John G. Primm; Sarata C. Sahu; Frank C. Vojtik; Brian F. Volkman; Russell L. Wrobel; Zsolt Zolnai

2009-01-01

89

Sequencing Your Genome: What Does It Mean?  

PubMed Central

The human genome contains approximately 3.2 billion nucleotides and about 23,500 genes. Each gene has protein-coding regions that are referred to as exons. The human genome contains about 180,000 exons, which are collectively called an exome. An exome comprises about 1% of the human genome and hence is about 30 million nucleotides in size. Today’s technologies afford the opportunity to sequence all nucleotides in the human exome and even in the human genome. Given that more than three-quarters of the known disease-causing variants are located in the exome, and considering the cost and technical challenges in analyzing the whole genome sequence data, the focus of present research is primarily on whole exome sequencing (WES). While WES at the medical sequencing level is still expensive, it is becoming more affordable. Cost will not likely be a major barrier in the near future, and the data analysis is becoming less tedious. The most difficult challenge at the heart of medical sequencing is interpreting the findings. Each exome contains about 13,500 single nucleotide variants (SNVs) that affect the amino acid sequence, and a large number are expected to be functional variants. The daunting task is to distinguish the variants that are pathogenic from those that have minimal or no discernible clinical effects. While various algorithms exist, none are sufficiently robust. Thus, in-depth knowledge in genetics and medicine is essential for the proper interpretation of the WES findings. This review will discuss the potential applications of the WES data in the practice of cardiovascular medicine.

2014-01-01

90

Mapping and sequencing the human genome  

SciTech Connect

Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

none,

1988-01-01

91

The complete genome sequence of Mycobacterium bovis  

PubMed Central

Mycobacterium bovis is the causative agent of tuberculosis in a range of animal species and man, with worldwide annual losses to agriculture of $3 billion. The human burden of tuberculosis caused by the bovine tubercle bacillus is still largely unknown. M. bovis was also the progenitor for the M. bovis bacillus Calmette–Guérin vaccine strain, the most widely used human vaccine. Here we describe the 4,345,492-bp genome sequence of M. bovis AF2122/97 and its comparison with the genomes of Mycobacterium tuberculosis and Mycobacterium leprae. Strikingly, the genome sequence of M. bovis is >99.95% identical to that of M. tuberculosis, but deletion of genetic information has led to a reduced genome size. Comparison with M. leprae reveals a number of common gene losses, suggesting the removal of functional redundancy. Cell wall components and secreted proteins show the greatest variation, indicating their potential role in host–bacillus interactions or immune evasion. Furthermore, there are no genes unique to M. bovis, implying that differential gene expression may be the key to the host tropisms of human and bovine bacilli. The genome sequence therefore offers major insight on the evolution, host preference, and pathobiology of M. bovis.

Garnier, Thierry; Eiglmeier, Karin; Camus, Jean-Christophe; Medina, Nadine; Mansoor, Huma; Pryor, Melinda; Duthoy, Stephanie; Grondin, Sophie; Lacroix, Celine; Monsempe, Christel; Simon, Sylvie; Harris, Barbara; Atkin, Rebecca; Doggett, Jon; Mayes, Rebecca; Keating, Lisa; Wheeler, Paul R.; Parkhill, Julian; Barrell, Bart G.; Cole, Stewart T.; Gordon, Stephen V.; Hewinson, R. Glyn

2003-01-01

92

Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria  

PubMed Central

Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Ponten, Thomas; Ussery, David W.; Aarestrup, Frank M.; Lund, Ole

2012-01-01

93

Genome Sequence of Lactobacillus versmoldensis KCTC 3814  

PubMed Central

Lactobacillus versmoldensis KCTC 3814 was isolated from raw fermented poultry salami. The species was present in high numbers and frequently dominated the lactic acid bacteria (LAB) populations of the products. Here, we announce the draft genome sequence of Lactobacillus versmoldensis KCTC 3814, isolated from poultry salami, and describe major findings from its annotation.

Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Nam, Seong-Hyeuk; Kang, Aram; Kim, Aeri; Park, Hong-Seog

2011-01-01

94

Genome Sequencing and Analysis Conference IV.  

National Technical Information Service (NTIS)

J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were pr...

1993-01-01

95

Draft genome sequence of Bacillus oceanisediminis 2691.  

PubMed

Bacillus oceanisediminis 2691 is an aerobic, Gram-positive, spore-forming, and moderately halophilic bacterium that was isolated from marine sediment of the Yellow Sea coast of South Korea. Here, we report the draft genome sequence of B. oceanisediminis 2691 that may have an important role in the bioremediation of marine sediment. PMID:23105082

Lee, Yong-Jik; Lee, Sang-Jae; Jeong, Haeyoung; Kim, Hyun Ju; Ryu, Naeun; Kim, Byoung-Chan; Lee, Han-Seung; Lee, Dong-Woo; Lee, Sang Jun

2012-11-01

96

Draft Genome Sequence of Bacillus oceanisediminis 2691  

PubMed Central

Bacillus oceanisediminis 2691 is an aerobic, Gram-positive, spore-forming, and moderately halophilic bacterium that was isolated from marine sediment of the Yellow Sea coast of South Korea. Here, we report the draft genome sequence of B. oceanisediminis 2691 that may have an important role in the bioremediation of marine sediment.

Lee, Yong-Jik; Lee, Sang-Jae; Jeong, Haeyoung; Kim, Hyun Ju; Ryu, Naeun; Kim, Byoung-Chan; Lee, Han-Seung

2012-01-01

97

Draft Genome Sequence of Streptomyces iranensis  

PubMed Central

Streptomyces iranensis HM 35 has been shown to exhibit 72.7% DNA-DNA similarity to the important drug rapamycin (sirolimus)-producing Streptomyces rapamycinicus NRRL5491. Here, we report the genome sequence of HM 35, which represents a partially overlapping repertoire of secondary metabolite gene clusters with S. rapamycinicus, including the gene cluster for rapamycin biosynthesis.

Horn, Fabian; Netzker, Tina; Guthke, Reinhard; Brakhage, Axel A.

2014-01-01

98

The first Korean genome sequence and analysis: Full genome sequencing for a socio-ethnic group  

PubMed Central

We present the first Korean individual genome sequence (SJK) and analysis results. The diploid genome of a Korean male was sequenced to 28.95-fold redundancy using the Illumina paired-end sequencing method. SJK covered 99.9% of the NCBI human reference genome. We identified 420,083 novel single nucleotide polymorphisms (SNPs) that are not in the dbSNP database. Despite a close similarity, significant differences were observed between the Chinese genome (YH), the only other Asian genome available, and SJK: (1) 39.87% (1,371,239 out of 3,439,107) SNPs were SJK-specific (49.51% against Venter's, 46.94% against Watson's, and 44.17% against the Yoruba genomes); (2) 99.5% (22,495 out of 22,605) of short indels (< 4 bp) discovered on the same loci had the same size and type as YH; and (3) 11.3% (331 out of 2920) deletion structural variants were SJK-specific. Even after attempting to map unmapped reads of SJK to unanchored NCBI scaffolds, HGSV, and available personal genomes, there were still 5.77% SJK reads that could not be mapped. All these findings indicate that the overall genetic differences among individuals from closely related ethnic groups may be significant. Hence, constructing reference genomes for minor socio-ethnic groups will be useful for massive individual genome sequencing.

Ahn, Sung-Min; Kim, Tae-Hyung; Lee, Sunghoon; Kim, Deokhoon; Ghang, Ho; Kim, Dae-Soo; Kim, Byoung-Chul; Kim, Sang-Yoon; Kim, Woo-Yeon; Kim, Chulhong; Park, Daeui; Lee, Yong Seok; Kim, Sangsoo; Reja, Rohit; Jho, Sungwoong; Kim, Chang Geun; Cha, Ji-Young; Kim, Kyung-Hee; Lee, Bonghee; Bhak, Jong; Kim, Seong-Jin

2009-01-01

99

The Genome Sequence of Drosophila melanogaster  

NSDL National Science Digital Library

On Thursday March 23, 2000, a historic milestone was marked as researchers announced they have completed mapping the genome of the fruit fly, Drosophila melanogaster. The achievement, which was announced in a special issue of the journal Science, culminates close to 100 years of research. Drosophila melanogaster is the most complex animal thus far to have its genetic sequence deciphered. The findings have important implications for human medical research and for completing a map of the human genome. Mapping the fruit fly genome has been a broad collaborative effort between academia and industry in several countries. While a foundation was laid by US (Berkeley), European, and Canadian Drosophila Genome Projects, Celera Genomic finished the job over the last year by employing super-computers and state-of-the-art gene-sequencing machines. The techniques learned and used in this last phase of mapping may now be applied to more rapidly decode genes of other organisms, including humans. This week's In The News takes a closer look at this important landmark.

Ramanujan, Krishna.

100

Comparative Analysis of Genome Sequences with VISTA  

DOE Data Explorer

VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

Dubchak, Inna

101

Genome, Epigenome and RNA sequences of Monozygotic Twins Discordant for Multiple Sclerosis  

SciTech Connect

Neil Miller, Deputy Director of Software Engineering at the National Center for Genome Resources, discusses a monozygotic twin study on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

Miller, Neil [National Center for Genome Resources

2010-06-02

102

Comparison of Sample Sequences of the Salmonella typhi Genome to the Sequence of the Complete Escherichia coli K-12 Genome  

Microsoft Academic Search

Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with

MICHAEL MCCLELLAND; RICHARD K. WILSON

1998-01-01

103

Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux  

PubMed Central

We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ?20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology.

Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

2012-01-01

104

A Computer Program for Aligning a cDNA Sequence with a Genomic DNA Sequence  

Microsoft Academic Search

We address the problem of efficiently aligning a transcribed and spliced DNA sequence with a genomic sequence containing that gene, allowing for introns in the genomic sequence and a relatively small number of sequencing errors. A freely available computer program, described herein, solves the problem for a 100-kb genomic sequence in a few seconds on a workstation. With large amounts

Liliana Florea; George Hartzell; Gerald M. Rubin; Webb Miller

1998-01-01

105

Defining Genome Project Standards in a New Era of Sequencing  

SciTech Connect

Patrick Chain of the DOE Joint Genome Institute gives a talk on behalf of the International Genome Sequencing Standards Consortium on the need for intermediate genome classifications between "draft" and "finished"

Chain, Patrick [DOE-JGI

2009-05-27

106

Whole-genome sequencing in bacteriology: state of the art  

PubMed Central

Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

Dark, Michael J

2013-01-01

107

Draft Genome Sequence of Actinomyces massiliensis Strain 4401292T  

PubMed Central

A draft genome sequence of Actinomyces massiliensis, an anaerobic bacterium isolated from a patient's blood culture, is described here. CRISPR-associated proteins, insertion sequences, and toxin-antitoxin loci were found on the genome.

Robert, Catherine; Gimenez, Gregory; Gharbi, Reem; Raoult, Didier

2012-01-01

108

Complete Genome Sequence of Lactobacillus helveticus H10 ?  

PubMed Central

Lactobacillus helveticus strain H10 was isolated from traditional fermented milk in Tibet, China. We sequenced the whole genome of strain H10 and compared it to the published genome sequence of Lactobacillus helveticus DPC4571.

Zhao, Wenjing; Chen, Yongfu; Sun, Zhihong; Wang, Jicheng; Zhou, Zhemin; Sun, Tiansong; Wang, Lei; Chen, Wei; Zhang, Heping

2011-01-01

109

The Norway spruce genome sequence and conifer genome evolution.  

PubMed

Conifers have dominated forests for more than 200?million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000?base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding. PMID:23698360

Nystedt, Björn; Street, Nathaniel R; Wetterbom, Anna; Zuccolo, Andrea; Lin, Yao-Cheng; Scofield, Douglas G; Vezzi, Francesco; Delhomme, Nicolas; Giacomello, Stefania; Alexeyenko, Andrey; Vicedomini, Riccardo; Sahlin, Kristoffer; Sherwood, Ellen; Elfstrand, Malin; Gramzow, Lydia; Holmberg, Kristina; Hällman, Jimmie; Keech, Olivier; Klasson, Lisa; Koriabine, Maxim; Kucukoglu, Melis; Käller, Max; Luthman, Johannes; Lysholm, Fredrik; Niittylä, Totte; Olson, Ake; Rilakovic, Nemanja; Ritland, Carol; Rosselló, Josep A; Sena, Juliana; Svensson, Thomas; Talavera-López, Carlos; Theißen, Günter; Tuominen, Hannele; Vanneste, Kevin; Wu, Zhi-Qiang; Zhang, Bo; Zerbe, Philipp; Arvestad, Lars; Bhalerao, Rishikesh; Bohlmann, Joerg; Bousquet, Jean; Garcia Gil, Rosario; Hvidsten, Torgeir R; de Jong, Pieter; MacKay, John; Morgante, Michele; Ritland, Kermit; Sundberg, Björn; Thompson, Stacey Lee; Van de Peer, Yves; Andersson, Björn; Nilsson, Ove; Ingvarsson, Pär K; Lundeberg, Joakim; Jansson, Stefan

2013-05-30

110

Initial sequencing and comparative analysis of the mouse genome  

Microsoft Academic Search

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing

Robert H. Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F. Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E. Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R. Brent; Daniel G. Brown; Stephen D. Brown; Carol Bult; John Burton; Jonathan Butler; Robert D. Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T. Chinwalla; Deanna M. Church; Michele Clamp; Christopher Clee; Francis S. Collins; Lisa L. Cook; Richard R. Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D. Delehaunty; Justin Deri; Emmanouil T. Dermitzakis; Colin Dewey; Nicholas J. Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M. Dunn; Sean R. Eddy; Laura Elnitski; Richard D. Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A. Fewell; Paul Flicek; Karen Foley; Wayne N. Frankel; Lucinda A. Fulton; Robert S. Fulton; Terrence S. Furey; Diane Gage; Richard A. Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A. Graves; Eric D. Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C. Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W. Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B. Jaffe; L. Steven Johnson; Matthew Jones; Thomas A. Jones; Ann Joy; Michael Kamal; Elinor K. Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W. James Kent; Andrew Kirby; Diana L. Kolbe; Ian Korf; Raju S. Kucherlapati; Edward J. Kulbokas; David Kulp; Tom Landers; J. P. Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R. Maglott; Elaine R. Mardis; Lucy Matthews; Evan Mauceli; John H. Mayer; Megan McCarthy; W. Richard McCombie; Stuart McLaren; Kirsten McLay; John D. McPherson; Jim Meldrim; Beverley Meredith; Jill P. Mesirov; Webb Miller; Tracie L. Miner; Emmanuel Mongin; Kate T. Montgomery; Michael Morgan; Richard Mott; James C. Mullikin; Donna M. Muzny; William E. Nash; Joanne O. Nelson; Michael N. Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J. O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H. Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S. Pohl; Alex Poliakov; Tracy C. Ponce; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A. Roe; Krishna M. Roskin; Edward M. Rubin; Alistair G. Rust; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B. Singer; Guy Slater; Arian Smit; Douglas R. Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P. Vinson; Andrew C. von Niederhausern; Claire M. Wade; Melanie Wall; Ryan J. Weber; Robert B. Weiss; Michael C. Wendl; Anthony P. West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K. Wilson; Eitan Winter; Kim C. Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M. Zdobnov; Michael C. Zody; Eric S. Lander; Chris P. Ponting; Matthias S. Schwartz

2002-01-01

111

Using inversion signatures to generate draft genome sequence scaffolds  

Microsoft Academic Search

We present a linear-time algorithm that can generate a contig scaffold for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion

Zanoni Dias; Ulisses Dias; João C. Setubal

2011-01-01

112

Complete genome sequence of Pyrobaculum oguniense.  

PubMed

Pyrobaculum oguniense TE7 is an aerobic hyperthermophilic crenarchaeon isolated from a hot spring in Japan. Here we describe its main chromosome of 2,436,033 bp, with three large-scale inversions and an extra-chromosomal element of 16,887 bp. We have annotated 2,800 protein-coding genes and 145 RNA genes in this genome, including nine H/ACA-like small RNA, 83 predicted C/D box small RNA, and 47 transfer RNA genes. Comparative analyses with the closest known relative, the anaerobe Pyrobaculum arsenaticum from Italy, reveals unexpectedly high synteny and nucleotide identity between these two geographically distant species. Deep sequencing of a mixture of genomic DNA from multiple cells has illuminated some of the genome dynamics potentially shared with other species in this genus. PMID:23407329

Bernick, David L; Karplus, Kevin; Lui, Lauren M; Coker, Joanna K C; Murphy, Julie N; Chan, Patricia P; Cozen, Aaron E; Lowe, Todd M

2012-07-30

113

Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences  

Microsoft Academic Search

BACKGROUND: Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX,

Alexander F. Auch; Stefan R. Henz; Barbara R. Holland; Markus Göker

2006-01-01

114

Cactus: Algorithms for genome multiple sequence alignment  

PubMed Central

Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.

Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

2011-01-01

115

[Prediction of transcription and genomic sequences].  

PubMed

Technological developments have enhanced DNA sequencing at genomic scale. On the basis of the resulting sequences, computational biologists now attempt to localise the most important functional regions, starting with genes, but also importantly the regulatory motifs and conditions controlling their expression. In a recent paper published in Cell, M.A. Beer and S. Tavazoie report the results obtained by combining statistical classifications (clustering) of transcriptome data (DNA chips), software for the discovery of cis-regulatory patterns, together with a probabilistic learning method to infer regulatory rules tentatively accounting for the observed transcriptional profiles. PMID:15525501

Martin, David; Ghattas, Badih; Thieffry, Denis

2004-11-01

116

Complete genome sequence of Candidatus Ruthia magnifica  

PubMed Central

The hydrothermal vent clam Calyptogena magnifica (Bivalvia: Mollusca) is a member of the Vesicomyidae. Species within this family form symbioses with chemosynthetic Gammaproteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a rudimentary gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. The C. magnifica symbiont, Candidatus Ruthia magnifica, was the first intracellular sulfur-oxidizing endosymbiont to have its genome sequenced (Newton et al. 2007). Here we expand upon the original report and provide additional details complying with the emerging MIGS/MIMS standards. The complete genome exposed the genetic blueprint of the metabolic capabilities of the symbiont. Genes which were predicted to encode the proteins required for all the metabolic pathways typical of free-living chemoautotrophs were detected in the symbiont genome. These include major pathways including carbon fixation, sulfur oxidation, nitrogen assimilation, as well as amino acid and cofactor/vitamin biosynthesis. This genome sequence is invaluable in the study of these enigmatic associations and provides insights into the origin and evolution of autotrophic endosymbiosis.

Roeselers, Guus; Newton, Irene L. G.; Woyke, Tanja; Auchtung, Thomas A.; Dilly, Geoffrey F.; Dutton, Rachel J.; Fisher, Meredith C.; Fontanez, Kristina M.; Lau, Evan; Stewart, Frank J.; Richardson, Paul M.; Barry, Kerrie W.; Saunders, Elizabeth; Detter, John C.; Wu, Dongying; Eisen, Jonathan A.; Cavanaugh, Colleen M.

2010-01-01

117

The genome sequence of Schizosaccharomyces pombe  

Microsoft Academic Search

We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended

R. Gwilliam; M.-A. Rajandream; M. Lyne; R. Lyne; A. Stewart; J. Sgouros; N. Peat; J. Hayles; S. Baker; D. Basham; S. Bowman; K. Brooks; D. Brown; S. Brown; T. Chillingworth; C. Churcher; M. Collins; R. Connor; A. Cronin; P. Davis; T. Feltwell; A. Fraser; S. Gentles; A. Goble; N. Hamlin; D. Harris; J. Hidalgo; G. Hodgson; S. Holroyd; T. Hornsby; S. Howarth; E. J. Huckle; S. Hunt; K. Jagels; K. James; L. Jones; M. Jones; S. Leather; S. McDonald; J. McLean; P. Mooney; S. Moule; K. Mungall; L. Murphy; D. Niblett; C. Odell; K. Oliver; S. O'Neil; D. Pearson; M. A. Quail; E. Rabbinowitsch; K. Rutherford; S. Rutter; D. Saunders; K. Seeger; S. Sharp; J. Skelton; M. Simmonds; R. Squares; S. Squares; K. Stevens; K. Taylor; R. G. Taylor; A. Tivey; S. Walsh; T. Warren; S. Whitehead; J. Woodward; G. Volckaert; R. Aert; J. Robben; B. Grymonprez; I. Weltjens; E. Vanstreels; M. Rieger; M. Schäfer; S. Müller-Auer; C. Gabel; M. Fuchs; C. Fritzc; E. Holzer; D. Moestl; H. Hilbert; K. Borzym; I. Langer; A. Beck; H. Lehrach; R. Reinhardt; T. M. Pohl; P. Eger; W. Zimmermann; H. Wedler; R. Wambutt; B. Purnelle; A. Goffeau; E. Cadieu; S. Dréano; S. Gloux; V. Lelaure; S. Mottier; F. Galibert; S. J. Aves; Z. Xiang; C. Hunt; K. Moore; S. M. Hurst; M. Lucas; M. Rochet; C. Gaillardin; V. A. Tallada; A. Garzon; G. Thode; R. R. Daga; L. Cruzado; J. Jimenez; M. Sánchez; F. del Rey; J. Benito; A. Domínguez; J. L. Revuelta; S. Moreno; J. Armstrong; S. L. Forsburg; L. Cerrutti; T. Lowe; W. R. McCombie; I. Paulsen; J. Potashkin; G. V. Shpakovski; D. Ussery; B. G. Barrell; P. Nurse

2002-01-01

118

Draft Genome Sequence of Rubrivivax gelatinosus CBS  

PubMed Central

Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N2 as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H2. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

Hu, Pingsha; Lang, Juan; Wawrousek, Karen; Yu, Jianping; Maness, Pin-Ching

2012-01-01

119

Clinical applications of sequencing take center stage  

PubMed Central

A report on the Advances in Genome Biology and Technology (AGBT) meeting, Marco Island, Florida, USA, February 20-23, 2013. This year's Advances in Genome Biology and Technology (AGBT) meeting reflected the current state of 'next generation' sequencing (NGS) technologies: significantly reduced competition and innovation, and a strong focus on standardization and application. Announcements of technological breakthroughs - a hallmark of previous AGBT meetings - were markedly absent, but existing technologies continued to improve following the now expected exponential curve. Although applications ranged widely, there was a strong emphasis on clinical diagnosis.

2013-01-01

120

Genome Sequences of Pseudomonas spp. Isolated from Cereal Crops  

PubMed Central

Compared to those of dicot-infecting bacteria, the available genome sequences of bacteria that infect wheat and barley are limited. Herein, we report the draft genome sequences of four pseudomonads originally isolated from these cereals. These genome sequences provide a useful resource for comparative analyses within the genus and for cross-kingdom analyses of plant pathogenesis.

Stiller, Jiri; Covarelli, Lorenzo; Lindeberg, Magdalen; Shivas, Roger G.; Manners, John M.

2013-01-01

121

Plastid DNA sequence homologies in the tobacco nuclear genome  

Microsoft Academic Search

The tobacco (Nicotiana tabacum) nuclear genome contains long tracts of DNA (i.e. in excess of 18 kb) with high sequence homology to the tobacco plastid genome. Five lambda clones containing these nuclear DNA sequences encompass more than one-third of the tobacco plastid genome. The absolute size of these five integrants is unknown but potentially includes uninterrupted sequences that are as

Michael A. Ayliffe; Jeremy N. Timmis

1992-01-01

122

Draft Genome Sequence of Pseudomonas putida Strain MTCC5279  

PubMed Central

Here we report the genome sequence of a plant-growth-promoting rhizobacterium, Pseudomonas putida strain MTCC5279. The length of the draft genome sequence is approximately 5.2 Mb, with a GC content of 62.5%. The draft genome sequence reveals a number of genes whose products are possibly involved in plant growth promotion and abiotic stress tolerance.

Chaudhry, Vasvi; Asif, Mehar H.; Bag, Sumit; Goel, Ridhi; Mantri, Shrikant S.; Singh, Sunil K.; Chauhan, Puneet S.; Sawant, Samir V.

2013-01-01

123

Draft Genome Sequence of Pseudomonas putida Strain MTCC5279.  

PubMed

Here we report the genome sequence of a plant-growth-promoting rhizobacterium, Pseudomonas putida strain MTCC5279. The length of the draft genome sequence is approximately 5.2 Mb, with a GC content of 62.5%. The draft genome sequence reveals a number of genes whose products are possibly involved in plant growth promotion and abiotic stress tolerance. PMID:23908291

Chaudhry, Vasvi; Asif, Mehar H; Bag, Sumit; Goel, Ridhi; Mantri, Shrikant S; Singh, Sunil K; Chauhan, Puneet S; Sawant, Samir V; Nautiyal, Chandra Shekhar

2013-01-01

124

Mitochondrial Genome Sequence Evolution in Chlamydomonas  

PubMed Central

The mitochondrial genomes of the Chlorophyta exhibit significant diversity with respect to gene content and genome compactness; however, quantitative data on the rates of nucleotide substitution in mitochondrial DNA, which might help explain the origin of this diversity, are lacking. To gain insight into the evolutionary forces responsible for mitochondrial genome diversification, we sequenced to near completion the mitochondrial genome of the chlorophyte Chlamydomonas incerta, estimated the evolutionary divergence between Chlamydomonas reinhardtii and C. incerta mitochondrial protein-coding genes and rRNA-coding regions, and compared the relative evolutionary rates in mitochondrial and nuclear genes. Synonymous and nonsynonymous substitution rates do not differ significantly between the mitochondrial and nuclear protein-coding genes. The mitochondrial rRNA-coding regions, however, are evolving much faster than their nuclear counterparts, and this difference might be explained by relaxed functional constraints on the mitochondrial translational apparatus due to the small number of proteins synthesized in Chlamydomonas mitochondria. Substitution rates at synonymous sites in a nonstandard mitochondrial gene (rtl) and at intronic and synonymous sites in nuclear genes expressed at low levels suggest that the mutation rate is similar in these two genetic compartments. Potential evolutionary forces shaping mitochondrial genome evolution in Chlamydomonas are discussed.

Popescu, Cristina E.; Lee, Robert W.

2007-01-01

125

Whole genome sequencing for lung cancer  

PubMed Central

Lung cancer is a leading cause of cancer related morbidity and mortality globally, and carries a dismal prognosis. Improved understanding of the biology of cancer is required to improve patient outcomes. Next-generation sequencing (NGS) is a powerful tool for whole genome characterisation, enabling comprehensive examination of somatic mutations that drive oncogenesis. Most NGS methods are based on polymerase chain reaction (PCR) amplification of platform-specific DNA fragment libraries, which are then sequenced. These techniques are well suited to high-throughput sequencing and are able to detect the full spectrum of genomic changes present in cancer. However, they require considerable investments in time, laboratory infrastructure, computational analysis and bioinformatic support. Next-generation sequencing has been applied to studies of the whole genome, exome, transcriptome and epigenome, and is changing the paradigm of lung cancer research and patient care. The results of this new technology will transform current knowledge of oncogenic pathways and provide molecular targets of use in the diagnosis and treatment of cancer. Somatic mutations in lung cancer have already been identified by NGS, and large scale genomic studies are underway. Personalised treatment strategies will improve care for those likely to benefit from available therapies, while sparing others the expense and morbidity of futile intervention. Organisational, computational and bioinformatic challenges of NGS are driving technological advances as well as raising ethical issues relating to informed consent and data release. Differentiation between driver and passenger mutations requires careful interpretation of sequencing data. Challenges in the interpretation of results arise from the types of specimens used for DNA extraction, sample processing techniques and tumour content. Tumour heterogeneity can reduce power to detect mutations implicated in oncogenesis. Next-generation sequencing will facilitate investigation of the biological and clinical implications of such variation. These techniques can now be applied to single cells and free circulating DNA, and possibly in the future to DNA obtained from body fluids and from subpopulations of tumour. As costs reduce, and speed and processing accuracy increase, NGS technology will become increasingly accessible to researchers and clinicians, with the ultimate goal of improving the care of patients with lung cancer.

Goh, Felicia; Wright, Casey M; Sriram, Krishna B; Relan, Vandana; Clarke, Belinda E; Duhig, Edwina E; Bowman, Rayleen V; Yang, Ian A; Fong, Kwun M

2012-01-01

126

Functional genomics of tomato in a post-genome-sequencing phase.  

PubMed

Completion of tomato genome sequencing project has broad impacts on genetic and genomic studies of tomato and Solanaceae plants. The reference genome sequence derived from Solanum lycopersicum cv 'Heinz 1706' serves as the firm basis for sequencing-based approaches to tomato genomics. In this article, we first present a brief summary of the genome sequencing project and a summary of the reference genome sequence. We then focus on recent progress in transcriptome sequencing and small RNA sequencing and show how the reference genome sequence makes these analyses more comprehensive than before. We discuss the potential of in-depth analysis that is based on DNA methylome sequencing and transcription start-site detection. Finally, we describe the current status of efforts to resequence S. lycopersicum cultivars to demonstrate how resequencing can allow the use of intraspecific genomic diversity for detailed phenotyping and breeding. PMID:23641177

Aoki, Koh; Ogata, Yoshiyuki; Igarashi, Kaori; Yano, Kentaro; Nagasaki, Hideki; Kaminuma, Eli; Toyoda, Atsushi

2013-03-01

127

Functional genomics of tomato in a post-genome-sequencing phase  

PubMed Central

Completion of tomato genome sequencing project has broad impacts on genetic and genomic studies of tomato and Solanaceae plants. The reference genome sequence derived from Solanum lycopersicum cv ‘Heinz 1706’ serves as the firm basis for sequencing-based approaches to tomato genomics. In this article, we first present a brief summary of the genome sequencing project and a summary of the reference genome sequence. We then focus on recent progress in transcriptome sequencing and small RNA sequencing and show how the reference genome sequence makes these analyses more comprehensive than before. We discuss the potential of in-depth analysis that is based on DNA methylome sequencing and transcription start-site detection. Finally, we describe the current status of efforts to resequence S. lycopersicum cultivars to demonstrate how resequencing can allow the use of intraspecific genomic diversity for detailed phenotyping and breeding.

Aoki, Koh; Ogata, Yoshiyuki; Igarashi, Kaori; Yano, Kentaro; Nagasaki, Hideki; Kaminuma, Eli; Toyoda, Atsushi

2013-01-01

128

Why Assembling Plant Genome Sequences Is So Challenging  

PubMed Central

In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed.

Claros, Manuel Gonzalo; Bautista, Rocio; Guerrero-Fernandez, Dario; Benzerki, Hicham; Seoane, Pedro; Fernandez-Pozo, Noe

2012-01-01

129

Advances in understanding cancer genomes through second-generation sequencing  

Microsoft Academic Search

Cancers are caused by the accumulation of genomic alterations. Therefore, analyses of cancer genome sequences and structures provide insights for understanding cancer biology, diagnosis and therapy. The application of second-generation DNA sequencing technologies (also known as next-generation sequencing) — through whole-genome, whole-exome and whole-transcriptome approaches — is allowing substantial advances in cancer genomics. These methods are facilitating an increase in

Stacey Gabriel; Gad Getz; Matthew Meyerson

2010-01-01

130

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence  

Microsoft Academic Search

BACKGROUND: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS)

Frank M You; Naxin Huo; Karin R Deal; Yong Q Gu; Ming-Cheng Luo; Patrick E McGuire; Jan Dvorak; Olin D Anderson

2011-01-01

131

Complete mitochondrial genome sequence of Nectogale elegans.  

PubMed

Abstract The elegant water shrew (Nectogale elegans) belongs to the family Soricidae, and distributes in northern South Asia, central and southern China and northern Southeast Asia. In this study, the complete mitochondrial genome of N. elegans was sequenced. It was determined to be 17,460 bases, and included 13 protein-coding genes (PCGs), 22 tRNA genes, 2 ribosomal RNA genes and one non-coding region, which is similar to other mammalian mitochondrial genomes. Bayesian inference and maximum likelihood methods were used to construct phylogenetic trees based on 12 heavy-strand concatenated PCGs. Phylogenetic analyses further confirmed that Crocidurinae diverged prior to Soricinae, and Sorex unguiculatus differentiated earlier than N. elegans. PMID:23795853

Huang, Ting; Yan, Chaochao; Tan, Zheng; Tu, Feiyun; Yue, Bisong; Zhang, Xiuyue

2014-08-01

132

BorreliaBase: a phylogeny-centered browser of Borrelia genomes  

PubMed Central

Background The bacterial genus Borrelia (phylum Spirochaetes) consists of two groups of pathogens represented respectively by B. burgdorferi, the agent of Lyme borreliosis, and B. hermsii, the agent of tick-borne relapsing fever. The number of publicly available Borrelia genomic sequences is growing rapidly with the discovery and sequencing of Borrelia strains worldwide. There is however a lack of dedicated online databases to facilitate comparative analyses of Borrelia genomes. Description We have developed BorreliaBase, an online database for comparative browsing of Borrelia genomes. The database is currently populated with sequences from 35 genomes of eight Lyme-borreliosis (LB) group Borrelia species and 7 Relapsing-fever (RF) group Borrelia species. Distinct from genome repositories and aggregator databases, BorreliaBase serves manually curated comparative-genomic data including genome-based phylogeny, genome synteny, and sequence alignments of orthologous genes and intergenic spacers. Conclusions With a genome phylogeny at its center, BorreliaBase allows online identification of hypervariable lipoprotein genes, potential regulatory elements, and recombination footprints by providing evolution-based expectations of sequence variability at each genomic locus. The phylo-centric design of BorreliaBase (http://borreliabase.org) is a novel model for interactive browsing and comparative analysis of bacterial genomes online.

2014-01-01

133

The International Rice Genome Sequencing Project: progress and prospects  

Microsoft Academic Search

The rice genome sequencing project has been pursued as a national project in Japan since 1998. At the same time, a desire to accelerate the sequenc- ing of the entire rice genome led to the formation of the International Rice Genome Sequencing Project (IRGSP), initially comprising five countries. The sequencing strategy is the conventional clone-by-clone shotgun method us- ing P1-derived

T. Sasaki; T. Matsumoto; T. Baba; K. Yamamoto; J. Wu; Y. Katayose; K. Sakata

134

The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics  

PubMed Central

The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found strong evidence for 1,300 new C. elegans genes. In addition, comparisons of the two genomes will help to understand the evolutionary forces that mold nematode genomes.

2003-01-01

135

The Jackson Laboratory: The Mouse Genome Sequence Project  

NSDL National Science Digital Library

Part of the Mouse Genome Informatics program (last reported on in the NSDL Scout Report for the Life Sciences on March 19, 2004) at the Jackson Laboratory, this website presents The Mouse Genome Sequence (MGS) project. MGS is designed "to integrate emerging mouse genomic sequence data with the genetic and biological data available in MGD and GXD." The site links to Eukaryotic Genome Annotation Projects, as well as Sequence Analysis Tools including MouseBlast and Genome Analysis. The site also offers basic background information about the Mouse Genome Sequencing Initiative, and provides site users with access to groups involved in mouse genome sequencing, the BAC clone library, request forms for targeted sequencing, and more.

136

Draft Genome Sequence of Lactobacillus rossiae DSM 15814T  

PubMed Central

The draft genome sequence of Lactobacillus rossiae DSM 15814T (CS1, ATCC BAA-88) was determined by a whole-genome shotgun approach. Reads were assembled to a 2.9-Mb draft version. RAST genome annotation evidenced 2,723 predicted coding sequences. Many carbohydrate, amino acid, and amino acid derivative subsystem features were found.

Di Cagno, Raffaella; Cattonaro, Federica; Gobbetti, Marco

2012-01-01

137

Complete Genome Sequence of Staphylococcus aureus Siphovirus Phage JS01  

PubMed Central

Staphylococcus aureus is the most prevalent and economically significant pathogen causing bovine mastitis. We isolated and characterized one staphylophage from the milk of mastitis-affected cattle and sequenced its genome. Transmission electron microscopy (TEM) observation shows that it belongs to the family Siphovirus. We announce here its complete genome sequence and report major findings from the genomic analysis.

Jia, Hongying; Bai, Qinqin; Yang, Yongchun

2013-01-01

138

The Center for Eukaryotic Structural Genomics.  

PubMed

The Center for Eukaryotic Structural Genomics (CESG) is a "specialized" or "technology development" center supported by the Protein Structure Initiative (PSI). CESG's mission is to develop improved methods for the high-throughput solution of structures from eukaryotic proteins, with a very strong weighting toward human proteins of biomedical relevance. During the first three years of PSI-2, CESG selected targets representing 601 proteins from Homo sapiens, 33 from mouse, 10 from rat, 139 from Galdieria sulphuraria, 35 from Arabidopsis thaliana, 96 from Cyanidioschyzon merolae, 80 from Plasmodium falciparum, 24 from yeast, and about 25 from other eukaryotes. Notably, 30% of all structures of human proteins solved by the PSI Centers were determined at CESG. Whereas eukaryotic proteins generally are considered to be much more challenging targets than prokaryotic proteins, the technology now in place at CESG yields success rates that are comparable to those of the large production centers that work primarily on prokaryotic proteins. We describe here the technological innovations that underlie CESG's platforms for bioinformatics and laboratory information management, target selection, protein production, and structure determination by X-ray crystallography or NMR spectroscopy. PMID:19130299

Markley, John L; Aceti, David J; Bingman, Craig A; Fox, Brian G; Frederick, Ronnie O; Makino, Shin-ichi; Nichols, Karl W; Phillips, George N; Primm, John G; Sahu, Sarata C; Vojtik, Frank C; Volkman, Brian F; Wrobel, Russell L; Zolnai, Zsolt

2009-04-01

139

Ethical issues raised by whole genome sequencing.  

PubMed

While there is ongoing discussion about the details of implementation of whole genome sequencing (WGS) and whole exome sequencing (WES), there appears to be a consensus amongst geneticists that the widespread use of these approaches is not only inevitable, but will also be beneficial [1]. However, at the present time, we are unable to anticipate the full range of uses, consequences and impact of implementing WGS and WES. Nevertheless, the already known ethical issues, both in research and in clinical practice are diverse and complex and should be addressed properly presently. Herein, we discuss the ethical aspects of WGS and WES by particularly focussing on three overlapping themes: (1) informed consent, (2) data handling, and (3) the return of results. PMID:24810188

Pinxten, Wim; Howard, Heidi Carmen

2014-04-01

140

Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis  

Microsoft Academic Search

The 4 202 353 bp genome of the alkaliphilic bacterium Bacillus halodurans C-125 contains 4066 predicted protein coding sequences (CDSs), 2141 (52.7%) of which have functional assignments, 1182 (29%) of which are conserved CDSs with unknown function and 743 (18.3%) of which have no match to any protein database. Among the total CDSs, 8.8% match sequences of proteins found only

Hideto Takami; Kaoru Nakasone; Yoshihiro Takaki; Go Maeno; Rumie Sasaki; Noriaki Masui; Fumie Fuji; Chie Hirama; Yuka Nakamura; Naotake Ogasawara; Satoru Kuhara; Koki Horikoshi

2000-01-01

141

Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags  

PubMed Central

Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ? 98.28% and 89.02% ? 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published.

Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

2013-01-01

142

Genomic Sequence Comparisons, 1987-2003 Final Report  

SciTech Connect

This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

George M. Church

2004-07-29

143

Complete genome sequence of Liberibacter crescens BT-1.  

PubMed

Liberibacter crescens BT-1, a Gram-negative, rod-shaped bacterial isolate, was previously recovered from mountain papaya to gain insight on Huanglongbing (HLB) and Zebra Chip (ZC) diseases. The genome of BT-1 was sequenced at the Interdisciplinary Center for Biotechnology Research (ICBR) at the University of Florida. A finished assembly and annotation yielded one chromosome with a length of 1,504,659 bp and a G+C content of 35.4%. Comparison to other species in the Liberibacter genus, L. crescens has many more genes in thiamine and essential amino acid biosynthesis. This likely explains why L. crescens BT-1 is culturable while the known Liberibacter strains have not yet been cultured. Similar to CandidatusL. asiaticus psy62, the L. crescens BT-1 genome contains two prophage regions. PMID:23408754

Leonard, Michael T; Fagen, Jennie R; Davis-Richardson, Austin G; Davis, Michael J; Triplett, Eric W

2012-12-19

144

Draft Genome Sequence of Bacillus amyloliquefaciens B-1895.  

PubMed

In this report, we present a draft genome sequence of Bacillus amyloliquefaciens strain B-1895. Comparison with the genome of a reference strain demonstrated similar overall organization, as well as differences involving large gene clusters. PMID:24948774

Karlyshev, Andrey V; Melnikov, Vyacheslav G; Chistyakov, Vladimir A

2014-01-01

145

Draft Genome Sequence of Bacillus amyloliquefaciens B-1895  

PubMed Central

In this report, we present a draft genome sequence of Bacillus amyloliquefaciens strain B-1895. Comparison with the genome of a reference strain demonstrated similar overall organization, as well as differences involving large gene clusters.

Melnikov, Vyacheslav G.; Chistyakov, Vladimir A.

2014-01-01

146

Integraated Program in Microbial Genome Sequencing and Analysis.  

National Technical Information Service (NTIS)

The final progress report for this project contains information on nine microbial genome sequencing projects and two functional genomics projects that have been underway since the last report was submitted. The work funded under this award has resulted in...

2005-01-01

147

Validation of rice genome sequence by optical mapping  

Microsoft Academic Search

BACKGROUND: Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. RESULTS: To facilitate ongoing sequencing finishing and validation

Shiguo Zhou; Michael C Bechner; Chris P Churas; Louise Pape; Sally A Leong; Rod Runnheim; Dan K Forrest; Steve Goldstein; Miron Livny; David C Schwartz

2007-01-01

148

Next Generation Sequencing at the University of Chicago Genomics Core  

ScienceCinema

The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

149

Next Generation Sequencing at the University of Chicago Genomics Core  

SciTech Connect

The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

Faber, Pieter [University of Chicago

2013-04-24

150

Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)  

SciTech Connect

Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Teshima, Hazuki [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

151

Reconstructing cancer genomes from paired-end sequencing data  

PubMed Central

Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.

2012-01-01

152

Next-generation sequencing strategies for characterizing the turkey genome.  

PubMed

The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry. PMID:24570472

Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

2014-02-01

153

Applications of next-generation sequencing technologies in functional genomics  

Microsoft Academic Search

A new generation of sequencing technologies, from Illumina\\/Solexa, ABI\\/SOLiD, 454\\/Roche, and Helicos, has provided unprecedented opportunities for high-throughput functional genomic research. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted resequencing, discovery of transcription factor binding sites, and noncoding RNA expression profiling. This review discusses applications of next-generation sequencing technologies in functional genomics

Olena Morozova; Marco A. Marra

2008-01-01

154

Synergy between sequence and size in Large-scale genomics  

Microsoft Academic Search

Until recently the study of individual DNA sequences and of total DNA content (the C-value) sat at opposite ends of the spectrum in genome biology. For gene sequencers, the vast stretches of non-coding DNA found in eukaryotic genomes were largely considered to be an annoyance, whereas genome-size researchers attributed little relevance to specific nucleotide sequences. However, the dawn of comprehensive

T. Ryan Gregory

2005-01-01

155

Multiplex sequencing of plant chloroplast genomes using Solexa sequencing-by-synthesis technology  

Microsoft Academic Search

Organellar DNA sequences are widely used in evolutionary and population genetic studies, how- ever, the conservative nature of chloroplast gene and genome evolution often limits phylogenetic resolution and statistical power. To gain maximal access to the historical record contained within chloroplast genomes, we have adapted multiplex sequencing-by-synthesis (MSBS) to simultaneously sequence multiple genomes using the Illumina Genome Analyzer. We PCR-amplified

Richard Cronn; Aaron Liston; Matthew Parks; David S. Gernandt; Rongkun Shen; Todd Mockler

2008-01-01

156

Sequencing and assembly of the 22-gb loblolly pine genome.  

PubMed

Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer "super-reads," rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp. PMID:24653210

Zimin, Aleksey; Stevens, Kristian A; Crepeau, Marc W; Holtz-Morris, Ann; Koriabine, Maxim; Marçais, Guillaume; Puiu, Daniela; Roberts, Michael; Wegrzyn, Jill L; de Jong, Pieter J; Neale, David B; Salzberg, Steven L; Yorke, James A; Langley, Charles H

2014-03-01

157

Exome sequencing: the sweet spot before whole genomes.  

PubMed

The development of massively parallel sequencing technologies, coupled with new massively parallel DNA enrichment technologies (genomic capture), has allowed the sequencing of targeted regions of the human genome in rapidly increasing numbers of samples. Genomic capture can target specific areas in the genome, including genes of interest and linkage regions, but this limits the study to what is already known. Exome capture allows an unbiased investigation of the complete protein-coding regions in the genome. Researchers can use exome capture to focus on a critical part of the human genome, allowing larger numbers of samples than are currently practical with whole-genome sequencing. In this review, we briefly describe some of the methodologies currently used for genomic and exome capture and highlight recent applications of this technology. PMID:20705737

Teer, Jamie K; Mullikin, James C

2010-10-15

158

Genome Sequence of Lactobacillus plantarum Strain UCMA 3037  

PubMed Central

Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias

2013-01-01

159

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome  

Microsoft Academic Search

Background: It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results: We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D.

Casey M Bergman; Barret D Pfeiffer; Diego E Rincón-Limas; Roger A Hoskins; Andreas Gnirke; Chris J Mungall; Adrienne M Wang; Brent Kronmiller; Joanne Pacleb; Soo Park; Mark Stapleton; Kenneth Wan; Reed A George; Pieter J de Jong; Juan Botas; Gerald M Rubin; Susan E Celniker

2002-01-01

160

The $1000 Genome: Ethical and Legal Issues in Whole Genome Sequencing of Individuals  

Microsoft Academic Search

Progress in gene sequencing could make rapid whole genome sequencing of individuals affordable to millions of persons and useful for many purposes in a future era of genomic medicine. Using the idea of $1000 genome as a focus, this article reviews the main technical, ethical, and legal issues that must be resolved to make mass genotyping of individuals cost-effective and

John A. Robertson

2003-01-01

161

De Novo Next Generation Sequencing of Plant Genomes  

Microsoft Academic Search

The genome sequencing of all major food and bioenergy crops is of critical importance in the race to improve crop production\\u000a to meet the future food and energy security needs of the world. Next generation sequencing technologies have brought about\\u000a great improvements in sequencing throughput and cost, but do not yet allow for de novo sequencing of large repetitive genomes

Steve Rounsley; Pradeep Reddy Marri; Yeisoo Yu; Ruifeng He; Nick Sisneros; Jose Luis Goicoechea; So Jeong Lee; Angelina Angelova; Dave Kudrna; Meizhong Luo; Jason Affourtit; Brian Desany; James Knight; Faheem Niazi; Michael Egholm; Rod A. Wing

2009-01-01

162

Finishing The Euchromatic Sequence Of The Human Genome  

SciTech Connect

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

2004-09-07

163

Identification of Candidate Drosophila Olfactory Receptors from Genomic DNA Sequence  

Microsoft Academic Search

We have taken advantage of the availability of a large amount of Drosophila genomic DNA sequence in the Berkeley Drosophila Genome Project database (?1\\/5 of the genome) to identify a family of novel seven transmembrane domain encoding genes that are putative Drosophila olfactory receptors. Members of the family are expressed in distinct subsets of olfactory neurons, and certain family members

Qian Gao; Andrew Chess

1999-01-01

164

Complete Genome Sequence of the Mesoplasma florum W37 Strain  

PubMed Central

Mesoplasma florum is a small-genome fast-growing mollicute that is an attractive model for systems and synthetic genomics studies. We report the complete 825,824-bp genome sequence of a second representative of this species, M. florum strain W37, which contains 733 predicted open reading frames and 35 stable RNAs.

Baby, Vincent; Matteau, Dominick; Knight, Thomas F.

2013-01-01

165

Complete Genome Sequences of Helicobacter pylori Clarithromycin-Resistant Strains  

PubMed Central

We report the complete genome sequences of two Helicobacter pylori clarithromycin-resistant strains. Clarithromycin (CLR)-resistant strains were obtained under the exposure of H. pylori strain 26695 on agar plates with low clarithromycin concentrations. The genome data provide insights into the genomic changes of H. pylori under selection by clarithromycin in vitro.

Binh, Tran Thanh; Suzuki, Rumiko; Shiota, Seiji; Kwon, Dong Hyeon

2013-01-01

166

Complete Genome Sequence of Mycoplasma wenyonii Strain Massachusetts  

PubMed Central

Mycoplasma wenyonii is a hemotrophic mycoplasma that causes acute and chronic infections in cattle. Here, we announce the first complete genome sequence of this organism. The genome is a single circular chromosome with 650,228 bp and G+C% of 33.9. Analyses of M. wenyonii genome will provide insights into its biology.

Guimaraes, Ana M. S.; do Nascimento, Naila C.; SanMiguel, Phillip J.

2012-01-01

167

Genome Sequence of the Rice Pathogen Pseudomonas fuscovaginae CB98818  

PubMed Central

Pseudomonas fuscovaginae is a phytopathogenic bacterium causing bacterial sheath brown rot of cereal crops. Here, we present the draft genome sequence of P. fuscovaginae CB98818, originally isolated from a diseased rice plant in China. The draft genome will aid in epidemiological studies, comparative genomics, and quarantine of this broad-host-range pathogen.

Xie, Guanlin; Cui, Zhouqi; Tao, Zhongyun; Qiu, Hui; Liu, He; Zhu, Bo; Jin, Gulei; Sun, Guochang; Almoneafy, Abdulwareth

2012-01-01

168

Genome Sequence of Aedes aegypti, a Major Arbovirus Vector  

Microsoft Academic Search

We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at ~1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of ~4 to

Vishvanath Nene; Jennifer R. Wortman; Daniel Lawson; Brian Haas; Chinnappa Kodira; Z. Tu; Brendan Loftus; Zhiyong Xi; Karyn Megy; Manfred Grabherr; Quinghu Ren; E. M. Zdobnov; N. F. Lobo; K. S. Campbell; S. E. Brown; M. F. Bonaldo; Jingsong Zhu; S. P. Sinkins; D. G. Hogenkamp; Paolo Amedeo; Peter Arensburger; P. W. Atkinson; Shelby Bidwell; Jim Biedler; Ewan Birney; Robert V. Bruggner; Javier Costas; M. R. Coy; Jonathan Crabtree; Matt Crawford; Becky deBruyn; David DeCaprio; Karin Eiglmeier; Eric Eisenstadt; Hamza El-Dorry; W. M. Gelbart; S. L. Gomes; Martin Hammond; Linda I. Hannick; M. H. Holmes; J. R. Hogan; David Jaffe; J. S. Johnston; R. C. Kennedy; Hean Koo; Saul Kravitz; Evgenia V. Kriventseva; David Kulp; Kurt LaButti; Eduardo Lee; Song Li; Diane D. Lovin; Chunhong Mao; Evan Mauceli; C. F. M. Menck; J. R. Miller; Philip Montgomery; Akio Mori; A. L. Nascimento; H. F. Naveira; Chad Nusbaum; S. O'Leary; Joshua Orvis; Mihaela Pertea; Hadi Quesneville; K. R. Reidenbach; Yu-Hui Rogers; C. W. Roth; J. R. Schneider; Michael Schatz; Martin Shumway; Mario Stanke; E. O. Stinson; J. M. C. Tubio; J. P. VanZee; Sergio Verjovski-Almeida; Doreen Werner; Owen White; Stefan Wyder; Qiandong Zeng; Qi Zhao; Yongmei Zhao; C. A. Hill; A. S. Raikhel; M. B. Soares; D. L. Knudson; N. H. Lee; James Galagan; S. L. Salzberg; I. T. Paulsen; George Dimopoulos; F. H. Collins; Bruce Birren; C. M. Fraser-Liggett; D. W. Severson

2007-01-01

169

Complete genome sequence of Mycoplasma wenyonii strain Massachusetts.  

PubMed

Mycoplasma wenyonii is a hemotrophic mycoplasma that causes acute and chronic infections in cattle. Here, we announce the first complete genome sequence of this organism. The genome is a single circular chromosome with 650,228 bp and G+C% of 33.9. Analyses of M. wenyonii genome will provide insights into its biology. PMID:22965086

dos Santos, Andrea P; Guimaraes, Ana M S; do Nascimento, Naíla C; SanMiguel, Phillip J; Messick, Joanne B

2012-10-01

170

Mapping the human reference genome's missing sequence by three-way admixture in Latino genomes.  

PubMed

A principal obstacle to completing maps and analyses of the human genome involves the genome's "inaccessible" regions: sequences (often euchromatic and containing genes) that are isolated from the rest of the euchromatic genome by heterochromatin and other repeat-rich sequence. We describe a way to localize these sequences by using ancestry linkage disequilibrium in populations that derive ancestry from at least three continents, as is the case for Latinos. We used this approach to map the genomic locations of almost 20 megabases of sequence unlocalized or missing from the current human genome reference (NCBI Genome GRCh37)-a substantial fraction of the human genome's remaining unmapped sequence. We show that the genomic locations of most sequences that originated from fosmids and larger clones can be admixture mapped in this way, by using publicly available whole-genome sequence data. Genome assembly efforts and future builds of the human genome reference will be strongly informed by this localization of genes and other euchromatic sequences that are embedded within highly repetitive pericentromeric regions. PMID:23932108

Genovese, Giulio; Handsaker, Robert E; Li, Heng; Kenny, Eimear E; McCarroll, Steven A

2013-09-01

171

Gene discovery in Plasmodium chabaudi by genome survey sequencing  

Microsoft Academic Search

The first genome survey sequencing of the rodent malaria parasite Plasmodium chabaudi is presented here. In 766 sequences, 131 putative gene sequences have been identified by sequence similarity database searches. Further, 7 potential gene families, four of which have not previously been described, were discovered. These genes may be important in understanding the biology of malaria, as well as offering

Christoph S. Janssen; Michael P. Barrett; Daniel Lawson; Michael A. Quail; David Harris; Sharen Bowman; R. Stephen Phillips; C. Michael R. Turner

2001-01-01

172

Accurate whole human genome sequencing using reversible terminator chemistry  

Microsoft Academic Search

DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation.

David R. Bentley; Shankar Balasubramanian; Harold P. Swerdlow; Geoffrey P. Smith; John Milton; Clive G. Brown; Kevin P. Hall; Dirk J. Evers; Colin L. Barnes; Helen R. Bignell; Jonathan M. Boutell; Jason Bryant; Richard J. Carter; R. Keira Cheetham; Anthony J. Cox; Darren J. Ellis; Michael R. Flatbush; Niall A. Gormley; Sean J. Humphray; Leslie J. Irving; Mirian S. Karbelashvili; Scott M. Kirk; Heng Li; Xiaohai Liu; Klaus S. Maisinger; Lisa J. Murray; Bojan Obradovic; Tobias Ost; Michael L. Parkinson; Mark R. Pratt; Isabelle M. J. Rasolonjatovo; Mark T. Reed; Roberto Rigatti; Chiara Rodighiero; Mark T. Ross; Andrea Sabot; Subramanian V. Sankar; Aylwyn Scally; Gary P. Schroth; Mark E. Smith; Vincent P. Smith; Anastassia Spiridou; Peta E. Torrance; Svilen S. Tzonev; Eric H. Vermaas; Klaudia Walter; Xiaolin Wu; Lu Zhang; Mohammed D. Alam; Carole Anastasi; Ify C. Aniebo; David M. D. Bailey; Iain R. Bancarz; Saibal Banerjee; Selena G. Barbour; Primo A. Baybayan; Vincent A. Benoit; Kevin F. Benson; Claire Bevis; Phillip J. Black; Asha Boodhun; Joe S. Brennan; John A. Bridgham; Rob C. Brown; Andrew A. Brown; Dale H. Buermann; Abass A. Bundu; James C. Burrows; Nigel P. Carter; Nestor Castillo; Maria Chiara E. Catenazzi; Simon Chang; R. Neil Cooley; Natasha R. Crake; Olubunmi O. Dada; Konstantinos D. Diakoumakos; Belen Dominguez-Fernandez; David J. Earnshaw; Ugonna C. Egbujor; David W. Elmore; Sergey S. Etchin; Mark R. Ewan; Milan Fedurco; Louise J. Fraser; Karin V. Fuentes Fajardo; W. Scott Furey; David George; Kimberley J. Gietzen; Colin P. Goddard; George S. Golda; Philip A. Granieri; David L. Gustafson; Nancy F. Hansen; Kevin Harnish; Christian D. Haudenschild; Narinder I. Heyer; Matthew M. Hims; Johnny T. Ho; Adrian M. Horgan; Katya Hoschler; Steve Hurwitz; Denis V. Ivanov; Maria Q. Johnson; Terena James; T. A. Huw Jones; Gyoung-Dong Kang; Tzvetana H. Kerelska; Alan D. Kersey; Irina Khrebtukova; Alex P. Kindwall; Zoya Kingsbury; Paula I. Kokko-Gonzales; Anil Kumar; Marc A. Laurent; Cynthia T. Lawley; Sarah E. Lee; Xavier Lee; Arnold K. Liao; Jennifer A. Loch; Mitch Lok; Shujun Luo; Radhika M. Mammen; John W. Martin; Patrick G. McCauley; Paul McNitt; Parul Mehta; Keith W. Moon; Joe W. Mullens; Taksina Newington; Zemin Ning; Bee Ling Ng; Sonia M. Novo; Mark A. Osborne; Andrew Osnowski; Omead Ostadan; Lambros L. Paraschos; Lea Pickering; Andrew C. Pike; D. Chris Pinkard; Daniel P. Pliskin; Joe Podhasky; Victor J. Quijano; Come Raczy; Vicki H. Rae; Stephen R. Rawlings; Ana Chiva Rodriguez; Phyllida M. Roe; John Rogers; Maria C. Rogert Bacigalupo; Nikolai Romanov; Anthony Romieu; Rithy K. Roth; Natalie J. Rourke; Silke T. Ruediger; Eli Rusman; Raquel M. Sanches-Kuiper; Martin R. Schenker; Josefina M. Seoane; Richard J. Shaw; Mitch K. Shiver; Steven W. Short; Ning L. Sizto; Johannes P. Sluis; Melanie A. Smith; Jean Ernest Sohna Sohna; Eric J. Spence; Kim Stevens; Neil Sutton; Lukasz Szajkowski; Carolyn L. Tregidgo; Gerardo Turcatti; Stephanie vandeVondele; Yuli Verhovsky; Selene M. Virk; Suzanne Wakelin; Gregory C. Walcott; Jingwen Wang; Graham J. Worsley; Juying Yan; Ling Yau; Mike Zuerlein; Jane Rogers; James C. Mullikin; Matthew E. Hurles; Nick J. McCooke; John S. West; Frank L. Oaks; Peter L. Lundberg; David Klenerman; Richard Durbin; Anthony J. Smith

2008-01-01

173

Genome Sequence of the Nonpathogenic Pseudomonas aeruginosa Strain ATCC 15442.  

PubMed

Pseudomonas aeruginosa ATCC 15442 is an environmental strain of the Pseudomonas genus. Here, we present a 6.77-Mb assembly of its genome sequence. Besides giving insights into characteristics associated with the pathogenicity of P. aeruginosa, such as virulence, drug resistance, and biofilm formation, the genome sequence may provide some information related to biotechnological utilization of the strain. PMID:24786961

Wang, Yujiao; Li, Chao; Gao, Chao; Ma, Cuiqing; Xu, Ping

2014-01-01

174

Initial sequencing and analysis of the human genome  

Microsoft Academic Search

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Eric S. Lander; Lauren M. Linton; Bruce Birren; Chad Nusbaum; Michael C. Zody; Jennifer Baldwin; Keri Devon; Ken Dewar; Michael Doyle; William FitzHugh; Roel Funke; Diane Gage; Katrina Harris; Andrew Heaford; John Howland; Lisa Kann; Jessica Lehoczky; Rosie LeVine; Paul McEwan; Kevin McKernan; James Meldrim; Jill P. Mesirov; Cher Miranda; William Morris; Jerome Naylor; Christina Raymond; Mark Rosetti; Ralph Santos; Andrew Sheridan; Carrie Sougnez; Nicole Stange-Thomann; Nikola Stojanovic; Aravind Subramanian; Dudley Wyman; Jane Rogers; John Sulston; Rachael Ainscough; Stephan Beck; David Bentley; John Burton; Christopher Clee; Nigel Carter; Alan Coulson; Rebecca Deadman; Panos Deloukas; Andrew Dunham; Ian Dunham; Richard Durbin; Lisa French; Darren Grafham; Simon Gregory; Tim Hubbard; Sean Humphray; Adrienne Hunt; Matthew Jones; Christine Lloyd; Amanda McMurray; Lucy Matthews; Simon Mercer; Sarah Milne; James C. Mullikin; Andrew Mungall; Robert Plumb; Mark Ross; Ratna Shownkeen; Sarah Sims; Robert H. Waterston; Richard K. Wilson; LaDeana W. Hillier; John D. McPherson; Marco A. Marra; Elaine R. Mardis; Lucinda A. Fulton; Asif T. Chinwalla; Kymberlie H. Pepin; Warren R. Gish; Stephanie L. Chissoe; Michael C. Wendl; Kim D. Delehaunty; Tracie L. Miner; Andrew Delehaunty; Jason B. Kramer; Lisa L. Cook; Robert S. Fulton; Douglas L. Johnson; Patrick J. Minx; Sandra W. Clifton; Trevor Hawkins; Elbert Branscomb; Paul Predki; Paul Richardson; Sarah Wenning; Tom Slezak; Norman Doggett; Jan-Fang Cheng; Anne Olsen; Susan Lucas; Christopher Elkin; Edward Uberbacher; Marvin Frazier; Richard A. Gibbs; Donna M. Muzny; Steven E. Scherer; John B. Bouck; Erica J. Sodergren; Kim C. Worley; Catherine M. Rives; James H. Gorrell; Michael L. Metzker; Susan L. Naylor; Raju S. Kucherlapati; David L. Nelson; George M. Weinstock; Yoshiyuki Sakaki; Asao Fujiyama; Masahira Hattori; Tetsushi Yada; Atsushi Toyoda; Takehiko Itoh; Chiharu Kawagoe; Hidemi Watanabe; Yasushi Totoki; Todd Taylor; Jean Weissenbach; Roland Heilig; William Saurin; Francois Artiguenave; Philippe Brottier; Thomas Bruls; Eric Pelletier; Catherine Robert; Patrick Wincker; Douglas R. Smith; Lynn Doucette-Stamm; Marc Rubenfield; Keith Weinstock; Hong Mei Lee; JoAnn Dubois; André Rosenthal; Matthias Platzer; Gerald Nyakatura; Stefan Taudien; Andreas Rump; Huanming Yang; Jun Yu; Jian Wang; Guyang Huang; Jun Gu; Leroy Hood; Lee Rowen; Anup Madan; Shizen Qin; Ronald W. Davis; Nancy A. Federspiel; A. Pia Abola; Michael J. Proctor; Richard M. Myers; Jeremy Schmutz; Mark Dickson; Jane Grimwood; David R. Cox; Maynard V. Olson; Rajinder Kaul; Christopher Raymond; Nobuyoshi Shimizu; Kazuhiko Kawasaki; Shinsei Minoshima; Glen A. Evans; Maria Athanasiou; Roger Schultz; Bruce A. Roe; Feng Chen; Huaqin Pan; Juliane Ramser; Hans Lehrach; Richard Reinhardt; W. Richard McCombie; Melissa de la Bastide; Neilay Dedhia; Helmut Blöcker; Klaus Hornischer; Gabriele Nordsiek; Richa Agarwala; L. Aravind; Jeffrey A. Bailey; Serafim Batzoglou; Ewan Birney; Peer Bork; Daniel G. Brown; Christopher B. Burge; Lorenzo Cerutti; Hsiu-Chuan Chen; Deanna Church; Michele Clamp; Richard R. Copley; Tobias Doerks; Sean R. Eddy; Evan E. Eichler; Terrence S. Furey; James Galagan; James G. R. Gilbert; Cyrus Harmon; Yoshihide Hayashizaki; David Haussler; Henning Hermjakob; Karsten Hokamp; Wonhee Jang; L. Steven Johnson; Thomas A. Jones; Simon Kasif; Arek Kaspryzk; Scot Kennedy; W. James Kent; Paul Kitts; Eugene V. Koonin; Ian Korf; David Kulp; Doron Lancet; Todd M. Lowe; Aoife McLysaght; Tarjei Mikkelsen; John V. Moran; Nicola Mulder; Victor J. Pollara; Chris P. Ponting; Greg Schuler; Jörg Schultz; Guy Slater; Arian F. A. Smit; Elia Stupka; Joseph Szustakowki; Danielle Thierry-Mieg; Jean Thierry-Mieg; Lukas Wagner; John Wallis; Raymond Wheeler; Alan Williams; Yuri I. Wolf; Kenneth H. Wolfe; Shiaw-Pyng Yang; Ru-Fang Yeh; Francis Collins; Mark S. Guyer; Jane Peterson; Adam Felsenfeld; Kris A. Wetterstrand; Aristides Patrinos; Michael J. Morgan

2001-01-01

175

Draft Genome Sequence of the Fish Pathogen Piscirickettsia salmonis.  

PubMed

Piscirickettsia salmonis is a Gram-negative intracellular fish pathogen that has a significant impact on the salmon industry. Here, we report the genome sequence of P. salmonis strain LF-89. This is the first draft genome sequence of P. salmonis, and it reveals interesting attributes, including flagellar genes, despite this bacterium being considered nonmotile. PMID:24201203

Eppinger, Mark; McNair, Katelyn; Zogaj, Xhavit; Dinsdale, Elizabeth A; Edwards, Robert A; Klose, Karl E

2013-01-01

176

Genome Sequence of the Nonpathogenic Pseudomonas aeruginosa Strain ATCC 15442  

PubMed Central

Pseudomonas aeruginosa ATCC 15442 is an environmental strain of the Pseudomonas genus. Here, we present a 6.77-Mb assembly of its genome sequence. Besides giving insights into characteristics associated with the pathogenicity of P. aeruginosa, such as virulence, drug resistance, and biofilm formation, the genome sequence may provide some information related to biotechnological utilization of the strain.

Wang, Yujiao; Li, Chao; Ma, Cuiqing; Xu, Ping

2014-01-01

177

Complete Genome Sequence of Aeromonas veronii Strain B565?  

PubMed Central

Aeromonas veronii strain B565 was isolated from aquaculture pond sediment in China. We present here the complete genome sequence of B565 and compare it with 2 published genome sequences of pathogenic strains in the Aeromonas genus. The result represents an independent stepwise acquisition of virulence factors of pathogenic strains in this genus.

Li, Yanxia; Liu, Yuchun; Zhou, Zhemin; Huang, Huoqing; Ren, Yan; Zhang, Yuting; Li, Guannan; Zhou, Zhigang; Wang, Lei

2011-01-01

178

Initial sequencing and analysis of the human genome.  

PubMed

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, N; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowski, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

2001-02-15

179

Full genome sequence of a bovine enterovirus isolated in china.  

PubMed

We report the full genome sequence of an isolate of bovine enterovirus type B from China. The virus (BEV-BJ001) was isolated from Beijing, China, from fecal swabs of cattle suffering from severe diarrhea. This genome sequence will give useful insight for future molecular epidemiological studies in China. PMID:24970832

Peng, Xiao-Wei; Dong, Hao; Wu, Qing-Min; Lu, Yan-Li

2014-01-01

180

Draft Genome Sequence of Enterococcus mundtii CRL1656  

PubMed Central

We report the draft genome sequence of Enterococcus mundtii CRL1656, which was isolated from the stripping milk of a clinically healthy adult Holstein dairy cow from a dairy farm of the northwestern region of Tucumán (Argentina). The 3.10-Mb genome sequence consists of 450 large contigs and contains 2,741 predicted protein-coding genes.

Magni, Christian; Espeche, Carolina; Repizo, Guillermo D.; Saavedra, Lucila; Suarez, Cristian A.; Blancato, Victor S.; Espariz, Martin; Esteban, Luis; Raya, Raul R.; Font de Valdez, Graciela; Vignolo, Graciela; Mozzi, Fernanda; Taranto, Maria Pia; Hebert, Elvira M.; Nader-Macias, Maria Elena

2012-01-01

181

Draft genome sequence of Enterococcus mundtii CRL1656.  

PubMed

We report the draft genome sequence of Enterococcus mundtii CRL1656, which was isolated from the stripping milk of a clinically healthy adult Holstein dairy cow from a dairy farm of the northwestern region of Tucumán (Argentina). The 3.10-Mb genome sequence consists of 450 large contigs and contains 2,741 predicted protein-coding genes. PMID:22207752

Magni, Christian; Espeche, Carolina; Repizo, Guillermo D; Saavedra, Lucila; Suárez, Cristian A; Blancato, Víctor S; Espariz, Martín; Esteban, Luis; Raya, Raúl R; Font de Valdez, Graciela; Vignolo, Graciela; Mozzi, Fernanda; Taranto, María Pía; Hebert, Elvira M; Nader-Macías, María Elena; Sesma, Fernando

2012-01-01

182

Full Genome Sequence of Giant Panda Rotavirus Strain CH-1  

PubMed Central

We report here the complete genomic sequence of the giant panda rotavirus strain CH-1. This work is the first to document the complete genomic sequence (segments 1 to 11) of the CH-1 strain, which offers an effective platform for providing authentic research experiences to novice scientists.

Guo, Ling; Yang, Shaolin; Wang, Chengdong; Chen, Shijie; Yang, Xiaonong; Hou, Rong; Quan, Zifang; Hao, Zhongxiang

2013-01-01

183

Full Genome Sequence of a Bovine Enterovirus Isolated in China  

PubMed Central

We report the full genome sequence of an isolate of bovine enterovirus type B from China. The virus (BEV-BJ001) was isolated from Beijing, China, from fecal swabs of cattle suffering from severe diarrhea. This genome sequence will give useful insight for future molecular epidemiological studies in China.

Peng, Xiao-wei; Dong, Hao; Wu, Qing-min

2014-01-01

184

Genome Sequence of the Pathogenic Bacterium Vibrio vulnificus Biotype 3  

PubMed Central

We report the first genome sequence of the pathogenic Vibrio vulnificus biotype 3. This draft genome sequence of the environmental strain VVyb1(BT3), isolated in Israel, provides a representation of this newly emerged clonal group, which reveals higher similarity to the clinical strains of biotype 1 than to the environmental ones.

Danin-Poleg, Yael; Elgavish, Sharona; Raz, Nili; Efimov, Vera

2013-01-01

185

Assessing the Drosophila melanogaster and Anopheles gambiae Genome Annotations Using Genome-Wide Sequence Comparisons  

PubMed Central

We performed genome-wide sequence comparisons at the protein coding level between the genome sequences of Drosophila melanogaster and Anopheles gambiae. Such comparisons detect evolutionarily conserved regions (ecores) that can be used for a qualitative and quantitative evaluation of the available annotations of both genomes. They also provide novel candidate features for annotation. The percentage of ecores mapping outside annotations in the A. gambiae genome is about fourfold higher than in D. melanogaster. The A. gambiae genome assembly also contains a high proportion of duplicated ecores, possibly resulting from artefactual sequence duplications in the genome assembly. The occurrence of 4063 ecores in the D. melanogaster genome outside annotations suggests that some genes are not yet or only partially annotated. The present work illustrates the power of comparative genomics approaches towards an exhaustive and accurate establishment of gene models and gene catalogues in insect genomes.

Jaillon, Olivier; Dossat, Carole; Eckenberg, Ralph; Eiglmeier, Karin; Segurens, Beatrice; Aury, Jean-Marc; Roth, Charles W.; Scarpelli, Claude; Brey, Paul T.; Weissenbach, Jean; Wincker, Patrick

2003-01-01

186

EcoGene: a genome sequence database for Escherichia coli K-12  

Microsoft Academic Search

The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Center. The EcoGene12 release includes 4293 genes. EcoGene12 differs from the

Kenneth E. Rudd

2000-01-01

187

Low-pass sequencing for microbial comparative genomics  

Microsoft Academic Search

BACKGROUND: We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1

Young Ah Goo; Jared Roach; Gustavo Glusman; Nitin S Baliga; Kerry Deutsch; Min Pan; Sean Kennedy; Shiladitya DasSarma; Wailap Victor Ng; Leroy Hood

2004-01-01

188

Genomic sequencing and analysis of Clostera anachoreta granulovirus  

Microsoft Academic Search

The complete genome of Clostera anachoreta granulovirus (ClanGV) from an important pest of poplar, Clostera anachoreta (Lepidoptera: Notodontidae), was sequenced and analyzed. The circular double-stranded genome is 101,487 bp in size with a\\u000a C+G content of 44.4%. It is predicted to contain 123 open reading frames (ORFs), covering 93% of the whole genome sequence.\\u000a One hundred eleven ClanGV ORFs are homologues

Zhenpu Liang; Xiaoxia Zhang; Xinming Yin; Sumei Cao; Feng Xu

2011-01-01

189

A physical map of the papaya genome with integrated genetic map and genome sequence  

PubMed Central

Background Papaya is a major fruit crop in tropical and subtropical regions worldwide and has primitive sex chromosomes controlling sex determination in this trioecious species. The papaya genome was recently sequenced because of its agricultural importance, unique biological features, and successful application of transgenic papaya for resistance to papaya ringspot virus. As a part of the genome sequencing project, we constructed a BAC-based physical map using a high information-content fingerprinting approach to assist whole genome shotgun sequence assembly. Results The physical map consists of 963 contigs, representing 9.4× genome equivalents, and was integrated with the genetic map and genome sequence using BAC end sequences and a sequence-tagged high-density genetic map. The estimated genome coverage of the physical map is about 95.8%, while 72.4% of the genome was aligned to the genetic map. A total of 1,181 high quality overgo (overlapping oligonucleotide) probes representing conserved sequences in Arabidopsis and genetically mapped loci in Brassica were anchored on the physical map, which provides a foundation for comparative genomics in the Brassicales. The integrated genetic and physical map aligned with the genome sequence revealed recombination hotspots as well as regions suppressed for recombination across the genome, particularly on the recently evolved sex chromosomes. Suppression of recombination spread to the adjacent region of the male specific region of the Y chromosome (MSY), and recombination rates were recovered gradually and then exceeded the genome average. Recombination hotspots were observed at about 10 Mb away on both sides of the MSY, showing 7-fold increase compared with the genome wide average, demonstrating the dynamics of recombination of the sex chromosomes. Conclusion A BAC-based physical map of papaya was constructed and integrated with the genetic map and genome sequence. The integrated map facilitated the draft genome assembly, and is a valuable resource for comparative genomics and map-based cloning of agronomically and economically important genes and for sex chromosome research.

2009-01-01

190

Using BLAT to Find Sequence Similarity in Closely Related Genomes  

PubMed Central

The BLAST-Like Alignment Tool (BLAT) is used to find genomic sequences that match a protein or DNA sequence submitted by the user. BLAT is typically used for searching similar sequences within the same or closely related species. It was developed to align millions of expressed sequence tags and mouse whole-genome random reads to the human genome at a faster speed (Kent, 2002). It is freely available either on the web or as a downloadable stand-alone program. BLAT search results provide a link for visualization in the University of California, Santa Cruz (UCSC) genome browser where associated biological information may be obtained. Three example protocols are given: using an mRNA sequence to identify the exon-intron locations and associated gene in the genomic sequence of the same species, using a protein sequence to identify the coding regions in a genomic sequence and to search for gene family members in the same species, and using a protein sequence to find homologs in another species. A support protocol is given to visualize multiple nearby matches obtained in a search in one view of the UCSC Genome Browser. Discussion of the technical aspects of BLAT is also provided.

Bhagwat, Medha; Young, Lynn; Robison, Rex R.

2014-01-01

191

Progress in Understanding and Sequencing the Genome of Brassica rapa  

PubMed Central

Brassica rapa, which is closely related to Arabidopsis thaliana, is an important crop and a model plant for studying genome evolution via polyploidization. We report the current understanding of the genome structure of B. rapa and efforts for the whole-genome sequencing of the species. The tribe Brassicaceae, which comprises ca. 240 species, descended from a common hexaploid ancestor with a basic genome similar to that of Arabidopsis. Chromosome rearrangements, including fusions and/or fissions, resulted in the present-day “diploid” Brassica species with variation in chromosome number and phenotype. Triplicated genomic segments of B. rapa are collinear to those of A. thaliana with InDels. The genome triplication has led to an approximately 1.7-fold increase in the B. rapa gene number compared to that of A. thaliana. Repetitive DNA of B. rapa has also been extensively amplified and has diverged from that of A. thaliana. For its whole-genome sequencing, the Brassica rapa Genome Sequencing Project (BrGSP) consortium has developed suitable genomic resources and constructed genetic and physical maps. Ten chromosomes of B. rapa are being allocated to BrGSP consortium participants, and each chromosome will be sequenced by a BAC-by-BAC approach. Genome sequencing of B. rapa will offer a new perspective for plant biology and evolution in the context of polyploidization.

Hong, Chang Pyo; Kwon, Soo-Jin; Kim, Jung Sun; Yang, Tae-Jin; Park, Beom-Seok; Lim, Yong Pyo

2008-01-01

192

Complete Genome Sequence of Probiotic Strain Lactobacillus acidophilus La-14.  

PubMed

We present the 1,991,830-bp complete genome sequence of Lactobacillus acidophilus strain La-14 (SD-5212). Comparative genomic analysis revealed 99.98% similarity overall to the L. acidophilus NCFM genome. Globally, 111 single nucleotide polymorphisms (SNPs) (95 SNPs, 16 indels) were observed throughout the genome. Also, a 416-bp deletion in the LA14_1146 sugar ABC transporter was identified. PMID:23788546

Stahl, Buffy; Barrangou, Rodolphe

2013-01-01

193

Complete Genome Sequence of Probiotic Strain Lactobacillus acidophilus La-14  

PubMed Central

We present the 1,991,830-bp complete genome sequence of Lactobacillus acidophilus strain La-14 (SD-5212). Comparative genomic analysis revealed 99.98% similarity overall to the L. acidophilus NCFM genome. Globally, 111 single nucleotide polymorphisms (SNPs) (95 SNPs, 16 indels) were observed throughout the genome. Also, a 416-bp deletion in the LA14_1146 sugar ABC transporter was identified.

Stahl, Buffy

2013-01-01

194

Genome Sequence of the Lager Brewing Yeast, an Interspecies Hybrid  

Microsoft Academic Search

This work presents the genome sequencing of the lager brewing yeast (Saccharomyces pastorianus) Weihenstephan 34\\/70, a strain widely used in lager beer brewing. The 25 Mb genome comprises two nuclear sub-genomes originating from Saccharomyces cerevisiae and Saccharomyces bayanus and one cir- cular mitochondrial genome originating from S. bayanus. Thirty-six different types of chromosomes were found including eight chromosomes with translocations

YOSHIHIRO Nakao; TAKESHI Kanamori; T AKEHIKO Itoh; Y. Kodama; S. Rainieri; N. Nakamura; T. Shimonaga; M. Hattori; T. Ashikari

2009-01-01

195

Complete Chloroplast Genome Sequence of Glycine max and Comparative Analyses with other Legume Genomes  

Microsoft Academic Search

Lack of complete chloroplast genome sequences is still one of the major limitations to extending chloroplast genetic engineering technology to useful crops. Therefore, we sequenced the soybean chloroplast genome and compared it to the other completely sequenced legumes, Lotus and Medicago. The chloroplast genome of Glycine is 152,218 basepairs (bp) in length, including a pair of inverted repeats of 25,574 bp

Christopher Saski; Seung-Bum Lee; Henry Daniell; Todd C. Wood; Jeffrey Tomkins; Hyi-Gyung Kim; Robert K. Jansen

2005-01-01

196

Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing  

PubMed Central

ABSTRACT Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques.

Beitzel, Brett; Chain, Patrick S. G.; Davenport, Matthew G.; Donaldson, Eric; Frieman, Matthew; Kugelman, Jeffrey; Kuhn, Jens H.; O'Rear, Jules; Sabeti, Pardis C.; Wentworth, David E.; Wiley, Michael R.; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher

2014-01-01

197

Standards for sequencing viral genomes in the era of high-throughput sequencing.  

PubMed

Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five "standard" categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques. PMID:24939889

Ladner, Jason T; Beitzel, Brett; Chain, Patrick S G; Davenport, Matthew G; Donaldson, Eric F; Frieman, Matthew; Kugelman, Jeffrey R; Kuhn, Jens H; O'Rear, Jules; Sabeti, Pardis C; Wentworth, David E; Wiley, Michael R; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher; Palacios, Gustavo

2014-01-01

198

Community-wide analysis of microbial genome sequence signatures  

PubMed Central

Background Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them. Results We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and low-abundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing < 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH ~5) versus extracellular (pH ~1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases. Conclusions An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.

Dick, Gregory J; Andersson, Anders F; Baker, Brett J; Simmons, Sheri L; Thomas, Brian C; Yelton, A Pepper; Banfield, Jillian F

2009-01-01

199

Complete genomic sequence of bluetongue virus serotype 16 from China.  

PubMed

We report here the complete genomic sequence of the Chinese bluetongue virus serotype 16 (BTV16) strain BN96/16. This work is the first to document the complete genomic sequence (segments 1 to 10) of a BTV16 strain. The sequence information provided herein will help determine the geographic origin of BTV16 and define the phylogenetic relationship of BTV16 to other BTV strains. PMID:22106384

Yang, Tao; Liu, Nihong; Xu, Qingyuan; Sun, Encheng; Qin, Yongli; Zhao, Jin; Wu, Donglai

2011-12-01

200

Sequencing genomes from single cells by polymerase cloning  

Microsoft Academic Search

Genome sequencing currently requires DNA from pools of numerous nearly identical cells (clones), leaving the genome sequences of many difficult-to-culture microorganisms unattainable. We report a sequencing strategy that eliminates culturing of microorganisms by using real-time isothermal amplification to form polymerase clones (plones) from the DNA of single cells. Two Escherichia coli plones, analyzed by Affymetrix chip hybridization, demonstrate that plonal

Adam C Martiny; Nikos B Reppas; Kerrie W Barry; Joel Malek; Sallie W Chisholm; Kun Zhang; George M Church

2006-01-01

201

Genome Sequence of the Trichosporon asahii Environmental Strain CBS 8904  

PubMed Central

This is the first report of the genome sequence of Trichosporon asahii environmental strain CBS 8904, which was isolated from maize cobs. Comparison of the genome sequence with that of clinical strain CBS 2479 revealed that they have >99% chromosomal and mitochondrial sequence identity, yet CBS 8904 has 368 specific genes. Analysis of clusters of orthologous groups predicted that 3,307 genes belong to 23 functional categories and 703 genes were predicted to have a general function.

Li, Hai Tao; Zhu, He; Zhou, Guang Peng; Wang, Meng; Wang, Lei

2012-01-01

202

Complete Genome Sequence of Mycoplasma haemofelis, a Hemotropic Mycoplasma?  

PubMed Central

Here, we present the genome sequence of Mycoplasma haemofelis strain Langford 1, representing the first hemotropic mycoplasma (hemoplasma) species to be completely sequenced and annotated. Originally isolated from a cat with hemolytic anemia, this strain induces severe hemolytic anemia when inoculated into specific-pathogen-free-derived cats. The genome sequence has provided insights into the biology of this uncultivatable hemoplasma and has identified potential molecular mechanisms underlying its pathogenicity.

Barker, Emily N.; Helps, Chris R.; Peters, Iain R.; Darby, Alistair C.; Radford, Alan D.; Tasker, Severine

2011-01-01

203

Development of genomic resources in support of sequencing, assembly, and annotation of the catfish genome.  

PubMed

Major progress has been made in catfish genomics including construction of high-density genetic linkage maps, BAC-based physical maps, and integration of genetic linkage and physical maps. Large numbers of ESTs have been generated from both channel catfish and blue catfish. Microarray platforms have been developed for the analysis of genome expression. Genome repeat structures are studied, laying grounds for whole genome sequencing. USDA recently approved funding of the whole genome sequencing project of catfish using the next generation sequencing technologies. Generation of the whole genome sequence is a historical landmark of catfish research as it opens the real first step of the long march toward genetic enhancement. The research community needs to be focused on aquaculture performance and production traits, take advantage of the unprecedented genome information and technology, and make real progress toward genetic improvements of aquaculture brood stocks. PMID:20430707

Liu, Zhanjiang

2011-03-01

204

Data structures and compression algorithms for genomic sequence data  

PubMed Central

Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function and evolution, but also for the storage, navigation and privacy of genomic data. Here, we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and protecting the data. Results: The general idea is to encode only the differences between a genome sequence and a reference sequence, using absolute or relative coordinates for the location of the differences. These locations and the corresponding differential variants can be encoded into binary strings using various entropy coding methods, from fixed codes such as Golomb and Elias codes, to variables codes, such as Huffman codes. We demonstrate the approach and various tradeoffs using highly variables human mitochondrial genome sequences as a testbed. With only a partial level of optimization, 3615 genome sequences occupying 56 MB in GenBank are compressed down to only 167 KB, achieving a 345-fold compression rate, using the revised Cambridge Reference Sequence as the reference sequence. Using the consensus sequence as the reference sequence, the data can be stored using only 133 KB, corresponding to a 433-fold level of compression, roughly a 23% improvement. Extensions to nuclear genomes and high-throughput sequencing data are discussed. Availability: Data are publicly available from GenBank, the HapMap web site, and the MITOMAP database. Supplementary materials with additional results, statistics, and software implementations are available from http://mammag.web.uci.edu/bin/view/Mitowiki/ProjectDNACompression. Contact: pfbaldi@ics.uci.edu

Brandon, Marty C.; Wallace, Douglas C.; Baldi, Pierre

2009-01-01

205

Savant: genome browser for high-throughput sequencing data  

PubMed Central

Motivation: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant Contact: savant@cs.toronto.edu

Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

2010-01-01

206

Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.  

PubMed

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources. PMID:23593174

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

207

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data  

PubMed Central

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

208

Whole-exome targeted sequencing of the uncharacterized pine genome.  

PubMed

The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm. PMID:23551702

Neves, Leandro G; Davis, John M; Barbazuk, William B; Kirst, Matias

2013-07-01

209

Marsupial genome sequences: providing insight into evolution and disease.  

PubMed

Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

Deakin, Janine E

2012-01-01

210

Nucleotide sequence and genome organization of carnation mottle virus RNA.  

PubMed Central

The complete nucleotide sequence of carnation mottle genomic RNA (4003 nucleotides) is presented. The sequence was determined for cloned cDNA copies of viral RNA containing over 99% of the sequence and was completed by direct sequence analysis of RNA and cDNA transcripts. The sequence contains two long open reading frames which together can account for observed translation products. One translation product would arise by suppression of an amber termination codon and the sequence raises the possibility that a second suppression event could also occur. Sequence homology exists between a portion of the carnation mottle virus sequence and that of putative RNA polymerases from other RNA viruses. Images

Guilley, H; Carrington, J C; Balazs, E; Jonard, G; Richards, K; Morris, T J

1985-01-01

211

Complete genome sequence of Spirosoma linguale type strain (1T)  

PubMed Central

Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plasmids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Lail, Kathleen; Sikorski, Johannes; Saunders, Elizabeth; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Brettin, Thomas; Detter, John C.; Schutze, Andrea; Rohde, Manfred; Tindall, Brian J.; Goker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Chen, Feng

2010-01-01

212

Genome sequence of the Nocardia bacteriophage NBR1.  

PubMed

We here characterize a novel bacteriophage (NBR1) that is lytic for Nocardia otitidiscaviarum and N. brasiliensis. NBR1 is a member of the family Siphoviridae and appears to have a structurally more complex tail than previously reported Siphoviridae phages. NBR1 has a linear genome of 46,140 bp and a sequence that appears novel when compared to those of other phage sequences in GenBank. Annotation of the genome reveals 68 putative open reading frames. The phage genome organization appears to be similar to other Siphoviridae phage genomes in that it has a modular arrangement. PMID:23913189

Petrovski, Steve; Seviour, Robert J; Tillett, Daniel

2014-01-01

213

Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)  

PubMed Central

Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia C.; Chain, Patrick; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

2009-01-01

214

Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)  

SciTech Connect

Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

215

Complete genome sequence of Thermomonospora curvata type strain (B9)  

SciTech Connect

Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Chertkov, Olga [Los Alamos National Laboratory (LANL); Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [Joint Genome Institute, Walnut Creek, California; Lapidus, Alla L. [Joint Genome Institute, Walnut Creek, California; Lucas, Susan [Joint Genome Institute, Walnut Creek, California; Glavina Del Rio, Tijana [Joint Genome Institute, Walnut Creek, California; Tice, Hope [Joint Genome Institute, Walnut Creek, California; Cheng, Jan-Fang [Joint Genome Institute, Walnut Creek, California; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [Joint Genome Institute, Walnut Creek, California; Liolios, Konstantinos [Joint Genome Institute, Walnut Creek, California; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [Joint Genome Institute, Walnut Creek, California; Palaniappan, Krishna [Joint Genome Institute, Walnut Creek, California; Ngatchou, Olivier Duplex [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brettin, Thomas S [ORNL; Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [Joint Genome Institute, Walnut Creek, California; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [Joint Genome Institute, Walnut Creek, California; Bristow, James [Joint Genome Institute, Walnut Creek, California; Eisen, Jonathan [Joint Genome Institute, Walnut Creek, California; Markowitz, Victor [Joint Genome Institute, Walnut Creek, California; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [Joint Genome Institute, Walnut Creek, California

2011-01-01

216

Complete genome sequence of Spirosoma linguale type strain (1T)  

SciTech Connect

Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

Lail, Kathleen [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Schutze, Andrea [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chen, Feng [U.S. Department of Energy, Joint Genome Institute

2010-01-01

217

Comparative Analysis of Rice Genome Sequence to Understand the Molecular Basis of Genome Evolution  

Microsoft Academic Search

Accurate sequencing of the rice genome has ignited a passion for elucidating mechanism for sequence diversity among rice varieties\\u000a and species, both in protein-coding regions and in genomic regions that are important for chromosome functions. Here, we have\\u000a shown examples of sequence diversity in genic and non-genic regions. Sequence analysis of chromosome ends has revealed that\\u000a there is diversity in

Jianzhong Wu; Hiroshi Mizuno; Takuji Sasaki; Takashi Matsumoto

2008-01-01

218

Complete Genome Sequence of a Polyomavirus Isolated from Horses  

PubMed Central

A polyomavirus was isolated from the eyes of horses, and the sequence was determined. A nearly identical VP1 sequence was amplified from the kidney of another animal. We report the complete genome sequence of the first polyomavirus to be isolated from a horse. Analysis shows it to be most closely related overall to human and nonhuman primate polyomaviruses.

Wise, Annabel G.; Maes, Roger K.

2012-01-01

219

Genome sequencing: a systematic review of health economic evidence  

PubMed Central

Recently the sequencing of the human genome has become a major biological and clinical research field. However, the public health impact of this new technology with focus on the financial effect is not yet to be foreseen. To provide an overview of the current health economic evidence for genome sequencing, we conducted a thorough systematic review of the literature from 17 databases. In addition, we conducted a hand search. Starting with 5 520 records we ultimately included five full-text publications and one internet source, all focused on cost calculations. The results were very heterogeneous and, therefore, difficult to compare. Furthermore, because the methodology of the publications was quite poor, the reliability and validity of the results were questionable. The real costs for the whole sequencing workflow, including data management and analysis, remain unknown. Overall, our review indicates that the current health economic evidence for genome sequencing is quite poor. Therefore, we listed aspects that needed to be considered when conducting health economic analyses of genome sequencing. Thereby, specifics regarding the overall aim, technology, population, indication, comparator, alternatives after sequencing, outcomes, probabilities, and costs with respect to genome sequencing are discussed. For further research, at the outset, a comprehensive cost calculation of genome sequencing is needed, because all further health economic studies rely on valid cost data. The results will serve as an input parameter for budget-impact analyses or cost-effectiveness analyses.

2013-01-01

220

Exploring Microbial Genome Sequences to Identify Protein Families on the Grid.  

National Technical Information Service (NTIS)

The analysis of microbial genome sequences can identify protein families that provide potential drug targets for new antibiotics. With the rapid accumulation of newly sequenced genomes, the analysis of complete genome sequences has become a computationall...

Y. Sun A. Wipat M. Pocock P. Lee K. Flanagan J. Worthington

2005-01-01

221

Complete genome sequence of Staphylothermus hellenicus P8T  

SciTech Connect

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Davenport, Karen W. [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

2011-01-01

222

Complete genome sequence of Staphylothermus hellenicus P8.  

PubMed

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phylum Crenarchaeota. Strain P8(T) is the type strain of the species and was isolated from a shallow hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the complete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein-coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) Laboratory Sequencing Program (LSP) project. PMID:22180806

Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne; Pitluck, Samuel; Davenport, Karen; Detter, John C; Han, Cliff; Tapia, Roxanne; Land, Miriam; Hauser, Loren; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos; Ivanova, Natalia

2011-10-15

223

Management of Incidental Findings in Clinical Genomic Sequencing  

PubMed Central

Genomic sequencing is becoming accurate, fast, and inexpensive, and is rapidly being incorporated into clinical practice. Incidental findings, which result in large numbers from genomic sequencing, are a potential barrier to the utility of this new technology due to their high prevalence and the lack of evidence or guidelines available to guide their clinical interpretation. This unit reviews the definition, classification, and management of incidental findings from genomic sequencing. The unit focuses on the clinical aspects of handling incidental findings, with an emphasis on the key role of clinical context in defining incidental findings and determining their clinical relevance and utility.

Krier, Joel B.; Green, Robert C.

2013-01-01

224

Genomic Treasure Troves: Complete Genome Sequencing of Herbarium and Insect Museum Specimens  

PubMed Central

Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22–82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4–97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2–71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal. Furthermore, NGS of historical DNA enables recovering crucial genetic information from old type specimens that to date have remained mostly unutilized and, thus, opens up a new frontier for taxonomic research as well.

Staats, Martijn; Erkens, Roy H. J.; van de Vossenberg, Bart; Wieringa, Jan J.; Kraaijeveld, Ken; Stielow, Benjamin; Geml, Jozsef; Richardson, James E.; Bakker, Freek T.

2013-01-01

225

GIST: Genomic island suite of tools for predicting genomic islands in genomic sequences  

PubMed Central

Genomic Islands (GIs) are genomic regions that are originally from other organisms, through a process known as Horizontal Gene Transfer (HGT). Detection of GIs plays a significant role in biomedical research since such align genomic regions usually contain important features, such as pathogenic genes. We have developed a use friendly graphic user interface, Genomic Island Suite of Tools (GIST), which is a platform for scientific users to predict GIs. This software package includes five commonly used tools, AlienHunter, IslandPath, Colombo SIGI-HMM, INDeGenIUS and Pai-Ida. It also includes an optimization program EGID that ensembles the result of existing tools for more accurate prediction. The tools in GIST can be used either separately or sequentially. GIST also includes a downloadable feature that facilitates collecting the input genomes automatically from the FTP server of the National Center for Biotechnology Information (NCBI). GIST was implemented in Java, and was compiled and executed on Linux/Unix operating systems. Availability The database is available for free at http://www5.esu.edu/cpsc/bioinfo/software/GIST

Hasan, Mohammad Shabbir; Liu, Qi; Wang, Han; Fazekas, John; Chen, Bernard; Che, Dongsheng

2012-01-01

226

Sequencing, Assembling, and Correcting Draft Genomes Using Recombinant Populations  

PubMed Central

Current de novo whole-genome sequencing approaches often are inadequate for organisms lacking substantial preexisting genetic data. Problems with these methods are manifest as: large numbers of scaffolds that are not ordered within chromosomes or assigned to individual chromosomes, misassembly of allelic sequences as separate loci when the individual(s) being sequenced are heterozygous, and the collapse of recently duplicated sequences into a single locus, regardless of levels of heterozygosity. Here we propose a new approach for producing de novo whole-genome sequences—which we call recombinant population genome construction—that solves many of the problems encountered in standard genome assembly and that can be applied in model and nonmodel organisms. Our approach takes advantage of next-generation sequencing technologies to simultaneously barcode and sequence a large number of individuals from a recombinant population. The sequences of all recombinants can be combined to create an initial de novo assembly, followed by the use of individual recombinant genotypes to correct assembly splitting/collapsing and to order and orient scaffolds within linkage groups. Recombinant population genome construction can rapidly accelerate the transformation of nonmodel species into genome-enabled systems by simultaneously producing a high-quality genome assembly and providing genomic tools (e.g., high-confidence single-nucleotide polymorphisms) for immediate applications. In populations segregating for important functional traits, this approach also enables simultaneous mapping of quantitative trait loci. We demonstrate our method using simulated Illumina data from a recombinant population of Caenorhabditis elegans and show that the method can produce a high-fidelity, high-quality genome assembly for both parents of the cross.

Hahn, Matthew W.; Zhang, Simo V.; Moyle, Leonie C.

2014-01-01

227

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change  

SciTech Connect

In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

2011-04-29

228

A Genome-Wide Analysis of FRT-Like Sequences in the Human Genome  

PubMed Central

Efficient and precise genome manipulations can be achieved by the Flp/FRT system of site-specific DNA recombination. Applications of this system are limited, however, to cases when target sites for Flp recombinase, FRT sites, are pre-introduced into a genome locale of interest. To expand use of the Flp/FRT system in genome engineering, variants of Flp recombinase can be evolved to recognize pre-existing genomic sequences that resemble FRT and thus can serve as recombination sites. To understand the distribution and sequence properties of genomic FRT-like sites, we performed a genome-wide analysis of FRT-like sites in the human genome using the experimentally-derived parameters. Out of 642,151 identified FRT-like sequences, 581,157 sequences were unique and 12,452 sequences had at least one exact duplicate. Duplicated FRT-like sequences are located mostly within LINE1, but also within LTRs of endogenous retroviruses, Alu repeats and other repetitive DNA sequences. The unique FRT-like sequences were classified based on the number of matches to FRT within the first four proximal bases pairs of the Flp binding elements of FRT and the nature of mismatched base pairs in the same region. The data obtained will be useful for the emerging field of genome engineering.

Shultz, Jeffry L.; Voziyanova, Eugenia; Konieczka, Jay H.; Voziyanov, Yuri

2011-01-01

229

Genome Sequence of Pseudomonas brassicacearum DF41  

PubMed Central

Pseudomonas brassicacearum DF41, a Gram-negative soil bacterium, is able to suppress the fungal pathogen Sclerotinia sclerotiorum through a process known as biological control. Here, we present a 6.8-Mb assembly of its genome, which is the second fully assembled genome of a P. brassicacearum strain.

Loewen, Peter C.; Switala, Jack; Fernando, W. G. Dilantha

2014-01-01

230

Assembly of large genomes using second-generation sequencing  

PubMed Central

Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.

Schatz, Michael C.; Delcher, Arthur L.; Salzberg, Steven L.

2010-01-01

231

Genome sequencing and analysis of the model grass Brachypodium distachyon  

SciTech Connect

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

Yang, Xiaohan [ORNL; Kalluri, Udaya C [ORNL; Tuskan, Gerald A [ORNL

2010-01-01

232

Genome sequencing and analysis of the model grass Brachypodium distachyon.  

PubMed

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops. PMID:20148030

2010-02-11

233

Complete genome sequence of Cellulomonas flavigena type strain (134T)  

SciTech Connect

Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Foster, Brian [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Sun, Hui [U.S. Department of Energy, Joint Genome Institute; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

234

Perspectives of integrative cancer genomics in next generation sequencing era.  

PubMed

The explosive development of genomics technologies including microarrays and next generation sequencing (NGS) has provided comprehensive maps of cancer genomes, including the expression of mRNAs and microRNAs, DNA copy numbers, sequence variations, and epigenetic changes. These genome-wide profiles of the genetic aberrations could reveal the candidates for diagnostic and/or prognostic biomarkers as well as mechanistic insights into tumor development and progression. Recent efforts to establish the huge cancer genome compendium and integrative omics analyses, so-called "integromics", have extended our understanding on the cancer genome, showing its daunting complexity and heterogeneity. However, the challenges of the structured integration, sharing, and interpretation of the big omics data still remain to be resolved. Here, we review several issues raised in cancer omics data analysis, including NGS, focusing particularly on the study design and analysis strategies. This might be helpful to understand the current trends and strategies of the rapidly evolving cancer genomics research. PMID:23105932

Kwon, So Mee; Cho, Hyunwoo; Choi, Ji Hye; Jee, Byul A; Jo, Yuna; Woo, Hyun Goo

2012-06-01

235

Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato.  

PubMed

Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965 amino acids long. PMID:24831145

Smith, Gilbert; Macias-Muñoz, Aide; Briscoe, Adriana D

2014-01-01

236

Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato  

PubMed Central

Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965 amino acids long.

Macias-Munoz, Aide; Briscoe, Adriana D.

2014-01-01

237

Complete Genome Sequences of Six Strains of the Genus Methylobacterium  

SciTech Connect

The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

Marx, Christopher J [Harvard University; Bringel, Francoise O. [University of Strasbourg; Christoserdova, Ludmila [University of Washington, Seattle; Moulin, Lionel [UMR, France; UI Hague, Muhammad Farhan [University of Strasbourg; Fleischman, Darrell E. [Wright State University, Dayton, OH; Gruffaz, Christelle [CNRS, Strasbourg, France; Jourand, Philippe [UMR, France; Knief, Claudia [ETH Zurich, Switzerland; Lee, Ming-Chun [Harvard University; Muller, Emilie E. L. [CNRS, Strasbourg, France; Nadalig, Thierry [CNRS, Strasbourg, France; Peyraud, Remi [ETH Zurich, Switzerland; Roselli, Sandro [CNRS, Strasbourg, France; Russ, Lina [ETH Zurich, Switzerland; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Ivanov, Pavel S. [University of Wyoming, Laramie; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lajus, Aurelie [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Land, Miriam L [ORNL; Medigue, Claudine [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Stolyar, Sergey [University of Washington; Vorholt, Julia A. [ETH Zurich, Switzerland; Vuilleumier, Stephane [University of Strasbourg

2012-01-01

238

Draft Genome Sequence of Lactobacillus animalis 381-IL-28.  

PubMed

Lactobacillus animalis 381-IL-28 is an integral component of a multistrain commercial culture with food biopreservative and pathogen biocontrol functionality. A draft sequence of the L. animalis 381-IL-28 genome is described in this paper. PMID:24874675

Sturino, Joseph M; Rajendran, Mahitha; Altermann, Eric

2014-01-01

239

Science Originals: Sequencing Cancer Genomes: Targeted Cancer Therapies  

NSDL National Science Digital Library

Applying DNA sequencing to cancer genomes is providing insights that have allowed researchers to turn some cancers into chronic diseases rather than deadly ones. Still, the ultimate goal is to kill the cancer.

Robert Frederick (AAAS;)

2011-03-25

240

Draft Genome Sequence of Staphylococcus massiliensis Strain 5402776T  

PubMed Central

A draft genome sequence of Staphylococcus massiliensis, Gram-positive cocci isolated from a human brain abscess sample, is described here. One clustered regularly interspaced short palindromic repeat, three transposases, six putative transposases, and one potential provirus were characterized.

Robert, Catherine; Gimenez, Gregory; Raoult, Didier

2012-01-01

241

Genome Sequence of the Immunomodulatory Strain Bifidobacterium bifidum LMG 13195  

PubMed Central

In this work, we report the genome sequences of Bifidobacterium bifidum strain LMG13195. Results from our research group show that this strain is able to interact with human immune cells, generating functional regulatory T cells.

Gueimonde, Miguel; Ventura, Marco; Margolles, Abelardo

2012-01-01

242

Complete genome sequences of six strains of the genus Methylobacterium.  

PubMed

The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints. PMID:22887658

Marx, Christopher J; Bringel, Françoise; Chistoserdova, Ludmila; Moulin, Lionel; Farhan Ul Haque, Muhammad; Fleischman, Darrell E; Gruffaz, Christelle; Jourand, Philippe; Knief, Claudia; Lee, Ming-Chun; Muller, Emilie E L; Nadalig, Thierry; Peyraud, Rémi; Roselli, Sandro; Russ, Lina; Goodwin, Lynne A; Ivanova, Natalia; Kyrpides, Nikos; Lajus, Aurélie; Land, Miriam L; Médigue, Claudine; Mikhailova, Natalia; Nolan, Matt; Woyke, Tanja; Stolyar, Sergey; Vorholt, Julia A; Vuilleumier, Stéphane

2012-09-01

243

Significant Sequences: Genomics Activities for Advanced Biology Students  

NSDL National Science Digital Library

Significant Sequences, developed by Washington UniversityâÂÂs Science Outreach Program and written by faculty and high school teachers, is a publication that focuses on the importance of genomic data and how the data are discovered and used.

Kathryn Gail Miller (Washington University;)

2010-06-17

244

Fulfilling the Promise of a Sequenced Human Genome – Part II  

SciTech Connect

Eric Green, scientific director of the National Human Genome Research Institute (NHGRI), gives the opening keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM on May 27, 2009. Part 2 of 2

Green, Eric [National Human Genome Research Institute

2009-05-27

245

Fulfilling the Promise of a Sequenced Human Genome – Part I  

SciTech Connect

Eric Green, scientific director of the National Human Genome Research Institute (NHGRI), gives the opening keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM on May 27, 2009. Part 1 of 2

Green, Eric [National Human Genome Research Institute

2009-05-27

246

Draft Genome Sequence of Lactobacillus animalis 381-IL-28  

PubMed Central

Lactobacillus animalis 381-IL-28 is an integral component of a multistrain commercial culture with food biopreservative and pathogen biocontrol functionality. A draft sequence of the L. animalis 381-IL-28 genome is described in this paper.

Rajendran, Mahitha; Altermann, Eric

2014-01-01

247

Complete Genome Sequence of Porcine Encephalomyocarditis Virus Strain BD2  

PubMed Central

Encephalomyocarditis virus (EMCV) causes acute myocarditis in young pigs or reproductive failure in sows, and it is divided into two main groups. Here, we report the complete genome sequence of EMCV strain BD2, which belongs to group I.

Yuan, Wanzhe; Zhang, Xiuyuan

2013-01-01

248

Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines  

PubMed Central

New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

Li, Lijin; Goedegebuure, Peter; Mardis, Elaine R.; Ellis, Matthew J.C.; Zhang, Xiuli; Herndon, John M.; Fleming, Timothy P.; Carreno, Beatriz M.; Hansen, Ted H.; Gillanders, William E.

2011-01-01

249

Genome Sequence of Bacillus thuringiensis subsp. kurstaki Strain HD-1  

PubMed Central

We report here the complete genome sequence of Bacillus thuringiensis subsp. kurstaki strain HD-1, which serves as the primary U.S. reference standard for all commercial insecticidal formulations of B. thuringiensis manufactured around the world.

Day, Michael; Ibrahim, Mohamed; Dyer, David

2014-01-01

250

Complete Genome Sequence of Pseudomonas denitrificans ATCC 13867.  

PubMed

Pseudomonas denitrificans ATCC 13867, a Gram-negative facultative anaerobic bacterium, is known to produce vitamin B12 under aerobic conditions. This paper reports the annotated whole-genome sequence of the circular chromosome of this organism. PMID:23723394

Ainala, Satish Kumar; Somasundar, Ashok; Park, Sunghoon

2013-01-01

251

Complete Genome Sequence of Methanomassiliicoccus luminyensis, the Largest Genome of a Human-Associated Archaea Species  

PubMed Central

The present study describes the complete and annotated genome sequence of Methanomassiliicoccus luminyensis strain B10 (DSM 24529T, CSUR P135), which was isolated from human feces. The 2.6-Mb genome represents the largest genome of a methanogenic euryarchaeon isolated from humans. The genome data of M. luminyensis reveal unique features and horizontal gene transfer events, which might have occurred during its adaptation and/or evolution in the human ecosystem.

Gorlas, Aurore; Robert, Catherine; Gimenez, Gregory; Drancourt, Michel

2012-01-01

252

Genome Sequence of Fusarium graminearum Isolate CS3005.  

PubMed

Fusarium graminearum is one of the most important fungal pathogens of wheat, barley, and maize worldwide. This announcement reports the genome sequence of a highly virulent Australian isolate of this species to supplement the existing genome of the North American F. graminearum isolate Ph1. PMID:24744326

Gardiner, Donald M; Stiller, Jiri; Kazan, Kemal

2014-01-01

253

Complete Genome Sequence of Marinobacter sp. BSs20148.  

PubMed

Marinobacter sp. BSs20148 was isolated from marine sediment collected from the Arctic Ocean at a water depth of 3,800 m. Here we report the complete genome sequence of Marinobacter sp. BSs20148. This genomic information will facilitate the study of the physiological metabolism, ecological roles, and evolution of the Marinobacter species. PMID:23682144

Song, Lai; Ren, Lufeng; Li, Xingang; Yu, Dan; Yu, Yong; Wang, Xumin; Liu, Guiming

2013-01-01

254

Genome Sequence of a Salinibacterium sp. Isolated from Antarctic Soil  

PubMed Central

The draft genome of Salinibacterium sp. PAMC 21357, isolated from permafrost soil of Antarctica, was determined. Here we present a 3.1-Mb draft genome sequence of Salinibacterium sp. that could provide further insight into the genetic determination of its cold-adaptive properties.

Shin, Seung Chul; Kim, Su Jin; Ahn, Do Hwan; Lee, Jong Kyu; Lee, Hyoungseok; Lee, Jungeun; Hong, Soon Gyu; Lee, Yung Mi

2012-01-01

255

Draft Genome Sequence of Mycobacterium mageritense DSM 44476T  

PubMed Central

We report the draft genome sequence of Mycobacterium mageritense strain DSM 44476T (CIP 104973), a nontuberculosis species responsible for various infections. The genome described here is composed of 7,966,608 bp, with a G+C content of 66.95%, and contains 7,675 protein-coding genes and 120 predicted RNA genes.

Croce, Olivier; Robert, Catherine; Raoult, Didier

2014-01-01

256

Complete Genome Sequence of Lactococcus lactis subsp. cremoris A76  

PubMed Central

We report the complete genome sequence of Lactococcus lactis subsp. cremoris A76, a dairy strain isolated from a cheese production outfit. Genome analysis detected two contiguous islands fitting to the L. lactis subsp. lactis rather than to the L. lactis subsp. cremoris lineage. This indicates the existence of genetic exchange between the diverse subspecies, presumably related to the technological process.

Quinquis, Benoit; Ehrlich, Stanislas Dusko; Sorokin, Alexei

2012-01-01

257

The genome sequence of the rice blast fungus Magnaporthe grisea  

Microsoft Academic Search

Magnaporthe grisea is the most destructive pathogen of rice worldwide and the principal model organism for elucidating the molecular basis of fungal disease of plants. Here, we report the draft sequence of the M. grisea genome. Analysis of the gene set provides an insight into the adaptations required by a fungus to cause disease. The genome encodes a large and

Ralph A. Dean; Nicholas J. Talbot; Daniel J. Ebbole; Mark L. Farman; Thomas K. Mitchell; Marc J. Orbach; Michael Thon; Resham Kulkarni; Jin-Rong Xu; Huaqin Pan; Nick D. Read; Yong-Hwan Lee; Ignazio Carbone; Doug Brown; Yeon Yee Oh; Nicole Donofrio; Jun Seop Jeong; Darren M. Soanes; Slavica Djonovic; Elena Kolomiets; Cathryn Rehmeyer; Weixi Li; Michael Harding; Soonok Kim; Marc-Henri Lebrun; Heidi Bohnert; Sean Coughlan; Jonathan Butler; Sarah Calvo; Li-Jun Ma; Robert Nicol; Seth Purcell; Chad Nusbaum; James E. Galagan; Bruce W. Birren

2005-01-01

258

Draft Genome Sequence of Avibacterium paragallinarum Strain 221  

PubMed Central

Avibacterium paragallinarum is the causative agent of infectious coryza. Here we report the draft genome sequence of reference strain 221 of A. paragallinarum serovar A. The genome is composed of 135 contigs for 2,685,568 bp with a 41% G+C content.

Xu, Fuzhou; Miao, Deyuan; Du, Yu; Chen, Xiaoling; Zhang, Peijun

2013-01-01

259

Draft Genome Sequence of Enterobacter cloacae Strain JD6301.  

PubMed

Enterobacter cloacae strain JD6301 was isolated from a mixed culture with wastewater collected from a municipal treatment facility and oleaginous microorganisms. A draft genome sequence of this organism indicates that it has a genome size of 4,772,910 bp, an average G+C content of 53%, and 4,509 protein-coding genes. PMID:24874669

Wilson, Jessica G; French, William T; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Woyke, Tanja; Shapiro, Nicole; Bullard, James W; Champlin, Franklin R; Donaldson, Janet R

2014-01-01

260

Complete Genome Sequence of the Soil Actinomycete Kocuria rhizophila  

Microsoft Academic Search

The soil actinomycete Kocuria rhizophila belongs to the suborder Micrococcineae, a divergent bacterial group for which only a limited amount of genomic information is currently available. K. rhizophila is also important in industrial applications; e.g., it is commonly used as a standard quality control strain for antimicrobial susceptibility testing. Sequencing and annotation of the genome of K. rhizophila DC2201 (NBRC

Hiromi Takarada; Mitsuo Sekine; Hiroki Kosugi; Yasunori Matsuo; Takatomo Fujisawa; Seiha Omata; Emi Kishi; Ai Shimizu; Naofumi Tsukatani; Satoshi Tanikawa; Nobuyuki Fujita; Shigeaki Harayama

2008-01-01

261

Complete Genome Sequence of Pediococcus pentosaceus Strain SL4.  

PubMed

Pediococcus pentosaceus SL4 was isolated from a Korean fermented vegetable product, kimchi. We report here the whole-genome sequence (WGS) of P. pentosaceus SL4. The genome consists of a 1.79-Mb circular chromosome (G+C content of 37.3%) and seven distinct plasmids ranging in size from 4 kb to 50 kb. PMID:24371205

Dantoft, Shruti Harnal; Bielak, Eliza Maria; Seo, Jae-Gu; Chung, Myung-Jun; Jensen, Peter Ruhdal

2013-01-01

262

Draft Genome Sequences of Two Clinical Isolates of Streptococcus mutans.  

PubMed

We report the draft genome sequences of PKUSS-HG01 and PKUSS-LG01, two clinical isolates of Streptococcus mutans from human dental plaque. The genomics information will facilitate the study of the mechanisms of pathogenicity and evolution of S. mutans. PMID:24926045

Zheng, Hui; Guo, Lihong; Du, Ning; Lin, Jiuxiang; Song, Lai; Liu, Guiming; Chen, Feng

2014-01-01

263

Second Generation Sequencing of the Mesothelioma Tumor Genome  

Microsoft Academic Search

The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched

Raphael Bueno; Assunta de Rienzo; Lingsheng Dong; Gavin J. Gordon; Colin F. Hercus; William G. Richards; Roderick V. Jensen; Arif Anwar; Gautam Maulik; Lucian R. Chirieac; Kim-Fong Ho; Bruce E. Taillon; Cynthia L. Turcotte; Robert G. Hercus; Steven R. Gullans; David J. Sugarbaker; Anita Brandstaetter

2010-01-01

264

Complete genome sequence of Riemerella anatipestifer reference strain.  

PubMed

Riemerella anatipestifer is an infectious pathogen causing serositis in ducks. We had the genome of the R. anatipestifer reference strain ATCC 11845 sequenced. The completed draft genome consists of one circular chromosome with 2,164,087 bp. There are 2,101 genes in the draft, and its GC content is 35.01%. PMID:22628503

Wang, Xiaojia; Zhu, DeKang; Wang, MingShu; Cheng, AnChun; Jia, RenYong; Zhou, Yi; Chen, Zhengli; Luo, QiHui; Liu, Fei; Wang, Yin; Chen, Xiao Yue

2012-06-01

265

Complete Genome Sequence of Riemerella anatipestifer Reference Strain  

PubMed Central

Riemerella anatipestifer is an infectious pathogen causing serositis in ducks. We had the genome of the R. anatipestifer reference strain ATCC 11845 sequenced. The completed draft genome consists of one circular chromosome with 2,164,087 bp. There are 2,101 genes in the draft, and its GC content is 35.01%.

Wang, Xiaojia; Zhu, DeKang; Wang, MingShu; Jia, RenYong; Zhou, Yi; Chen, Zhengli; Luo, QiHui; Liu, Fei; Wang, Yin; Chen, Xiao Yue

2012-01-01

266

Complete Genome Sequence of Antarctic Bacterium Psychrobacter sp. Strain G.  

PubMed

Here, we report the complete genome sequence of Psychrobacter sp. strain G, isolated from King George Island, Antarctica, which can produce lipolytic enzymes at low temperatures. The genomics information of this strain will facilitate the study of the physiology, cold adaptation properties, and evolution of this genus. PMID:24051316

Che, Shuai; Song, Lai; Song, Weizhi; Yang, Meng; Liu, Guiming; Lin, Xuezheng

2013-01-01

267

Draft Genome Sequence of Enterobacter cloacae Strain JD6301  

PubMed Central

Enterobacter cloacae strain JD6301 was isolated from a mixed culture with wastewater collected from a municipal treatment facility and oleaginous microorganisms. A draft genome sequence of this organism indicates that it has a genome size of 4,772,910 bp, an average G+C content of 53%, and 4,509 protein-coding genes.

Wilson, Jessica G.; French, William T.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Woyke, Tanja; Shapiro, Nicole; Bullard, James W.; Champlin, Franklin R.

2014-01-01

268

Genome Sequences of Five B1 Subcluster Mycobacteriophages  

PubMed Central

Mycobacteriophages infect members of the Mycobacterium genus in the phylum Actinobacteria and exhibit remarkable diversity. Genome analysis groups the thousands of known mycobacteriophages into clusters, of which the B1 subcluster is currently the third most populous. We report the complete genome sequences of five additional members of the B1 subcluster.

Barrus, E. Zane; Benedict, Alex B.; Brighton, Alicia K.; Fisher, Joshua N. B.; Gardner, Adam V.; Kartchner, Brittany J.; Ladle, Kara C.; Lunt, Bryce L.; Merrill, Bryan D.; Morrell, John D.; Burnett, Sandra H.

2013-01-01

269

Complete genome sequence of pronghorn virus, a pestivirus.  

PubMed

The complete genome sequence of pronghorn virus, a member of the Pestivirus genus of the family Flaviviridae, was determined here. The virus, originally isolated from a pronghorn antelope, has a genome of 12,273 nucleotides, with a single open reading frame of 11,694 bases encoding 3,897 amino acids. PMID:24926058

Neill, John D; Ridpath, Julia F; Fischer, Nicole; Grundhoff, Adam; Postel, Alexander; Becher, Paul

2014-01-01

270

Complete genome sequence of Lactococcus lactis subsp. cremoris A76.  

PubMed

We report the complete genome sequence of Lactococcus lactis subsp. cremoris A76, a dairy strain isolated from a cheese production outfit. Genome analysis detected two contiguous islands fitting to the L. lactis subsp. lactis rather than to the L. lactis subsp. cremoris lineage. This indicates the existence of genetic exchange between the diverse subspecies, presumably related to the technological process. PMID:22328746

Bolotin, Alexander; Quinquis, Benoit; Ehrlich, Stanislas Dusko; Sorokin, Alexei

2012-03-01

271

Draft Genome Sequence of Mycobacterium vulneris DSM 45247T  

PubMed Central

We report the draft genome sequence of Mycobacterium vulneris DSM 45247T strain, an emerging, opportunistic pathogen of the Mycobacterium avium complex. The genome described here is composed of 6,981,439 bp (with a G+C content of 67.14%) and has 6,653 protein-coding genes and 84 predicted RNA genes.

Croce, Olivier; Robert, Catherine; Raoult, Didier

2014-01-01

272

Draft Genome Sequence of the Sexually Transmitted Pathogen Trichomonas vaginalis  

Microsoft Academic Search

We describe the genome sequence of the protist Trichomonas vaginalis, a sexually transmitted human pathogen. Repeats and transposable elements comprise about two-thirds of the ~160-megabase genome, reflecting a recent massive expansion of genetic material. This expansion, in conjunction with the shaping of metabolic pathways that likely transpired through lateral gene transfer from bacteria, and amplification of specific gene families implicated

J. M. Carlton; R. P. Hirt; J. C. Silva; A. L. Delcher; Michael Schatz; Qi Zhao; J. R. Wortman; S. L. Bidwell; U. C. M. Alsmark; Sébastien Besteiro; Thomas Sicheritz-Ponten; C. J. Noel; J. B. Dacks; P. G. Foster; Cedric Simillion; Y. Van de Peer; Diego Miranda-Saavedra; G. J. Barton; G. D. Westrop; S. Muller; Daniele Dessi; P. L. Fiori; Qinghu Ren; Ian Paulsen; Hanbang Zhang; F. D. Bastida-Corcuera; Augusto Simoes-Barbosa; M. T. Brown; R. D. Hayes; Mandira Mukherjee; C. Y. Okumura; Rachel Schneider; A. J. Smith; Stepanka Vanacova; Maria Villalvazo; B. J. Haas; Mihaela Pertea; Tamara V. Feldblyum; T. R. Utterback; Chung-Li Shu; Kazutoyo Osoegawa; P. J. de Jong; Ivan Hrdy; Lenka Horvathova; Zuzana Zubacova; Pavel Dolezal; Shehre-Banoo Malik; J. M. Logsdon; Katrin Henze; Arti Gupta; Ching C. Wang; R. L. Dunne; J. A. Upcroft; Peter Upcroft; Owen White; S. L. Salzberg; Petrus Tang; Cheng-Hsun Chiu; Ying-Shiung Lee; T. M. Embley; G. H. Coombs; J. C. Mottram; Jan Tachezy; C. M. Fraser-Liggett; P. J. Johnson

2007-01-01

273

Draft Genome Sequence of Mycobacterium triplex DSM 44626  

PubMed Central

We announce the draft genome sequence of Mycobacterium triplex strain DSM 44626, a nontuberculosis species responsible for opportunistic infections. The genome described here is composed of 6,382,840 bp, with a G+C content of 66.57%, and contains 5,988 protein-coding genes and 81 RNA genes.

Sassi, Mohamed; Croce, Olivier; Robert, Catherine; Raoult, Didier

2014-01-01

274

Sequence Analysis of the Genome of the Neodiprion sertifer Nucleopolyhedrovirus  

Microsoft Academic Search

The genome of the Neodiprion sertifer nucleopolyhedrovirus (NeseNPV), which infects the European pine sawfly, N. sertifer (Hymenoptera: Diprionidae), was sequenced and analyzed. The genome was 86,462 bp in size. The CG content of 34% was lower than that of the majority of baculoviruses. A total of 90 methionine- initiated open reading frames (ORFs) with more than 50 amino acids and

Alejandra Garcia-Maruniak; James E. Maruniak; Paolo M. A. Zanotto; Aissa E. Doumbouya; Jaw-Ching Liu; Thomas M. Merritt; Jennifer S. Lanoie

2004-01-01

275

Draft Genome Sequence of Entomopathogenic Serratia liquefaciens Strain FK01  

PubMed Central

In the present study, we determined the draft genome sequence of the entomopathogenic bacterium Serratia liquefaciens FK01, which is highly virulent to the silkworm. The draft genome is ~5.28 Mb in size, and the G+C content is 55.8%.

Taira, Erika; Mon, Hiroaki; Mori, Kazuki; Akasaka, Taiki; Tashiro, Kousuke; Yasunaga-Aoki, Chisa; Lee, Jae Man; Kusakabe, Takahiro

2014-01-01

276

Draft Genome Sequences of Two Clinical Isolates of Streptococcus mutans  

PubMed Central

We report the draft genome sequences of PKUSS-HG01 and PKUSS-LG01, two clinical isolates of Streptococcus mutans from human dental plaque. The genomics information will facilitate the study of the mechanisms of pathogenicity and evolution of S. mutans.

Zheng, Hui; Guo, Lihong; Du, Ning; Lin, Jiuxiang; Song, Lai

2014-01-01

277

Complete Genome Sequence of Pronghorn Virus, a Pestivirus  

PubMed Central

The complete genome sequence of pronghorn virus, a member of the Pestivirus genus of the family Flaviviridae, was determined here. The virus, originally isolated from a pronghorn antelope, has a genome of 12,273 nucleotides, with a single open reading frame of 11,694 bases encoding 3,897 amino acids.

Ridpath, Julia F.; Fischer, Nicole; Grundhoff, Adam; Postel, Alexander; Becher, Paul

2014-01-01

278

Complete Genome Sequence of Cronobacter sakazakii Strain CMCC 45402.  

PubMed

Cronobacter sakazakii is considered to be an important pathogen involved in life-threatening neonatal infections. Here, we report the annotated complete genome sequence of C. sakazakii strain CMCC 45402, obtained from a milk sample in China. The major findings from the genomic analysis provide a better understanding of the isolates from China. PMID:24435860

Zhao, Zhijing; Wang, Lei; Wang, Bin; Liang, Haoyu; Ye, Qiang; Zeng, Ming

2014-01-01

279

Draft Genome Sequence of Mycobacterium triplex DSM 44626.  

PubMed

We announce the draft genome sequence of Mycobacterium triplex strain DSM 44626, a nontuberculosis species responsible for opportunistic infections. The genome described here is composed of 6,382,840 bp, with a G+C content of 66.57%, and contains 5,988 protein-coding genes and 81 RNA genes. PMID:24874681

Sassi, Mohamed; Croce, Olivier; Robert, Catherine; Raoult, Didier; Drancourt, Michel

2014-01-01

280

Simple sequence repeats in different genome sequences of Shigella and comparison with high GC and AT-rich genomes.  

PubMed

Simple sequence repeats (SSRs) are omnipresent in prokaryotes and eukaryotes, and are found anywhere in the genome in both protein encoding and noncoding regions. In present study the whole genome sequences of seven chromosomes (Shigella flexneri 2a str301 and 2457T, Shigella sonnei, Escherichia coli k12, Mycobacterium tuberculosis, Mycobacterium leprae and Staphylococcus saprophyticus) have downloaded from the GenBank database for identifying abundance, distribution and composition of SSRs and also to determine difference between the tandem repeats in real genome and randomness genome (using sequence shuffling tool) of the organisms included in this study. The data obtained in the present study show that: (i) tandem repeats are widely distributed throughout the genomes; (ii) SSRs are differentially distributed among coding and noncoding regions in investigated Shigella genomes; (iii) total frequency of SSRs in noncoding regions are higher than coding regions; (iv) in all investigated chromosomes ratio of Trinucleotide SSRs in real genomes are much higher than randomness genomes and Di nucleotide SSRs are lower; (v) Ratio of total and mononucleotide SSRs in real genome is higher than randomness genomes in E. coli K12, S. flexneri str 301 and S. saprophyticus, while it is lower in S. flexneri str 2457T, S.sonnei and M. tuberculosis and it is approximately same in M. leprae; (vi) frequency of codon repetitions are vary considerably depending on the type of encoded amino acids. PMID:18464038

Hosseini, Ashraf; Ranade, Suvidya H; Ghosh, Indira; Khandekar, Pramod

2008-06-01

281

Genome sequence of the biocontrol strain Pseudomonas fluorescens F113.  

PubMed

Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A; Giddens, Stephen R; Coppoolse, Eric R; Muriel, Candela; Stiekema, Willem J; Rainey, Paul B; Dowling, David; O'Gara, Fergal; Martín, Marta; Rivilla, Rafael

2012-03-01

282

Complete genome sequence of Bacillus cereus bacteriophage PBC1.  

PubMed

Bacillus cereus is a ubiquitous, spore-forming bacterium associated with food poisoning cases. To develop an efficient biocontrol agent against B. cereus, we isolated lytic phage PBC1 and sequenced its genome. PBC1 showed a very low degree of homology to previously reported phages, implying that it is novel. Here we report the complete genome sequence of PBC1 and describe major findings from our analysis. PMID:22570248

Kong, Minsuk; Kim, Minsik; Ryu, Sangryeol

2012-06-01

283

Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii  

Microsoft Academic Search

The complete 1.66-megabase pair genome sequence of an autotrophic archaeon, Methanococcus jannaschii, and its 58- and 16-kilobase pair extrachromosomal elements have been determined by whole-genome random sequencing. A total of 1738 predicted proteincoding genes were identified; however, only a minority of these (38 percent) could be assigned a putative cellular role with high confidence. Although the majority of genes related

Carol J. Bult; Owen White; Gary J. Olsen; Lixin Zhou; Robert D. Fleischmann; Granger G. Sutton; Judith A. Blake; Lisa M. Fitzgerald; Rebecca A. Clayton; Jeannine D. Gocayne; Anthony R. Kerlavage; Brian A. Dougherty; Jean-Francois Tomb; Mark D. Adams; Claudia I. Reich; Ross Overbeek; Ewen F. Kirkness; Keith G. Weinstock; Joseph M. Merrick; Anna Glodek; John L. Scott; Neil S. M. Geoghagen; Janice F. Weidman; Joyce L. Fuhrmann; Dave Nguyen; Teresa R. Utterback; Jenny M. Kelley; Jeremy D. Peterson; Paul W. Sadow; Michael C. Hanna; Matthew D. Cotton; Kevin M. Roberts; Margaret A. Hurst; Brian P. Kaine; Mark Borodovsky; Hans-Peter Klenk; Claire M. Fraser; Hamilton O. Smith; Carl R. Woese; J. Craig Venter

1996-01-01

284

Complete Sequence and Genomic Analysis of Murine Gammaherpesvirus 68  

Microsoft Academic Search

Murine gammaherpesvirus 68 (gHV68) infects mice, thus providing a tractable small-animal model for analysis of the acute and chronic pathogenesis of gammaherpesviruses. To facilitate molecular analysis of gHV68 pathogenesis, we have sequenced the gHV68 genome. The genome contains 118,237 bp of unique sequence flanked by multiple copies of a 1,213-bp terminal repeat. The GC content of the unique portion of

HERBERT W. VIRGIN; PHILIP LATREILLE; PAMELA WAMSLEY; KYMBERLIE HALLSWORTH; KAREN E. WECK; ALBERT J. DAL CANTO; SAMUEL H. SPECK

1997-01-01

285

Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113  

PubMed Central

Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms.

Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martinez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sanchez-Contreras, Maria; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martin, Marta

2012-01-01

286

Non-invasive whole genome sequencing of a human fetus  

PubMed Central

Analysis of cell-free fetal DNA in maternal plasma holds great promise for the development of non-invasive prenatal genetic diagnostics. However, previous studies have been restricted to detection of fetal trisomies (1, 2) or specific, paternally inherited mutations (3), or to genotyping common polymorphisms using invasively sampled material (4). Here, we combine genome sequencing of two parents, genome-wide maternal haplotyping (5), and deep sequencing of maternal plasma to non-invasively determine the genome sequence of a human fetus at 18.5 weeks gestation. Inheritance was predicted at 2.8×106 parentally heterozygous sites with 98.1% accuracy. Furthermore, 39 of 44 de novo point mutations in the fetal genome were detected, albeit with limited specificity. Subsampling these data and analyzing a second family trio by the same approach indicate that ~300 kilobase parental haplotype blocks combined with shallow sequencing of maternal plasma are sufficient to substantially determine the inherited complement of a fetal genome. However, ultra-deep sequencing of maternal plasma is necessary for the practical detection of fetal de novo mutations genome-wide. Although technical and analytical challenges remain, we anticipate that non-invasive analysis of inherited variation and de novo mutations in fetal genomes will facilitate the comprehensive prenatal diagnosis of both recessive and dominant Mendelian disorders.

Kitzman, Jacob O.; Snyder, Matthew W.; Ventura, Mario; Lewis, Alexandra P.; Qiu, Ruolan; Simmons, LaVone E.; Gammill, Hilary S.; Rubens, Craig E.; Santillan, Donna A.; Murray, Jeffrey C.; Tabor, Holly K.; Bamshad, Michael J.; Eichler, Evan E.; Shendure, Jay

2012-01-01

287

Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny  

PubMed Central

Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance analysis and a novel method developed here, termed neighbor pair analysis. The third set recorded gene content by scoring gene presence or absence in each genome. All three data sets yielded phylogenies supporting the separation of the Nucleopolyhedrovirus (NPV) and Granulovirus (GV) genera, the division of the NPVs into groups I and II, and species relationships within group I NPVs. Generation of phylogenies based on the combined sequences of all 63 shared genes proved to be the most effective approach to resolving the relationships among the group II NPVs and the GVs. The history of gene acquisitions and losses that have accompanied baculovirus diversification was visualized by mapping the gene content data onto the phylogenetic tree. This analysis highlighted the fluid nature of baculovirus genomes, with evidence of frequent genome rearrangements and multiple gene content changes during their evolution. Of more than 416 genes identified in the genomes analyzed, only 63 are present in all nine genomes, and 200 genes are found only in a single genome. Despite this fluidity, the whole genome-based methods we describe are sufficiently powerful to recover the underlying phylogeny of the viruses.

Herniou, Elisabeth A.; Luque, Teresa; Chen, Xinwen; Vlak, Just M.; Winstanley, Doreen; Cory, Jennifer S.; O'Reilly, David R.

2001-01-01

288

Sequencing viral genomes from a single isolated plaque  

PubMed Central

Background Whole genome sequencing of viruses and bacteriophages is often hindered because of the need for large quantities of genomic material. A method is described that combines single plaque sequencing with an optimization of Sequence Independent Single Primer Amplification (SISPA). This method can be used for de novo whole genome next-generation sequencing of any cultivable virus without the need for large-scale production of viral stocks or viral purification using centrifugal techniques. Methods A single viral plaque of a variant of the 2009 pandemic H1N1 human Influenza A virus was isolated and amplified using the optimized SISPA protocol. The sensitivity of the SISPA protocol presented here was tested with bacteriophage F_HA0480sp/Pa1651 DNA. The amplified products were sequenced with 454 and Illumina HiSeq platforms. Mapping and de novo assemblies were performed to analyze the quality of data produced from this optimized method. Results Analysis of the sequence data demonstrated that from a single viral plaque of Influenza A, a mapping assembly with 3590-fold average coverage representing 100% of the genome could be produced. The de novo assembled data produced contigs with 30-fold average sequence coverage, representing 96.5% of the genome. Using only 10 pg of starting DNA from bacteriophage F_HA0480sp/Pa1651 in the SISPA protocol resulted in sequencing data that gave a mapping assembly with 3488-fold average sequence coverage, representing 99.9% of the reference and a de novo assembly with 45-fold average sequence coverage, representing 98.1% of the genome. Conclusions The optimized SISPA protocol presented here produces amplified product that when sequenced will give high quality data that can be used for de novo assembly. The protocol requires only a single viral plaque or as little as 10 pg of DNA template, which will facilitate rapid identification of viruses during an outbreak and viruses that are difficult to propagate.

2013-01-01

289

Large-Scale Sequencing: The Future of Genomic Sciences Colloquium  

SciTech Connect

Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which

Margaret Riley; Merry Buckley

2009-01-01

290

Genome sequence of the date palm Phoenix dactylifera L  

PubMed Central

Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4?Mb in size and covers >90% of the genome (~671?Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm’s unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants.

Al-Mssallem, Ibrahim S.; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M.; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O.; Jia, Shangang; Yin, An; Alhuzimi, Eman M.; Alsaihati, Burair A.; Al-Owayyed, Saad A.; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A.; Sun, Gaoyuan; Majrashi, Majed A.; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A.; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F.; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R.; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

2013-01-01

291

Genome sequence of the date palm Phoenix dactylifera L.  

PubMed

Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4?Mb in size and covers >90% of the genome (~671?Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm's unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants. PMID:23917264

Al-Mssallem, Ibrahim S; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O; Jia, Shangang; Yin, An; Alhuzimi, Eman M; Alsaihati, Burair A; Al-Owayyed, Saad A; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A; Sun, Gaoyuan; Majrashi, Majed A; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

2013-01-01

292

Complete chloroplast genome sequences of Solanum bulbocastanum , Solanum lycopersicum and comparative analyses with other Solanaceae genomes  

Microsoft Academic Search

Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and

Henry Daniell; Seung-Bum Lee; Justin Grevich; Christopher Saski; Tania Quesada-Vargas; Chittibabu Guda; Jeffrey Tomkins; Robert K. Jansen

2006-01-01

293

Decoding the genome beyond sequencing: the new phase of genomic research.  

PubMed

While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. PMID:21640814

Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

2011-10-01

294

Characterizing the walnut genome through analyses of BAC end sequences.  

PubMed

Persian walnut (Juglans regia L.) is an economically important tree for its nut crop and timber. To gain insight into the structure and evolution of the walnut genome, we constructed two bacterial artificial chromosome (BAC) libraries, containing a total of 129,024 clones, from in vitro-grown shoots of J. regia cv. Chandler using the HindIII and MboI cloning sites. A total of 48,218 high-quality BAC end sequences (BESs) were generated, with an accumulated sequence length of 31.2 Mb, representing approximately 5.1% of the walnut genome. Analysis of repeat DNA content in BESs revealed that approximately 15.42% of the genome consists of known repetitive DNA, while walnut-unique repetitive DNA identified in this study constitutes 13.5% of the genome. Among the walnut-unique repetitive DNA, Julia SINE and JrTRIM elements represent the first identified walnut short interspersed element (SINE) and terminal-repeat retrotransposon in miniature (TRIM) element, respectively; both types of elements are abundant in the genome. As in other species, these SINEs and TRIM elements could be exploited for developing repeat DNA-based molecular markers in walnut. Simple sequence repeats (SSR) from BESs were analyzed and found to be more abundant in BESs than in expressed sequence tags. The density of SSR in the walnut genome analyzed was also slightly higher than that in poplar and papaya. Sequence analysis of BESs indicated that approximately 11.5% of the walnut genome represents a coding sequence. This study is an initial characterization of the walnut genome and provides the largest genomic resource currently available; as such, it will be a valuable tool in studies aimed at genetically improving walnut. PMID:22101470

Wu, Jiajie; Gu, Yong Q; Hu, Yuqin; You, Frank M; Dandekar, Abhaya M; Leslie, Charles A; Aradhya, Mallikarjuna; Dvorak, Jan; Luo, Ming-Cheng

2012-01-01

295

Draft Genome Sequence of Campylobacter ureolyticus Strain CIT007, the First Whole-Genome Sequence of a Clinical Isolate.  

PubMed

Herein, we present the draft genome sequence of Campylobacter ureolyticus. Strain CIT007 was isolated from a stool sample from an elderly female presenting with diarrheal illness and end-stage chronic renal disease. PMID:24723712

Lucid, Alan; Bullman, Susan; Koziel, Monika; Corcoran, Gerard D; Cotter, Paul D; Sleator, Roy D; Lucey, Brigid

2014-01-01

296

Mosaic Organization of Orthologous Sequences in Grass Genomes  

PubMed Central

Although comparative genetic mapping studies show extensive genome conservation among grasses, recent data provide many exceptions to gene collinearity at the DNA sequence level. Rice, sorghum, and maize are closely related grass species, once sharing a common ancestor. Because they diverged at different times during evolution, they provide an excellent model to investigate sequence divergence. We isolated, sequenced, and compared orthologous regions from two rice subspecies, sorghum, and maize to investigate the nature of their sequence differences. This study represents the most extensive sequence comparison among grasses, including the largest contiguous genomic sequences from sorghum (425 kb) and maize (435 kb) to date. Our results reveal a mosaic organization of the orthologous regions, with conserved sequences interspersed with nonconserved sequences. Gene amplification, gene movement, and retrotransposition account for the majority of the nonconserved sequences. Our analysis also shows that gene amplification is frequently linked with gene movement. Analyzing an additional 2.9 Mb of genomic sequence from rice not only corroborates our observations, but also suggests that a significant portion of grass genomes may consist of paralogous sequences derived from gene amplification. We propose that sequence divergence started from hotspots along chromosomes and expanded by accumulating small-scale genomic changes during evolution. [GenBank Accession Numbers: Rice (Oryza sativa L. ssp. japonica) php200725 region: AF119222; rice (Oryza sativa L. ssp. indica) php200725 region: AF128457; sorghum (Sorghum bicolor) php200725 region: AF114171, AF527807, AF727808, AF527809; maize (Zea mays) php200725 region: AF090447, AF528565; rice chromosome 10 region (2.9 Mb): AC073391, AC087549, AC027657, AC087547, AC027658, AC087546, AC027659, AC087550, AC025905, AC087545, AF229187, AC027660, AC087544, AC027661, AC027662, AC087543, AC073392, AC087542, AC025906, AC073393, AC025907.

Song, Rentao; Llaca, Victor; Messing, Joachim

2002-01-01

297

Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes  

PubMed Central

The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori.

Perkins, Timothy T.; Tay, Chin Yen; Thirriot, Fanny; Marshall, Barry

2013-01-01

298

Whole genome sequencing: a considered approach to clinical implementation.  

PubMed

The recent entry of "whole" exome/"whole" genome sequencing into limited clinical practice has led to a progression of the availability of genome-scale testing beyond deletion/duplication copy number arrays. This unit provides a considered approach to the implementation of such testing in routine clinical practice. Specifically, we will highlight the challenges in patient selection and consent, and the technical issues surrounding test interpretation and reporting. The unit will then provide practical solutions that allow for genome-wide sequencing to be implemented in current clinical practice. PMID:23595600

Dimmock, David

2013-01-01

299

Complete genome sequence of Ferroglobus placidus AEDII12DO  

SciTech Connect

Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Risso, Carla [University of Massachusetts, Amherst; Holmes, Dawn [University of Massachusetts, Amherst; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Larimer, Frank W [ORNL; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lovley, Derek [University of Massachusetts, Amherst; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

2011-01-01

300

Complete genome sequence of Serratia plymuthica strain AS12.  

PubMed

A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled "Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens". PMID:22768360

Neupane, Saraswoti; Finlay, Roger D; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

2012-05-25

301

Comparison of sample sequences of the Salmonella typhi genome to the sequence of the complete Escherichia coli K-12 genome.  

PubMed

Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with an average spacing of once every 5,000 bases. A total of 339,243 bases of unique sequence was generated (approximately 7% representation). The sample of 870 sequences was compared to the complete Escherichia coli K-12 genome and to the rest of the GenBank database, which can also be considered a collection of sampled sequences. Despite the incomplete S. typhi data set, interesting categories could easily be discerned. Sixteen percent of the sequences determined from S. typhi had close homologs among known Salmonella sequences (P < 1e-40 in BlastX or BlastN), reflecting the proportion of these genomes that have been sequenced previously; 277 sequences (32%) had no apparent orthologs in the complete E. coli K-12 genome (P > 1e-20), of which 155 sequences (18%) had no close similarities to any sequence in the database (P > 1e-5). Eight of the 277 sequences had similarities to genes in other strains of E. coli or plasmids, and six sequences showed evidence of novel phage lysogens or sequence remnants of phage integrations, including a member of the lambda family (P < 1e-15). Twenty-three sample sequences had a significantly closer similarity a sequence in the database from organisms other than the E. coli/Salmonella clade (which includes Shigella and Citrobacter). These sequences are new candidate lateral transfer events to the S. typhi lineage or deletions on the E. coli K-12 lineage. Eleven putative junctions of insertion/deletion events greater than 100 bp were observed in the sample, indicating that well over 150 such events may distinguish S. typhi from E. coli K-12. The need for automatic methods to more effectively exploit sample sequences is discussed. PMID:9712782

McClelland, M; Wilson, R K

1998-09-01

302

Sequence analysis and organization of the Neodiprion abietis nucleopolyhedrovirus genome.  

PubMed

Of 30 baculovirus genomes that have been sequenced to date, the only nonlepidopteran baculoviruses include the dipteran Culex nigripalpus nucleopolyhedrovirus and two hymenopteran nucleopolyhedroviruses that infect the sawflies Neodiprion lecontei (NeleNPV) and Neodiprion sertifer (NeseNPV). This study provides a complete sequence and genome analysis of the nucleopolyhedrovirus that infects the balsam fir sawfly Neodiprion abietis (Hymenoptera, Symphyta, Diprionidae). The N. abietis nucleopolyhedrovirus (NeabNPV) is 84,264 bp in size, with a G+C content of 33.5%, and contains 93 predicted open reading frames (ORFs). Eleven predicted ORFs are unique to this baculovirus, 10 ORFs have a putative sequence homologue in the NeleNPV genome but not the NeseNPV genome, and 1 ORF (neab53) has a putative sequence homologue in the NeseNPV genome but not the NeleNPV genome. Specific repeat sequences are coincident with major genome rearrangements that distinguish NeabNPV and NeleNPV. Genes associated with these repeat regions encode a common amino acid motif, suggesting that they are a family of repeated contiguous gene clusters. Lepidopteran baculoviruses, similarly, have a family of repeated genes called the bro gene family. However, there is no significant sequence similarity between the NeabNPV and bro genes. Homologues of early-expressed genes such as ie-1 and lef-3 were absent in NeabNPV, as they are in the previously sequenced hymenopteran baculoviruses. Analyses of ORF upstream sequences identified potential temporally distinct genes on the basis of putative promoter elements. PMID:16809301

Duffy, Simon P; Young, Aaron M; Morin, Benoit; Lucarotti, Christopher J; Koop, Ben F; Levin, David B

2006-07-01

303

Genome sequence of the cultivated cotton Gossypium arboreum.  

PubMed

The complex allotetraploid nature of the cotton genome (AADD; 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled the Gossypium arboreum (AA; 2n = 26) genome, a putative contributor of the A subgenome. A total of 193.6 Gb of clean sequence covering the genome by 112.6-fold was obtained by paired-end sequencing. We further anchored and oriented 90.4% of the assembly on 13 pseudochromosomes and found that 68.5% of the genome is occupied by repetitive DNA sequences. We predicted 41,330 protein-coding genes in G. arboreum. Two whole-genome duplications were shared by G. arboreum and Gossypium raimondii before speciation. Insertions of long terminal repeats in the past 5 million years are responsible for the twofold difference in the sizes of these genomes. Comparative transcriptome studies showed the key role of the nucleotide binding site (NBS)-encoding gene family in resistance to Verticillium dahliae and the involvement of ethylene in the development of cotton fiber cells. PMID:24836287

Li, Fuguang; Fan, Guangyi; Wang, Kunbo; Sun, Fengming; Yuan, Youlu; Song, Guoli; Li, Qin; Ma, Zhiying; Lu, Cairui; Zou, Changsong; Chen, Wenbin; Liang, Xinming; Shang, Haihong; Liu, Weiqing; Shi, Chengcheng; Xiao, Guanghui; Gou, Caiyun; Ye, Wuwei; Xu, Xun; Zhang, Xueyan; Wei, Hengling; Li, Zhifang; Zhang, Guiyin; Wang, Junyi; Liu, Kun; Kohel, Russell J; Percy, Richard G; Yu, John Z; Zhu, Yu-Xian; Wang, Jun; Yu, Shuxun

2014-06-01

304

Complete Genome Sequence of Equine Herpesvirus Type 9  

PubMed Central

Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson's gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson's gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9.

Yamaguchi, Tsuyoshi; Yamada, Souichi

2012-01-01

305

Genome sequence and comparative genome analysis of Pseudomonas syringae pv. syringae type strain ATCC 19310.  

PubMed

Pseudomonas syringae pv. syringae (Psy) is a major bacterial pathogen of many economically important plant species. Despite the severity of its impact, the genome sequence of the type strain has not been reported. Here, we present the draft genome sequence of Psy ATCC 19310. Comparative genomic analysis revealed that Psy ATCC 19310 is closely related to Psy B728a. However, only a few type III effectors, which are key virulence factors, are shared by the two strains, indicating the possibility of host-pathogen specificity and genome dynamics, even under the pathovar level. PMID:24444998

Park, Yong-Soon; Jeong, Haeyoung; Sim, Young Mi; Yi, Hwe-Su; Ryu, Choong-Min

2014-04-01

306

Corruption of genomic databases with anomalous sequence.  

PubMed Central

We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.

Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

1992-01-01

307

Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools.  

PubMed

The proliferation of genome sequence data has led to the development of a number of tools and strategies that facilitate computational analysis. These methods include the identification of motif patterns, membership of the query sequences in family databases, metabolic pathway involvement and gene proximity. We re-examined the completely sequenced genome of Thermotoga maritima by employing the combined use of the above methods. By analyzing all 1877 proteins encoded in this genome, we identified 193 cases of conflicting annotations (10%), of which 164 are new function predictions and 29 are amendments of previously proposed assignments. These results suggest that the combined use of existing computational tools can resolve inconclusive sequence similarities and significantly improve the prediction of protein function from genome sequence. PMID:11071948

Kyrpides, N C; Ouzounis, C A; Iliopoulos, I; Vonstein, V; Overbeek, R

2000-11-15

308

Computational comparison of two draft sequences of the human genome  

Microsoft Academic Search

We are in the enviable position of having two distinct drafts of the human genome sequence. Although gaps, errors, redundancy and incomplete annotation mean that individually each falls short of the ideal, many of these problems can be assessed by comparison. Here we present some comparative analyses of these drafts. We look at a number of features of the sequences,

John Aach; Martha L. Bulyk; George M. Church; Jason Comander; Adnan Derti; Jay Shendure

2001-01-01

309

Complete Genome Sequence of a Newly Emerging Newcastle Disease Virus  

PubMed Central

The complete genome sequence of a newly emerging Newcastle disease virus, isolated in China, was determined. A phylogenetic analysis based on the F gene revealed that the isolate is phylogenetically related to Newcastle disease virus genotype VIId. Sequence analysis indicated that amino acid residue substitutions occur at neutralizing epitopes on the hemagglutinin-neuraminidase (HN) protein.

Wang, Jing-Yu; Liu, Wan-Hua; Ren, Juan-Juan; Tang, Pan; Wu, Ning

2013-01-01

310

Environmental Genome Shotgun Sequencing of the Sargasso Sea  

Microsoft Academic Search

We have applied ``whole-genome shotgun sequencing'' to microbial populations collected en masse on tangential flow and impact filters from seawater samples collected from the Sargasso Sea near Bermuda. A total of 1.045 billion base pairs of nonredundant sequence was generated, annotated, and analyzed to elucidate the gene content, diversity, and relative abundance of the organisms within these environmental samples. These

J. Craig Venter; Karin Remington; John F. Heidelberg; Aaron L. Halpern; Doug Rusch; Dongying Wu; Ian Paulsen; Karen E. Nelson; William Nelson; Derrick E. Fouts; Samuel Levy; Anthony H. Knap; Michael W. Lomas; Ken Nealson; Owen White; Jeremy Peterson; Jeff Hoffman; Rachel Parsons; Holly Baden-Tillson; Cynthia Pfannkoch; Yu-Hui Rogers; Hamilton O. Smith

2004-01-01

311

A Tool for Analyzing and Annotating Genomic Sequences  

Microsoft Academic Search

We describe a tool for analyzing and annotating large genomic sequences containing introns. The analysis and annotation tool (AAT) includes two sets of programs, one for comparing the query sequence with a protein database and the other for comparing the query with a cDNA database. Each set contains a fast database search program and a rigorous alignment program. The database

Xiaoqiu Huang; Mark D. Adams; Hao Zhou; Anthony R. Kerlavage

1997-01-01

312

The impact of next-generation sequencing on genomics  

Microsoft Academic Search

This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also

Jun Zhang; Rod Chiodini; Ahmed Badr; Genfa Zhang

2011-01-01

313

Complete Genome Sequence of Bacillus subtilis Phage ?105  

PubMed Central

A complete 39,318-bp genome sequence containing 52 coding sequences has been determined for the Bacillus subtilis temperate phage ?105. In a lysogen, B. subtilis strain 1L32, the ?105 prophage interrupts the radC locus, a part of the competence-induced ComK regulon.

2013-01-01

314

Genome Sequencing and Bioinformatics Analyses of Higher Plants Chloroplasts  

Microsoft Academic Search

Chloroplast DNA in higher plants exist as closed circular molecules of about 150 kb (±30), usually presenting inverted repeat sequences separating two single copy regions (1). It is available the complete chloroplast genomes of around 13 higher plants species available in the gene bank. Our group has completely sequenced the sugarcane chloroplast DNA which is 141182 nucleotides in size. We

Helaine Carrer

315

DNA sequence organization in the genome of Cycas revoluta  

Microsoft Academic Search

The pattern of DNA sequence organization in the genome of Cycas revoluta was analyzed by DNA\\/DNA reassociation. Reassociation of 400 base pair (bp) fragments to various C0t values indicates the presence of at least four kinetic classes: the foldback plus very highly repetitive sequences (15%), the fast repeats (24%), the slow repeats (44%), and the single copy (17%). The latter

Buran Kurdi-Haidar; Victoria Shalhoub; Sulayman Dib-Hajj; Samir Deeb

1983-01-01

316

Revised Genome Sequence of Staphylococcus aureus Bacteriophage K  

PubMed Central

Bacteriophage K is a member of the virulent Twort-like group of myophages infecting Staphylococcus aureus. The revised sequence presented here includes 12,436 bp of additional sequence not present in the previously available phage K genome (GenBank accession no. NC_005880) and updated annotations, and has been reopened at the predicted terminal repeat boundary.

2014-01-01

317

Genome Sequence of Lactobacillus plantarum Strain UCMA 3037.  

PubMed

Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem. PMID:23704179

Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias; Vernoux, Jean-Paul

2013-01-01

318

Allopolyploidization-accommodated Genomic Sequence Changes in Triticale  

PubMed Central

Background Allopolyploidization is one of the major evolutionary modes of plant speciation. Recent interest in studying allopolyploids has provided significant novel insights into the mechanisms of allopolyploid formation. Compelling evidence indicates that genetic and/or epigenetic changes have played significant roles in shaping allopolyploids, but rates and modes of the changes can be very different among various species. Triticale (× Triticosecale) is an artificial species that has been used to study the evolutionary course of complex allopolyploids due to its recent origin and availability of a highly diversified germplasm pool. Scope This review summarizes recent genomics studies implemented in hexaploid and octoploid triticales and discusses the mechanisms of the changes and compares the major differences between genomic changes in triticale and other allopolyploid species. Conclusions Molecular studies have indicated extensive non-additive sequence changes or modifications in triticale, and the degree of variation appears to be higher than in other allopolyploid species. The data indicate that at least some sequence changes are non-random, and appear to be a function of genome relations, ploidy levels and sequence types. Specifically, the rye parental genome demonstrated a higher level of changes than the wheat genome. The frequency of lost parental bands was much higher than the frequency of gained novel bands, suggesting that sequence modification and/or elimination might be a major force causing genome variation in triticale. It was also shown that 68 % of the total changes occurred immediately following wide hybridization, but before chromosome doubling. Genome evolution following chromosome doubling occurred more slowly at a very low rate and the changes were mainly observed in the first five or so generations. The data suggest that cytoplasm and relationships between parental genomes are key factors in determining the direction, amount, timing and rate of genomic sequence variation that occurred during inter-generic allopolyploidization in this system.

Ma, Xue-Feng; Gustafson, J. Perry

2008-01-01

319

Complete genome sequencing and variant analysis of a Pakistani individual.  

PubMed

We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than 90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3,224,311 single-nucleotide polymorphisms (SNPs), of which 388,532 (12% of the total SNPs) had not been previously recorded in single nucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variants were screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified 'retinoic acid signaling' and 'regulation of transcription' as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against the OMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indian genome sequence revealed >1.8 million shared SNPs; 32% of which were annotated in ~14,000 genes. Gene Ontology (GO) terms analysis of these genes identified 'response to jasmonic acid stimulus', 'aminoglycoside antibiotic metabolic process' and 'glycoside metabolic process' with considerable enrichment. A total of 59,558 of small indels (1-5 bp) and 16,063 large structural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered in Pakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b) compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will be an important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent. PMID:23842039

Azim, Muhammad Kamran; Yang, Chuanchun; Yan, Zhixiang; Choudhary, Muhammad Iqbal; Khan, Asifullah; Sun, Xiao; Li, Ran; Asif, Huma; Sharif, Sana; Zhang, Yong

2013-09-01

320

Cloning, sequencing, and characterization of genomic subtracted sequences from Listeria monocytogenes.  

PubMed

Individual sequences of a genomic subtracted, PCR-amplified, mixed-sequence probe (GS probe) were cloned and sequenced. The GS probe differentiated restriction fragment length polymorphism patterns for Listeria monocytogenes but did not hybridize with members of other bacterial genera. Sequence analysis identified several L. monocytogenes sequences already present in the GenBank database; the putative identities of other sequences were inferred from homology data, and still other sequences did not exhibit significant levels of homology with any GenBank sequences. PMID:10583999

Wu, F M; Muriana, P M

1999-12-01

321

Cloning, Sequencing, and Characterization of Genomic Subtracted Sequences from Listeria monocytogenes  

PubMed Central

Individual sequences of a genomic subtracted, PCR-amplified, mixed-sequence probe (GS probe) were cloned and sequenced. The GS probe differentiated restriction fragment length polymorphism patterns for Listeria monocytogenes but did not hybridize with members of other bacterial genera. Sequence analysis identified several L. monocytogenes sequences already present in the GenBank database; the putative identities of other sequences were inferred from homology data, and still other sequences did not exhibit significant levels of homology with any GenBank sequences.

Wu, Fone-Mao; Muriana, Peter M.

1999-01-01

322

Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.).  

PubMed

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ?98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-06-01

323

Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)  

PubMed Central

The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ?98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp.

Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

2014-01-01

324

Sequence Surveyor: leveraging overview for scalable genomic alignment visualization.  

PubMed

In this paper, we introduce overview visualization tools for large-scale multiple genome alignment data. Genome alignment visualization and, more generally, sequence alignment visualization are an important tool for understanding genomic sequence data. As sequencing techniques improve and more data become available, greater demand is being placed on visualization tools to scale to the size of these new datasets. When viewing such large data, we necessarily cannot convey details, rather we specifically design overview tools to help elucidate large-scale patterns. Perceptual science, signal processing theory, and generality provide a framework for the design of such visualizations that can scale well beyond current approaches. We present Sequence Surveyor, a prototype that embodies these ideas for scalable multiple whole-genome alignment overview visualization. Sequence Surveyor visualizes sequences in parallel, displaying data using variable color, position, and aggregation encodings. We demonstrate how perceptual science can inform the design of visualization techniques that remain visually manageable at scale and how signal processing concepts can inform aggregation schemes that highlight global trends, outliers, and overall data distributions as the problem scales. These techniques allow us to visualize alignments with over 100 whole bacterial-sized genomes. PMID:22034360

Albers, Danielle; Dewey, Colin; Gleicher, Michael

2011-12-01

325

Insights in metabolism and toxin production from the complete genome sequence of Clostridium tetani  

Microsoft Academic Search

The decryption of prokaryotic genome sequences progresses rapidly and provides the scientific community with an enormous amount of information. Clostridial genome sequencing projects have been finished only recently, starting with the genome of the solvent-producing Clostridium acetobutylicum in 2001. A lot of attention has been devoted to the genomes of pathogenic clostridia. In 2002, the genome sequence of C. perfringens,

Holger Brüggemann; Gerhard Gottschalk

2004-01-01

326

Insights in metabolism and toxin production from the complete genome sequence of Clostridium tetani  

Microsoft Academic Search

The decryption of prokaryotic genome sequences progresses rapidly and provides the scientific community with an enormous amount of information. Clostridial genome sequencing projects have been finished only recently, starting with the genome of the solvent-producing Clostridium acetobutylicum in 2001. A lot of attention has been devoted to the genomes of pathogenic clostridia. In 2002, the genome sequence of C. perfringens,

Holger Br; Gerhard Gottschalkb

327

Baylor College of Medicine: The Dictyostelium Genome Sequencing Project  

NSDL National Science Digital Library

This website presents the Baylor College of Medicine Dictostelium Genome Sequencing Project. The site provides general background information about the project and the Dictyostelium Genome, however researchers will likely find the Tools for Scientists section of most use. This section includes several links devoted to: Mapping Information, Data Release, and a Preliminary Directory of Dictyostelium Genes version 3(UCSD). Additionally, researchers can link to an overview of Extrachromosomal Elements as well as perform a database search with SDSC Dicty BLAST. This site also provides several links to Other Sequencing Resources and External Sequence Analysis sites.

328

Genome Sequence of Luminous Piezophile Photobacterium phosphoreum ANT-2200.  

PubMed

Bacteria of the genus Photobacterium thrive worldwide in oceans and show substantially varied lifestyles, including free-living, commensal, pathogenic, symbiotic, and piezophilic. Here, we present the genome sequence of a luminous, piezophilic Photobacterium phosphoreum strain, ANT-2200, isolated from a water column at 2,200 m depth in the Mediterranean Sea. It is the first genomic sequence of the P. phosphoreum group. An analysis of the sequence provides insight into the adaptation of bacteria to the deep-sea habitat. PMID:24744322

Zhang, Sheng-Da; Barbe, Valérie; Garel, Marc; Zhang, Wei-Jia; Chen, Haitao; Santini, Claire-Lise; Murat, Dorothée; Jing, Hongmei; Zhao, Yuan; Lajus, Aurélie; Martini, Séverine; Pradel, Nathalie; Tamburini, Christian; Wu, Long-Fei

2014-01-01

329

Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center  

SciTech Connect

Advances in sequence genomics have resulted in an accumulation of a huge number of protein sequences derived from genome sequences. However, the functions of a large portion of them cannot be inferred based on the current methods of sequence homology detection to proteins of known functions. Three-dimensional structure can have an important impact in providing inference of molecular function (physical and chemical function) of a protein of unknown function. Structural genomics centers worldwide have been determining many 3-D structures of the proteins of unknown functions, and possible molecular functions of them have been inferred based on their structures. Combined with bioinformatics and enzymatic assay tools, the successful acceleration of the process of protein structure determination through high throughput pipelines enables the rapid functional annotation of a large fraction of hypothetical proteins. We present a brief summary of the process we used at the Berkeley Structural Genomics Center to infer molecular functions of proteins of unknown function.

Kim, Sung-Hou; Shin, Dong Hae; Hou, Jingtong; Chandonia, John-Marc; Das, Debanu; Choi, In-Geol; Kim, Rosalind; Kim, Sung-Hou

2007-09-02

330

Genomic multiple sequence alignments: refinement using a genetic algorithm  

PubMed Central

Background Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. Results We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. Conclusion We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time.

Wang, Chunlin; Lefkowitz, Elliot J

2005-01-01

331

LLNL Genomic Assessment: Viral and Bacterial Sequencing Needs for TMTI, Task 1.4.2 Report  

SciTech Connect

Good progress has been made on both bacterial and viral sequencing by the TMTI centers. While access to appropriate samples is a limiting factor to throughput, excellent progress has been made with respect to getting agreements in place with key sources of relevant materials. Sharing of sequenced genomes funded by TMTI has been extremely limited to date. The April 2010 exercise should force a resolution to this, but additional managerial pressures may be needed to ensure that rapid sharing of TMTI-funded sequencing occurs, regardless of collaborator constraints concerning ultimate publication(s). Policies to permit TMTI-internal rapid sharing of sequenced genomes should be written into all TMTI agreements with collaborators now being negotiated. TMTI needs to establish a Web-based system for tracking samples destined for sequencing. This includes metadata on sample origins and contributor, information on sample shipment/receipt, prioritization by TMTI, assignment to one or more sequencing centers (including possible TMTI-sponsored sequencing at a contributor site), and status history of the sample sequencing effort. While this system could be a component of the AFRL system, it is not part of any current development effort. Policy and standardized procedures are needed to ensure appropriate verification of all TMTI samples prior to the investment in sequencing. PCR, arrays, and classical biochemical tests are examples of potential verification methods. Verification is needed to detect miss-labeled, degraded, mixed or contaminated samples. Regular QC exercises are needed to ensure that the TMTI-funded centers are meeting all standards for producing quality genomic sequence data.

Slezak, T; Borucki, M; Lam, M; Lenhoff, R; Vitalis, E

2010-01-26

332

The Diploid Genome Sequence of an Individual Human  

Microsoft Academic Search

Presented here is a genome sequence of an individual human. It was produced from ;32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison

Samuel Levy; Granger Sutton; Pauline C. Ng; Lars Feuk; Aaron L. Halpern; Brian P. Walenz; Nelson Axelrod; Jiaqi Huang; Ewen F. Kirkness; Gennady Denisov; Yuan Lin; Jeffrey R. MacDonald; Andy Wing Chun Pang; Mary Shago; Timothy B. Stockwell; Alexia Tsiamouri; Vineet Bafna; Vikas Bansal; Saul A. Kravitz; Dana A. Busam; Karen Y. Beeson; Tina C. McIntosh; Karin A. Remington; Josep F. Abril; John Gill; Jon Borman; Yu-Hui Rogers; Marvin E. Frazier; Stephen W. Scherer; Robert L. Strausberg; J. Craig Venter

2007-01-01

333

Sequence-Based Mapping of the Polyploid Wheat Genome  

PubMed Central

The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40?100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome.

Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

2013-01-01

334

Sequence-based mapping of the polyploid wheat genome.  

PubMed

The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40-100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome. PMID:23665877

Saintenac, Cyrille; Jiang, Dayou; Wang, Shichen; Akhunov, Eduard

2013-07-01

335

Draft Genome Sequences of Two Virulent Serotypes of Avian Pasteurella multocida  

PubMed Central

Here we report the draft genome sequences of two virulent avian strains of Pasteurella multocida. Comparative analyses of these genomes were done with the published genome sequence of avirulent P. multocida strain Pm70.

Abrahante, Juan E.; Johnson, Timothy J.; Hunter, Samuel S.; Maheswaran, Samuel K.; Hauglund, Melissa J.; Bayles, Darrell O.; Tatum, Fred M.

2013-01-01

336

Performance comparison of whole-genome sequencing platforms  

PubMed Central

Whole-genome sequencing is becoming commonplace, but the accuracy and completeness of variant calling by the most widely used platforms from Illumina and Complete Genomics have not been reported. Here we sequenced the genome of an individual with both technologies to a high average coverage of ~76×, and compared their performance with respect to sequence coverage and calling of single-nucleotide variants (SNVs), insertions and deletions (indels). Although 88.1% of the ~3.7 million unique SNVs were concordant between platforms, there were tens of thousands of platform-specific calls located in genes and other genomic regions. In contrast, 26.5% of indels were concordant between platforms. Target enrichment validated 92.7% of the concordant SNVs, whereas validation by genotyping array revealed a sensitivity of 99.3%. The validation experiments also suggested that >60% of the platform-specific variants were indeed present in the genome. Our results have important implications for understanding the accuracy and completeness of the genome sequencing platforms.

Lam, Hugo Y K; Clark, Michael J; Chen, Rui; Chen, Rong; Natsoulis, Georges; O'Huallachain, Maeve; Dewey, Frederick E; Habegger, Lukas; Ashley, Euan A; Gerstein, Mark B; Butte, Atul J; Ji, Hanlee P; Snyder, Michael

2014-01-01

337

Genome Sequence of the Pea Aphid Acyrthosiphon pisum  

PubMed Central

Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems.

2010-01-01

338

Genome sequence of the pea aphid Acyrthosiphon pisum.  

PubMed

Aphids are important agricultural pests and also biological models for studies of insect-plant interactions, symbiosis, virus vectoring, and the developmental causes of extreme phenotypic plasticity. Here we present the 464 Mb draft genome assembly of the pea aphid Acyrthosiphon pisum. This first published whole genome sequence of a basal hemimetabolous insect provides an outgroup to the multiple published genomes of holometabolous insects. Pea aphids are host-plant specialists, they can reproduce both sexually and asexually, and they have coevolved with an obligate bacterial symbiont. Here we highlight findings from whole genome analysis that may be related to these unusual biological features. These findings include discovery of extensive gene duplication in more than 2000 gene families as well as loss of evolutionarily conserved genes. Gene family expansions relative to other published genomes include genes involved in chromatin modification, miRNA synthesis, and sugar transport. Gene losses include genes central to the IMD immune pathway, selenoprotein utilization, purine salvage, and the entire urea cycle. The pea aphid genome reveals that only a limited number of genes have been acquired from bacteria; thus the reduced gene count of Buchnera does not reflect gene transfer to the host genome. The inventory of metabolic genes in the pea aphid genome suggests that there is extensive metabolite exchange between the aphid and Buchnera, including sharing of amino acid biosynthesis between the aphid and Buchnera. The pea aphid genome provides a foundation for post-genomic studies of fundamental biological questions and applied agricultural problems. PMID:20186266

2010-02-01

339

Adaptive seeds tame genomic sequence comparison  

PubMed Central

The main way of analyzing biological sequences is by comparing and aligning them to each other. It remains difficult, however, to compare modern multi-billionbase DNA data sets. The difficulty is caused by the nonuniform (oligo)nucleotide composition of these sequences, rather than their size per se. To solve this problem, we modified the standard seed-and-extend approach (e.g., BLAST) to use adaptive seeds. Adaptive seeds are matches that are chosen based on their rareness, instead of using fixed-length matches. This method guarantees that the number of matches, and thus the running time, increases linearly, instead of quadratically, with sequence length. LAST, our open source implementation of adaptive seeds, enables fast and sensitive comparison of large sequences with arbitrarily nonuniform composition.

Kielbasa, Szymon M.; Wan, Raymond; Sato, Kengo; Horton, Paul; Frith, Martin C.

2011-01-01

340

Comprehensive genome sequence analysis of a breast cancer amplicon.  

PubMed

Gene amplification occurs in most solid tumors and is associated with poor prognosis. Amplification of 20q13.2 is common to several tumor types including breast cancer. The 1 Mb of sequence spanning the 20q13.2 breast cancer amplicon is one of the most exhaustively studied segments of the human genome. These studies have included amplicon mapping by comparative genomic hybridization (CGH), fluorescent in-situ hybridization (FISH), array-CGH, quantitative microsatellite analysis (QUMA), and functional genomic studies. Together these studies revealed a complex amplicon structure suggesting the presence of at least two driver genes in some tumors. One of these, ZNF217, is capable of immortalizing human mammary epithelial cells (HMEC) when overexpressed. In addition, we now report the sequencing of this region in human and mouse, and on quantitative expression studies in tumors. Amplicon localization now is straightforward and the availability of human and mouse genomic sequence facilitates their functional analysis. However, comprehensive annotation of megabase-scale regions requires integration of vast amounts of information. We present a system for integrative analysis and demonstrate its utility on 1.2 Mb of sequence spanning the 20q13.2 breast cancer amplicon and 865 kb of syntenic murine sequence. We integrate tumor genome copy number measurements with exhaustive genome landscape mapping, showing that amplicon boundaries are associated with maxima in repetitive element density and a region of evolutionary instability. This integration of comprehensive sequence annotation, quantitative expression analysis, and tumor amplicon boundaries provide evidence for an additional driver gene prefoldin 4 (PFDN4), coregulated genes, conserved noncoding regions, and associate repetitive elements with regions of genomic instability at this locus. PMID:11381030

Collins, C; Volik, S; Kowbel, D; Ginzinger, D; Ylstra, B; Cloutier, T; Hawkins, T; Predki, P; Martin, C; Wernick, M; Kuo, W L; Alberts, A; Gray, J W

2001-06-01

341

Draft genome sequence of the Tibetan antelope  

PubMed Central

The Tibetan antelope (Pantholops hodgsonii) is endemic to the extremely inhospitable high-altitude environment of the Qinghai-Tibetan Plateau, a region that has a low partial pressure of oxygen and high ultraviolet radiation. Here we generate a draft genome of this artiodactyl and use it to detect the potential genetic bases of highland adaptation. Compared with other plain-dwelling mammals, the genome of the Tibetan antelope shows signals of adaptive evolution and gene-family expansion in genes associated with energy metabolism and oxygen transmission. Both the highland American pika, and the Tibetan antelope have signals of positive selection for genes involved in DNA repair and the production of ATPase. Genes associated with hypoxia seem to have experienced convergent evolution. Thus, our study suggests that common genetic mechanisms might have been utilized to enable high-altitude adaptation.

Ge, Ri-Li; Cai, Qingle; Shen, Yong-Yi; San, A; Ma, Lan; Zhang, Yong; Yi, Xin; Chen, Yan; Yang, Lingfeng; Huang, Ying; He, Rongjun; Hui, Yuanyuan; Hao, Meirong; Li, Yue; Wang, Bo; Ou, Xiaohua; Xu, Jiaohui; Zhang, Yongfen; Wu, Kui; Geng, Chunyu; Zhou, Weiping; Zhou, Taicheng; Irwin, David M.; Yang, Yingzhong; Ying, Liu; Bao, Haihua; Kim, Jaebum; Larkin, Denis M.; Ma, Jian; Lewin, Harris A.; Xing, Jinchuan; Platt, Roy N.; Ray, David A.; Auvil, Loretta; Capitanu, Boris; Zhang, Xiufeng; Zhang, Guojie; Murphy, Robert W.; Wang, Jun; Zhang, Ya-Ping; Wang, Jian

2013-01-01

342

Terminally repeated sequences in the avian sarcoma virus RNA genome.  

PubMed Central

The initiation of DNA synthesis in vitro by RNA-directed DNA polymerase (deoxynucleosidetriphosphate: DNA deoxynucleotidyltransferase, EC 2.7.7.7) of avian oncornaviruses requires a tRNAtrp primer molecule located close to the 5' end of the viral RNA genome. DNA transcripts, 100 nucleotides in length, initiated on the tRNAtrp primer molecule contain nucleotide sequences complementary to a large (25 nucleotides) RNase T1 oligonucleotide, T-13, located at the 5' terminus of the avian sarcoma virus RNA genome. tRNAtrp-initiated DNA transcripts with a length of about 70 nucleotides contain substantially fewer nucleotide sequences complementary to this 5'-terminal oligonucleotide, suggesting that the tRNAtrp primer associated with the avian sarcoma virus RNA is located approximately 100 nucleotides from the 5' end of the RNA. In addition, we present evidence to demonstrate that DNA transcribed from avian sarcoma virus RNA sequences located at the 3' end, immediately adjacent to the poly(A), contains nucleotide sequences that are complementary to the 5'-terminal T1 oligonucleotide T-13. These data indicate that the 5' end of the viral genome contains nucleotide sequences that are repeated at the 3' end of the genome. We conclude that the avian oncornavirus RNA genome is terminally redundant. Images

Collett, M S; Dierks, P; Cahill, J F; Faras, A J; Parsons, J T

1977-01-01

343

A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)  

SciTech Connect

Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

FitzGerald, Michael [Broad Institute] [Broad Institute

2012-06-01

344

A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)  

ScienceCinema

Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

FitzGerald, Michael [Broad Institute

2013-02-12

345

Whole-genome haplotyping by dilution, amplification, and sequencing  

PubMed Central

Standard whole-genome genotyping technologies are unable to determine haplotypes. Here we describe a method for rapid and cost-effective long-range haplotyping. Genomic DNA is diluted and distributed into multiple aliquots such that each aliquot receives a fraction of a haploid copy. The DNA template in each aliquot is amplified by multiple displacement amplification, converted into barcoded sequencing libraries using Nextera technology, and sequenced in multiplexed pools. To assess the performance of our method, we combined two male genomic DNA samples at equal ratios, resulting in a sample with diploid X chromosomes with known haplotypes. Pools of the multiplexed sequencing libraries were subjected to targeted pull-down of a 1-Mb contiguous region of the X-chromosome Duchenne muscular dystrophy gene. We were able to phase the Duchenne muscular dystrophy region into two contiguous haplotype blocks with a mean length of 494 kb. The haplotypes showed 99% agreement with the consensus base calls made by sequencing the individual DNAs. We subsequently used the strategy to haplotype two human genomes. Standard genomic sequencing to identify all heterozygous SNPs in the sample was combined with dilution-amplification–based sequencing data to resolve the phase of identified heterozygous SNPs. Using this procedure, we were able to phase >95% of the heterozygous SNPs from the diploid sequence data. The N50 for a Yoruba male DNA was 702 kb whereas the N50 for a European female DNA was 358 kb. Therefore, the strategy described here is suitable for haplotyping of a set of targeted regions as well as of the entire genome.

Kaper, Fiona; Swamy, Sajani; Klotzle, Brandy; Munchel, Sarah; Cottrell, Joseph; Bibikova, Marina; Chuang, Han-Yu; Kruglyak, Semyon; Ronaghi, Mostafa; Eberle, Michael A.; Fan, Jian-Bing

2013-01-01

346

Complete genome sequence of Cellulophaga lytica type strain (LIM-21).  

PubMed

Cellulophaga lytica (Lewin 1969) Johansen et al. 1999 is the type species of the genus Cellulophaga, which belongs to the family Flavobacteriaceae within the phylum 'Bacteroidetes' and was isolated from marine beach mud in Limon, Costa Rica. The species is of biotechnological interest because its members produce a wide range of extracellular enzymes capable of degrading proteins and polysaccharides. After the genome sequence of Cellulophaga algicola this is the second completed genome sequence of a member of the genus Cellulophaga. The 3,765,936 bp long genome with its 3,303 protein-coding and 55 RNA genes consists of one circular chromosome and is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21677859

Pati, Amrita; Abt, Birte; Teshima, Hazuki; Nolan, Matt; Lapidus, Alla; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Mavromatis, Konstantinos; Ovchinikova, Galina; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Jeffries, Cynthia D; Detter, John C; Brambilla, Evelyne-Marie; Kannan, K Palani; Rohde, Manfred; Spring, Stefan; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Ivanova, Natalia

2011-04-29

347

Complete genome sequence of Arthrobacter sp. strain FB24  

PubMed Central

Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.

Nakatsu, Cindy H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, Thomas; Han, Cliff; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

2013-01-01

348

Genome rearrangements caused by interstitial telomeric sequences in yeast.  

PubMed

Interstitial telomeric sequences (ITSs) are present in many eukaryotic genomes and are linked to genome instabilities and disease in humans. The mechanisms responsible for ITS-mediated genome instability are not understood in molecular detail. Here, we use a model Saccharomyces cerevisiae system to characterize genome instability mediated by yeast telomeric (Ytel) repeats embedded within an intron of a reporter gene inside a yeast chromosome. We observed a very high rate of small insertions and deletions within the repeats. We also found frequent gross chromosome rearrangements, including deletions, duplications, inversions, translocations, and formation of acentric minichromosomes. The inversions are a unique class of chromosome rearrangement involving an interaction between the ITS and the true telomere of the chromosome. Because we previously found that Ytel repeats cause strong replication fork stalling, we suggest that formation of double-stranded DNA breaks within the Ytel sequences might be responsible for these gross chromosome rearrangements. PMID:24191060

Aksenova, Anna Y; Greenwell, Patricia W; Dominska, Margaret; Shishkin, Alexander A; Kim, Jane C; Petes, Thomas D; Mirkin, Sergei M

2013-12-01

349

Complete genome sequence of Desulfotomaculum acetoxidans type strain (5575T)  

SciTech Connect

Desulfotomaculum acetoxidans Widdel and Pfennig 1977 was one of the first sulfate-reducing bacteria known to grow with acetate as sole energy and carbon source. It is able to oxidize substrates completely to carbon dioxide with sulfate as the electron acceptor, which is reduced to hydrogen sulfide. All available data about this species are based on strain 5575T, isolated from piggery waste in Germany. Here we describe the features of this organ-ism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a Desulfotomaculum species with validly published name. The 4,545,624 bp long single replicon genome with its 4370 protein-coding and 100 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Schroder, Maren [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Gleim, Dorothea [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sims, David [Los Alamos National Laboratory (LANL); Meincke, Linda [Los Alamos National Laboratory (LANL); Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Tom [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Han, Cliff [Los Alamos National Laboratory (LANL)

2009-01-01

350

Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)  

SciTech Connect

Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the ge-nus, which until recently was the only genus within the actinobacterial family Acidimicrobia-ceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome se-quence, and annotation. This is the first complete genome sequence of the order Acidomi-crobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Bruce, David [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

2009-01-01

351

Draft genome sequence of the rubber tree Hevea brasiliensis  

PubMed Central

Background Hevea brasiliensis, a member of the Euphorbiaceae family, is the major commercial source of natural rubber (NR). NR is a latex polymer with high elasticity, flexibility, and resilience that has played a critical role in the world economy since 1876. Results Here, we report the draft genome sequence of H. brasiliensis. The assembly spans ~1.1 Gb of the estimated 2.15 Gb haploid genome. Overall, ~78% of the genome was identified as repetitive DNA. Gene prediction shows 68,955 gene models, of which 12.7% are unique to Hevea. Most of the key genes associated with rubber biosynthesis, rubberwood formation, disease resistance, and allergenicity have been identified. Conclusions The knowledge gained from this genome sequence will aid in the future development of high-yielding clones to keep up with the ever increasing need for natural rubber.

2013-01-01

352

Complete genome sequence of Truepera radiovictrix type strain (RQ-24).  

PubMed

Truepera radiovictrix Albuquerque et al. 2005 is the type species of the genus Truepera within the phylum "Deinococcus/Thermus". T. radiovictrix is of special interest not only because of its isolated phylogenetic location in the order Deinococcales, but also because of its ability to grow under multiple extreme conditions in alkaline, moderately saline, and high temperature habitats. Of particular interest is the fact that, T. radiovictrix is also remarkably resistant to ionizing radiation, a feature it shares with members of the genus Deinococcus. This is the first completed genome sequence of a member of the family Trueperaceae and the fourth type strain genome sequence from a member of the order Deinococcales. The 3,260,398 bp long genome with its 2,994 protein-coding and 52 RNA genes consists of one circular chromosome and is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21475591

Ivanova, Natalia; Rohde, Christine; Munk, Christine; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brambilla, Evelyne; Rohde, Manfred; Göker, Markus; Tindall, Brian J; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Lapidus, Alla

2011-01-01

353

Complete genome sequence of Arthrobacter sp. strain FB24  

SciTech Connect

Arthrobacter sp. strain FB24 is a species in the genus Arthrobacter Conn and Dimmick 1947, in the family Micrococcaceae and class Actinobacteria. A number of Arthrobacter genome sequences have been completed because of their important role in soil, especially bioremediation. This isolate is of special interest because it is tolerant to multiple metals and it is extremely resistant to elevated concentrations of chromate. The genome consists of a 4,698,945 bp circular chromosome and three plasmids (96,488, 115,507, and 159,536 bp, a total of 5,070,478 bp), coding 4,536 proteins of which 1,257 are without known function. This genome was sequenced as part of the DOE Joint Genome Institute Program.

Nakatsu, C. H.; Barabote, Ravi; Thompson, Sue; Bruce, David; Detter, Chris; Brettin, T.; Han, Cliff F.; Beasley, Federico; Chen, Weimin; Konopka, Allan; Xie, Gary

2013-09-30

354

Complete genome sequence of Meiothermus ruber type strain (21T)  

SciTech Connect

Meiothermus ruber (Loginova et al. 1984) Nobre et al. 1996 is the type species of the genus Meiothermus. This thermophilic genus is of special interest, as its members can be affiliated to either low-temperature or high-temperature groups. The temperature related split is in accordance with the chemotaxonomic feature of the polar lipids. M. ruber is a representative of the low-temperature group. This is the first completed genome sequence of the genus Meiothermus and only the third genome sequence to be published from a member of the family Thermaceae. The 3,097,457 bp long genome with its 3,052 protein-coding and 53 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Goltsman, Eugene [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Fahnrich, Regine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

2010-01-01

355

Complete genome sequence of Arcobacter nitrofigilis type strain (CIT)  

SciTech Connect

Arcobacter nitrofigilis (McClung et al. 1983) Vandamme et al. 1991 is the type species of the genus Arcobacter in the epsilonproteobacterial family Campylobacteraceae. The species was first described in 1983 as Campylobacter nitrofigilis [1] after its detection as a free-living, nitrogen-fixing Campylobacter species associated with Spartina alterniflora Loisel. roots [2]. It is of phylogenetic interest because of its lifestyle as a symbiotic organism in a marine environment in contrast to many other Arcobacter species which are associated with warm-blooded animals and tend to be pathogenic. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a type stain of the genus Arcobacter. The 3,192,235 bp genome with its 3,154 protein-coding and 70 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Chertkov, Olga [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute

2010-01-01

356

959 Nematode Genomes: a semantic wiki for coordinating sequencing projects.  

PubMed

Genome sequencing has been democratized by second-generation technologies, and even small labs can sequence metazoan genomes now. In this article, we describe '959 Nematode Genomes'--a community-curated semantic wiki to coordinate the sequencing efforts of individual labs to collectively sequence 959 genomes spanning the phylum Nematoda. The main goal of the wiki is to track sequencing projects that have been proposed, are in progress, or have been completed. Wiki pages for species and strains are linked to pages for people and organizations, using machine- and human-readable metadata that users can query to see the status of their favourite worm. The site is based on the same platform that runs Wikipedia, with semantic extensions that allow the underlying taxonomy and data storage models to be maintained and updated with ease compared with a conventional database-driven web site. The wiki also provides a way to track and share preliminary data if those data are not polished enough to be submitted to the official sequence repositories. In just over a year, this wiki has already fostered new international collaborations and attracted newcomers to the enthusiastic community of nematode genomicists. www.nematodegenomes.org. PMID:22058131

Kumar, Sujai; Schiffer, Philipp H; Blaxter, Mark

2012-01-01

357

Whole genome shotgun sequencing guided by bioinformatics pipelines--an optimized approach for an established technique.  

PubMed

While the sequencing of bacterial genomes has become a routine procedure at major sequencing centers, there are still a number of genome projects at small- or medium-size facilities. For these facilities a maximum of control over sequencing, assembling and finishing is essential. At the same time, facilities have to be able to co-operate at minimum costs for the overall project. We have established a pipeline for the distributed sequencing of Alcanivorax borkumensis SK2, Azoarcus sp. BH72, Clavibacter michiganensis subsp. michiganensis NCPPB382, Sorangium cellulosum So ce56 and Xanthomonas campestris pv. vesicatoria 85-10. Our pipeline relies on standard tools (e.g. PHRED/PHRAP, CAP3 and Consed/Autofinish) wherever possible, supplementing them with new tools (BioMake and BACCardI) to achieve the aims described above. PMID:14651855

Kaiser, Olaf; Bartels, Daniela; Bekel, Thomas; Goesmann, Alexander; Kespohl, Sebastian; Pühler, Alfred; Meyer, Folker

2003-12-19

358

Complete genome sequence of the fish pathogen Flavobacterium psychrophilum  

Microsoft Academic Search

We report here the complete genome sequence of the virulent strain JIP02\\/86 (ATCC 49511) of Flavobacterium psychrophilum, a widely distributed pathogen of wild and cultured salmonid fish. The genome consists of a 2,861,988–base pair (bp) circular chromosome with 2,432 predicted protein-coding genes. Among these predicted proteins, stress response mediators, gliding motility proteins, adhesins and many putative secreted proteases are probably

Mekki Boussaha; Valentin Loux; Jean-François Bernardet; Christian Michel; Brigitte Kerouault; Stanislas Mondot; Pierre Nicolas; Robert Bossy; Christophe Caron; Philippe Bessières; Jean-François Gibrat; Stéphane Claverol; Fabien Dumetz; Michel Le Hénaff; Abdenour Benmansour; Eric Duchaud

2007-01-01

359

Complete genome sequence of Bacillus cereus bacteriophage BCP78.  

PubMed

Bacillus cereus is generally found in soil habitats, and it contaminates a wide variety of foods, causing food poisoning with symptoms such as vomiting and diarrhea. To develop a novel biocontrol agent to inhibit this pathogen, bacteriophage BCP78 belonging to the Siphoviridae family was isolated from a fermented food sample. Here we announce the complete genome sequence of BCP78, which may be useful for understanding its inhibition mechanism against B. cereus, and describe major findings from the genome annotation. PMID:22158847

Lee, Ju-Hoon; Shin, Hakdong; Son, Bokyung; Ryu, Sangryeol

2012-01-01

360

The complete genomic sequence of Nocardia farcinica IFM 10152  

Microsoft Academic Search

We determined the genomic sequence of Nocardia farcinica IFM 10152, a clinical isolate, and revealed the molecular basis of its versatility. The genome consists of a single circular chromosome of 6,021,225 bp with an average G+C content of 70.8% and two plasmids of 184,027 (pNF1) and 87,093 (pNF2) bp with average G+C contents of 67.2% and 68.4%, respectively. The chromosome

Jun Ishikawa; Atsushi Yamashita; Yuzuru Mikami; Yasutaka Hoshino; Haruyo Kurita; Kunimoto Hotta; Tadayoshi Shiba; Masahira Hattori

2004-01-01

361

Genome sequence of the plant pathogen Ralstonia solanacearum  

Microsoft Academic Search

Ralstonia solanacearum is a devastating, soil-borne plant pathogen with a global distribution and an unusually wide host range. It is a model system for the dissection of molecular determinants governing pathogenicity. We present here the complete genome sequence and its analysis of strain GMI1000. The 5.8-megabase (Mb) genome is organized into two replicons: a 3.7-Mb chromosome and a 2.1-Mb megaplasmid.

M. Salanoubat; S. Genin; F. Artiguenave; J. Gouzy; S. Mangenot; M. Arlat; A. Billault; P. Brottier; J. C. Camus; L. Cattolico; M. Chandler; N. Choisne; C. Claudel-Renard; S. Cunnac; N. Demange; C. Gaspin; M. Lavie; A. Moisan; C. Robert; W. Saurin; T. Schiex; P. Siguier; P. Thébault; M. Whalen; P. Wincker; M. Levy; J. Weissenbach; C. A. Boucher

2002-01-01

362

The genome sequence of the filamentous fungus Neurospora crassa  

Microsoft Academic Search

Neurospora crassa is a central organism in the history of twentieth-century genetics, biochemistry and molecular biology. Here, we report a high-quality draft sequence of the N. crassa genome. The approximately 40-megabase genome encodes about 10,000 protein-coding genes-more than twice as many as in the fission yeast Schizosaccharomyces pombe and only about 25% fewer than in the fruitfly Drosophila melanogaster. Analysis

James E. Galagan; Sarah E. Calvo; Katherine A. Borkovich; Eric U. Selker; Nick D. Read; David Jaffe; William FitzHugh; Li-Jun Ma; Serge Smirnov; Seth Purcell; Bushra Rehman; Timothy Elkins; Reinhard Engels; Shunguang Wang; Cydney B. Nielsen; Jonathan Butler; Matthew Endrizzi; Dayong Qui; Peter Ianakiev; Deborah Bell-Pedersen; Mary Anne Nelson; Margaret Werner-Washburne; Claude P. Selitrennikoff; John A. Kinsey; Edward L. Braun; Alex Zelter; Ulrich Schulte; Gregory O. Kothe; Gregory Jedd; Werner Mewes; Chuck Staben; Edward Marcotte; David Greenberg; Alice Roy; Karen Foley; Jerome Naylor; Nicole Stange-Thomann; Robert Barrett; Sante Gnerre; Michael Kamal; Manolis Kamvysselis; Evan Mauceli; Cord Bielke; Stephen Rudd; Dmitrij Frishman; Svetlana Krystofova; Carolyn Rasmussen; Robert L. Metzenberg; David D. Perkins; Scott Kroken; Carlo Cogoni; Giuseppe Macino; David Catcheside; Weixi Li; Robert J. Pratt; Stephen A. Osmani; Colin P. C. DeSouza; Louise Glass; Marc J. Orbach; J. Andrew Berglund; Rodger Voelker; Oded Yarden; Michael Plamann; Stephan Seiler; Jay Dunlap; Alan Radford; Rodolfo Aramayo; Donald O. Natvig; Lisa A. Alex; Gertrud Mannhaupt; Daniel J. Ebbole; Michael Freitag; Ian Paulsen; Matthew S. Sachs; Eric S. Lander; Chad Nusbaum; Bruce Birren

2003-01-01

363

Contribution to Sequencing of the Deinococcus radiodurans Genome  

SciTech Connect

The stated goal of this project was to supply The Institute for Genomic Research (TIGR) with pure DNA from the bacterium Deinocmus radiodurans RI for purposes of complete genomic sequencing by TIGR. We subsequently decided to expand this project to include a second goal; this second goal was the development of a NotI chromosomal map of D. radiodurans R1 using Pulsed Field Gel Electrophoresis (PFGE).

Minton, K.W.

1999-03-11

364

Draft genome sequences of Actinobacillus pleuropneumoniae serotypes 2 and 6.  

PubMed

Actinobacillus pleuropneumoniae is a bacterial pathogen that causes highly contagious respiratory infection in pigs and has a serious impact on the production economy and animal welfare. As clear differences in virulence between serotypes have been observed, the genetic basis should be investigated at the genomic level. Here, we present the draft genome sequences of the A. pleuropneumoniae serotypes 2 (strain 4226) and 6 (strain Femo). PMID:20802047

Zhan, Bujie; Angen, Øystein; Hedegaard, Jakob; Bendixen, Christian; Panitz, Frank

2010-11-01

365

Draft Genome Sequences of Actinobacillus pleuropneumoniae Serotypes 2 and 6 ?  

PubMed Central

Actinobacillus pleuropneumoniae is a bacterial pathogen that causes highly contagious respiratory infection in pigs and has a serious impact on the production economy and animal welfare. As clear differences in virulence between serotypes have been observed, the genetic basis should be investigated at the genomic level. Here, we present the draft genome sequences of the A. pleuropneumoniae serotypes 2 (strain 4226) and 6 (strain Femo).

Zhan, Bujie; Angen, ?ystein; Hedegaard, Jakob; Bendixen, Christian; Panitz, Frank

2010-01-01

366

Genome sequence of the model medicinal mushroom Ganoderma lucidum.  

PubMed

Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi. PMID:22735441

Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C; Sun, Chao

2012-01-01

367

Genome sequence of the model medicinal mushroom Ganoderma lucidum  

PubMed Central

Ganoderma lucidum is a widely used medicinal macrofungus in traditional Chinese medicine that creates a diverse set of bioactive compounds. Here we report its 43.3-Mb genome, encoding 16,113 predicted genes, obtained using next-generation sequencing and optical mapping approaches. The sequence analysis reveals an impressive array of genes encoding cytochrome P450s (CYPs), transporters and regulatory proteins that cooperate in secondary metabolism. The genome also encodes one of the richest sets of wood degradation enzymes among all of the sequenced basidiomycetes. In all, 24 physical CYP gene clusters are identified. Moreover, 78 CYP genes are coexpressed with lanosterol synthase, and 16 of these show high similarity to fungal CYPs that specifically hydroxylate testosterone, suggesting their possible roles in triterpenoid biosynthesis. The elucidation of the G. lucidum genome makes this organism a potential model system for the study of secondary metabolic pathways and their regulation in medicinal fungi.

Chen, Shilin; Xu, Jiang; Liu, Chang; Zhu, Yingjie; Nelson, David R.; Zhou, Shiguo; Li, Chunfang; Wang, Lizhi; Guo, Xu; Sun, Yongzhen; Luo, Hongmei; Li, Ying; Song, Jingyuan; Henrissat, Bernard; Levasseur, Anthony; Qian, Jun; Li, Jianqin; Luo, Xiang; Shi, Linchun; He, Liu; Xiang, Li; Xu, Xiaolan; Niu, Yunyun; Li, Qiushi; Han, Mira V.; Yan, Haixia; Zhang, Jin; Chen, Haimei; Lv, Aiping; Wang, Zhen; Liu, Mingzhu; Schwartz, David C.; Sun, Chao

2012-01-01

368

Whole-genome sequencing for optimized patient management.  

PubMed

Whole-genome sequencing of patient DNA can facilitate diagnosis of a disease, but its potential for guiding treatment has been under-realized. We interrogated the complete genome sequences of a 14-year-old fraternal twin pair diagnosed with dopa (3,4-dihydroxyphenylalanine)-responsive dystonia (DRD; Mendelian Inheritance in Man #128230). DRD is a genetically heterogeneous and clinically complex movement disorder that is usually treated with l-dopa, a precursor of the neurotransmitter dopamine. Whole-genome sequencing identified compound heterozygous mutations in the SPR gene encoding sepiapterin reductase. Disruption of SPR causes a decrease in tetrahydrobiopterin, a cofactor required for the hydroxylase enzymes that synthesize the neurotransmitters dopamine and serotonin. Supplementation of l-dopa therapy with 5-hydroxytryptophan, a serotonin precursor, resulted in clinical improvements in both twins. PMID:21677200

Bainbridge, Matthew N; Wiszniewski, Wojciech; Murdock, David R; Friedman, Jennifer; Gonzaga-Jauregui, Claudia; Newsham, Irene; Reid, Jeffrey G; Fink, John K; Morgan, Margaret B; Gingras, Marie-Claude; Muzny, Donna M; Hoang, Linh D; Yousaf, Shahed; Lupski, James R; Gibbs, Richard A

2011-06-15

369

Analysis of Complete Genome Sequences of Human Rhinovirus  

PubMed Central

Human Rhinovirus (HRV) infection is the cause of about one-half of asthma and COPD exacerbations. With >100 serotypes in the HRV reference set an effort was undertaken to sequence their complete genomes so as to understand diversity, structural variation, and evolution of the virus. Analysis revealed conserved motifs, hypervariable regions, a potential fourth HRV species, within-serotype variation in field isolates, a non-scanning internal ribosome entry site, and evidence for HRV recombination. Techniques have now been developed using next generation sequencing to generate complete genomes from patient isolates with high throughput, deep coverage, and low costs. Thus relationships can now be sought between obstructive lung phenotypes and variation in HRV genomes in infected patients, and, potential novel therapeutic strategies developed based on HRV sequence.

Palmenberg, Ann C.; Rathe, Jennifer A.; Liggett, Stephen B.

2010-01-01

370

Sequence complexity profiles of prokaryotic genomic sequences: A fast algorithm for calculating linguistic complexity  

Microsoft Academic Search

Motivation: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal.

Olga G. Troyanskaya; Ora Arbell; Yair Koren; Gad M. Landau; Alexander Bolshoy

2002-01-01

371

Complete Genome Sequence of Rickettsia typhi and Comparison with Sequences of Other Rickettsiae  

Microsoft Academic Search

Rickettsia typhi, the causative agent of murine typhus, is an obligate intracellular bacterium with a life cycle involving both vertebrate and invertebrate hosts. Here we present the complete genome sequence of R. typhi (1,111,496 bp) and compare it to the two published rickettsial genome sequences: R. prowazekii and R. conorii. We identified 877 genes in R. typhi encoding 3 rRNAs,

Michael P. McLeod; Xiang Qin; Sandor E. Karpathy; Jason Gioia; Sarah K. Highlander; George E. Fox; Thomas Z. McNeill; Huaiyang Jiang; Donna Muzny; Leni S. Jacob; Alicia C. Hawes; Erica Sodergren; Rachel Gill; Jennifer Hume; Maggie Morgan; Guangwei Fan; Anita G. Amin; Richard A. Gibbs; Chao Hong; Xue-jie Yu; David H. Walker; George M. Weinstock

372

The genome sequence of the colonial chordate, Botryllus schlosseri  

PubMed Central

Botryllus schlosseri is a colonial urochordate that follows the chordate plan of development following sexual reproduction, but invokes a stem cell-mediated budding program during subsequent rounds of asexual reproduction. As urochordates are considered to be the closest living invertebrate relatives of vertebrates, they are ideal subjects for whole genome sequence analyses. Using a novel method for high-throughput sequencing of eukaryotic genomes, we sequenced and assembled 580 Mbp of the B. schlosseri genome. The genome assembly is comprised of nearly 14,000 intron-containing predicted genes, and 13,500 intron-less predicted genes, 40% of which could be confidently parceled into 13 (of 16 haploid) chromosomes. A comparison of homologous genes between B. schlosseri and other diverse taxonomic groups revealed genomic events underlying the evolution of vertebrates and lymphoid-mediated immunity. The B. schlosseri genome is a community resource for studying alternative modes of reproduction, natural transplantation reactions, and stem cell-mediated regeneration. DOI: http://dx.doi.org/10.7554/eLife.00569.001

Voskoboynik, Ayelet; Neff, Norma F; Sahoo, Debashis; Newman, Aaron M; Pushkarev, Dmitry; Koh, Winston; Passarelli, Benedetto; Fan, H Christina; Mantalas, Gary L; Palmeri, Karla J; Ishizuka, Katherine J; Gissi, Carmela; Griggio, Francesca; Ben-Shlomo, Rachel; Corey, Daniel M; Penland, Lolita; White, Richard A; Weissman, Irving L; Quake, Stephen R

2013-01-01

373

Sequence analysis of the genome of Bombyx mori nucleopolyhedrovirus.  

PubMed

The genome of the nucleopolyhedrovirus (NPV) (T3 strain) pathogenic for Bombyx mori (Bm) was sequenced and analysed. The BmNPV genome was 128,413 nucleotides long with a G+C content of 40% and contained 136 open reading frames (ORFs) encoding predicted proteins of over 60 amino acids. Although phenotypically different, the genome organizations of BmNPV and Autographa californica multinucleocapsid NPV (AcMNPV) were closely related. The BmNPV genome was over 90% identical to about three-quarters of the genome of AcMNPV. The relatedness of predicted amino acid sequences of corresponding ORFs between BmNPV and AcMNPV was about 90%. However, the BmNPV genome lacked homologues of the following AcMNPV ORFs: Ac3 (conotoxin), Ac7 (orf603), Ac48 (etm), Ac49 (pcna), Ac70 (hcf-1), Ac86 (pnk/pnl) and Ac134 (p94). In addition, BmNPV contained five ORFs related to Ac2. A high frequency of multiple 3 bp insertions was also found within BmNPV and AcMNPV coding sequences. PMID:10355780

Gomi, S; Majima, K; Maeda, S

1999-05-01

374

Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)  

SciTech Connect

Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Halisco- menobacter, which belongs to order 'Sphingobacteriales'. The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically un- charted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family 'Saprospiraceae'. The 8,771,651 bp long genome with its three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Daligault, Hajnalka E. [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Zeytun, Ahmet [Los Alamos National Laboratory (LANL); Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Verbarg, Susanne [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute

2011-01-01

375

Complete genome sequence of Haliscomenobacter hydrossis type strain (OT)  

PubMed Central

Haliscomenobacter hydrossis van Veen et al. 1973 is the type species of the genus Haliscomenobacter, which belongs to order "Sphingobacteriales". The species is of interest because of its isolated phylogenetic location in the tree of life, especially the so far genomically uncharted part of it, and because the organism grows in a thin, hardly visible hyaline sheath. Members of the species were isolated from fresh water of lakes and from ditch water. The genome of H. hydrossis is the first completed genome sequence reported from a member of the family "Saprospiraceae". The 8,771,651 bp long genome with its three plasmids of 92 kbp, 144 kbp and 164 kbp length contains 6,848 protein-coding and 60 RNA genes, and is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Daligault, Hajnalka; Lapidus, Alla; Zeytun, Ahmet; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Huntemann, Marcel; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Brambilla, Evelyne-Marie; Rohde, Manfred; Verbarg, Susanne; Goker, Markus; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Woyke, Tanja

2011-01-01

376

Low-pass sequencing for microbial comparative genomics  

PubMed Central

Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics.

Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

2004-01-01

377

Sequencing and Analysis of Neanderthal Genomic DNA  

Microsoft Academic Search

Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence

Noonan James P; Coop Graham; Kudaravalli Sridhar; Smith Doug; Krause Johannes; Alessi Joe; Chen Feng; Platt Darren; Pääbo Svante; Pritchard Jonathan K; Edward M. Rubin

2006-01-01

378

Draft genome sequence of Bacillus endophyticus 2102.  

PubMed

Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents. PMID:23012284

Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung; Jeong, Haeyoung; Lee, Dong-Woo

2012-10-01

379

Draft Genome Sequence of Bacillus endophyticus 2102  

PubMed Central

Bacillus endophyticus 2102 is an endospore-forming, plant growth-promoting rhizobacterium isolated from a hypersaline pond in South Korea. Here we present the draft sequence of B. endophyticus 2102, which is of interest because of its potential use in the industrial production of algaecides and bioplastics and for the treatment of industrial textile effluents.

Lee, Yong-Jik; Lee, Sang-Jae; Kim, Sun Hong; Lee, Sang Jun; Kim, Byoung-Chan; Lee, Han-Seung

2012-01-01

380

Complete genome sequence of Riemerella anatipestifer type strain (ATCC 11845).  

PubMed

Riemerella anatipestifer (Hendrickson and Hilbert 1932) Segers et al. 1993 is the type species of the genus Riemerella, which belongs to the family Flavobacteriaceae. The species is of interest because of the position of the genus in the phylogenetic tree and because of its role as a pathogen of commercially important avian species worldwide. This is the first completed genome sequence of a member of the genus Riemerella. The 2,155,121 bp long genome with its 2,001 protein-coding and 51 RNA genes consists of one circular chromosome and is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21677851

Mavromatis, Konstantinos; Lu, Megan; Misra, Monica; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Hammon, Nancy; Deshpande, Shweta; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Pagani, Ioanna; Ivanova, Natalia; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Jeffries, Cynthia D; Detter, John C; Brambilla, Evelyne-Marie; Rohde, Manfred; Göker, Markus; Gronow, Sabine; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

2011-04-29

381

Complete genome sequence of Cronobacter sakazakii bacteriophage CR3.  

PubMed

Due to the high risk of Cronobacter sakazakii infection in infants fed powdered milk formula and the emergence of antibiotic-resistant strains, an alternative biocontrol agent using bacteriophage is needed to control this pathogen. To further the development of such an agent, the C. sakazakii-targeting bacteriophage CR3 was isolated and its genome was completely sequenced. Here, we announce the genomic analysis results of the largest C. sakazakii phage known to date and report the major findings from the genome annotation. PMID:22570242

Shin, Hakdong; Lee, Ju-Hoon; Kim, Yeran; Ryu, Sangryeol

2012-06-01

382

Genome sequence and description of Aeromicrobium massiliense sp. nov.  

PubMed Central

Aeromicrobium massiliense strain JC14Tsp. nov. is the type strain of Aeromicrobium massiliense sp. nov., a new species within the genus Aeromicrobium. This strain, whose genome is described here, was isolated from the fecal microbiota of an asymptomatic patient. Aeromicrobium massiliense is an aerobic rod-shaped gram-positive bacterium. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,322,119 bp long genome contains 3,296 protein-coding and 51 RNA genes.

Ramasamy, Dhamodharan; Kokcha, Sahare; Lagier, Jean-Christophe; Nguyen, Thi-Thien; Raoult, Didier

2012-01-01

383

Complete genome sequence of Pyrolobus fumarii type strain (1AT)  

SciTech Connect

Pyrolobus fumarii Bl chl et al. 1997 is the type species of the genus Pyrolobus, which be- longs to the crenarchaeal family Pyrodictiaceae. The species is a facultatively microaerophilic non-motile crenarchaeon. It is of interest because of its isolated phylogenetic location in the tree of life and because it is a hyperthermophilic chemolithoautotroph known as the primary producer of organic matter at deep-sea hydrothermal vents. P. fumarii exhibits currently the highest optimal growth temperature of all life forms on earth (106 C). This is the first com- pleted genome sequence of a member of the genus Pyrolobus to be published and only the second genome sequence from a member of the family Pyrodictiaceae. Although Diversa Corporation announced the completion of sequencing of the P. fumarii genome on Septem- ber 25, 2001, this sequence was never released to the public. The 1,843,267 bp long genome with its 1,986 protein-coding and 52 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Hammon, Nancy [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Huntemann, Marcel [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Pagani, Ioanna [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Brambilla, Evelyne-Marie [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Huber, Harald [Universitat Regensburg, Regensburg, Germany; Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute

2011-01-01

384

Sequence analysis of the Xestia c-nigrum granulovirus genome.  

PubMed

The nucleotide sequence of the Xestia c-nigrum granulovirus (XcGV) genome was determined and found to comprise 178,733 bases with a G+C content of 40.7%. It contained 181 putative genes of 150 nucleotides or greater that showed minimal overlap. Eighty-four of these putative genes, which collectively accounted for 43% of the genome, are homologs of genes previously identified in the Autographa californica multinucleocapsid nucleopolyhedrovirus (AcMNPV) genome. These homologs showed on average 33% amino acid sequence identity to those from AcMNPV. Several genes reported to have major roles in AcMNPV biology including ie-2, gp64, and egt were not found in the XcGV genome. However, open reading frames with homology to DNA ligase, two DNA helicases (one similar to a yeast mitochondrial helicase and the other to a putative AcMNPV helicase), and four enhancins (virus enhancing factors) were found. In addition, several ORFs are repeated; there are 7 genes related to AcMNPV orf2, 4 genes related to AcMNPV orf145/150, and a number of repeated genes unique to XcGV. Eight major repeated sequences (XcGV hrs) that are similar to sequences found in the Trichoplusia ni GV genome (TnGV) were found. PMID:10502508

Hayakawa, T; Ko, R; Okano, K; Seong, S I; Goto, C; Maeda, S

1999-09-30

385

Castor Bean Organelle Genome Sequencing and Worldwide Genetic Diversity Analysis  

PubMed Central

Castor bean is an important oil-producing plant in the Euphorbiaceae family. Its high-quality oil contains up to 90% of the unusual fatty acid ricinoleate, which has many industrial and medical applications. Castor bean seeds also contain ricin, a highly toxic Type 2 ribosome-inactivating protein, which has gained relevance in recent years due to biosafety concerns. In order to gain knowledge on global genetic diversity in castor bean and to ultimately help the development of breeding and forensic tools, we carried out an extensive chloroplast sequence diversity analysis. Taking advantage of the recently published genome sequence of castor bean, we assembled the chloroplast and mitochondrion genomes extracting selected reads from the available whole genome shotgun reads. Using the chloroplast reference genome we used the methylation filtration technique to readily obtain draft genome sequences of 7 geographically and genetically diverse castor bean accessions. These sequence data were used to identify single nucleotide polymorphism markers and phylogenetic analysis resulted in the identification of two major clades that were not apparent in previous population genetic studies using genetic markers derived from nuclear DNA. Two distinct sub-clades could be defined within each major clade and large-scale genotyping of castor bean populations worldwide confirmed previously observed low levels of genetic diversity and showed a broad geographic distribution of each sub-clade.

Chan, Agnes P.; Williams, Amber L.; Rice, Danny W.; Liu, Xinyue; Melake-Berhan, Admasu; Huot Creasy, Heather; Puiu, Daniela; Rosovitz, M. J.; Khouri, Hoda M.; Beckstrom-Sternberg, Stephen M.; Allan, Gerard J.; Keim, Paul; Ravel, Jacques; Rabinowicz, Pablo D.

2011-01-01

386

Building a model: developing genomic resources for common milkweed (Asclepias syriaca) with low coverage genome sequencing  

PubMed Central

Background Milkweeds (Asclepias L.) have been extensively investigated in diverse areas of evolutionary biology and ecology; however, there are few genetic resources available to facilitate and compliment these studies. This study explored how low coverage genome sequencing of the common milkweed (Asclepias syriaca L.) could be useful in characterizing the genome of a plant without prior genomic information and for development of genomic resources as a step toward further developing A. syriaca as a model in ecology and evolution. Results A 0.5× genome of A. syriaca was produced using Illumina sequencing. A virtually complete chloroplast genome of 158,598 bp was assembled, revealing few repeats and loss of three genes: accD, clpP, and ycf1. A nearly complete rDNA cistron (18S-5.8S-26S; 7,541 bp) and 5S rDNA (120 bp) sequence were obtained. Assessment of polymorphism revealed that the rDNA cistron and 5S rDNA had 0.3% and 26.7% polymorphic sites, respectively. A partial mitochondrial genome sequence (130,764 bp), with identical gene content to tobacco, was also assembled. An initial characterization of repeat content indicated that Ty1/copia-like retroelements are the most common repeat type in the milkweed genome. At least one A. syriaca microread hit 88% of Catharanthus roseus (Apocynaceae) unigenes (median coverage of 0.29×) and 66% of single copy orthologs (COSII) in asterids (median coverage of 0.14×). From this partial characterization of the A. syriaca genome, markers for population genetics (microsatellites) and phylogenetics (low-copy nuclear genes) studies were developed. Conclusions The results highlight the promise of next generation sequencing for development of genomic resources for any organism. Low coverage genome sequencing allows characterization of the high copy fraction of the genome and exploration of the low copy fraction of the genome, which facilitate the development of molecular tools for further study of a target species and its relatives. This study represents a first step in the development of a community resource for further study of plant-insect co-evolution, anti-herbivore defense, floral developmental genetics, reproductive biology, chemical evolution, population genetics, and comparative genomics using milkweeds, and A. syriaca in particular, as ecological and evolutionary models.

2011-01-01

387

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome  

Microsoft Academic Search

Background  It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most\\u000a informative species and features of genome evolution for comparison remain to be determined.\\u000a \\u000a \\u000a \\u000a \\u000a Results  We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D.

Casey M Bergman; Barret D Pfeiffer; Diego E Rincón-Limas; Roger A Hoskins; Andreas Gnirke; Chris J Mungall; Adrienne M Wang; Brent Kronmiller; Joanne Pacleb; Soo Park; Mark Stapleton; Kenneth Wan; Reed A George; Pieter J de Jong; Juan Botas; Gerald M Rubin; Susan E Celniker

2002-01-01

388

Complete Mitochondrial Genome Sequence of Lichtheimia ramosa (syn. Lichtheimia hongkongensis)  

PubMed Central

We report the complete mitochondrial genome sequence of Lichtheimia ramosa (syn. Lichtheimia hongkongensis), the first complete mitochondrial DNA sequence of the genus Lichtheimia. This 31.8-kb mitochondrial genome encodes 11 subunits of respiratory chain complexes, 3 ATP synthase subunits, 25 tRNAs, and small and large rRNAs, with the gene order atp9-cox2-atp6-cox3-cox1-nad2-nad3-cob-nad1-nad6-nad5-nad4l-nad4-atp8.

Leung, Shui-Yee; Huang, Yi

2014-01-01

389

Deep Whole-Genome Sequencing of 100 Southeast Asian Malays  

PubMed Central

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2013-01-01

390

Simple sequences are ubiquitous repetitive components of eukaryotic genomes.  

PubMed Central

Simple sequences are stretches of DNA which consist of only one, or a few tandemly repeated nucleotides, for example poly (dA) X poly (dT) or poly (dG-dT) X poly (dC-dA). These two types of simple sequence have been shown to be repetitive and interspersed in many eukaryotic genomes. Several other types have been found by sequencing eukaryotic DNA. In this report we have undertaken a systematical survey for simple sequences. We hybridized synthetical simple sequence DNA to genome blots of phylogenetically different organisms. We found that many, probably even all possible types of simple sequence are repetitive components of eukaryotic genomes. We propose therefore that they arise by common mechanisms namely slippage replication and unequal crossover and that they might have no general function with regards to gene expression. This latter inference is supported by the fact that we have detected simple sequences only in the metabolically inactive micronucleus of the protozoan Stylonychia, but not in the metabolically active macronucleus which is derived from the micronucleus by chromosome diminution. Images

Tautz, D; Renz, M

1984-01-01

391

The genome sequence of the model ascomycete fungus Podospora anserina  

PubMed Central

Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. Conclusion The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.

Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Segurens, Beatrice; Poulain, Julie; Anthouard, Veronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Dequard-Chablat, Michelle; Picard, Marguerite; Contamine, Veronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Veronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne GJ; Henrissat, Bernard; Khoury, Riyad EL; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarre, Berangere; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

2008-01-01

392

Sequence modelling and an extensible data model for genomic database  

SciTech Connect

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

Li, Peter Wei-Der (California Univ., San Francisco, CA (United States) Lawrence Berkeley Lab., CA (United States))

1992-01-01

393

Sequence modelling and an extensible data model for genomic database  

SciTech Connect

The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data model that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.

Li, Peter Wei-Der [California Univ., San Francisco, CA (United States)]|[Lawrence Berkeley Lab., CA (United States)

1992-01-01

394

Genomic insight into the common carp ( Cyprinus carpio ) genome by sequencing analysis of BAC-end sequences  

Microsoft Academic Search

Background  Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae\\u000a species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively\\u000a underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development,\\u000a linkage map and physical map integration,

Peng Xu; Jiongtang Li; Yan Li; Runzi Cui; Jintu Wang; Jian Wang; Yan Zhang; Zixia Zhao; Xiaowen Sun

2011-01-01

395

Recurring Mutations Found by Sequencing an Acute Myeloid Leukemia Genome  

PubMed Central

BACKGROUND The full complement of DNA mutations that are responsible for the pathogenesis of acute myeloid leukemia (AML) is not yet known. METHODS We used massively parallel DNA sequencing to obtain a very high level of coverage (approximately 98%) of a primary, cytogenetically normal, de novo genome for AML with minimal maturation (AML-M1) and a matched normal skin genome. RESULTS We identified 12 acquired (somatic) mutations within the coding sequences of genes and 52 somatic point mutations in conserved or regulatory portions of the genome. All mutations appeared to be heterozygous and present in nearly all cells in the tumor sample. Four of the 64 mutations occurred in at least 1 additional AML sample in 188 samples that were tested. Mutations in NRAS and NPM1 had been identified previously in patients with AML, but two other mutations had not been identified. One of these mutations, in the IDH1 gene, was present in 15 of 187 additional AML genomes tested and was strongly associated with normal cytogenetic status; it was present in 13 of 80 cytogenetically normal samples (16%). The other was a nongenic mutation in a genomic region with regulatory potential and conservation in higher mammals; we detected it in one additional AML tumor. The AML genome that we sequenced contains approximately 750 point mutations, of which only a small fraction are likely to be relevant to pathogenesis. CONCLUSIONS By comparing the sequences of tumor and skin genomes of a patient with AML-M1, we have identified recurring mutations that may be relevant for pathogenesis.

Mardis, Elaine R.; Ding, Li; Dooling, David J.; Larson, David E.; McLellan, Michael D.; Chen, Ken; Koboldt, Daniel C.; Fulton, Robert S.; Delehaunty, Kim D.; McGrath, Sean D.; Fulton, Lucinda A.; Locke, Devin P.; Magrini, Vincent J.; Abbott, Rachel M.; Vickery, Tammi L.; Reed, Jerry S.; Robinson, Jody S.; Wylie, Todd; Smith, Scott M.; Carmichael, Lynn; Eldred, James M.; Harris, Christopher C.; Walker, Jason; Peck, Joshua B.; Du, Feiyu; Dukes, Adam F.; Sanderson, Gabriel E.; Brummett, Anthony M.; Clark, Eric; McMichael, Joshua F.; Meyer, Rick J.; Schindler, Jonathan K.; Pohl, Craig S.; Wallis, John W.; Shi, Xiaoqi; Lin, Ling; Schmidt, Heather; Tang, Yuzhu; Haipek, Carrie; Wiechert, Madeline E.; Ivy, Jolynda V.; Kalicki, Joelle; Elliott, Glendoria; Ries, Rhonda E.; Payton, Jacqueline E.; Westervelt, Peter; Tomasson, Michael H.; Watson, Mark A.; Baty, Jack; Heath, Sharon; Shannon, William D.; Nagarajan, Rakesh; Link, Daniel C.; Walter, Matthew J.; Graubert, Timothy A.; DiPersio, John F.; Wilson, Richard K.; Ley, Timothy J.

2011-01-01

396

Overview of PSB track on gene structure identification in large-scale genomic sequence  

SciTech Connect

The recent funding of more than a dozen major genome centers to begin community-wide high-throughput sequencing of the human genome has created a significant new challenge for the computational analysis of DNA sequence and the prediction of gene structure and function. It has been estimated that on average from 1996 to 2003, approximately 2 million bases of newly finished DNA sequence will be produced every day and be made available on the Internet and in central databases. The finished (fully assembled) sequence generated each day will represent approximately 75 new genes (and their respective proteins), and many times this number will be represented in partially completed sequences. The information contained in these is of immeasurable value to medical research, biotechnology, the pharmaceutical industry and researchers in a host of fields ranging from microorganism metabolism, to structural biology, to bioremediation. Sequencing of microorganisms and other model organisms is also ramping up at a very rapid rate. The genomes for yeast and several microorganisms such as H. influenza have recently been fully sequenced, although the significance of many genes remains to be determined.

Uberbacher, E.C.; Xu, Y.

1998-12-31

397

Complete genome sequence of Thauera aminoaromatica strain MZ1T  

PubMed Central

Thauera aminoaromatica strain MZ1T, an isolate belonging to genus Thauera, of the family Rhodocyclaceae and the class the Betaproteobacteria, has been characterized for its ability to produce abundant exopolysaccharide and degrade various aromatic compounds with nitrate as an electron acceptor. These properties, if fully understood at the genome-sequence level, can aid in environmental processing of organic matter in anaerobic cycles by short-circuiting a central anaerobic metabolite, acetate, from microbiological conversion to methane, a critical greenhouse gas. Strain MZ1T is the first strain from the genus Thauera with a completely sequenced genome. The 4,496,212 bp chromosome and 78,374 bp plasmid contain 4,071 protein-coding and 71 RNA genes, and were sequenced as part of the DOE Community Sequencing Program CSP_776774.

Jiang, Ke; Sanseverino, John; Chauhan, Archana; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Del Rio, Tijana Glavina; Dalin, Eileen; Tice, Hope; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Sims, David; Brettin, Thomas; Detter, John C.; Han, Cliff; Chang, Y.J.; Larimer, Frank; Land, Miriam; Hauser, Loren; Kyrpides, Nikos C.; Mikhailova, Natalia; Moser, Scott; Jegier, Patricia; Close, Dan; DeBruyn, Jennifer M.; Wang, Ying; Layton, Alice C.; Allen, Michael S.; Sayler, Gary S.

2012-01-01

398

Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes.  

PubMed

Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. Repeat analyses identified 33-45 direct and inverted repeats >or=30 bp with a sequence identity of at least 90%; all but five of the repeats shared by all four Solanaceae genomes are located in the same genes or intergenic regions, suggesting a functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in chloroplast genomes. Only four spacer regions are fully conserved (100% sequence identity) among all genomes; deletions or insertions within some intergenic spacer regions result in less than 25% sequence identity, underscoring the importance of choosing appropriate intergenic spacers for plastid transformation and providing valuable new information for phylogenetic utility of the chloroplast intergenic spacer regions. Comparison of coding sequences with expressed sequence tags showed considerable amount of variation, resulting in amino acid changes; none of the C-to-U conversions observed in potato and tomato were conserved in tobacco and Atropa. It is possible that there has been a loss of conserved editing sites in potato and tomato. PMID:16575560

Daniell, Henry; Lee, Seung-Bum; Grevich, Justin; Saski, Christopher; Quesada-Vargas, Tania; Guda, Chittibabu; Tomkins, Jeffrey; Jansen, Robert K

2006-05-01

399

Human-specific nonsense mutations identified by genome sequence comparisons  

Microsoft Academic Search

The comparative study of the human and chimpanzee genomes may shed light on the genetic ingredients for the evolution of the\\u000a unique traits of humans. Here, we present a simple procedure to identify human-specific nonsense mutations that might have\\u000a arisen since the human–chimpanzee divergence. The procedure involves collecting orthologous sequences in which a stop codon\\u000a of the human sequence is

Yoonsoo Hahn; Byungkook Lee

2006-01-01

400

The Complete Genome Sequence of Escherichia coli K-12  

Microsoft Academic Search

The 4,639,221- base pair sequence of Escherichia coli K-12 is presented. Of 4288 protein-coding genes annotated, 38 percent have no attributed function. Comparison with five other sequenced microbes reveals ubiquitous as well as narrowly distributed gene families; many families of similar genes within E. coli are also evident. The largest family of paralogous proteins contains 80 ABC transporters. The genome

Frederick R. Blattner; Guy Plunkett III; Craig A. Bloch; Nicole T. Perna; Valerie Burland; Monica Riley; Julio Collado-Vides; Jeremy D. Glasner; Christopher K. Rode; George F. Mayhew; Jason Gregor; Nelson Wayne Davis; Heather A. Kirkpatrick; Michael A. Goeden; Debra J. Rose; Bob Mau; Ying Shao

2007-01-01

401

Hybrid selection for sequencing pathogen genomes from clinical samples  

PubMed Central

We have adapted a solution hybrid selection protocol to enrich pathogen DNA in clinical samples dominated by human genetic material. Using mock mixtures of human and Plasmodium falciparum malaria parasite DNA as well as clinical samples from infected patients, we demonstrate an average of approximately 40-fold enrichment of parasite DNA after hybrid selection. This approach will enable efficient genome sequencing of pathogens from clinical samples, as well as sequencing of endosymbiotic organisms such as Wolbachia that live inside diverse metazoan phyla.

2011-01-01

402

High-throughput bisulfite sequencing in mammalian genomes  

Microsoft Academic Search

DNA methylation is a critical epigenetic mark that is essential for mammalian development and aberrant in many diseases including cancer. Over the past decade multiple methods have been developed and applied to characterize its genome-wide distribution. Of these, reduced representation bisulfite sequencing (RRBS) generates nucleotide resolution DNA methylation bisulfite sequencing libraries that enrich for CpG-dense regions by methylation-insensitive restriction digestion.

Zachary D. Smith; Hongcang Gu; Christoph Bock; Andreas Gnirke; Alexander Meissner

2009-01-01

403

Sequence and organization of the human mitochondrial genome  

Microsoft Academic Search

The complete sequence of the 16,569-base pair human mitochondrial genome is presented. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6, cytochrome b and eight other predicted protein coding genes have been located. The sequence shows extreme economy in that the genes have none or only a few

S. Anderson; A. T. Bankier; B. G. Barrell; M. H. L. de Bruijn; A. R. Coulson; J. Drouin; I. C. Eperon; D. P. Nierlich; B. A. Roe; F. Sanger; P. H. Schreier; A. J. H. Smith; R. Staden; I. G. Young

1981-01-01

404

The complete genome sequence of the gastric pathogen Helicobacter pylori  

Microsoft Academic Search

Helicobacter pylori, strain 26695, has a circular genome of 1,667,867 base pairs and 1,590 predicted coding sequences. Sequence analysis indicates that H. pylori has well-developed systems for motility, for scavenging iron, and for DNA restriction and modification. Many putative adhesins, lipoproteins and other outer membrane proteins were identified, underscoring the potential complexity of host-pathogen interaction. Based on the large number

Jean-F. Tomb; Owen White; Anthony R. Kerlavage; Rebecca A. Clayton; Granger G. Sutton; Robert D. Fleischmann; Karen A. Ketchum; Hans Peter Klenk; Steven Gill; Brian A. Dougherty; Karen Nelson; John Quackenbush; Lixin Zhou; Ewen F. Kirkness; Scott Peterson; Brendan Loftus; Delwood Richardson; Robert Dodson; Hanif G. Khalak; Anna Glodek; Keith McKenney; Lisa M. Fitzegerald; Norman Lee; Mark D. Adams; Erin K. Hickey; Douglas E. Berg; Jeanine D. Gocayne; Teresa R. Utterback; Jeremy D. Peterson; Jenny M. Kelley; Matthew D. Cotton; Janice M. Weidman; Claire Fujii; Cheryl Bowman; Larry Watthey; Erik Wallin; William S. Hayes; Mark Borodovsky; Peter D. Karp; Hamilton O. Smith; Claire M. Fraser; J. Craig Venter

1997-01-01

405

Evolutionary insights from suffix array-based genome sequence analysis  

Microsoft Academic Search

Gene and protein sequence analyses, central components of studies in modern biology are easily amenable to string matching\\u000a and pattern recognition algorithms. The growing need of analysing whole genome sequences more efficiently and thoroughly,\\u000a has led to the emergence of new computational methods. Suffix trees and suffix arrays are data structures, well known in many\\u000a other areas and are highly

Anindya Poddar; Nagasuma Chandra; Madhavi Ganapathiraju; K. Sekar; Judith Klein-Seetharaman; Raj Reddy; N. Balakrishnan

2007-01-01

406

Alastrim Smallpox Variola Minor Virus Genome DNA Sequences  

Microsoft Academic Search

Alastrim variola minor virus, which causes mild smallpox, was first recognized in Florida and South America in the late 19th century. Genome linear double-stranded DNA sequences (186,986 bp) of the alastrim virus Garcia-1966, a laboratory reference strain from an outbreak associated with 0.8% case fatalities in Brazil in 1966, were determined except for a 530-bp fragment of hairpin-loop sequences at

Sergei N. Shchelkunov; Alexei V. Totmenin; Vladimir N. Loparev; Pavel F. Safronov; Valery V. Gutorov; Vladimir E. Chizhikov; Janice C. Knight; Joseph M. Parsons; Robert F. Massung; Joseph J. Esposito

2000-01-01

407

Whole-genome sequences of Bacillus subtilis and close relatives.  

PubMed

We sequenced four strains of Bacillus subtilis and the type strains for two closely related species, Bacillus vallismortis and Bacillus mojavensis. We report the high-quality Sanger genome sequences of B. subtilis subspecies subtilis RO-NN-1 and AUSI98, B. subtilis subspecies spizizenii TU-B-10(T) and DV1-B-1, Bacillus mojavensis RO-H-1(T), and Bacillus vallismortis DV1-F-3(T). PMID:22493193

Earl, Ashlee M; Eppinger, Mark; Fricke, W Florian; Rosovitz, M J; Rasko, David A; Daugherty, Sean; Losick, Richard; Kolter, Roberto; Ravel, Jacques

2012-05-01

408

Complete Genome Sequence of a Street Rabies Virus from Mexico  

PubMed Central

A canine rabies virus (RABV) has been used as a street rabies virus in laboratory investigations. Its entire genome was sequenced and found to be closely related to that of canine RABV circulating in Mexico. Sequence comparison indicates that the virus is closely related to those in the “cosmopolitan” group, with high homology (89 to 93%) to clade I of rabies viruses. The virus is now termed dog rabies virus-Mexico (DRV-Mexico).

Zhang, Guoqing

2012-01-01

409

Mutator System Derivatives Isolated from Sugarcane Genome Sequence.  

PubMed

Mutator-like transposase is the most represented transposon transcript in the sugarcane transcriptome. Phylogenetic reconstructions derived from sequenced transcripts provided evidence that at least four distinct classes exist (I-IV) and that diversification among these classes occurred early in Angiosperms, prior to the divergence of Monocots/Eudicots. The four previously described classes served as probes to select and further sequence six BAC clones from a genomic library of cultivar R570. A total of 579,352 sugarcane base pairs were produced from these "Mutator system" BAC containing regions for further characterization. The analyzed genomic regions confirmed that the predicted structure and organization of the Mutator system in sugarcane is composed of two true transposon lineages, each containing a specific terminal inverted repeat and two transposase lineages considered to be domesticated. Each Mutator transposase class displayed a particular molecular structure supporting lineage specific evolution. MUSTANG, previously described domesticated genes, are located in syntenic regions across Sacharineae and, as expected for a host functional gene, posses the same gene structure as in other Poaceae. Two sequenced BACs correspond to hom(eo)logous locus with specific retrotransposon insertions that discriminate sugarcane haplotypes. The comparative studies presented, add information to the Mutator systems previously identified in the maize and rice genomes by describing lineage specific molecular structure and genomic distribution pattern in the sugarcane genome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s12042-012-9104-y) contains supplementary material, which is available to authorized users. PMID:22905278

Manetti, M E; Rossi, M; Cruz, G M Q; Saccaro, N L; Nakabashi, M; Altebarmakian, V; Rodier-Goud, M; Domingues, D; D'Hont, A; Van Sluys, M A

2012-09-01

410

Genome Sequence and Characterization of the Tsukamurella Bacteriophage TPA2? †  

PubMed Central

The formation of stable foam in activated sludge plants is a global problem for which control is difficult. These foams are often stabilized by hydrophobic mycolic acid-synthesizing Actinobacteria, among which are Tsukamurella spp. This paper describes the isolation from activated sludge of the novel double-stranded DNA phage TPA2. This polyvalent Siphoviridae family phage is lytic for most Tsukamurella species. Whole-genome sequencing reveals that the TPA2 genome is circularly permuted (61,440 bp) and that 70% of its sequence is novel. We have identified 78 putative open reading frames, 95 pairs of inverted repeats, and 6 palindromes. The TPA2 genome has a modular gene structure that shares some similarity to those of Mycobacterium phages. A number of the genes display a mosaic architecture, suggesting that the TPA2 genome has evolved at least in part from genetic recombination events. The genome sequence reveals many novel genes that should inform any future discussion on Tsukamurella phage evolution.

Petrovski, Steve; Seviour, Robert J.; Tillett, Daniel

2011-01-01

411

Genome Sequence of the Lager Brewing Yeast, an Interspecies Hybrid  

PubMed Central

This work presents the genome sequencing of the lager brewing yeast (Saccharomyces pastorianus) Weihenstephan 34/70, a strain widely used in lager beer brewing. The 25 Mb genome comprises two nuclear sub-genomes originating from Saccharomyces cerevisiae and Saccharomyces bayanus and one circular mitochondrial genome originating from S. bayanus. Thirty-six different types of chromosomes were found including eight chromosomes with translocations between the two sub-genomes, whose breakpoints are within the orthologous open reading frames. Several gene loci responsible for typical lager brewing yeast characteristics such as maltotriose uptake and sulfite production have been increased in number by chromosomal rearrangements. Despite an overall high degree of conservation of the synteny with S. cerevisiae and S. bayanus, the syntenies were not well conserved in the sub-telomeric regions that contain lager brewing yeast characteristic and specific genes. Deletion of larger chromosomal regions, a massive unilateral decrease of the ribosomal DNA cluster and bilateral truncations of over 60 genes reflect a post-hybridization evolution process. Truncations and deletions of less efficient maltose and maltotriose uptake genes may indicate the result of adaptation to brewing. The genome sequence of this interspecies hybrid yeast provides a new tool for better understanding of lager brewing yeast behavior in industrial beer production.

Nakao, Yoshihiro; Kanamori, Takeshi; Itoh, Takehiko; Kodama, Yukiko; Rainieri, Sandra; Nakamura, Norihisa; Shimonaga, Tomoko; Hattori, Masahira; Ashikari, Toshihiko

2009-01-01

412

Characterizing the citrus cultivar Carrizo genome through 454 shotgun sequencing.  

PubMed

The citrus cultivar Carrizo is the single most important rootstock to the US citrus industry and has resistance or tolerance to a number of major citrus diseases, including citrus tristeza virus, foot rot, and Huanglongbing (HLB, citrus greening). A Carrizo genomic sequence database providing approximately 3.5×genome coverage (haploid genome size approximately 367 Mb) was populated through 454 GS FLX shotgun sequencing. Analysis of the repetitive DNA fraction indicated a total interspersed repeat fraction of 36.5%. Assembly and characterization of abundant citrus Ty3/gypsy elements revealed a novel type of element containing open reading frames encoding a viral RNA-silencing suppressor protein (RNA binding protein, rbp) and a plant cytokinin riboside 5?-monophosphate phosphoribohydrolase-related protein (LONELY GUY, log). Similar gypsy elements were identified in the Populus trichocarpa genome. Gene-coding region analysis indicated that 24.4% of the nonrepetitive reads contained genic regions. The depth of