Note: This page contains sample records for the topic cancer genome sequences from Science.gov.
While these samples are representative of the content of Science.gov,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of Science.gov
to obtain the most current and comprehensive results.
Last update: November 12, 2013.
1

Cancer Genome Sequencing—An Interim Analysis  

Microsoft Academic Search

Abstract With the publishing,of the first complete,whole,genome,of a human cancer and its paired normal, we have passed a key milestone,in the,cancer,genome,sequencing,strategy. The generation of such data will, thanks to technical advances, soon,become,commonplace.,As a significant number,of proof- of-concept studies have been published, it is important to analyze,now,the likely implications,of these data and how,this information,might,frame,cancer,research,in the near future. The diversity of genes

Edward J. Fox; Jesse J. Salk; Lawrence A. Loeb

2

Cancer Genome Sequencing and Its Implications for Personalized Cancer Vaccines  

PubMed Central

New DNA sequencing platforms have revolutionized human genome sequencing. The dramatic advances in genome sequencing technologies predict that the $1,000 genome will become a reality within the next few years. Applied to cancer, the availability of cancer genome sequences permits real-time decision-making with the potential to affect diagnosis, prognosis, and treatment, and has opened the door towards personalized medicine. A promising strategy is the identification of mutated tumor antigens, and the design of personalized cancer vaccines. Supporting this notion are preliminary analyses of the epitope landscape in breast cancer suggesting that individual tumors express significant numbers of novel antigens to the immune system that can be specifically targeted through cancer vaccines.

Li, Lijin; Goedegebuure, Peter; Mardis, Elaine R.; Ellis, Matthew J.C.; Zhang, Xiuli; Herndon, John M.; Fleming, Timothy P.; Carreno, Beatriz M.; Hansen, Ted H.; Gillanders, William E.

2011-01-01

3

Advances in understanding cancer genomes through second-generation sequencing  

Microsoft Academic Search

Cancers are caused by the accumulation of genomic alterations. Therefore, analyses of cancer genome sequences and structures provide insights for understanding cancer biology, diagnosis and therapy. The application of second-generation DNA sequencing technologies (also known as next-generation sequencing) — through whole-genome, whole-exome and whole-transcriptome approaches — is allowing substantial advances in cancer genomics. These methods are facilitating an increase in

Stacey Gabriel; Gad Getz; Matthew Meyerson

2010-01-01

4

Reconstructing cancer genomes from paired-end sequencing data  

PubMed Central

Background A cancer genome is derived from the germline genome through a series of somatic mutations. Somatic structural variants - including duplications, deletions, inversions, translocations, and other rearrangements - result in a cancer genome that is a scrambling of intervals, or "blocks" of the germline genome sequence. We present an efficient algorithm for reconstructing the block organization of a cancer genome from paired-end DNA sequencing data. Results By aligning paired reads from a cancer genome - and a matched germline genome, if available - to the human reference genome, we derive: (i) a partition of the reference genome into intervals; (ii) adjacencies between these intervals in the cancer genome; (iii) an estimated copy number for each interval. We formulate the Copy Number and Adjacency Genome Reconstruction Problem of determining the cancer genome as a sequence of the derived intervals that is consistent with the measured adjacencies and copy numbers. We design an efficient algorithm, called Paired-end Reconstruction of Genome Organization (PREGO), to solve this problem by reducing it to an optimization problem on an interval-adjacency graph constructed from the data. The solution to the optimization problem results in an Eulerian graph, containing an alternating Eulerian tour that corresponds to a cancer genome that is consistent with the sequencing data. We apply our algorithm to five ovarian cancer genomes that were sequenced as part of The Cancer Genome Atlas. We identify numerous rearrangements, or structural variants, in these genomes, analyze reciprocal vs. non-reciprocal rearrangements, and identify rearrangements consistent with known mechanisms of duplication such as tandem duplications and breakage/fusion/bridge (B/F/B) cycles. Conclusions We demonstrate that PREGO efficiently identifies complex and biologically relevant rearrangements in cancer genome sequencing data. An implementation of the PREGO algorithm is available at http://compbio.cs.brown.edu/software/.

2012-01-01

5

Prostate cancer genomics by high-throughput technologies: genome-wide association study and sequencing analysis.  

PubMed

Prostate cancer (PC) is the most common malignancy in males. It is evident that genetic factors at both germline and somatic levels play critical roles in prostate carcinogenesis. Recently, genome-wide association studies (GWAS) by high-throughput genotyping technology have identified more than 70 germline variants of various genes or chromosome loci that are significantly associated with PC susceptibility. They include multiple 8q24 loci, prostate-specific genes, and metabolism-related genes. Somatic alterations in PC genomes have been explored by high-throughput sequencing technologies such as whole-genome sequencing and RNA sequencing, which have identified a variety of androgen-responsive events and fusion transcripts represented by E26 transformation-specific (ETS) gene fusions. Recent innovations in high-throughput genomic technologies have enabled us to analyze PC genomics more comprehensively, more precisely, and on a larger scale in multiple ethnic groups to increase our understanding of PC genomics and biology in germline and somatic studies, which can ultimately lead to personalized medicine for PC diagnosis, prevention, and therapy. However, these data indicate that the PC genome is more complex and heterogeneous than we expected from GWAS and sequencing analyses. PMID:23625613

Nakagawa, Hidewaki

2013-06-24

6

Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer  

Microsoft Academic Search

BACKGROUND: Persistent colonization of the human stomach by Helicobacter pylori is associated with asymptomatic gastric inflammation (gastritis) and an increased risk of duodenal ulceration, gastric ulceration, and non-cardia gastric cancer. In previous studies, the genome sequences of H. pylori strains from patients with gastritis or duodenal ulcer disease have been analyzed. In this study, we analyzed the genome sequences of

Mark S McClain; Carrie L Shaffer; Dawn A Israel; Richard M Peek

2009-01-01

7

U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line  

Microsoft Academic Search

U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy

Michael James Clark; Nils Homer; Brian D. OConnor; Zugen Chen; Ascia Eskin; Hane Lee; Barry Merriman; Stanley F. Nelson

2010-01-01

8

Assessing telomeric DNA content in pediatric cancers using whole-genome sequencing data  

PubMed Central

Background Telomeres are the protective arrays of tandem TTAGGG sequence and associated proteins at the termini of chromosomes. Telomeres shorten at each cell division due to the end-replication problem and are maintained above a critical threshold in malignant cancer cells to prevent cellular senescence or apoptosis. With the recent advances in massive parallel sequencing, assessing telomere content in the context of other cancer genomic aberrations becomes an attractive possibility. We present the first comprehensive analysis of telomeric DNA content change in tumors using whole-genome sequencing data from 235 pediatric cancers. Results To measure telomeric DNA content, we counted telomeric reads containing TTAGGGx4 or CCCTAAx4 and normalized to the average genomic coverage. Changes in telomeric DNA content in tumor genomes were clustered using a Bayesian Information Criterion to determine loss, no change, or gain. Using this approach, we found that the pattern of telomeric DNA alteration varies dramatically across the landscape of pediatric malignancies: telomere gain was found in 32% of solid tumors, 4% of brain tumors and 0% of hematopoietic malignancies. The results were validated by three independent experimental approaches and reveal significant association of telomere gain with the frequency of somatic sequence mutations and structural variations. Conclusions Telomere DNA content measurement using whole-genome sequencing data is a reliable approach that can generate useful insights into the landscape of the cancer genome. Measuring the change in telomeric DNA during malignant progression is likely to be a useful metric when considering telomeres in the context of the whole genome.

2012-01-01

9

Genome interrupted: sequencing of prostate cancer reveals the importance of chromosomal rearrangements  

Microsoft Academic Search

A recent study involving whole genome sequencing of seven prostate cancers has provided the first comprehensive assessment\\u000a of genomic changes that underlie this common malignancy. Point mutations were found to be infrequent but changes in chromosome\\u000a structure were common. Rearrangements were linked to chromatin organization and associated with regions involved in transcription\\u000a factor binding. Novel candidate prostate cancer genes were

Akash Kumar; Jay Shendure; Peter S Nelson

2011-01-01

10

Detection and Mapping of Amplified DNA Sequences in Breast Cancer by Comparative Genomic Hybridization  

Microsoft Academic Search

Comparative genomic hybridization was applied to 5 breast cancer cell lines and 33 primary tumors to discover and map regions of the genome with increased DNA-sequence copy-number. Two-thirds of primary tumors and almost all cell lines showed increased DNA-sequence copy-number affecting a total of 26 chromosomal subregions. Most of these loci were distinct from those of currently known amplified genes

Anne Kallioniemi; Olli-Pekka Kallioniemi; Jim Piper; Minna Tanner; Trond Stokke; Ling Chen; Helene S. Smith; Dan Pinkel; Joe W. Gray; Frederic M. Waldman

1994-01-01

11

Identification of somatic mutations in cancer through Bayesian-based analysis of sequenced genome pairs  

PubMed Central

Background The field of cancer genomics has rapidly adopted next-generation sequencing (NGS) in order to study and characterize malignant tumors with unprecedented resolution. In particular for cancer, one is often trying to identify somatic mutations – changes specific to a tumor and not within an individual’s germline. However, false positive and false negative detections often result from lack of sufficient variant evidence, contamination of the biopsy by stromal tissue, sequencing errors, and the erroneous classification of germline variation as tumor-specific. Results We have developed a generalized Bayesian analysis framework for matched tumor/normal samples with the purpose of identifying tumor-specific alterations such as single nucleotide mutations, small insertions/deletions, and structural variation. We describe our methodology, and discuss its application to other types of paired-tissue analysis such as the detection of loss of heterozygosity as well as allelic imbalance. We also demonstrate the high level of sensitivity and specificity in discovering simulated somatic mutations, for various combinations of a) genomic coverage and b) emulated heterogeneity. Conclusion We present a Java-based implementation of our methods named Seurat, which is made available for free academic use. We have demonstrated and reported on the discovery of different types of somatic change by applying Seurat to an experimentally-derived cancer dataset using our methods; and have discussed considerations and practices regarding the accurate detection of somatic events in cancer genomes. Seurat is available at https://sites.google.com/site/seuratsomatic.

2013-01-01

12

nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing.  

PubMed

Complex genomic rearrangements (CGRs) are emerging as a new feature of cancer genomes. CGRs are characterized by multiple genomic breakpoints and thus have the potential to simultaneously affect multiple genes, fusing some genes and interrupting other genes. Analysis of high-throughput whole-genome shotgun sequencing (WGSS) is beginning to facilitate the discovery and characterization of CGRs, but further development of computational methods is required. We have developed an algorithmic method for identifying CGRs in WGSS data based on shortest alternating paths in breakpoint graphs. Aiming for a method with the highest possible sensitivity, we use breakpoint graphs built from all WGSS data, including sequences with ambiguous genomic origin. Since the majority of cell function is encoded by the transcriptome, we target our search to find CGRs that underlie fusion transcripts predicted from matched high-throughput cDNA sequencing (RNA-seq). We have applied our method, nFuse, to the discovery of CGRs in publicly available data from the well-studied breast cancer cell line HCC1954 and primary prostate tumor sample 963. We first establish the sensitivity and specificity of the nFuse breakpoint prediction and scoring method using breakpoints previously discovered in HCC1954. We then validate five out of six CGRs in HCC1954 and two out of two CGRs in 963. We show examples of gene fusions that would be difficult to discover using methods that do not account for the existence of CGRs, including one important event that was missed in a previous study of the HCC1954 genome. Finally, we illustrate how CGRs may be used to infer the gene expression history of a tumor. PMID:22745232

McPherson, Andrew; Wu, Chunxiao; Wyatt, Alexander W; Shah, Sohrab; Collins, Colin; Sahinalp, S Cenk

2012-06-28

13

Genome and transcriptome sequencing in prospective metastatic triple-negative breast cancer uncovers therapeutic vulnerabilities.  

PubMed

Triple-negative breast cancer (TNBC) is characterized by the absence of expression of estrogen receptor, progesterone receptor, and HER-2. Thirty percent of patients recur after first-line treatment, and metastatic TNBC (mTNBC) has a poor prognosis with median survival of one year. Here, we present initial analyses of whole genome and transcriptome sequencing data from 14 prospective mTNBC. We have cataloged the collection of somatic genomic alterations in these advanced tumors, particularly those that may inform targeted therapies. Genes mutated in multiple tumors included TP53, LRP1B, HERC1, CDH5, RB1, and NF1. Notable genes involved in focal structural events were CTNNA1, PTEN, FBXW7, BRCA2, WT1, FGFR1, KRAS, HRAS, ARAF, BRAF, and PGCP. Homozygous deletion of CTNNA1 was detected in 2 of 6 African Americans. RNA sequencing revealed consistent overexpression of the FOXM1 gene when tumor gene expression was compared with nonmalignant breast samples. Using an outlier analysis of gene expression comparing one cancer with all the others, we detected expression patterns unique to each patient's tumor. Integrative DNA/RNA analysis provided evidence for deregulation of mutated genes, including the monoallelic expression of TP53 mutations. Finally, molecular alterations in several cancers supported targeted therapeutic intervention on clinical trials with known inhibitors, particularly for alterations in the RAS/RAF/MEK/ERK and PI3K/AKT/mTOR pathways. In conclusion, whole genome and transcriptome profiling of mTNBC have provided insights into somatic events occurring in this difficult to treat cancer. These genomic data have guided patients to investigational treatment trials and provide hypotheses for future trials in this irremediable cancer. PMID:23171949

Craig, David W; O'Shaughnessy, Joyce A; Kiefer, Jeffrey A; Aldrich, Jessica; Sinari, Shripad; Moses, Tracy M; Wong, Shukmei; Dinh, Jennifer; Christoforides, Alexis; Blum, Joanne L; Aitelli, Cristi L; Osborne, Cynthia R; Izatt, Tyler; Kurdoglu, Ahmet; Baker, Angela; Koeman, Julie; Barbacioru, Catalin; Sakarya, Onur; De La Vega, Francisco M; Siddiqui, Asim; Hoang, Linh; Billings, Paul R; Salhia, Bodour; Tolcher, Anthony W; Trent, Jeffrey M; Mousses, Spyro; Von Hoff, Daniel; Carpten, John D

2012-11-19

14

Detection of chromosomal alterations in the circulation of cancer patients with whole-genome sequencing.  

PubMed

Clinical management of cancer patients could be improved through the development of noninvasive approaches for the detection of incipient, residual, and recurrent tumors. We describe an approach to directly identify tumor-derived chromosomal alterations through analysis of circulating cell-free DNA from cancer patients. Whole-genome analyses of DNA from the plasma of 10 colorectal and breast cancer patients and 10 healthy individuals with massively parallel sequencing identified, in all patients, structural alterations that were not present in plasma DNA from healthy subjects. Detected alterations comprised chromosomal copy number changes and rearrangements, including amplification of cancer driver genes such as ERBB2 and CDK6. The level of circulating tumor DNA in the cancer patients ranged from 1.4 to 47.9%. The sensitivity and specificity of this approach are dependent on the amount of sequence data obtained and are derived from the fact that most cancers harbor multiple chromosomal alterations, each of which is unlikely to be present in normal cells. Given that chromosomal abnormalities are present in nearly all human cancers, this approach represents a useful method for the noninvasive detection of human tumors that is not dependent on the availability of tumor biopsies. PMID:23197571

Leary, Rebecca J; Sausen, Mark; Kinde, Isaac; Papadopoulos, Nickolas; Carpten, John D; Craig, David; O'Shaughnessy, Joyce; Kinzler, Kenneth W; Parmigiani, Giovanni; Vogelstein, Bert; Diaz, Luis A; Velculescu, Victor E

2012-11-28

15

Integrated genome and transcriptome sequencing identifies a novel form of hybrid and aggressive prostate cancer  

PubMed Central

Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology.

Wu, Chunxiao; Wyatt, Alexander W; Lapuk, Anna V; McPherson, Andrew; McConeghy, Brian J; Bell, Robert H; Anderson, Shawn; Haegert, Anne; Brahmbhatt, Sonal; Shukin, Robert; Mo, Fan; Li, Estelle; Fazli, Ladan; Hurtado-Coll, Antonio; Jones, Edward C; Butterfield, Yaron S; Hach, Faraz; Hormozdiari, Fereydoun; Hajirasouliha, Iman; Boutros, Paul C; Bristow, Robert G; Jones, Steven JM; Hirst, Martin; Marra, Marco A; Maher, Christopher A; Chinnaiyan, Arul M; Sahinalp, S Cenk; Gleave, Martin E; Volik, Stanislav V; Collins, Colin C

2013-01-01

16

Integrated genome and transcriptome sequencing identifies a novel form of hybrid and aggressive prostate cancer.  

PubMed

Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology. PMID:22294438

Wu, Chunxiao; Wyatt, Alexander W; Lapuk, Anna V; McPherson, Andrew; McConeghy, Brian J; Bell, Robert H; Anderson, Shawn; Haegert, Anne; Brahmbhatt, Sonal; Shukin, Robert; Mo, Fan; Li, Estelle; Fazli, Ladan; Hurtado-Coll, Antonio; Jones, Edward C; Butterfield, Yaron S; Hach, Faraz; Hormozdiari, Fereydoun; Hajirasouliha, Iman; Boutros, Paul C; Bristow, Robert G; Jones, Steven Jm; Hirst, Martin; Marra, Marco A; Maher, Christopher A; Chinnaiyan, Arul M; Sahinalp, S Cenk; Gleave, Martin E; Volik, Stanislav V; Collins, Colin C

2012-03-21

17

Draft Genome Sequences of Helicobacter pylori Strains Isolated from Regions of Low and High Gastric Cancer Risk in Colombia.  

PubMed

The draft genome sequences of six Colombian Helicobacter pylori strains are presented. These strains were isolated from patients from regions of high and low gastric cancer risk in Colombia and were characterized by multilocus sequence typing. The data provide insights into differences between H. pylori strains of different phylogeographic origins. PMID:24051318

Sheh, Alexander; Piazuelo, M Blanca; Wilson, Keith T; Correa, Pelayo; Fox, James G

2013-09-19

18

Sequencing technologies and genome sequencing  

Microsoft Academic Search

The high-throughput - next generation sequencing (HT-NGS) technologies are currently the hottest topic in the field of human\\u000a and animals genomics researches, which can produce over 100 times more data compared to the most sophisticated capillary sequencers\\u000a based on the Sanger method. With the ongoing developments of high throughput sequencing machines and advancement of modern\\u000a bioinformatics tools at unprecedented pace,

Chandra Shekhar Pareek; Rafal Smoczynski; Andrzej Tretyn

19

Cancer genome analysis informatics.  

PubMed

The analysis of cancer genomes has benefited from the advances in technology that enable data to be generated on an unprecedented scale, describing a tumour genome's sequence and composition at increasingly high resolution and reducing cost. This progress is likely to increase further over the coming years as next-generation sequencing approaches are applied to the study of cancer genomes, in tandem with large-scale efforts such as the Cancer Genome Atlas and recently announced International Cancer Genome Consortium efforts to complement those already established such as the Sanger Institute Cancer Genome Project. This presents challenges for the cancer researcher and the research community in general, in terms of analysing the data generated in one's own projects and also in coordinating and interrogating data that are publicly available. This review aims to provide a brief overview of some of the main informatics resources currently available and their use, and some of the informatics approaches that may be applied in the study of cancer genomes. PMID:20238077

Barrett, Ian P

2010-01-01

20

Multiplexed Fragaria Chloroplast Genome Sequencing  

Technology Transfer Automated Retrieval System (TEKTRAN)

A method to sequence multiple chloroplast genomes that uses the sequencing depth of ultra high throughput sequencing technologies was recently described. Sequencing complete chloroplast genomes can resolve phylogenetic relationships at low taxonomic levels and identify point mutations and indels tha...

21

Identification of high-confidence somatic mutations in whole genome sequence of formalin-fixed breast cancer specimens  

PubMed Central

The utilization of archived, formalin-fixed paraffin-embedded (FFPE) tumor samples for massive parallel sequencing has been challenging due to DNA damage and contamination with normal stroma. Here, we perform whole genome sequencing of DNA isolated from two triple-negative breast cancer tumors archived for >11 years as 5?µm FFPE sections and matched germline DNA. The tumor samples show differing amounts of FFPE damaged DNA sequencing reads revealed as relatively high alignment mismatch rates enriched for C·G?>?T·A substitutions compared to germline samples. This increase in mismatch rate is observable with as few as one million reads, allowing for an upfront evaluation of the sample integrity before whole genome sequencing. By applying innovative quality filters incorporating global nucleotide mismatch rates and local mismatch rates, we present a method to identify high-confidence somatic mutations even in the presence of FFPE induced DNA damage. This results in a breast cancer mutational profile consistent with previous studies and revealing potentially important functional mutations. Our study demonstrates the feasibility of performing genome-wide deep sequencing analysis of FFPE archived tumors of limited sample size such as residual cancer after treatment or metastatic biopsies.

Yost, Shawn E.; Smith, Erin N.; Schwab, Richard B.; Bao, Lei; Jung, HyunChul; Wang, Xiaoyun; Voest, Emile; Pierce, John P.; Messer, Karen; Parker, Barbara A.; Harismendy, Olivier; Frazer, Kelly A.

2012-01-01

22

Identification of high-confidence somatic mutations in whole genome sequence of formalin-fixed breast cancer specimens.  

PubMed

The utilization of archived, formalin-fixed paraffin-embedded (FFPE) tumor samples for massive parallel sequencing has been challenging due to DNA damage and contamination with normal stroma. Here, we perform whole genome sequencing of DNA isolated from two triple-negative breast cancer tumors archived for >11 years as 5 µm FFPE sections and matched germline DNA. The tumor samples show differing amounts of FFPE damaged DNA sequencing reads revealed as relatively high alignment mismatch rates enriched for C · G > T · A substitutions compared to germline samples. This increase in mismatch rate is observable with as few as one million reads, allowing for an upfront evaluation of the sample integrity before whole genome sequencing. By applying innovative quality filters incorporating global nucleotide mismatch rates and local mismatch rates, we present a method to identify high-confidence somatic mutations even in the presence of FFPE induced DNA damage. This results in a breast cancer mutational profile consistent with previous studies and revealing potentially important functional mutations. Our study demonstrates the feasibility of performing genome-wide deep sequencing analysis of FFPE archived tumors of limited sample size such as residual cancer after treatment or metastatic biopsies. PMID:22492626

Yost, Shawn E; Smith, Erin N; Schwab, Richard B; Bao, Lei; Jung, HyunChul; Wang, Xiaoyun; Voest, Emile; Pierce, John P; Messer, Karen; Parker, Barbara A; Harismendy, Olivier; Frazer, Kelly A

2012-04-06

23

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TTGR and the Malaria Program, NMRC, were to: (Specific Aim 1) sequence 3.5 Mb of P. falciparum genomic DNA; (Specific Aim 2) annotate the sequence; (Specific Aim 3) release the information to the...

M. J. Gardner

2003-01-01

24

Malaria Genome Sequencing Project.  

National Technical Information Service (NTIS)

The objectives of this 5-year Cooperative Agreement between TICR and the Malaria Program, NMPC, were to: Specific Aim 1, sequence 3.5 Mb of P. ralciparum genomic DNA; Specific Aim 2, annotate the sequence; Specific Aim 3, release the information to the sc...

M. J. Gardner

2001-01-01

25

Porcine Genomic Sequencing Initiative  

Microsoft Academic Search

A. Specific biological rationales for the utility of the porcine sequence information Rationale and Objectives. Completion of the human genome sequence provides the starting point for understanding the genetic complexity of humans and how genetic variation contributes to diverse phenotypes and disease. It is clear that model organisms have played an invaluable role in the synthesis of this understanding. It

Gary Rohrer; Jonathan E. Beever; Max F. Rothschild; Lawrence Schook; Richard Gibbs; George Weinstock; W. Gregory

26

A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer  

PubMed Central

Western countries, prostate cancer is the most prevalent cancer of men, and one of the leading causes of cancer-related death in men. Several genome-wide association studies have yielded numerous common variants conferring risk of prostate cancer. In the present study we analyzed 32.5 million variants discovered by whole-genome sequencing 1,795 Icelanders. One variant was found to be associated with prostate cancer in European populations: rs188140481[A] (OR = 2.90, Pcomb = 6.2×10?34) located on 8q24, with an average risk allele control frequency of 0.54%. This variant is only very weakly correlated (r2 ? 0.06) with previously reported risk variants on 8q24, and remains significant after adjustment for all of them. Carriers of rs188140481[A] were diagnosed with prostate cancer 1.26 years younger than non-carriers (P = 0.0059). We also report results for the previously described HOXB13 mutation (rs138213197[T]), confirming it as prostate cancer risk variant in populations from all over Europe.

Gudmundsson, Julius; Sulem, Patrick; Gudbjartsson, Daniel F.; Masson, Gisli; Agnarsson, Bjarni A.; Benediktsdottir, Kristrun R.; Sigurdsson, Asgeir; Magnusson, Olafur Th.; Gudjonsson, Sigurjon A.; Magnusdottir, Droplaug N.; Johannsdottir, Hrefna; Helgadottir, Hafdis Th.; Stacey, Simon N.; Jonasdottir, Adalbjorg; Olafsdottir, Stefania B.; Thorleifsson, Gudmar; Jonasson, Jon G.; Tryggvadottir, Laufey; Navarrete, Sebastian; Fuertes, Fernando; Helfand, Brian T.; Hu, Qiaoyan; Csiki, Irma E.; Mates, Ioan N.; Jinga, Viorel; Aben, Katja K. H.; van Oort, Inge M.; Vermeulen, Sita H.; Donovan, Jenny L.; Hamdy, Freddy C.; Ng, Chi-Fai; Chiu, Peter K.F.; Lau, Kin-Mang; Ng, Maggie C.Y.; Gulcher, Jeffrey R.; Kong, Augustine; Catalona, William J.; Mayordomo, Jose I.; Einarsson, Gudmundur V.; Barkardottir, Rosa B.; Jonsson, Eirikur; Mates, Dana; Neal, David E.; Kiemeney, Lambertus A.; Thorsteinsdottir, Unnur; Rafnar, Thorunn; Stefansson, Kari

2013-01-01

27

A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer.  

PubMed

In Western countries, prostate cancer is the most prevalent cancer of men and one of the leading causes of cancer-related death in men. Several genome-wide association studies have yielded numerous common variants conferring risk of prostate cancer. Here, we analyzed 32.5 million variants discovered by whole-genome sequencing 1,795 Icelanders. We identified a new low-frequency variant at 8q24 associated with prostate cancer in European populations, rs188140481[A] (odds ratio (OR) = 2.90; P(combined) = 6.2 × 10(-34)), with an average risk allele frequency in controls of 0.54%. This variant is only very weakly correlated (r(2) ? 0.06) with previously reported risk variants at 8q24, and its association remains significant after adjustment for all known risk-associated variants. Carriers of rs188140481[A] were diagnosed with prostate cancer 1.26 years younger than non-carriers (P = 0.0059). We also report results for a previously described HOXB13 variant (rs138213197[T]), confirming it as a prostate cancer risk variant in populations from across Europe. PMID:23104005

Gudmundsson, Julius; Sulem, Patrick; Gudbjartsson, Daniel F; Masson, Gisli; Agnarsson, Bjarni A; Benediktsdottir, Kristrun R; Sigurdsson, Asgeir; Magnusson, Olafur Th; Gudjonsson, Sigurjon A; Magnusdottir, Droplaug N; Johannsdottir, Hrefna; Helgadottir, Hafdis Th; Stacey, Simon N; Jonasdottir, Adalbjorg; Olafsdottir, Stefania B; Thorleifsson, Gudmar; Jonasson, Jon G; Tryggvadottir, Laufey; Navarrete, Sebastian; Fuertes, Fernando; Helfand, Brian T; Hu, Qiaoyan; Csiki, Irma E; Mates, Ioan N; Jinga, Viorel; Aben, Katja K H; van Oort, Inge M; Vermeulen, Sita H; Donovan, Jenny L; Hamdy, Freddy C; Ng, Chi-Fai; Chiu, Peter K F; Lau, Kin-Mang; Ng, Maggie C Y; Gulcher, Jeffrey R; Kong, Augustine; Catalona, William J; Mayordomo, Jose I; Einarsson, Gudmundur V; Barkardottir, Rosa B; Jonsson, Eirikur; Mates, Dana; Neal, David E; Kiemeney, Lambertus A; Thorsteinsdottir, Unnur; Rafnar, Thorunn; Stefansson, Kari

2012-10-28

28

A whole-genome massively parallel sequencing analysis of BRCA1 mutant oestrogen receptor-negative and -positive breast cancers.  

PubMed

BRCA1 encodes a tumour suppressor protein that plays pivotal roles in homologous recombination (HR) DNA repair, cell-cycle checkpoints, and transcriptional regulation. BRCA1 germline mutations confer a high risk of early-onset breast and ovarian cancer. In more than 80% of cases, tumours arising in BRCA1 germline mutation carriers are oestrogen receptor (ER)-negative; however, up to 15% are ER-positive. It has been suggested that BRCA1 ER-positive breast cancers constitute sporadic cancers arising in the context of a BRCA1 germline mutation rather than being causally related to BRCA1 loss-of-function. Whole-genome massively parallel sequencing of ER-positive and ER-negative BRCA1 breast cancers, and their respective germline DNAs, was used to characterize the genetic landscape of BRCA1 cancers at base-pair resolution. Only BRCA1 germline mutations, somatic loss of the wild-type allele, and TP53 somatic mutations were recurrently found in the index cases. BRCA1 breast cancers displayed a mutational signature consistent with that caused by lack of HR DNA repair in both ER-positive and ER-negative cases. Sequencing analysis of independent cohorts of hereditary BRCA1 and sporadic non-BRCA1 breast cancers for the presence of recurrent pathogenic mutations and/or homozygous deletions found in the index cases revealed that DAPK3, TMEM135, KIAA1797, PDE4D, and GATA4 are potential additional drivers of breast cancers. This study demonstrates that BRCA1 pathogenic germline mutations coupled with somatic loss of the wild-type allele are not sufficient for hereditary breast cancers to display an ER-negative phenotype, and has led to the identification of three potential novel breast cancer genes (ie DAPK3, TMEM135, and GATA4). PMID:22362584

Natrajan, Rachael; Mackay, Alan; Lambros, Maryou B; Weigelt, Britta; Wilkerson, Paul M; Manie, Elodie; Grigoriadis, Anita; A'hern, Roger; van der Groep, Petra; Kozarewa, Iwanka; Popova, Tatiana; Mariani, Odette; Turajlic, Samra; Furney, Simon J; Marais, Richard; Rodruigues, Daniel-Nava; Flora, Adriana C; Wai, Patty; Pawar, Vidya; McDade, Simon; Carroll, Jason; Stoppa-Lyonnet, Dominique; Green, Andrew R; Ellis, Ian O; Swanton, Charles; van Diest, Paul; Delattre, Olivier; Lord, Christopher J; Foulkes, William D; Vincent-Salomon, Anne; Ashworth, Alan; Henri Stern, Marc; Reis-Filho, Jorge S

2012-02-23

29

Cancer systems biology in the genome sequencing era: part 2, evolutionary dynamics of tumor clonal networks and drug resistance.  

PubMed

A tumor often consists of multiple cell subpopulations (clones). Current chemo-treatments often target one clone of a tumor. Although the drug kills that clone, other clones overtake it and the tumor recurs. Genome sequencing and computational analysis allows to computational dissection of clones from tumors, while singe-cell genome sequencing including RNA-Seq allows profiling of these clones. This opens a new window for treating a tumor as a system in which clones are evolving. Future cancer systems biology studies should consider a tumor as an evolving system with multiple clones. Therefore, topics discussed in Part 2 of this review include evolutionary dynamics of clonal networks, early-warning signals (e.g., genome duplication events) for formation of fast-growing clones, dissecting tumor heterogeneity, and modeling of clone-clone-stroma interactions for drug resistance. The ultimate goal of the future systems biology analysis is to obtain a 'whole-system' understanding of a tumor and therefore provides a more efficient and personalized management strategies for cancer patients. PMID:23792107

Wang, Edwin; Zou, Jinfeng; Zaman, Naif; Beitel, Lenore K; Trifiro, Mark; Paliouras, Miltiadis

2013-06-18

30

The Cancer Genome Atlas completes detailed ovarian cancer analysis:  

Cancer.gov

An analysis of genomic changes in ovarian cancer has provided the most comprehensive and integrated view of cancer genes for any cancer type to date. Ovarian serous adenocarcinoma tumors from 500 patients were examined by The Cancer Genome Atlas (TCGA) Research Network. TCGA researchers completed whole-exome sequencing, which examines the protein-coding regions of the genome, on an unprecedented 316 tumors.

31

Whole-genome sequencing of liver cancers identifies etiological influences on mutation patterns and recurrent mutations in chromatin regulators.  

PubMed

Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related death worldwide. We sequenced and analyzed the whole genomes of 27 HCCs, 25 of which were associated with hepatitis B or C virus infections, including two sets of multicentric tumors. Although no common somatic mutations were identified in the multicentric tumor pairs, their whole-genome substitution patterns were similar, suggesting that these tumors developed from independent mutations, although their shared etiological backgrounds may have strongly influenced their somatic mutation patterns. Statistical and functional analyses yielded a list of recurrently mutated genes. Multiple chromatin regulators, including ARID1A, ARID1B, ARID2, MLL and MLL3, were mutated in ?50% of the tumors. Hepatitis B virus genome integration in the TERT locus was frequently observed in a high clonal proportion. Our whole-genome sequencing analysis of HCCs identified the influence of etiological background on somatic mutation patterns and subsequent carcinogenesis, as well as recurrent mutations in chromatin regulators in HCCs. PMID:22634756

Fujimoto, Akihiro; Totoki, Yasushi; Abe, Tetsuo; Boroevich, Keith A; Hosoda, Fumie; Nguyen, Ha Hai; Aoki, Masayuki; Hosono, Naoya; Kubo, Michiaki; Miya, Fuyuki; Arai, Yasuhito; Takahashi, Hiroyuki; Shirakihara, Takuya; Nagasaki, Masao; Shibuya, Tetsuo; Nakano, Kaoru; Watanabe-Makino, Kumiko; Tanaka, Hiroko; Nakamura, Hiromi; Kusuda, Jun; Ojima, Hidenori; Shimada, Kazuaki; Okusaka, Takuji; Ueno, Masaki; Shigekawa, Yoshinobu; Kawakami, Yoshiiku; Arihiro, Koji; Ohdan, Hideki; Gotoh, Kunihito; Ishikawa, Osamu; Ariizumi, Shun-Ichi; Yamamoto, Masakazu; Yamada, Terumasa; Chayama, Kazuaki; Kosuge, Tomoo; Yamaue, Hiroki; Kamatani, Naoyuki; Miyano, Satoru; Nakagama, Hitoshi; Nakamura, Yusuke; Tsunoda, Tatsuhiko; Shibata, Tatsuhiro; Nakagawa, Hidewaki

2012-05-27

32

The Pediatric Cancer Genome Project  

PubMed Central

The St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project (PCGP) is participating in the international effort to identify somatic mutations that drive cancer. These cancer genome sequencing efforts will not only yield an unparalleled view of the altered signaling pathways in cancer but should also identify new targets against which novel therapeutics can be developed. Although these projects are still deep in the phase of generating primary DNA sequence data, important results are emerging and valuable community resources are being generated that should catalyze future cancer research. We describe here the rationale for conducting the PCGP, present some of the early results of this project and discuss the major lessons learned and how these will affect the application of genomic sequencing in the clinic.

Downing, James R; Wilson, Richard K; Zhang, Jinghui; Mardis, Elaine R; Pui, Ching-Hon; Ding, Li; Ley, Timothy J; Evans, William E

2013-01-01

33

Evolution of the cancer genome  

PubMed Central

The advent of massively parallel sequencing technologies has allowed the characterization of cancer genomes at an unprecedented resolution. Investigation of the mutational landscape of tumours is providing new insights into cancer genome evolution, laying bare the interplay of somatic mutation, adaptation of clones to their environment and natural selection. These studies have demonstrated the extent of the heterogeneity of cancer genomes, have allowed inferences to be made about the forces that act on nascent cancer clones as they evolve and have shown insight into the mutational processes that generate genetic variation. Here we review our emerging understanding of the dynamic evolution of the cancer genome and of the implications for basic cancer biology and the development of antitumour therapy.

Yates, Lucy R.; Campbell, Peter J.

2013-01-01

34

The identification of a novel TP53 cancer susceptibility mutation through whole genome sequencing of a patient with therapy-related AML  

PubMed Central

Context The identification of patients with inherited cancer susceptibility syndromes facilitates early diagnosis, prevention, and treatment. However, in many cases of suspected cancer susceptibility, the family history is unclear and genetic testing of common cancer susceptibility genes is unrevealing. Objective To apply whole-genome sequencing to a patient with suspected cancer susceptibility (and lacking a clear family history of cancer and no BRCA1 and BRCA2 mutations) to identify rare or novel germline variants in cancer susceptibility genes. Design, Setting, and Participant Skin (normal) and bone marrow (leukemia) DNA were obtained from a patient with early-onset breast and ovarian cancer and therapy-related acute myeloid leukemia (t-AML), and analyzed with: 1) whole genome sequencing using paired end reads; 2) SNP genotyping; 3) RNA expression profiling; and 4) spectral karyotyping. Main Outcome Measures Structural variants, copy number alterations, single nucleotide variants and small insertions and deletions (indels) were detected and validated using the above platforms. Results Whole genome sequencing revealed a novel, heterozygous 3 Kb deletion removing exons 7-9 of TP53 in the patient’s normal skin DNA, which was homozygous in the leukemia DNA as a result of uniparental disomy. In addition, a total of 28 validated somatic single nucleotide variations or indels in coding genes, 8 somatic structural variants, and 12 somatic copy number alterations were detected in the patient’s leukemia genome. Conclusions Whole genome sequencing can identify novel, cryptic variants in cancer susceptibility genes in addition to providing unbiased information on the spectrum of mutations in a cancer genome.

Link, Daniel C.; Schuettpelz, Laura G.; Shen, Dong; Wang, Jinling; Walter, Matthew J.; Kulkarni, Shashikant; Payton, Jacqueline E.; Ivanovich, Jennifer; Goodfellow, Paul J.; Le Beau, Michelle; Koboldt, Daniel C.; Dooling, David J.; Fulton, Robert S.; Bender, R. Hugh F.; Fulton, Lucinda L.; Delehaunty, Kimberly D.; Fronick, Catrina C.; Appelbaum, Elizabeth L.; Schmidt, Heather; Abbott, Rachel; O'Laughlin, Michelle; Chen, Ken; McLellan, Michael D.; Varghese, Nobish; Nagarajan, Rakesh; Heath, Sharon; Graubert, Timothy A.; Ding, Li; Ley, Timothy J.; Zambetti, Gerard P.; Wilson, Richard K.; Mardis, Elaine R.

2011-01-01

35

Genome Sequence Databases (Overview): Sequencing and Assembly  

SciTech Connect

From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

Lapidus, Alla L.

2009-01-01

36

Fungal Genome Sequencing and Bioenergy  

SciTech Connect

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

Schadt, Christopher Warren [ORNL; Baker, Scott [Pacific Northwest National Laboratory (PNNL); Thykaer, Jette [Pacific Northwest National Laboratory (PNNL); Adney, William S [National Renewable Energy Laboratory (NREL); Brettin, Tom [Los Alamos National Laboratory (LANL); Brockman, Fred [Pacific Northwest National Laboratory (PNNL); Dhaeseleer, Patrick [Lawrence Livermore National Laboratory (LLNL); Martinez, A diego [Los Alamos National Laboratory (LANL); Miller, R michael [Argonne National Laboratory (ANL); Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Torok, Tamas [U.S. Department of Energy, Joint Genome Institute; Tuskan, Gerald A [ORNL; Bennett, Joan [Rutgers University; Berka, Randy [Novozymes, Inc; Briggs, Steven [University of California, San Diego; Heitman, Joseph [Duke University; Rizvi, L [Royal Ontario Museum; Taylor, John [University of California, Berkeley; Turgeon, Gillian [Cornell University; Werner-Washburne, Maggie [University of New Mexico, Albuquerque; Himmel, Michael [ORNL

2008-01-01

37

Fungal Genome Sequencing and Bioenergy  

SciTech Connect

To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

2008-09-30

38

Genome-wide analysis of aberrant methylation in human breast cancer cells using methyl-DNA immunoprecipitation combined with high-throughput sequencing  

Microsoft Academic Search

BACKGROUND: Cancer cells undergo massive alterations to their DNA methylation patterns that result in aberrant gene expression and malignant phenotypes. However, the mechanisms that underlie methylome changes are not well understood nor is the genomic distribution of DNA methylation changes well characterized. RESULTS: Here, we performed methylated DNA immunoprecipitation combined with high-throughput sequencing (MeDIP-seq) to obtain whole-genome DNA methylation profiles

Yoshinao Ruike; Yukako Imanaka; Fumiaki Sato; Kazuharu Shimizu; Gozoh Tsujimoto

2010-01-01

39

Mutation discovery in regions of segmental cancer genome amplifications with CoNAn-SNV: a mixture model for next generation sequencing of tumors.  

PubMed

Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome-in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)-which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes. PMID:22916110

Crisan, Anamaria; Goya, Rodrigo; Ha, Gavin; Ding, Jiarui; Prentice, Leah M; Oloumi, Arusha; Senz, Janine; Zeng, Thomas; Tse, Kane; Delaney, Allen; Marra, Marco A; Huntsman, David G; Hirst, Martin; Aparicio, Sam; Shah, Sohrab

2012-08-16

40

Mutation Discovery in Regions of Segmental Cancer Genome Amplifications with CoNAn-SNV: A Mixture Model for Next Generation Sequencing of Tumors  

PubMed Central

Next generation sequencing has now enabled a cost-effective enumeration of the full mutational complement of a tumor genome—in particular single nucleotide variants (SNVs). Most current computational and statistical models for analyzing next generation sequencing data, however, do not account for cancer-specific biological properties, including somatic segmental copy number alterations (CNAs)—which require special treatment of the data. Here we present CoNAn-SNV (Copy Number Annotated SNV): a novel algorithm for the inference of single nucleotide variants (SNVs) that overlap copy number alterations. The method is based on modelling the notion that genomic regions of segmental duplication and amplification induce an extended genotype space where a subset of genotypes will exhibit heavily skewed allelic distributions in SNVs (and therefore render them undetectable by methods that assume diploidy). We introduce the concept of modelling allelic counts from sequencing data using a panel of Binomial mixture models where the number of mixtures for a given locus in the genome is informed by a discrete copy number state given as input. We applied CoNAn-SNV to a previously published whole genome shotgun data set obtained from a lobular breast cancer and show that it is able to discover 21 experimentally revalidated somatic non-synonymous mutations in a lobular breast cancer genome that were not detected using copy number insensitive SNV detection algorithms. Importantly, ROC analysis shows that the increased sensitivity of CoNAn-SNV does not result in disproportionate loss of specificity. This was also supported by analysis of a recently published lymphoma genome with a relatively quiescent karyotype, where CoNAn-SNV showed similar results to other callers except in regions of copy number gain where increased sensitivity was conferred. Our results indicate that in genomically unstable tumors, copy number annotation for SNV detection will be critical to fully characterize the mutational landscape of cancer genomes.

Crisan, Anamaria; Goya, Rodrigo; Ha, Gavin; Ding, Jiarui; Prentice, Leah M.; Oloumi, Arusha; Senz, Janine; Zeng, Thomas; Tse, Kane; Delaney, Allen; Marra, Marco A.; Huntsman, David G.; Hirst, Martin; Aparicio, Sam; Shah, Sohrab

2012-01-01

41

Whole Genome Sequencing Program (WGS)  

Center for Food Safety and Applied Nutrition (CFSAN)

... Read FDA's article in The New England Journal of Medicine (March 2011) about how genome sequencing helped resolve a salmonellosis outbreak ... More results from www.fda.gov/food/foodscienceresearch/wholegenomesequencingprogramwgs

42

The Trichomonas vaginalis Genome Sequencing Project  

NSDL National Science Digital Library

The Institute for Genomic Research (TIGR) in 2003 released the first draft assembly of the Trichomonas vaginalis_genome, available through this website to the academic and not-for-profit research community for noncommercial use only. TIGR will release more data at regular intervals during the sequencing project, which should help researchers better understand this widespread parasite and its role in HIV infection, neo-natal disorders, predisposition to cervical cancer, and of course, vaginitis. The website also includes background information on T. vaginalis, as well as a link to TIGR's sequencing project for Entamoeba histolytica -- a closely related organism.

43

Mapping and Genome Sequence Analysis of Chromosome 5 Regions Involved in Bladder Cancer Progression  

Microsoft Academic Search

We studied the evolution of allelic losses on chromosome 5 by whole-organ histologic and genetic mapping in 234 mucosal DNA samples of 5 cystectomy specimens with invasive bladder cancer and preneoplastic changes in adjacent urothelium. The frequency of alterations in individual loci was verified on 32 tumors and 29 voided urine samples from patients with bladder cancer. Finally, deleted regions

Andrzej Kram; Li Li; Ruo Dan Zhang; Dong Sup Yoon; Jay Y Ro; Dennis Johnston; Herbert Barton Grossman; Steven Scherer; Bogdan Czerniak

2001-01-01

44

Whole-genome sequences of DA and F344 rats with different susceptibilities to arthritis, autoimmunity, inflammation and cancer.  

PubMed

DA (D-blood group of Palm and Agouti, also known as Dark Agouti) and F344 (Fischer) are two inbred rat strains with differences in several phenotypes, including susceptibility to autoimmune disease models and inflammatory responses. While these strains have been extensively studied, little information is available about the DA and F344 genomes, as only the Brown Norway (BN) and spontaneously hypertensive rat strains have been sequenced to date. Here we report the sequencing of the DA and F344 genomes using next-generation Illumina paired-end read technology and the first de novo assembly of a rat genome. DA and F344 were sequenced with an average depth of 32-fold, covered 98.9% of the BN reference genome, and included 97.97% of known rat ESTs. New sequences could be assigned to 59 million positions with previously unknown data in the BN reference genome. Differences between DA, F344, and BN included 19 million positions in novel scaffolds, 4.09 million single nucleotide polymorphisms (SNPs) (including 1.37 million new SNPs), 458,224 short insertions and deletions, and 58,174 structural variants. Genetic differences between DA, F344, and BN, including high-impact SNPs and short insertions and deletions affecting >2500 genes, are likely to account for most of the phenotypic variation between these strains. The new DA and F344 genome sequencing data should facilitate gene discovery efforts in rat models of human disease. PMID:23695301

Guo, Xiaosen; Brenner, Max; Zhang, Xuemei; Laragione, Teresina; Tai, Shuaishuai; Li, Yanhong; Bu, Junjie; Yin, Ye; Shah, Anish A; Kwan, Kevin; Li, Yingrui; Jun, Wang; Gulko, Pércio S

2013-05-20

45

Integrating sequence, evolution and functional genomics in regulatory genomics  

PubMed Central

With genome analysis expanding from the study of genes to the study of gene regulation, 'regulatory genomics' utilizes sequence information, evolution and functional genomics measurements to unravel how regulatory information is encoded in the genome.

Vingron, Martin; Brazma, Alvis; Coulson, Richard; van Helden, Jacques; Manke, Thomas; Palin, Kimmo; Sand, Olivier; Ukkonen, Esko

2009-01-01

46

A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome.  

PubMed

By applying a method that combines end-sequence profiling and massively parallel sequencing, we obtained a sequence-level map of chromosomal aberrations in the genome of the MCF-7 breast cancer cell line. A total of 157 distinct somatic breakpoints of two distinct types, dispersed and clustered, were identified. A total of 89 breakpoints are evenly dispersed across the genome. A majority of dispersed breakpoints are in regions of low copy repeats (LCRs), indicating a possible role for LCRs in chromosome breakage. The remaining 68 breakpoints form four distinct clusters of closely spaced breakpoints that coincide with the four highly amplified regions in MCF-7 detected by array CGH located in the 1p13.1-p21.1, 3p14.1-p14.2, 17q22-q24.3, and 20q12-q13.33 chromosomal cytobands. The clustered breakpoints are not significantly associated with LCRs. Sequences flanking most (95%) breakpoint junctions are consistent with double-stranded DNA break repair by nonhomologous end-joining or template switching. A total of 79 known or predicted genes are involved in rearrangement events, including 10 fusions of coding exons from different genes and 77 other rearrangements. Four fusions result in novel expressed chimeric mRNA transcripts. One of the four expressed fusion products (RAD51C-ATXN7) and one gene truncation (BRIP1 or BACH1) involve genes coding for members of protein complexes responsible for homology-driven repair of double-stranded DNA breaks. Another one of the four expressed fusion products (ARFGEF2-SULF2) involves SULF2, a regulator of cell growth and angiogenesis. We show that knock-down of SULF2 in cell lines causes tumorigenic phenotypes, including increased proliferation, enhanced survival, and increased anchorage-independent growth. PMID:19056696

Hampton, Oliver A; Den Hollander, Petra; Miller, Christopher A; Delgado, David A; Li, Jian; Coarfa, Cristian; Harris, Ronald A; Richards, Stephen; Scherer, Steven E; Muzny, Donna M; Gibbs, Richard A; Lee, Adrian V; Milosavljevic, Aleksandar

2008-12-03

47

Second Generation Sequencing of the Mesothelioma Tumor Genome  

Microsoft Academic Search

The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched

Raphael Bueno; Assunta de Rienzo; Lingsheng Dong; Gavin J. Gordon; Colin F. Hercus; William G. Richards; Roderick V. Jensen; Arif Anwar; Gautam Maulik; Lucian R. Chirieac; Kim-Fong Ho; Bruce E. Taillon; Cynthia L. Turcotte; Robert G. Hercus; Steven R. Gullans; David J. Sugarbaker; Anita Brandstaetter

2010-01-01

48

The Fungal Genome Initiative and Lessons Learned from Genome Sequencing  

Microsoft Academic Search

The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The “fungal genome initiative” represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal

Christina A. Cuomo; Bruce W. Birren

2010-01-01

49

Genome-wide significant association between a sequence variant at 15q15.2 and lung cancer risk.  

PubMed

Genome-wide association studies (GWAS) have identified 3 genomic regions, at 15q24-25.1, 5p15.33, and 6p21.33, which associate with the risk of lung cancer. Large meta-analyses of GWA data have failed to find additional associations of genome-wide significance. In this study, we sought to confirm 7 variants with suggestive association to lung cancer (P < 10(-5)) in a recently published meta-analysis. In a GWA dataset of 1,447 lung cancer cases and 36,256 controls in Iceland, 3 correlated variants on 15q15.2 (rs504417, rs11853991, and rs748404) showed a significant association with lung cancer, whereas rs4254535 on 2p14, rs1530057 on 3p24.1, rs6438347 on 3q13.31, and rs1926203 on 10q23.31 did not. The most significant variant, rs748404, was genotyped in an additional 1,299 lung cancer cases and 4,102 controls from the Netherlands, Spain, and the United States and the results combined with published GWAS data. In this analysis, the T allele of rs748404 reached genome-wide significance (OR = 1.15, P = 1.1 × 10(-9)). Another variant at the same locus, rs12050604, showed association with lung cancer (OR = 1.09, 3.6 × 10(-6)) and remained significant after adjustment for rs748404 and vice versa. rs748404 is located 140 kb centromeric of the TP53BP1 gene that has been implicated in lung cancer risk. Two fully correlated, nonsynonymous coding variants in TP53BP1, rs2602141 (Q1136K) and rs560191 (E353D) showed association with lung cancer in our sample set; however, this association did not remain significant after adjustment for rs748404. Our data show that 1 or more lung cancer risk variants of genome-wide significance and distinct from the coding variants in TP53BP1 are located at 15q15.2. PMID:21303977

Rafnar, Thorunn; Sulem, Patrick; Besenbacher, Soren; Gudbjartsson, Daniel F; Zanon, Carlo; Gudmundsson, Julius; Stacey, Simon N; Kostic, Jelena P; Thorgeirsson, Thorgeir E; Thorleifsson, Gudmar; Bjarnason, Hjordis; Skuladottir, Halla; Gudbjartsson, Tomas; Isaksson, Helgi J; Isla, Dolores; Murillo, Laura; García-Prats, Maria D; Panadero, Angeles; Aben, Katja K H; Vermeulen, Sita H; van der Heijden, Henricus F M; Feser, William J; Miller, York E; Bunn, Paul A; Kong, Augustine; Wolf, Holly J; Franklin, Wilbur A; Mayordomo, Jose I; Kiemeney, Lambertus A; Jonsson, Steinn; Thorsteinsdottir, Unnur; Stefansson, Kari

2011-02-08

50

Genome sequencing for healthy individuals.  

PubMed

Genome sequencing of healthy individuals has the potential to lead to improved well-being and disease prevention, but numerous challenges remain that must be addressed to realize these benefits and, importantly, these benefits must be equitable across society. PMID:24035073

Sanderson, Saskia C

2013-09-11

51

Profiling the cancer genome.  

PubMed

Cancer profiling studies have had a profound impact on our understanding of the biology of cancers in a number of ways, including providing insights into the biological heterogeneity of specific cancer types, identification of novel oncogenes and tumor suppressors, and defining pathways that interact to drive the growth of individual cancers. Several large-scale genomic studies are underway that aim to catalog all biologically significant mutational events in each cancer type, and these findings will allow researchers to understand how mutational networks function within individual tumors. The identification of molecular predictive and prognostic tools to facilitate treatment decisions is an important step for individualized patient therapy and, ultimately, in improving patient outcomes. Whereas there are still significant challenges to implementing genomic testing and targeted therapy into routine clinical practice, rapid technological advancements provide hope for overcoming these obstacles. PMID:20590430

Cowin, Prue A; Anglesio, Michael; Etemadmoghadam, Dariush; Bowtell, David D L

2010-09-22

52

Endometrial and acute myeloid leukemia cancer genomes characterized  

Cancer.gov

The characterization of acute myeloid leukemia and endometrial cancer are the latest results of The Cancer Genome Atlas Research Network’s efforts to sequence the genomes of 20 major cancers. The photo above shows technicians from The Genome Institute at Washington University in St. Louis.

53

Genome Sequence of Burkholderia pseudomallei NCTC 13392.  

PubMed

Here, we describe the draft genome sequence of Burkholderia pseudomallei NCTC 13392. This isolate has been distributed as K96243, but distinct genomic differences have been identified. The genomic sequence of this isolate will provide the genomic context for previously conducted functional studies. PMID:23704173

Sahl, Jason W; Stone, Joshua K; Gelhaus, H Carl; Warren, Richard L; Cruttwell, Caroline J; Funnell, Simon G; Keim, Paul; Tuanyok, Apichai

2013-05-23

54

Genome Sequence of Burkholderia pseudomallei NCTC 13392  

PubMed Central

Here, we describe the draft genome sequence of Burkholderia pseudomallei NCTC 13392. This isolate has been distributed as K96243, but distinct genomic differences have been identified. The genomic sequence of this isolate will provide the genomic context for previously conducted functional studies.

Sahl, Jason W.; Stone, Joshua K.; Gelhaus, H. Carl; Warren, Richard L.; Cruttwell, Caroline J.; Funnell, Simon G.; Keim, Paul

2013-01-01

55

Sequence and analysis of the Arabidopsis genome  

Microsoft Academic Search

The comprehensive analysis of the genome sequence of the plant Arabidopsis thaliana has been completed recently. The genome sequence and associated analyses provide the foundations for rapid progress in many fields of plant research, such as the exploitation of genetic variation in Arabidopsis ecotypes, the assessment of the transcriptome and proteome, and the association of genome changes at the sequence

Michael Bevan; Klaus Mayer; Owen White; Jonathan A Eisen; Daphne Preuss; Thomas Bureau; Steven L Salzberg; Hans-Werner Mewes

2001-01-01

56

GENOME SEQUENCING AND ANALYSIS OF ASPERGILLUS ORYZAE  

Technology Transfer Automated Retrieval System (TEKTRAN)

The genome of Aspergillus oryzae, an important industrial fungus used in the production of oriental fermented foods, such as soy sauce, miso, and sake, has been sequenced. The genome sequence reveals a wealth of genes encoding secreted enzymes. A comparison with the genome sequences of A. nidulans...

57

Chapter 27 -- Breast Cancer Genomics, Section VI, Pathology and Biological Markers of Invasive Breast Cancer  

Microsoft Academic Search

Breast cancer is predominantly a disease of the genome with cancers arising and progressing through accumulation of aberrations that alter the genome - by changing DNA sequence, copy number, and structure in ways that that contribute to diverse aspects of cancer pathophysiology. Classic examples of genomic events that contribute to breast cancer pathophysiology include inherited mutations in BRCA1, BRCA2, TP53,

Paul T. Spellman; Laura Heiser; Joe W. Gray

2009-01-01

58

Understanding Cancer Series: Cancer Genomics  

Cancer.gov

Understanding Cancer Genomics These PowerPoint slides are not locked files. You can mix and match slides from different tutorials as you prepare your own lectures. In the Notes section, you will find explanations of the graphics. The art in this tutorial is copyrighted and may not be reused for commercial gain. Please do not remove the NCI logo or the copyright mark from any slide. These tutorials may be copied only if they are distributed free of charge for educational purposes.

59

Complementary DNA Sequencing: Expressed Sequence Tags and Human Genome Project  

Microsoft Academic Search

Automated partial DNA sequencing was conducted on more than 600 randomly selected human brain complementary DNA (cDNA) clones to generate expressed sequence tags (ESTs). ESTs have applications in the discovery of new human genes, mapping of the human genome, and identification of coding regions in genomic sequences. Of the sequences generated, 337 represent new genes, including 48 with significant similarity

Mark D. Adams; Jenny M. Kelley; Jeannine D. Gocayne; Mark Dubnick; Mihael H. Polymeropoulos; Hong Xiao; Carl R. Merril; Andrew Wu; Bjorn Olde; Ruben F. Moreno; Anthony R. Kerlavage; W. Richard McCombie; J. Craig Venter

1991-01-01

60

Genome-wide Approaches for Cancer Gene Discovery  

PubMed Central

One of the central aims of cancer research is to identify and characterize cancer-causing alterations in cancer genomes. In recent years, unprecedented advances in genome-wide sequencing, functional genomics technologies of RNA interference screens and methods to evaluate three-dimensional chromatin organization in vivo have resulted in important discoveries regarding human cancer. The cancer causing genes identified from these new genome-wide technologies have also provided opportunities for effective and personalized cancer therapy. In this review, we describe some of the most recent technologies for cancer gene discovery. We also provide specific examples where these technologies have proven remarkably successful in uncovering important cancer-causing alterations.

Lizardi, Paul M.; Forloni, Matteo; Wajapeyee, Narendra

2011-01-01

61

Functional genomics and cancer drug target discovery.  

PubMed

The recent development of technologies for whole-genome sequencing, copy number analysis and expression profiling enables the generation of comprehensive descriptions of cancer genomes. However, although the structural analysis and expression profiling of tumors and cancer cell lines can allow the identification of candidate molecules that are altered in the malignant state, functional analyses are necessary to confirm such genes as oncogenes or tumor suppressors. Moreover, recent research suggests that tumor cells also depend on synthetic lethal targets, which are not mutated or amplified in cancer genomes; functional genomics screening can facilitate the discovery of such targets. This review provides an overview of the tools available for the study of functional genomics, and discusses recent research involving the use of these tools to identify potential novel drug targets in cancer. PMID:20521217

Moody, Susan E; Boehm, Jesse S; Barbie, David A; Hahn, William C

2010-06-01

62

Genome sequencing and functional genomics approaches in tomato  

Microsoft Academic Search

Tomato genome sequencing has been taking place through an international, 10-year initiative entitled the “International Solanaceae Genome Project” (SOL). The strategy proposed by the SOL consortium is to sequence the approximately 220?Mb of euchromatin that contains the majority of genes, rather than the entire tomato genome. Tomato and other Solanaceae plants have unique developmental aspects, such as the formation of

Daisuke Shibata

2005-01-01

63

The UCSC Cancer Genomics Browser: update 2013  

PubMed Central

The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a set of web-based tools to display, investigate and analyse cancer genomics data and its associated clinical information. The browser provides whole-genome to base-pair level views of several different types of genomics data, including some next-generation sequencing platforms. The ability to view multiple datasets together allows users to make comparisons across different data and cancer types. Biological pathways, collections of genes, genomic or clinical information can be used to sort, aggregate and zoom into a group of samples. We currently display an expanding set of data from various sources, including 201 datasets from 22 TCGA (The Cancer Genome Atlas) cancers as well as data from Cancer Cell Line Encyclopedia and Stand Up To Cancer. New features include a completely redesigned user interface with an interactive tutorial and updated documentation. We have also added data downloads, additional clinical heatmap features, and an updated Tumor Image Browser based on Google Maps. New security features allow authenticated users access to private datasets hosted by several different consortia through the public website.

Goldman, Mary; Craft, Brian; Swatloski, Teresa; Ellrott, Kyle; Cline, Melissa; Diekhans, Mark; Ma, Singer; Wilks, Chris; Stuart, Josh; Haussler, David; Zhu, Jingchun

2013-01-01

64

The UCSC Cancer Genomics Browser: update 2013.  

PubMed

The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a set of web-based tools to display, investigate and analyse cancer genomics data and its associated clinical information. The browser provides whole-genome to base-pair level views of several different types of genomics data, including some next-generation sequencing platforms. The ability to view multiple datasets together allows users to make comparisons across different data and cancer types. Biological pathways, collections of genes, genomic or clinical information can be used to sort, aggregate and zoom into a group of samples. We currently display an expanding set of data from various sources, including 201 datasets from 22 TCGA (The Cancer Genome Atlas) cancers as well as data from Cancer Cell Line Encyclopedia and Stand Up To Cancer. New features include a completely redesigned user interface with an interactive tutorial and updated documentation. We have also added data downloads, additional clinical heatmap features, and an updated Tumor Image Browser based on Google Maps. New security features allow authenticated users access to private datasets hosted by several different consortia through the public website. PMID:23109555

Goldman, Mary; Craft, Brian; Swatloski, Teresa; Ellrott, Kyle; Cline, Melissa; Diekhans, Mark; Ma, Singer; Wilks, Chris; Stuart, Josh; Haussler, David; Zhu, Jingchun

2012-10-29

65

Sequencing Intractable DNA to Close Microbial Genomes  

SciTech Connect

Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

Hurt, Jr., Richard Ashley [ORNL; Brown, Steven D [ORNL; Podar, Mircea [ORNL; Palumbo, Anthony Vito [ORNL; Elias, Dwayne A [ORNL

2012-01-01

66

SP8 Sequencing Extinct Genomes  

PubMed Central

Nucleic acids, which hold clues to the evolution of various animal and hominid taxa, are comparatively weak molecules from other cellular debris, and thus evolutionary biologists are in essence time trapped. Fortunately, DNA and protein fragments do exist in fossil remains beyond what theoretical experimentation would suggest. Sequestering of DNA molecules in humic or Maillard-like complexes likely represents a rich source of DNA molecules from the past, which have yet to be tapped. These molecules were impossible to acquire due to the selective nature of the polymerase chain reaction. Recently, however, rapid parallel pyrosequencing techniques, such as those used in metagenomics-based research, which, in theory, allow for the identification of all short nucleotide sequences in a sample in a non-selective approach, have the potential to allow the identification of all nucleic acids in a sample, and thus represent the way forward for ancient DNA. In theory, this new technology will allow the completion of genomes of extinct animals, plants, and microbes. I will discuss the benefits and pitfalls of this metagenomics approach to ancient DNA, highlighting our recent efforts underway to sequence the wooly mammoth genome as well as other fossil remains.

Poinar, H.

2007-01-01

67

Next-generation sequencing for lung cancer.  

PubMed

Lung cancer is biologically aggressive and is the leading cause of cancer-related deaths. The development of lung cancer is unique in each patient according to clinical characterizations, prognosis, response and tolerance to treatment. Traditional capillary-based single-gene sequencing by a first-generation technique (known as Sanger sequencing) has been replaced by next-generation sequencing (NGS) since it allows massive parallel sequencing with lower cost and higher throughput. The NGS approach has made remarkable advances compared with traditional methods. We expect these methodologies to comprehensively interpret the global landscape of cancer and provide more information to fulfill the needs of personalized medicine. This review covers a brief introduction and summary on various NGS technologies, applications and important findings by NGS in lung cancer advances, including further discoveries in previously known target genes (EGFR, ALK and KRAS), the identification of additional lung cancer mutations and the global coordination of cancer genome studies. PMID:23980680

Wu, Kehua; Huang, R Stephanie; House, Larry; Cho, William Chi

2013-09-01

68

NIH Launches Comprehensive Effort to Explore Cancer Genomics  

Cancer.gov

The National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI), both part of the National Institutes of Health (NIH), today launched a comprehensive effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, especially large-scale genome sequencing.

69

An Intelligent System for Searching Genomic Sequences  

Microsoft Academic Search

In this paper, we have developed an intelligent system for searching comparative genomic sequences which departs from the traditional sequence alignment methods of nucleic residues or alphabets. Instead, we use the composition vector method that exploits pattern structures in sequences and indexing techniques for building a genomic database of prokaryotic organisms and their phylogenetic relationships. For the structural analysis of

Vandana Gummuluru; Su-shing Chen

2007-01-01

70

Genome-wide analysis and modeling of DNA methylation susceptibility in 30 breast cancer cell lines by using CpG flanking sequences.  

PubMed

DNA methylation is an epigenetic modification of DNA that adds a methyl group to cytosine. Aberrant DNA methylation in the CpG context is frequently observed in cancer cells and it is known that aberrant DNA methylation silences tumor repressor genes. However, the mechanism of DNA methylation is not well understood. A widely accepted hypothesis is that DNA methylation does not randomly occur and may be controlled by some instructive mechanisms. In this paper, we conducted an extensive study on this important question by using proprietary sequencing data from methyl-binding domain protein (MBD)-Cap ChIP sequencing experiments for 30 breast cancer cell lines. The goal of our study is to investigate difference in nucleotide composition around CpG sites, where high levels of methylation are observed, and use the information for modeling DNA methylation susceptibility. First, we observed that DNA methylation is not uniform in the whole-genome region and also that the character composition of CpG flanking sequences are significantly different between hyper- and hypo-methylated groups. In an in-depth study, we used information theoretic approaches such as entropy and relative entropy to delineate character composition features and found enrichment of A (Adenine) and T (Thymine) in specific positions around hyper-methylated sites. As the methylation level is increased, A, T proportions in specific positions around hypermethylated sites are increased while A, T proportions in other positions around hypermethylated sites are decreased. Second, we built predictive models for methylation susceptibility by using characters flanking CpG sites as features and hyper-/hypo-methylation status as class. Third, we constructed predictive models using a log odds score of two profiles from DNA sequences surrounding CpG sites of hyper- and hypo-methylated groups. This analysis showed that distribution of profile scores of hyper-/hypo-methylated sites sequences is quite distinct. Our genome-wide CpG methylation study shows that nucleotides around CpG sites caries information for cytosine methylation. This is consistent with the seminal work on the instructive evidence of DNA methylation by Keshet et al. (Nature Genetics, 38(2), 149-153 2006). Our study is on the full genome scale and used the sequencing data, thus our study is significantly different in terms of resolution of data and analysis methods used for the study by Keshet et al. PMID:23796180

An, Jaehyun; Kim, Kwangsoo; Rhee, Sung-Min; Chae, Heejoon; Nephew, Kenneth P; Kim, Sun

2013-06-01

71

NIH Launches Comprehensive Effort to Explore Cancer Genomics: The Cancer Genome Atlas Begins With Three-Year, $100 Million Pilot  

Cancer.gov

The National Cancer Institute and the National Human Genome Research Institute today launched a comprehensive effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, especially large-scale genome sequencing. Questions and Answers

72

Accurate and comprehensive sequencing of personal genomes.  

PubMed

As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ?30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported. PMID:21771779

Ajay, Subramanian S; Parker, Stephen C J; Abaan, Hatice Ozel; Fajardo, Karin V Fuentes; Margulies, Elliott H

2011-07-19

73

Evolution of the cancer genome  

PubMed Central

Human tumors result from an evolutionary process operating on somatic cells within tissues, whereby natural selection operates on the phenotypic variability generated by the accumulation of genetic, genomic and epigenetic alterations. This somatic evolution leads to adaptations such as increased proliferative, angiogenic, and invasive phenotypes. In this review we outline how cancer genomes are beginning to be investigated from an evolutionary perspective. We describe recent progress in the cataloging of somatic genetic and genomic alterations, and investigate the contributions of germline as well as epigenetic factors to cancer genome evolution. Finally, we outline the challenges facing researchers who investigate the processes driving the evolution of the cancer genome.

Podlaha, Ondrej; Riester, Markus; De, Subhajyoti; Michor, Franziska

2013-01-01

74

The Cancer Genome Atlas : Genomic Characteristics of Ovarian Cancer  

Cancer.gov

The Cancer Genome Atlas : Genomic Characteristics of Ovarian Cancer You will need Adobe Flash Player 8 or later and JavaScript enabled to view this video. You can view the movie here Play Pause Volume Up Volume Down Mute Unmute Fast Forward Rewind

75

A comparative study of detection of p53 mutations in human breast cancer by flow cytometry, single-strand conformation polymorphism and genomic sequencing.  

PubMed Central

The accuracy of immunodetection by dual parameter flow cytometry (FCM), polymerase chain reaction-mediated single strand conformation polymorphism (PCR-SSCP) and genomic sequencing to detect p53 mutations were compared. Analysis by the last two techniques was restricted to exons 5-8. Initially, 110 breast tumours were screened for p53 expression by FCM. Seventy (64%) of tumours were immunopositive. Fifteen highly immunopositive and 15 completely immunonegative tumours were selected for further analysis by PCR-SSCP and genomic sequencing. Eleven out of 15 immunopositive tumours were found to have mutation by PCR-SSCP. Genomic sequencing confirmed the presence of mutation in 10 of these 11 immunopositive tumours. Therefore, four immunopositive tumours failed to show mutation by SSCP and five by genomic sequencing. Of the 15 immunonegative tumours, one showed mutation by both PCR-SSCP and genomic sequencing and one tumour has undergone deletion of the p53 gene. Overall, immunoreactivity correlated with both PCR-SSCP and genomic sequencing in 80% of cases (24/30), and there was 96.5% (28/29) concordance between PCR-SSCP and genomic sequencing. We conclude that there is good concordance between mutations detected by PCR-SSCP and genomic sequencing, but immunochemical detection of p53 overexpression is not an absolute indicator of p53 gene mutation. Images Figure 1 Figure 2 Figure 3 Figure 4 Figure 5

Chakravarty, G.; Redkar, A.; Mittra, I.

1996-01-01

76

A comparative study of detection of p53 mutations in human breast cancer by flow cytometry, single-strand conformation polymorphism and genomic sequencing.  

PubMed

The accuracy of immunodetection by dual parameter flow cytometry (FCM), polymerase chain reaction-mediated single strand conformation polymorphism (PCR-SSCP) and genomic sequencing to detect p53 mutations were compared. Analysis by the last two techniques was restricted to exons 5-8. Initially, 110 breast tumours were screened for p53 expression by FCM. Seventy (64%) of tumours were immunopositive. Fifteen highly immunopositive and 15 completely immunonegative tumours were selected for further analysis by PCR-SSCP and genomic sequencing. Eleven out of 15 immunopositive tumours were found to have mutation by PCR-SSCP. Genomic sequencing confirmed the presence of mutation in 10 of these 11 immunopositive tumours. Therefore, four immunopositive tumours failed to show mutation by SSCP and five by genomic sequencing. Of the 15 immunonegative tumours, one showed mutation by both PCR-SSCP and genomic sequencing and one tumour has undergone deletion of the p53 gene. Overall, immunoreactivity correlated with both PCR-SSCP and genomic sequencing in 80% of cases (24/30), and there was 96.5% (28/29) concordance between PCR-SSCP and genomic sequencing. We conclude that there is good concordance between mutations detected by PCR-SSCP and genomic sequencing, but immunochemical detection of p53 overexpression is not an absolute indicator of p53 gene mutation. PMID:8883402

Chakravarty, G; Redkar, A; Mittra, I

1996-10-01

77

The Characterization of Twenty Sequenced Human Genomes  

Microsoft Academic Search

We present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten “case” genomes from individuals with severe hemophilia A and ten “control” genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof

Kimberly Pelak; Kevin V. Shianna; Dongliang Ge; Jessica M. Maia; Mingfu Zhu; Jason P. Smith; Elizabeth T. Cirulli; Jacques Fellay; Samuel P. Dickson; Curtis E. Gumbs; Erin L. Heinzen; Anna C. Need; Elizabeth K. Ruzzo; Abanish Singh; C. Ryan Campbell; Linda K. Hong; Katharina A. Lornsen; Alexander M. McKenzie; Nara L. M. Sobreira; Julie E. Hoover-Fong; Joshua D. Milner; Ruth Ottman; Barton F. Haynes; James J. Goedert; David B. Goldstein

2010-01-01

78

Complete Genome Sequence of Mycobacterium massiliense  

PubMed Central

Mycobacterium massiliense is a rapidly growing bacterium associated with opportunistic infections. The genome of a representative isolate (strain GO 06) recovered from wound samples from patients who underwent arthroscopic or laparoscopic surgery was sequenced. To the best of our knowledge, this is the first announcement of the complete genome sequence of an M. massiliense strain.

Raiol, Taina; Ribeiro, Guilherme Menegoi; Maranhao, Andrea Queiroz; Bocca, Anamelia Lorenzetti; Silva-Pereira, Ildinete; Junqueira-Kipnis, Ana Paula; Brigido, Marcelo de Macedo

2012-01-01

79

The sequence of the human genome.  

PubMed

A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge. PMID:11181995

Venter, J C; Adams, M D; Myers, E W; Li, P W; Mural, R J; Sutton, G G; Smith, H O; Yandell, M; Evans, C A; Holt, R A; Gocayne, J D; Amanatides, P; Ballew, R M; Huson, D H; Wortman, J R; Zhang, Q; Kodira, C D; Zheng, X H; Chen, L; Skupski, M; Subramanian, G; Thomas, P D; Zhang, J; Gabor Miklos, G L; Nelson, C; Broder, S; Clark, A G; Nadeau, J; McKusick, V A; Zinder, N; Levine, A J; Roberts, R J; Simon, M; Slayman, C; Hunkapiller, M; Bolanos, R; Delcher, A; Dew, I; Fasulo, D; Flanigan, M; Florea, L; Halpern, A; Hannenhalli, S; Kravitz, S; Levy, S; Mobarry, C; Reinert, K; Remington, K; Abu-Threideh, J; Beasley, E; Biddick, K; Bonazzi, V; Brandon, R; Cargill, M; Chandramouliswaran, I; Charlab, R; Chaturvedi, K; Deng, Z; Di Francesco, V; Dunn, P; Eilbeck, K; Evangelista, C; Gabrielian, A E; Gan, W; Ge, W; Gong, F; Gu, Z; Guan, P; Heiman, T J; Higgins, M E; Ji, R R; Ke, Z; Ketchum, K A; Lai, Z; Lei, Y; Li, Z; Li, J; Liang, Y; Lin, X; Lu, F; Merkulov, G V; Milshina, N; Moore, H M; Naik, A K; Narayan, V A; Neelam, B; Nusskern, D; Rusch, D B; Salzberg, S; Shao, W; Shue, B; Sun, J; Wang, Z; Wang, A; Wang, X; Wang, J; Wei, M; Wides, R; Xiao, C; Yan, C; Yao, A; Ye, J; Zhan, M; Zhang, W; Zhang, H; Zhao, Q; Zheng, L; Zhong, F; Zhong, W; Zhu, S; Zhao, S; Gilbert, D; Baumhueter, S; Spier, G; Carter, C; Cravchik, A; Woodage, T; Ali, F; An, H; Awe, A; Baldwin, D; Baden, H; Barnstead, M; Barrow, I; Beeson, K; Busam, D; Carver, A; Center, A; Cheng, M L; Curry, L; Danaher, S; Davenport, L; Desilets, R; Dietz, S; Dodson, K; Doup, L; Ferriera, S; Garg, N; Gluecksmann, A; Hart, B; Haynes, J; Haynes, C; Heiner, C; Hladun, S; Hostin, D; Houck, J; Howland, T; Ibegwam, C; Johnson, J; Kalush, F; Kline, L; Koduru, S; Love, A; Mann, F; May, D; McCawley, S; McIntosh, T; McMullen, I; Moy, M; Moy, L; Murphy, B; Nelson, K; Pfannkoch, C; Pratts, E; Puri, V; Qureshi, H; Reardon, M; Rodriguez, R; Rogers, Y H; Romblad, D; Ruhfel, B; Scott, R; Sitter, C; Smallwood, M; Stewart, E; Strong, R; Suh, E; Thomas, R; Tint, N N; Tse, S; Vech, C; Wang, G; Wetter, J; Williams, S; Williams, M; Windsor, S; Winn-Deen, E; Wolfe, K; Zaveri, J; Zaveri, K; Abril, J F; Guigó, R; Campbell, M J; Sjolander, K V; Karlak, B; Kejariwal, A; Mi, H; Lazareva, B; Hatton, T; Narechania, A; Diemer, K; Muruganujan, A; Guo, N; Sato, S; Bafna, V; Istrail, S; Lippert, R; Schwartz, R; Walenz, B; Yooseph, S; Allen, D; Basu, A; Baxendale, J; Blick, L; Caminha, M; Carnes-Stine, J; Caulk, P; Chiang, Y H; Coyne, M; Dahlke, C; Mays, A; Dombroski, M; Donnelly, M; Ely, D; Esparham, S; Fosler, C; Gire, H; Glanowski, S; Glasser, K; Glodek, A; Gorokhov, M; Graham, K; Gropman, B; Harris, M; Heil, J; Henderson, S; Hoover, J; Jennings, D; Jordan, C; Jordan, J; Kasha, J; Kagan, L; Kraft, C; Levitsky, A; Lewis, M; Liu, X; Lopez, J; Ma, D; Majoros, W; McDaniel, J; Murphy, S; Newman, M; Nguyen, T; Nguyen, N; Nodell, M; Pan, S; Peck, J; Peterson, M; Rowe, W; Sanders, R; Scott, J; Simpson, M; Smith, T; Sprague, A; Stockwell, T; Turner, R; Venter, E; Wang, M; Wen, M; Wu, D; Wu, M; Xia, A; Zandieh, A; Zhu, X

2001-02-16

80

Human genome sequencing in health and disease.  

PubMed

Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges. PMID:22248320

Gonzaga-Jauregui, Claudia; Lupski, James R; Gibbs, Richard A

2012-01-01

81

Human Genome Sequencing in Health and Disease  

PubMed Central

Following the “finished,” euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

Gonzaga-Jauregui, Claudia; Lupski, James R.; Gibbs, Richard A.

2013-01-01

82

Genomic approaches to research in lung cancer  

Microsoft Academic Search

The medical research community is experiencing a marked increase in the amount of information available on genomic sequences and genes expressed by humans and other organisms. This information offers great opportunities for improving our understanding of complex diseases such as lung cancer. In particular, we should expect to witness a rapid increase in the rate of discovery of genes involved

Edward Gabrielson

2000-01-01

83

Genome Sequencing and Analysis Conference IV  

SciTech Connect

J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

Not Available

1993-12-31

84

Human genetics and genomics a decade after the release of the draft sequence of the human genome  

PubMed Central

Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

2011-01-01

85

Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes  

PubMed Central

Background Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. Methodology/Principal Findings For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. Conclusions/Significance Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further.

Barthelson, Roger; McFarlin, Adam J.; Rounsley, Steven D.; Young, Sarah

2011-01-01

86

Next-generation sequencing: applications beyond genomes  

Microsoft Academic Search

The development of DNA sequencing more than 30 years ago has profoundly impacted biological research. In the last couple of years, remarkable technological innovations have emerged that allow the direct and cost-effective sequencing of complex samples at unprecedented scale and speed. These next-generation technologies make it feasible to sequence not only static genomes, but also entire transcriptomes expressed under different

Samuel Marguerat; Jürg Bähler

2008-01-01

87

Progress in Arabidopsis genome sequencing and functional genomics.  

PubMed

Arabidopsis thaliana has a relatively small genome of approximately 130 Mb containing about 10% repetitive DNA. Genome sequencing studies reveal a gene-rich genome, predicted to contain approximately 25000 genes spaced on average every 4.5 kb. Between 10 to 20% of the predicted genes occur as clusters of related genes, indicating that local sequence duplication and subsequent divergence generates a significant proportion of gene families. In addition to gene families, repetitive sequences comprise individual and small clusters of two to three retroelements and other classes of smaller repeats. The clustering of highly repetitive elements is a striking feature of the A. thaliana genome emerging from sequence and other analyses. PMID:10751689

Wambutt, R; Murphy, G; Volckaert, G; Pohl, T; Düsterhöft, A; Stiekema, W; Entian, K D; Terryn, N; Harris, B; Ansroge, W; Brandt, P; Grivell, L; Rieger, M; Weichselgartner, M; de Simone, V; Obermaier, B; Mache, R; Müller, M; Kreis, M; Delseny, M; Puigdomenech, P; Watson, M; Schmidtheini, T; Reichert, B; Portatelle, D; Perez-Alonso, M; Bountry, M; Bancroft, I; Vos, P; Hoheisel, J; Zimmermann, W; Wedler, H; Ridley, P; Langham, S A; McCullagh, B; Bilham, L; Robben, J; Van der Schueren, J; Grymonprez, B; Chuang, Y J; Vandenbussche, F; Braeken, M; Weltjens, I; Voet, M; Bastiens, I; Aert, R; Defoor, E; Weitzenegger, T; Bothe, G; Rose, M

2000-03-31

88

Coupled amplification and sequencing of genomic DNA  

SciTech Connect

Addition of dideoxyribonucleotides during the exponential phase of the PCR should result in the synthesis of two complementary sequence ladders. The authors have explored this hypothesis to develop coupled amplification and sequencing of genomic DNA. Coupled amplification and sequencing is a biphasic method for sequencing both strands of template as they are amplified. Stage I selects and amplifies a single target form the genomic DNA sample. Stage II accomplishes the sequencing as well as additional amplification of the target using aliquots from the stage I reaction mixed with end-labeled primer and dideoxynucleotiodes. They have successfully applied coupled amplification and sequencing to a 300-base-pair fragment 4 kilobases upstream from HOX2B directly from human whole genomic DNA.

Ruano, G.; Kidd, K.K. (Yale Univ. School of Medicine, New Haven, CT (United States))

1991-04-01

89

Genome Sequence of Lactobacillus crispatus ST1?  

PubMed Central

Lactobacillus crispatus is a common member of the beneficial microbiota present in the vertebrate gastrointestinal and human genitourinary tracts. Here, we report the genome sequence of L. crispatus ST1, a chicken isolate displaying strong adherence to vaginal epithelial cells.

Ojala, Teija; Kuparinen, Veera; Koskinen, J. Patrik; Alatalo, Edward; Holm, Liisa; Auvinen, Petri; Edelman, Sanna; Westerlund-Wikstrom, Benita; Korhonen, Timo K.; Paulin, Lars; Kankainen, Matti

2010-01-01

90

Virtually sequenced: The next genomic generation  

SciTech Connect

The announcement of {open_quotes}virtual genomics{close_quotes} requires evaluation of the efficiency and accuracy of computer-generated sequencing efforts. {open_quotes}Digital Northerns{close_quotes}, or Northern blot electrophoresis done in the realm of computer data, have been developed by Incyte Pharmaceuticals (Palo Alto, CA) and Human Genome Sciences (Rockville, MD). 12 refs., 2 figs.

Bains, W. [PA Consulting Group, Melbourn (United Kingdom)

1996-06-01

91

Finding approximate tandem repeats in genomic sequences  

Microsoft Academic Search

An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and examined and its effectiveness on genomic data is demonstrated.

Ydo Wexler; Zohar Yakhini; Yechezkel Kashi; Dan Geiger

2004-01-01

92

Genomics of Lung Cancer  

Microsoft Academic Search

opment Lung cancer is the leading cause of cancer death in both men and women in the United States, despite its incidence being less than that of prostate cancer in men and breast cancer in women. With 166,000 deaths expected in 2008, the sum total of lung cancer deaths exceeds those of prostate, breast, and colon cancer combined (1). Prostate,

Alain C. Borczuk; Rebecca L. Toonkel; Charles A. Powell

2009-01-01

93

Genomic sequencing of Pleistocene cave bears.  

PubMed

Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of approximately 1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones contained cave bear inserts, yielding 26,861 base pairs of cave bear genome sequence. Comparison of cave bear and modern bear sequences revealed the evolutionary relationship of these lineages. The metagenomic approach used here establishes the feasibility of ancient DNA genome sequencing programs. PMID:15933159

Noonan, James P; Hofreiter, Michael; Smith, Doug; Priest, James R; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J Chris; Pääbo, Svante; Rubin, Edward M

2005-06-02

94

Genome evolution during progression to breast cancer.  

PubMed

Cancer evolution involves cycles of genomic damage, epigenetic deregulation, and increased cellular proliferation that eventually culminate in the carcinoma phenotype. Early neoplasias, which are often found concurrently with carcinomas and are histologically distinguishable from normal breast tissue, are less advanced in phenotype than carcinomas and are thought to represent precursor stages. To elucidate their role in cancer evolution we performed comparative whole-genome sequencing of early neoplasias, matched normal tissue, and carcinomas from six patients, for a total of 31 samples. By using somatic mutations as lineage markers we built trees that relate the tissue samples within each patient. On the basis of these lineage trees we inferred the order, timing, and rates of genomic events. In four out of six cases, an early neoplasia and the carcinoma share a mutated common ancestor with recurring aneuploidies, and in all six cases evolution accelerated in the carcinoma lineage. Transition spectra of somatic mutations are stable and consistent across cases, suggesting that accumulation of somatic mutations is a result of increased ancestral cell division rather than specific mutational mechanisms. In contrast to highly advanced tumors that are the focus of much of the current cancer genome sequencing, neither the early neoplasia genomes nor the carcinomas are enriched with potentially functional somatic point mutations. Aneuploidies that occur in common ancestors of neoplastic and tumor cells are the earliest events that affect a large number of genes and may predispose breast tissue to eventual development of invasive carcinoma. PMID:23568837

Newburger, Daniel E; Kashef-Haghighi, Dorna; Weng, Ziming; Salari, Raheleh; Sweeney, Robert T; Brunner, Alayne L; Zhu, Shirley X; Guo, Xiangqian; Varma, Sushama; Troxell, Megan L; West, Robert B; Batzoglou, Serafim; Sidow, Arend

2013-04-08

95

Intraspecies sequence comparisons for annotating genomes.  

PubMed

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intraspecies sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents, and a set of genomic intervals were amplified, resequenced, and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C. intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom. It also raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. PMID:15545499

Boffelli, Dario; Weer, Claire V; Weng, Li; Lewis, Keith D; Shoukry, Malak I; Pachter, Lior; Keys, David N; Rubin, Edward M

2004-11-15

96

Complete genome sequence of arracacha mottle virus.  

PubMed

Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses. PMID:23001696

Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

2012-09-22

97

Sorghum Genome Sequencing by Methylation Filtration  

Microsoft Academic Search

Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged,

Joseph A Bedell; Muhammad A Budiman; Andrew Nunberg; Robert W Citek; Dan Robbins; Joshua Jones; Elizabeth Flick; Theresa Rohlfing; Jason Fries; Kourtney Bradford; Jennifer McMenamy; Michael Smith; Heather Holeman; Bruce A Roe; Graham Wiley; Ian F Korf; Pablo D Rabinowicz; Nathan Lakey; W. Richard McCombie; Jeffrey A Jeddeloh; Robert A Martienssen

2005-01-01

98

Computational Genomics: From Genome Sequence To Global Gene Regulation  

NASA Astrophysics Data System (ADS)

As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

Li, Hao

2000-03-01

99

Pervasive sequence patents cover the entire human genome.  

PubMed

The scope and eligibility of patents for genetic sequences have been debated for decades, but a critical case regarding gene patents (Association of Molecular Pathologists v. Myriad Genetics) is now reaching the US Supreme Court. Recent court rulings have supported the assertion that such patents can provide intellectual property rights on sequences as small as 15 nucleotides (15mers), but an analysis of all current US patent claims and the human genome presented here shows that 15mer sequences from all human genes match at least one other gene. The average gene matches 364 other genes as 15mers; the breast-cancer-associated gene BRCA1 has 15mers matching at least 689 other genes. Longer sequences (1,000 bp) still showed extensive cross-gene matches. Furthermore, 15mer-length claims from bovine and other animal patents could also claim as much as 84% of the genes in the human genome. In addition, when we expanded our analysis to full-length patent claims on DNA from all US patents to date, we found that 41% of the genes in the human genome have been claimed. Thus, current patents for both short and long nucleotide sequences are extraordinarily non-specific and create an uncertain, problematic liability for genomic medicine, especially in regard to targeted re-sequencing and other sequence diagnostic assays. PMID:23522065

Rosenfeld, Jeffrey; Mason, Christopher E

2013-03-25

100

Transcriptional consequences of genomic structural aberrations in breast cancer  

Microsoft Academic Search

Using a long-span, paired-end deep sequencing strategy, we have comprehensively identified cancer genome rearrangements in eight breast cancer genomes. Herein, we show that 40%-54% of these structural genomic rearrangements result in different forms of fusion transcripts and that 44% are potentially translated. We find that single segmental tandem duplication spanning several genes is a major source of the fusion gene

Koichiro Inaki; Axel M Hillmer; Leena Ukil; Fei Yao; Xing Yi Woo; Leah A Vardy; Kelson Folkvard Braaten Zawack; Charlie Wah Heng Lee; Pramila Nuwantha Ariyaratne; Yang Sun Chan; Kartiki Vasant Desai; Jonas Bergh; Per Hall; Thomas Choudary Putti; Wai Loon Ong; Atif Shahab; Valere Cacheux-Rataboul; Radha Krishna Murthy Karuturi; Wing-Kin Sung; Xiaoan Ruan; Guillaume Bourque; Yijun Ruan; Edison T Liu

2011-01-01

101

Latent Periodicities in Genome Sequences  

Microsoft Academic Search

A novel approach is presented for the detection of periodicities in DNA sequences. A DNA sequence can be modelled as a nonstationary stochastic process that exhibits various statistical periodicities over different regions. The coding part of the DNA, for instance, exhibits statistical periodicity with period three. Such regions in DNA are modelled as generated from a collection of information sources

Raman Arora; William A. Sethares; James A. Bucklew

2008-01-01

102

International Rice Genome Sequencing Project: the effort to completely sequence the rice genome  

Microsoft Academic Search

The International Rice Genome Sequencing Project (IRGSP) involves researchers from ten countries who are working to completely and accurately sequence the rice genome within a short period. Sequencing uses a map-based clone-by-clone shotgun strategy; shared bacterial artificial chromosome\\/ P1-derived artificial chromosome libraries have been constructed from Oryza sativa ssp. japonica variety ‘Nipponbare’. End-sequencing, fingerprinting and marker-aided PCR screening are being

Takuji Sasaki; Benjamin Burr

2000-01-01

103

DNA sequencing of a cytogenetically normal acute myeloid leukemia genome  

PubMed Central

Lay Summary Acute myeloid leukemia is a highly malignant hematopoietic tumor that affects about 13,000 adults yearly in the United States. The treatment of this disease has changed little in the past two decades, since most of the genetic events that initiate the disease remain undiscovered. Whole genome sequencing is now possible at a reasonable cost and timeframe to utilize this approach for unbiased discovery of tumor-specific somatic mutations that alter the protein-coding genes. Here we show the results obtained by sequencing a typical acute myeloid leukemia genome and its matched normal counterpart, obtained from the patient’s skin. We discovered 10 genes with acquired mutations; two were previously described mutations thought to contribute to tumor progression, and 8 were novel mutations present in virtually all tumor cells at presentation and relapse, whose function is not yet known. Our study establishes whole genome sequencing as an unbiased method for discovering initiating mutations in cancer genomes, and for identifying novel genes that may respond to targeted therapies. We used massively parallel sequencing technology to sequence the genomic DNA of tumor and normal skin cells obtained from a patient with a typical presentation of FAB M1 Acute Myeloid Leukemia (AML) with normal cytogenetics. 32.7-fold ‘haploid’ coverage (98 billion bases) was obtained for the tumor genome, and 13.9-fold coverage (41.8 billion bases) was obtained for the normal sample. Of 2,647,695 well-supported Single Nucleotide Variants (SNVs) found in the tumor genome, 2,588,486 (97.7%) also were detected in the patient’s skin genome, limiting the number of variants that required further study. For the purposes of this initial study, we restricted our downstream analysis to the coding sequences of annotated genes: we found only eight heterozygous, non-synonymous somatic SNVs in the entire genome. All were novel, including mutations in protocadherin/cadherin family members (CDH24 and PCLKC), G-protein coupled receptors (GPR123 and EBI2), a protein phosphatase (PTPRT), a potential guanine nucleotide exchange factor (KNDC1), a peptide/drug transporter (SLC15A1), and a glutamate receptor gene (GRINL1B). We also detected previously described, recurrent somatic insertions in the FLT3 and NPM1 genes. Based on deep readcount data, we determined that all of these mutations (except FLT3) were present in nearly all tumor cells at presentation, and again at relapse 11 months later, suggesting that the patient had a single dominant clone containing all of the mutations. These results demonstrate the power of whole genome sequencing to discover novel cancer-associated mutations.

Ley, Timothy J; Mardis, Elaine R; Ding, Li; Fulton, Bob; McLellan, Michael D; Chen, Ken; Dooling, David; Dunford-Shore, Brian H; McGrath, Sean; Hickenbotham, Matthew; Cook, Lisa; Abbott, Rachel; Larson, David E; Koboldt, Dan C; Pohl, Craig; Smith, Scott; Hawkins, Amy; Abbott, Scott; Locke, Devin; Hillier, LaDeana W; Miner, Tracie; Fulton, Lucinda; Magrini, Vincent; Wylie, Todd; Glasscock, Jarret; Conyers, Joshua; Sander, Nathan; Shi, Xiaoqi; Osborne, John R; Minx, Patrick; Gordon, David; Chinwalla, Asif; Zhao, Yu; Ries, Rhonda E; Payton, Jacqueline E; Westervelt, Peter; Tomasson, Michael H; Watson, Mark; Baty, Jack; Ivanovich, Jennifer; Heath, Sharon; Shannon, William D; Nagarajan, Rakesh; Walter, Matthew J; Link, Daniel C; Graubert, Timothy A; DiPersio, John F; Wilson, Richard K

2008-01-01

104

Genome Sequencing, Assembly and Gene Prediction in Fungi  

Microsoft Academic Search

Genome sequencing and the science of genomics is now being applied to the study of fungi. Although resources have been slow in coming, a number of fungi are now being sequenced and an increasingly diverse array of these organisms are being considered as candidates for whole genome sequencing. Currently there are only two complete fungal genome sequences available, those of

Brendan Loftus

2003-01-01

105

Finishing the euchromatic sequence of the human genome  

Microsoft Academic Search

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and

2004-01-01

106

Genome Sequence of Yersinia pestis KIM†  

PubMed Central

We present the complete genome sequence of Yersinia pestis KIM, the etiologic agent of bubonic and pneumonic plague. The strain KIM, biovar Mediaevalis, is associated with the second pandemic, including the Black Death. The 4.6-Mb genome encodes 4,198 open reading frames (ORFs). The origin, terminus, and most genes encoding DNA replication proteins are similar to those of Escherichia coli K-12. The KIM genome sequence was compared with that of Y. pestis CO92, biovar Orientalis, revealing homologous sequences but a remarkable amount of genome rearrangement for strains so closely related. The differences appear to result from multiple inversions of genome segments at insertion sequences, in a manner consistent with present knowledge of replication and recombination. There are few differences attributable to horizontal transfer. The KIM and E. coli K-12 genome proteins were also compared, exposing surprising amounts of locally colinear “backbone,” or synteny, that is not discernible at the nucleotide level. Nearly 54% of KIM ORFs are significantly similar to K-12 proteins, with conserved housekeeping functions. However, a number of E. coli pathways and transport systems and at least one global regulator were not found, reflecting differences in lifestyle between them. In KIM-specific islands, new genes encode candidate pathogenicity proteins, including iron transport systems, putative adhesins, toxins, and fimbriae.

Deng, Wen; Burland, Valerie; Plunkett III, Guy; Boutin, Adam; Mayhew, George F.; Liss, Paul; Perna, Nicole T.; Rose, Debra J.; Mau, Bob; Zhou, Shiguo; Schwartz, David C.; Fetherston, Jaqueline D.; Lindler, Luther E.; Brubaker, Robert R.; Plano, Gregory V.; Straley, Susan C.; McDonough, Kathleen A.; Nilles, Matthew L.; Matson, Jyl S.; Blattner, Frederick R.; Perry, Robert D.

2002-01-01

107

Using comparative genomics to reorder the human genome sequence into a virtual sheep genome  

Microsoft Academic Search

BACKGROUND: Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes? RESULTS: A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the

Brian P Dalrymple; Ewen F Kirkness; Mikhail Nefedov; Sean McWilliam; Abhirami Ratnakumar; Wes Barris; Shaying Zhao; Jyoti Shetty; Jillian F Maddox; Margaret O'Grady; Frank Nicholas; Allan M Crawford; Tim Smith; Pieter J de Jong; John McEwan; V Hutton Oddy; Noelle E Cockett

2007-01-01

108

Genetic variation in the genome-wide predicted estrogen response element-related sequences is associated with breast cancer development  

Microsoft Academic Search

Introduction  Estrogen forms a complex with the estrogen receptor (ER) that binds to estrogen response elements (EREs) in the promoter region\\u000a of estrogen-responsive genes, regulates their transcription, and consequently mediates physiological or tumorigenic effects.\\u000a Thus, sequence variants in EREs have the potential to affect the estrogen-ER-ERE interaction. In this study, we examined the\\u000a hypothesis that genetic variations of EREs are associated

Jyh-Cherng Yu; Chia-Ni Hsiung; Huan-Ming Hsu; Bo-Ying Bao; Shou-Tung Chen; Giu-Cheng Hsu; Wen-Cheng Chou; Ling-Yueh Hu; Shian-Ling Ding; Chun-Wen Cheng; Pei-Ei Wu; Chen-Yang Shen

2011-01-01

109

Accelerating Genome Sequencing 100X with FPGAs  

SciTech Connect

The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

Storaasli, Olaf O [ORNL; Strenski, Dave [Cray, Inc.

2007-01-01

110

The genome sequence of Schizosaccharomyces pombe.  

PubMed

We have sequenced and annotated the genome of fission yeast (Schizosaccharomyces pombe), which contains the smallest number of protein-coding genes yet recorded for a eukaryote: 4,824. The centromeres are between 35 and 110 kilobases (kb) and contain related repeats including a highly conserved 1.8-kb element. Regions upstream of genes are longer than in budding yeast (Saccharomyces cerevisiae), possibly reflecting more-extended control regions. Some 43% of the genes contain introns, of which there are 4,730. Fifty genes have significant similarity with human disease genes; half of these are cancer related. We identify highly conserved genes important for eukaryotic cell organization including those required for the cytoskeleton, compartmentation, cell-cycle control, proteolysis, protein phosphorylation and RNA splicing. These genes may have originated with the appearance of eukaryotic life. Few similarly conserved genes that are important for multicellular organization were identified, suggesting that the transition from prokaryotes to eukaryotes required more new genes than did the transition from unicellular to multicellular organization. PMID:11859360

Wood, V; Gwilliam, R; Rajandream, M-A; Lyne, M; Lyne, R; Stewart, A; Sgouros, J; Peat, N; Hayles, J; Baker, S; Basham, D; Bowman, S; Brooks, K; Brown, D; Brown, S; Chillingworth, T; Churcher, C; Collins, M; Connor, R; Cronin, A; Davis, P; Feltwell, T; Fraser, A; Gentles, S; Goble, A; Hamlin, N; Harris, D; Hidalgo, J; Hodgson, G; Holroyd, S; Hornsby, T; Howarth, S; Huckle, E J; Hunt, S; Jagels, K; James, K; Jones, L; Jones, M; Leather, S; McDonald, S; McLean, J; Mooney, P; Moule, S; Mungall, K; Murphy, L; Niblett, D; Odell, C; Oliver, K; O'Neil, S; Pearson, D; Quail, M A; Rabbinowitsch, E; Rutherford, K; Rutter, S; Saunders, D; Seeger, K; Sharp, S; Skelton, J; Simmonds, M; Squares, R; Squares, S; Stevens, K; Taylor, K; Taylor, R G; Tivey, A; Walsh, S; Warren, T; Whitehead, S; Woodward, J; Volckaert, G; Aert, R; Robben, J; Grymonprez, B; Weltjens, I; Vanstreels, E; Rieger, M; Schäfer, M; Müller-Auer, S; Gabel, C; Fuchs, M; Düsterhöft, A; Fritzc, C; Holzer, E; Moestl, D; Hilbert, H; Borzym, K; Langer, I; Beck, A; Lehrach, H; Reinhardt, R; Pohl, T M; Eger, P; Zimmermann, W; Wedler, H; Wambutt, R; Purnelle, B; Goffeau, A; Cadieu, E; Dréano, S; Gloux, S; Lelaure, V; Mottier, S; Galibert, F; Aves, S J; Xiang, Z; Hunt, C; Moore, K; Hurst, S M; Lucas, M; Rochet, M; Gaillardin, C; Tallada, V A; Garzon, A; Thode, G; Daga, R R; Cruzado, L; Jimenez, J; Sánchez, M; del Rey, F; Benito, J; Domínguez, A; Revuelta, J L; Moreno, S; Armstrong, J; Forsburg, S L; Cerutti, L; Lowe, T; McCombie, W R; Paulsen, I; Potashkin, J; Shpakovski, G V; Ussery, D; Barrell, B G; Nurse, P; Cerrutti, L

2002-02-21

111

Genome sequence of Haemophilus parasuis strain 29755.  

PubMed

Haemophilus parasuis is a member of the family Pasteurellaceae and is the etiologic agent of Glässer's disease in pigs, a systemic syndrome associated with only a subset of isolates. The genetic basis for virulence and systemic spread of particular H. parasuis isolates is currently unknown. Strain 29755 is an invasive isolate that has long been used in the study of Glässer's disease. Accordingly, the genome sequence of strain 29755 is of considerable importance to investigators endeavoring to understand the molecular pathogenesis of H. parasuis. Here we describe the features of the 2,224,137 bp draft genome sequence of strain 29755 generated from 454-FLX pyrosequencing. These data comprise the first publicly available genome sequence for this bacterium. PMID:22180811

Mullins, Michael A; Register, Karen B; Bayles, Darrell O; Dyer, David W; Kuehn, Joanna S; Phillips, Gregory J

2011-09-23

112

Genome sequence of Haemophilus parasuis strain 29755  

PubMed Central

Haemophilus parasuis is a member of the family Pasteurellaceae and is the etiologic agent of Glässer’s disease in pigs, a systemic syndrome associated with only a subset of isolates. The genetic basis for virulence and systemic spread of particular H. parasuis isolates is currently unknown. Strain 29755 is an invasive isolate that has long been used in the study of Glässer’s disease. Accordingly, the genome sequence of strain 29755 is of considerable importance to investigators endeavoring to understand the molecular pathogenesis of H. parasuis. Here we describe the features of the 2,224,137 bp draft genome sequence of strain 29755 generated from 454-FLX pyrosequencing. These data comprise the first publicly available genome sequence for this bacterium.

Mullins, Michael A.; Bayles, Darrell O.; Dyer, David W.; Kuehn, Joanna S.; Phillips, Gregory J.

2011-01-01

113

Sequencing and analysis of a genomic fragment provide an insight into the Dunaliella viridis genomic sequence.  

PubMed

Dunaliella is a genus of wall-less unicellular eukaryotic green alga. Its exceptional resistances to salt and various other stresses have made it an ideal model for stress tolerance study. However, very little is known about its genome and genomic sequences. In this study, we sequenced and analyzed a 29,268 bp genomic fragment from Dunaliella viridis. The fragment showed low sequence homology to the GenBank database. At the nucleotide level, only a segment with significant sequence homology to 18S rRNA was found. The fragment contained six putative genes, but only one gene showed significant homology at the protein level to GenBank database. The average GC content of this sequence was 51.1%, which was much lower than that of close related green algae Chlamydomonas (65.7%). Significant segmental duplications were found within this fragment. The duplicated sequences accounted for about 35.7% of the entire region. Large amounts of simple sequence repeats (microsatellites) were found, with strong bias towards (AC)(n) type (76%). Analysis of other Dunaliella genomic sequences in the GenBank database (total 25,749 bp) was in agreement with these findings. These sequence features made it difficult to sequence Dunaliella genomic sequences. Further investigation should be made to reveal the biological significance of these unique sequence features. PMID:17091199

Sun, Xiao-Ming; Tang, Yuan-Ping; Meng, Xiang-Zong; Zhang, Wen-Wen; Li, Shan; Deng, Zhi-Rui; Xu, Zheng-Kai; Song, Ren-Tao

2006-11-01

114

Scoring Pairwise Genomic Sequence Alignments  

Microsoft Academic Search

IntroductionMost sequence alignment programs employ an explicit scheme for assigning ascore to every possible alignment. This provides the criterion to prefer onealignment over another. Alignment scores typically involve a score for eachpossible aligned pair of symbols, together with a penalty for each gap in thealignment. For protein alignments, the scores for all possible aligned pairsconstitute a 20-by-20 substitution matrix. Amino

F. Chiaromonte; V. B. Yap; W. Miller

2002-01-01

115

Cancer genomics: technology, discovery, and translation.  

PubMed

In recent years, the increasing awareness that somatic mutations and other genetic aberrations drive human malignancies has led us within reach of personalized cancer medicine (PCM). The implementation of PCM is based on the following premises: genetic aberrations exist in human malignancies; a subset of these aberrations drive oncogenesis and tumor biology; these aberrations are actionable (defined as having the potential to affect management recommendations based on diagnostic, prognostic, and/or predictive implications); and there are highly specific anticancer agents available that effectively modulate these targets. This article highlights the technology underlying cancer genomics and examines the early results of genome sequencing and the challenges met in the discovery of new genetic aberrations. Finally, drawing from experiences gained in a feasibility study of somatic mutation genotyping and targeted exome sequencing led by Princess Margaret Hospital-University Health Network and the Ontario Institute for Cancer Research, the processes, challenges, and issues involved in the translation of cancer genomics to the clinic are discussed. PMID:22271477

Tran, Ben; Dancey, Janet E; Kamel-Reid, Suzanne; McPherson, John D; Bedard, Philippe L; Brown, Andrew M K; Zhang, Tong; Shaw, Patricia; Onetto, Nicole; Stein, Lincoln; Hudson, Thomas J; Neel, Benjamin G; Siu, Lillian L

2012-01-23

116

Melanoma genome sequencing reveals frequent PREX2 mutations.  

PubMed

Melanoma is notable for its metastatic propensity, lethality in the advanced setting and association with ultraviolet exposure early in life. To obtain a comprehensive genomic view of melanoma in humans, we sequenced the genomes of 25 metastatic melanomas and matched germline DNA. A wide range of point mutation rates was observed: lowest in melanomas whose primaries arose on non-ultraviolet-exposed hairless skin of the extremities (3 and 14 per megabase (Mb) of genome), intermediate in those originating from hair-bearing skin of the trunk (5-55 per Mb), and highest in a patient with a documented history of chronic sun exposure (111 per Mb). Analysis of whole-genome sequence data identified PREX2 (phosphatidylinositol-3,4,5-trisphosphate-dependent Rac exchange factor 2)--a PTEN-interacting protein and negative regulator of PTEN in breast cancer--as a significantly mutated gene with a mutation frequency of approximately 14% in an independent extension cohort of 107 human melanomas. PREX2 mutations are biologically relevant, as ectopic expression of mutant PREX2 accelerated tumour formation of immortalized human melanocytes in vivo. Thus, whole-genome sequencing of human melanoma tumours revealed genomic evidence of ultraviolet pathogenesis and discovered a new recurrently mutated gene in melanoma. PMID:22622578

Berger, Michael F; Hodis, Eran; Heffernan, Timothy P; Deribe, Yonathan Lissanu; Lawrence, Michael S; Protopopov, Alexei; Ivanova, Elena; Watson, Ian R; Nickerson, Elizabeth; Ghosh, Papia; Zhang, Hailei; Zeid, Rhamy; Ren, Xiaojia; Cibulskis, Kristian; Sivachenko, Andrey Y; Wagle, Nikhil; Sucker, Antje; Sougnez, Carrie; Onofrio, Robert; Ambrogio, Lauren; Auclair, Daniel; Fennell, Timothy; Carter, Scott L; Drier, Yotam; Stojanov, Petar; Singer, Meredith A; Voet, Douglas; Jing, Rui; Saksena, Gordon; Barretina, Jordi; Ramos, Alex H; Pugh, Trevor J; Stransky, Nicolas; Parkin, Melissa; Winckler, Wendy; Mahan, Scott; Ardlie, Kristin; Baldwin, Jennifer; Wargo, Jennifer; Schadendorf, Dirk; Meyerson, Matthew; Gabriel, Stacey B; Golub, Todd R; Wagner, Stephan N; Lander, Eric S; Getz, Gad; Chin, Lynda; Garraway, Levi A

2012-05-09

117

Complete Genome Sequence of Ikoma Lyssavirus  

PubMed Central

Lyssaviruses (family Rhabdoviridae) constitute one of the most important groups of viral zoonoses globally. All lyssaviruses cause the disease rabies, an acute progressive encephalitis for which, once symptoms occur, there is no effective cure. Currently available vaccines are highly protective against the predominantly circulating lyssavirus species. Using next-generation sequencing technologies, we have obtained the whole-genome sequence for a novel lyssavirus, Ikoma lyssavirus (IKOV), isolated from an African civet in Tanzania displaying clinical signs of rabies. Genetically, this virus is the most divergent within the genus Lyssavirus. Characterization of the genome will help to improve our understanding of lyssavirus diversity and enable investigation into vaccine-induced immunity and protection.

Marston, Denise A.; Ellis, Richard J.; Horton, Daniel L.; Kuzmin, Ivan V.; Wise, Emma L.; McElhinney, Lorraine M.; Banyard, Ashley C.; Ngeleja, Chanasa; Keyyu, Julius; Cleaveland, Sarah; Lembo, Tiziana; Rupprecht, Charles E.

2012-01-01

118

Chapter 14: Cancer Genome Analysis  

PubMed Central

Although there is great promise in the benefits to be obtained by analyzing cancer genomes, numerous challenges hinder different stages of the process, from the problem of sample preparation and the validation of the experimental techniques, to the interpretation of the results. This chapter specifically focuses on the technical issues associated with the bioinformatics analysis of cancer genome data. The main issues addressed are the use of database and software resources, the use of analysis workflows and the presentation of clinically relevant action items. We attempt to aid new developers in the field by describing the different stages of analysis and discussing current approaches, as well as by providing practical advice on how to access and use resources, and how to implement recommendations. Real cases from cancer genome projects are used as examples.

Vazquez, Miguel; de la Torre, Victor; Valencia, Alfonso

2012-01-01

119

Initial genome sequencing and analysis of multiple myeloma.  

PubMed

Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumour genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the data set. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-?B signalling was indicated by mutations in 11 members of the NF-?B pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge. PMID:21430775

Chapman, Michael A; Lawrence, Michael S; Keats, Jonathan J; Cibulskis, Kristian; Sougnez, Carrie; Schinzel, Anna C; Harview, Christina L; Brunet, Jean-Philippe; Ahmann, Gregory J; Adli, Mazhar; Anderson, Kenneth C; Ardlie, Kristin G; Auclair, Daniel; Baker, Angela; Bergsagel, P Leif; Bernstein, Bradley E; Drier, Yotam; Fonseca, Rafael; Gabriel, Stacey B; Hofmeister, Craig C; Jagannath, Sundar; Jakubowiak, Andrzej J; Krishnan, Amrita; Levy, Joan; Liefeld, Ted; Lonial, Sagar; Mahan, Scott; Mfuko, Bunmi; Monti, Stefano; Perkins, Louise M; Onofrio, Robb; Pugh, Trevor J; Rajkumar, S Vincent; Ramos, Alex H; Siegel, David S; Sivachenko, Andrey; Stewart, A Keith; Trudel, Suzanne; Vij, Ravi; Voet, Douglas; Winckler, Wendy; Zimmerman, Todd; Carpten, John; Trent, Jeff; Hahn, William C; Garraway, Levi A; Meyerson, Matthew; Lander, Eric S; Getz, Gad; Golub, Todd R

2011-03-24

120

Complete genome sequence of Caulobacter crescentus.  

PubMed

The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living alpha-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus. PMID:11259647

Nierman, W C; Feldblyum, T V; Laub, M T; Paulsen, I T; Nelson, K E; Eisen, J A; Heidelberg, J F; Alley, M R; Ohta, N; Maddock, J R; Potocka, I; Nelson, W C; Newton, A; Stephens, C; Phadke, N D; Ely, B; DeBoy, R T; Dodson, R J; Durkin, A S; Gwinn, M L; Haft, D H; Kolonay, J F; Smit, J; Craven, M B; Khouri, H; Shetty, J; Berry, K; Utterback, T; Tran, K; Wolf, A; Vamathevan, J; Ermolaeva, M; White, O; Salzberg, S L; Venter, J C; Shapiro, L; Fraser, C M; Eisen, J

2001-03-20

121

Complete genome sequence of Caulobacter crescentus  

PubMed Central

The complete genome sequence of Caulobacter crescentus was determined to be 4,016,942 base pairs in a single circular chromosome encoding 3,767 genes. This organism, which grows in a dilute aquatic environment, coordinates the cell division cycle and multiple cell differentiation events. With the annotated genome sequence, a full description of the genetic network that controls bacterial differentiation, cell growth, and cell cycle progression is within reach. Two-component signal transduction proteins are known to play a significant role in cell cycle progression. Genome analysis revealed that the C. crescentus genome encodes a significantly higher number of these signaling proteins (105) than any bacterial genome sequenced thus far. Another regulatory mechanism involved in cell cycle progression is DNA methylation. The occurrence of the recognition sequence for an essential DNA methylating enzyme that is required for cell cycle regulation is severely limited and shows a bias to intergenic regions. The genome contains multiple clusters of genes encoding proteins essential for survival in a nutrient poor habitat. Included are those involved in chemotaxis, outer membrane channel function, degradation of aromatic ring compounds, and the breakdown of plant-derived carbon sources, in addition to many extracytoplasmic function sigma factors, providing the organism with the ability to respond to a wide range of environmental fluctuations. C. crescentus is, to our knowledge, the first free-living ?-class proteobacterium to be sequenced and will serve as a foundation for exploring the biology of this group of bacteria, which includes the obligate endosymbiont and human pathogen Rickettsia prowazekii, the plant pathogen Agrobacterium tumefaciens, and the bovine and human pathogen Brucella abortus.

Nierman, William C.; Feldblyum, Tamara V.; Laub, Michael T.; Paulsen, Ian T.; Nelson, Karen E.; Eisen, Jonathan; Heidelberg, John F.; Alley, M. R. K.; Ohta, Noriko; Maddock, Janine R.; Potocka, Isabel; Nelson, William C.; Newton, Austin; Stephens, Craig; Phadke, Nikhil D.; Ely, Bert; DeBoy, Robert T.; Dodson, Robert J.; Durkin, A. Scott; Gwinn, Michelle L.; Haft, Daniel H.; Kolonay, James F.; Smit, John; Craven, M. B.; Khouri, Hoda; Shetty, Jyoti; Berry, Kristi; Utterback, Teresa; Tran, Kevin; Wolf, Alex; Vamathevan, Jessica; Ermolaeva, Maria; White, Owen; Salzberg, Steven L.; Venter, J. Craig; Shapiro, Lucy; Fraser, Claire M.

2001-01-01

122

Genomic Sequencing of Pleistocene Cave Bears  

Microsoft Academic Search

Despite the greater information content of genomic DNA, ancient DNA studies have largely been limited to the amplification of mitochondrial sequences. Here we describe metagenomic libraries constructed with unamplified DNA extracted from skeletal remains of two 40,000-year-old extinct cave bears. Analysis of ~1 megabase of sequence from each library showed that despite significant microbial contamination, 5.8 and 1.1% of clones

James P. Noonan; Michael Hofreiter; Doug Smith; James R. Priest; Nadin Rohland; Gernot Rabeder; Johannes Krause; J. Chris Detter; Svante Pääbo; Edward M. Rubin

2005-01-01

123

Future medical applications of single-cell sequencing in cancer.  

PubMed

Advances in whole genome amplification and next-generation sequencing methods have enabled genomic analyses of single cells, and these techniques are now beginning to be used to detect genomic lesions in individual cancer cells. Previous approaches have been unable to resolve genomic differences in complex mixtures of cells, such as heterogeneous tumors, despite the importance of characterizing such tumors for cancer treatment. Sequencing of single cells is likely to improve several aspects of medicine, including the early detection of rare tumor cells, monitoring of circulating tumor cells (CTCs), measuring intratumor heterogeneity, and guiding chemotherapy. In this review we discuss the challenges and technical aspects of single-cell sequencing, with a strong focus on genomic copy number, and discuss how this information can be used to diagnose and treat cancer patients. PMID:21631906

Navin, Nicholas; Hicks, James

2011-05-31

124

Genome Sequence Assembly Using Trace Signals and Additional Sequence Information  

Microsoft Academic Search

Motivation: This article presents a method for as- sembling shotgun sequences which primarily uses high confidence regions whilst taking advantage of additional available information such as low con- fidence regions, quality values or repetitive region tags. Conflict situations are resolved with routines for analysing trace signals. Results: Initial tests with different human and mouse genome projects showed promising results but

Bastien Chevreux; Thomas Wetter; Sándor Suhai

1999-01-01

125

An emerging place for lung cancer genomics in 2013  

PubMed Central

Lung cancer is a disease with a dismal prognosis and is the biggest cause of cancer deaths in many countries. Nonetheless, rapid technological developments in genome science promise more effective prevention and treatment strategies. Since the Human Genome Project, scientific advances have revolutionized the diagnosis and treatment of human cancers, including thoracic cancers. The latest, massively parallel, next generation sequencing (NGS) technologies offer much greater sequencing capacity than traditional, capillary-based Sanger sequencing. These modern but costly technologies have been applied to whole genome-, and whole exome sequencing (WGS and WES) for the discovery of mutations and polymorphisms, transcriptome sequencing for quantification of gene expression, small ribonucleic acid (RNA) sequencing for microRNA profiling, large scale analysis of deoxyribonucleic acid (DNA) methylation and chromatin immunoprecipitation mapping of DNA-protein interaction. With the rise of personalized cancer care, based on the premise of precision medicine, sequencing technologies are constantly changing. To date, the genomic landscape of lung cancer has been captured in several WGS projects. Such work has not only contributed to our understanding of cancer biology, but has also provided impetus for technical advances that may improve our ability to accurately capture the cancer genome. Issues such as short read lengths contribute to sequenced libraries that contain challenging gaps in the aligned genome. Emerging platforms promise longer reads as well as the ability to capture a range of epigenomic signals. In addition, ongoing optimization of bioinformatics strategies for data analysis and interpretation are critical, especially for the differentiation between driver and passenger mutations. Moreover, broader deployment of these and future generations of platforms, coupled with an increasing bioinformatics workforce with access to highly sophisticated technologies, could see many of these discoveries translated to the clinic at a rapid pace. We look forward to these advances making a difference for the many patients we treat in the Asia-Pacific region and around the world.

Bowman, Rayleen V.; Yang, Ian A.; Govindan, Ramaswamy; Fong, Kwun M.

2013-01-01

126

Mapping and sequencing the human genome  

SciTech Connect

Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

none,

1988-01-01

127

Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria  

PubMed Central

Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST.

Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Ponten, Thomas; Ussery, David W.; Aarestrup, Frank M.; Lund, Ole

2012-01-01

128

The genome sequence DataBase.  

PubMed

The Genome Sequence DataBase (GSDB) is a database of publicly available nucleotide sequences and their associated biological and bibliographic information. Several notable changes have occurred in the past year: GSDB stopped accepting data submissions from researchers; ownership of data submitted to GSDB was transferred to GenBank; sequence analysis capabilities were expanded to include Smith-Waterman and Frame Search; and Sequence Viewer became available to Mac users. The content of GSDB remains up-to-date because publicly available data is acquired from the International Nucleotide Sequence Database Collaboration databases (IC) on a nightly basis. This allows GSDB to continue providing researchers with the ability to analyze, query and retrieve nucleotide sequences in the database. GSDB and its related tools are freely accessible from the URL: http://www.ncgr.org PMID:10592174

Harger, C; Chen, G; Farmer, A; Huang, W; Inman, J; Kiphart, D; Schilkey, F; Skupski, M P; Weller, J

2000-01-01

129

The Genome Sequence DataBase  

PubMed Central

The Genome Sequence DataBase (GSDB) is a database of publicly available nucleotide sequences and their associated biological and bibliographic information. Several notable changes have occurred in the past year: GSDB stopped accepting data submissions from researchers; ownership of data submitted to GSDB was transferred to GenBank; sequence analysis capabilities were expanded to include Smith–Waterman and Frame Search; and Sequence Viewer became available to Mac users. The content of GSDB remains up-to-date because publicly available data is acquired from the International Nucleotide Sequence Database Collaboration databases (IC) on a nightly basis. This allows GSDB to continue providing researchers with the ability to analyze, query and retrieve nucleotide sequences in the database. GSDB and its related tools are freely accessible from the URL: http://www.ncgr.org

Harger, C.; Chen, G.; Farmer, A.; Huang, W.; Inman, J.; Kiphart, D.; Schilkey, F.; Skupski, M. P.; Weller, J.

2000-01-01

130

Genome sequence of Lactobacillus versmoldensis KCTC 3814.  

PubMed

Lactobacillus versmoldensis KCTC 3814 was isolated from raw fermented poultry salami. The species was present in high numbers and frequently dominated the lactic acid bacteria (LAB) populations of the products. Here, we announce the draft genome sequence of Lactobacillus versmoldensis KCTC 3814, isolated from poultry salami, and describe major findings from its annotation. PMID:21914893

Kim, Dae-Soo; Choi, Sang-Haeng; Kim, Dong-Wook; Kim, Ryong Nam; Nam, Seong-Hyeuk; Kang, Aram; Kim, Aeri; Park, Hong-Seog

2011-10-01

131

VIRAL SEQUENCES INTEGRATED INTO PLANT GENOMES  

Microsoft Academic Search

? Abstract Sequences of various DNA plant viruses have been found,integrated into the host genome. There are two forms of integrant, those that can form episomal viral infections and those that cannot. Integrants of three pararetroviruses, Banana streak virus (BSV), Tobacco vein clearing virus(TVCV), and Petunia vein clearing virus (PVCV), can generate episomal infections in certain hybrid plant hosts in

Glyn Harper; Roger Hull; Ben Lockhart; Neil Olszewski

2002-01-01

132

International network of cancer genome projects  

Microsoft Academic Search

The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and\\/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic

Thomas J. Hudson; Warwick Anderson; Axel Aretz; Anna D. Barker; Cindy Bell; Rosa R. Bernabé; M. K. Bhan; Iiro Eerola; Daniela S. Gerhard; Alan Guttmacher; Mark Guyer; Fiona M. Hemsley; Jennifer L. Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusuda; Frank Laplace; Youyong Lu; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T. S. Rao; Jacques Remacle; Alan J. Schafer; Tatsuhiro Shibata; Michael R. Stratton; Joseph G. Vockley; Koichi Watanabe; Huanming Yang; Martin Bobrow; Anne Cambon-Thomsen; Lynn G. Dressler; Stephanie O. M. Dyke; Yann Joly; Kazuto Kato; Karen L. Kennedy; Pilar Nicolás; Michael J. Parker; Emmanuelle Rial-Sebbag; Carlos M. Romeo-Casabona; Kenna M. Shaw; Susan Wallace; Georgia L. Wiesner; Andrew V. Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L. Ferguson; Peter Geary; D. Neil Hayes; Amber L. Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A. Piris; Rajiv Sarin; Aldo Scarpa; Hiroyuki Aburatani; Mónica Bayés; David D. L. Bowtell; Peter J. Campbell; Xavier Estivill; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D. McPherson; Zemin Ning; Xose S. Puente; Yijun Ruan; Hendrik G. Stunnenberg; Harold Swerdlow; Victor E. Velculescu; Richard K. Wilson; Hong H. Xue; Paul T. Spellman; Gary D. Bader; Paul C. Boutros; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J. Hubbard; Tao Jiang; Steven M. Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B. F. Francis Ouellette; John V. Pearson; Victor Quesada; Benjamin J. Raphael; Chris Sander; Terence P. Speed; Joshua M. Stuart; Jon W. Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A. Wheeler; Honglong Wu; Shancen Zhao; Mark Lathrop; Gilles Thomas; Myles Axton; Chris Gunter; Linda J. Miller; Junjun Zhang; Syed A. Haider; Jianxin Wang; Christina K. Yung; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Don R. C. Chalmers; Karl W. Hasel; Terry S. H. Kaan; William W. Lowrance; Tohru Masui; Laura Lyman Rodriguez; Catherine Vergely; Nicole Cloonan; Anna Defazio; James R. Eshleman; Dariush Etemadmoghadam; Brooke A. Gardiner; James G. Kench; Robert L. Sutherland; Margaret A. Tempero; Nicola J. Waddell; Steve Gallinger; Ming-Sound Tsao; Patricia A. Shaw; Gloria M. Petersen; Debabrata Mukhopadhyay; Ronald A. Depinho; Sarah Thayer; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevad; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Juris Viksna; Fredrik Ponten; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A. Foekens; Sancha Martin; Jorge S. Reis-Filho; Andrea L. Richardson; Christos Sotiriou; Marc van de Vijver; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Jocelyne D. Masson-Jacquemier; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Paulette Bioulac-Sage; Thomas Decaens; Dominique Franco; Marta Gut; Didier Samuel; Benedikt Brors; Jan O. Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D. Taylor; Paolo Pederzoli; Rita T. Lawlor; Massimo Delledonne; Alberto Bardelli; Thomas Gress; David Klimstra; Yusuke Nakamura; Satoru Miyano; Akihiro Fujimoto; Silvia de Sanjosé; Emili Montserrat; Marcos González-Díaz; Pedro Jares; Heinz Himmelbaue; Samuel Aparicio; Laura van't Veer; Douglas F. Easton; Francis S. Collins; Carolyn C. Compton; Eric S. Lander; Wylie Burke; Anthony R. Green; Olli P. Kallioniemi; Timothy J. Ley; Edison T. Liu; Brandon J. Wainwright

2010-01-01

133

Hardware accelerator for genomic sequence alignment.  

PubMed

To infer homology and subsequently gene function, the Smith-Waterman algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain billions of sequences, this algorithm becomes computationally expensive. Consequently, in this paper, we focused on accelerating the Smith-Waterman algorithm by modifying the computationally repeated portion of the algorithm by FPGA hardware custom instructions. These simple modifications accelerated the algorithm runtime by an average of 287% compared to the pure software implementation. Therefore, further design of FPGA accelerated hardware offers a promising direction to seeking runtime improvement of genomic database searching. PMID:17946720

Chiang, Jason; Studniberg, Michael; Shaw, Jack; Seto, Shaw; Truong, Kevin

2006-01-01

134

Comparison of Sample Sequences of the Salmonella typhi Genome to the Sequence of the Complete Escherichia coli K-12 Genome  

Microsoft Academic Search

Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with

MICHAEL MCCLELLAND; RICHARD K. WILSON

1998-01-01

135

NIH researchers complete whole-exome sequencing of skin cancer;  

Cancer.gov

A team led by researchers at NIH is the first to systematically survey the landscape of the melanoma genome, the DNA code of the deadliest form of skin cancer. The researchers have made surprising new discoveries using whole-exome sequencing, an approach that decodes the 1-2 percent of the genome that contains protein-coding genes.

136

The breast cancer genome - a key for better oncology  

PubMed Central

Molecular classification has added important knowledge to breast cancer biology, but has yet to be implemented as a clinical standard. Full sequencing of breast cancer genomes could potentially refine classification and give a more complete picture of the mutational profile of cancer and thus aid therapy decisions. Future treatment guidelines must be based on the knowledge derived from histopathological sub-classification of tumors, but with added information from genomic signatures when properly clinically validated. The objective of this article is to give some background on molecular classification, the potential of next generation sequencing, and to outline how this information could be implemented in the clinic.

2011-01-01

137

Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux  

PubMed Central

We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ?20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology.

Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

2012-01-01

138

Mutational hotspots in the mitochondrial genome of lung cancer  

Microsoft Academic Search

We determined the somatic mutations in the mitochondrial genomes of 70 lung cancer patients by pair-wise comparative analyses of the normal- and tumor-genome sequences acquired using Affymetrix Mitochondrial Resequencing Array 2.0. The overall mutation rates in lung cancers were Approximately 100 fold higher than those in normal cells, with significant statistical correlation with smoking (p=0.00088). Total of 532 somatic mutations

So-Jung Choi; Sung-Hyun Kim; Ho Y. Kang; Jinseon Lee; Jong H. Bhak; Insuk Sohn; Sin-Ho Jung; Yong Soo Choi; Hong Kwan Kim; Jungho Han; Nam Huh; Gyusang Lee; Byung C. Kim; Jhingook Kim

2011-01-01

139

Clinical applications of next-generation sequencing in colorectal cancers  

PubMed Central

Like other solid tumors, colorectal cancer (CRC) is a genomic disorder in which various types of genomic alterations, such as point mutations, genomic rearrangements, gene fusions, or chromosomal copy number alterations, can contribute to the initiation and progression of the disease. The advent of a new DNA sequencing technology known as next-generation sequencing (NGS) has revolutionized the speed and throughput of cataloguing such cancer-related genomic alterations. Now the challenge is how to exploit this advanced technology to better understand the underlying molecular mechanism of colorectal carcinogenesis and to identify clinically relevant genetic biomarkers for diagnosis and personalized therapeutics. In this review, we will introduce NGS-based cancer genomics studies focusing on those of CRC, including a recent large-scale report from the Cancer Genome Atlas. We will mainly discuss how NGS-based exome-, whole genome- and methylome-sequencing have extended our understanding of colorectal carcinogenesis. We will also introduce the unique genomic features of CRC discovered by NGS technologies, such as the relationship with bacterial pathogens and the massive genomic rearrangements of chromothripsis. Finally, we will discuss the necessary steps prior to development of a clinical application of NGS-related findings for the advanced management of patients with CRC.

Kim, Tae-Min; Lee, Sug-Hyung; Chung, Yeun-Jun

2013-01-01

140

Whole-genome sequencing in bacteriology: state of the art  

PubMed Central

Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

Dark, Michael J

2013-01-01

141

The Norway spruce genome sequence and conifer genome evolution.  

PubMed

Conifers have dominated forests for more than 200?million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000?base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding. PMID:23698360

Nystedt, Björn; Street, Nathaniel R; Wetterbom, Anna; Zuccolo, Andrea; Lin, Yao-Cheng; Scofield, Douglas G; Vezzi, Francesco; Delhomme, Nicolas; Giacomello, Stefania; Alexeyenko, Andrey; Vicedomini, Riccardo; Sahlin, Kristoffer; Sherwood, Ellen; Elfstrand, Malin; Gramzow, Lydia; Holmberg, Kristina; Hällman, Jimmie; Keech, Olivier; Klasson, Lisa; Koriabine, Maxim; Kucukoglu, Melis; Käller, Max; Luthman, Johannes; Lysholm, Fredrik; Niittylä, Totte; Olson, Ake; Rilakovic, Nemanja; Ritland, Carol; Rosselló, Josep A; Sena, Juliana; Svensson, Thomas; Talavera-López, Carlos; Theißen, Günter; Tuominen, Hannele; Vanneste, Kevin; Wu, Zhi-Qiang; Zhang, Bo; Zerbe, Philipp; Arvestad, Lars; Bhalerao, Rishikesh; Bohlmann, Joerg; Bousquet, Jean; Garcia Gil, Rosario; Hvidsten, Torgeir R; de Jong, Pieter; MacKay, John; Morgante, Michele; Ritland, Kermit; Sundberg, Björn; Thompson, Stacey Lee; Van de Peer, Yves; Andersson, Björn; Nilsson, Ove; Ingvarsson, Pär K; Lundeberg, Joakim; Jansson, Stefan

2013-05-22

142

Perspective beyond cancer genomics: bioenergetics of cancer stem cells.  

PubMed

Although the notion that cancer is a disease caused by genetic and epigenetic alterations is now widely accepted, perhaps more emphasis has been given to the fact that cancer is a genetic disease. It should be noted that in the post-genome sequencing project period of the 21st century, the underlined phenomenon nevertheless could not be discarded towards the complete control of cancer disaster as the whole strategy, and in depth investigation of the factors associated with tumorigenesis is required for achieving it. Otto Warburg has won a Nobel Prize in 1931 for the discovery of tumor bioenergetics, which is now commonly used as the basis of positron emission tomography (PET), a highly sensitive noninvasive technique used in cancer diagnosis. Furthermore, the importance of the cancer stem cell (CSC) hypothesis in therapy-related resistance and metastasis has been recognized during the past 2 decades. Accumulating evidence suggests that tumor bioenergetics plays a critical role in CSC regulation; this finding has opened up a new era of cancer medicine, which goes beyond cancer genomics. PMID:20635433

Ishii, Hideshi; Doki, Yuichiro; Mori, Masaki

2010-09-01

143

Complete genome sequence of Pyrobaculum oguniense  

PubMed Central

Pyrobaculum oguniense TE7 is an aerobic hyperthermophilic crenarchaeon isolated from a hot spring in Japan. Here we describe its main chromosome of 2,436,033 bp, with three large-scale inversions and an extra-chromosomal element of 16,887 bp. We have annotated 2,800 protein-coding genes and 145 RNA genes in this genome, including nine H/ACA-like small RNA, 83 predicted C/D box small RNA, and 47 transfer RNA genes. Comparative analyses with the closest known relative, the anaerobe Pyrobaculum arsenaticum from Italy, reveals unexpectedly high synteny and nucleotide identity between these two geographically distant species. Deep sequencing of a mixture of genomic DNA from multiple cells has illuminated some of the genome dynamics potentially shared with other species in this genus.

Bernick, David L.; Karplus, Kevin; Lui, Lauren M.; Coker, Joanna K. C.; Murphy, Julie N.; Chan, Patricia P.; Cozen, Aaron E.

2012-01-01

144

International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data  

PubMed Central

The International Cancer Genome Consortium (ICGC) is a collaborative effort to characterize genomic abnormalities in 50 different cancer types. To make this data available, the ICGC has created the ICGC Data Portal. Powered by the BioMart software, the Data Portal allows each ICGC member institution to manage and maintain its own databases locally, while seamlessly presenting all the data in a single access point for users. The Data Portal currently contains data from 24 cancer projects, including ICGC, The Cancer Genome Atlas (TCGA), Johns Hopkins University, and the Tumor Sequencing Project. It consists of 3478 genomes and 13 cancer types and subtypes. Available open access data types include simple somatic mutations, copy number alterations, structural rearrangements, gene expression, microRNAs, DNA methylation and exon junctions. Additionally, simple germline variations are available as controlled access data. The Data Portal uses a web-based graphical user interface (GUI) to offer researchers multiple ways to quickly and easily search and analyze the available data. The web interface can assist in constructing complicated queries across multiple data sets. Several application programming interfaces are also available for programmatic access. Here we describe the organization, functionality, and capabilities of the ICGC Data Portal. Database URL: http://dcc.icgc.org

Zhang, Junjun; Baran, Joachim; Cros, A.; Guberman, Jonathan M.; Haider, Syed; Hsu, Jack; Liang, Yong; Rivkin, Elena; Wang, Jianxin; Whitty, Brett; Wong-Erasmus, Marie; Yao, Long; Kasprzyk, Arek

2011-01-01

145

International Cancer Genome Consortium Data Portal--a one-stop shop for cancer genomics data.  

PubMed

The International Cancer Genome Consortium (ICGC) is a collaborative effort to characterize genomic abnormalities in 50 different cancer types. To make this data available, the ICGC has created the ICGC Data Portal. Powered by the BioMart software, the Data Portal allows each ICGC member institution to manage and maintain its own databases locally, while seamlessly presenting all the data in a single access point for users. The Data Portal currently contains data from 24 cancer projects, including ICGC, The Cancer Genome Atlas (TCGA), Johns Hopkins University, and the Tumor Sequencing Project. It consists of 3478 genomes and 13 cancer types and subtypes. Available open access data types include simple somatic mutations, copy number alterations, structural rearrangements, gene expression, microRNAs, DNA methylation and exon junctions. Additionally, simple germline variations are available as controlled access data. The Data Portal uses a web-based graphical user interface (GUI) to offer researchers multiple ways to quickly and easily search and analyze the available data. The web interface can assist in constructing complicated queries across multiple data sets. Several application programming interfaces are also available for programmatic access. Here we describe the organization, functionality, and capabilities of the ICGC Data Portal. PMID:21930502

Zhang, Junjun; Baran, Joachim; Cros, A; Guberman, Jonathan M; Haider, Syed; Hsu, Jack; Liang, Yong; Rivkin, Elena; Wang, Jianxin; Whitty, Brett; Wong-Erasmus, Marie; Yao, Long; Kasprzyk, Arek

2011-09-19

146

Genomic multiple sequence alignments: refinement using a genetic algorithm  

Microsoft Academic Search

BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a

Chunlin Wang; Elliot J. Lefkowitz

2005-01-01

147

Initial sequencing and comparative analysis of the mouse genome  

Microsoft Academic Search

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing

Robert H. Waterston; Kerstin Lindblad-Toh; Ewan Birney; Jane Rogers; Josep F. Abril; Pankaj Agarwal; Richa Agarwala; Rachel Ainscough; Marina Alexandersson; Peter An; Stylianos E. Antonarakis; John Attwood; Robert Baertsch; Jonathon Bailey; Karen Barlow; Stephan Beck; Eric Berry; Bruce Birren; Toby Bloom; Peer Bork; Marc Botcherby; Nicolas Bray; Michael R. Brent; Daniel G. Brown; Stephen D. Brown; Carol Bult; John Burton; Jonathan Butler; Robert D. Campbell; Piero Carninci; Simon Cawley; Francesca Chiaromonte; Asif T. Chinwalla; Deanna M. Church; Michele Clamp; Christopher Clee; Francis S. Collins; Lisa L. Cook; Richard R. Copley; Alan Coulson; Olivier Couronne; James Cuff; Val Curwen; Tim Cutts; Mark Daly; Robert David; Joy Davies; Kimberly D. Delehaunty; Justin Deri; Emmanouil T. Dermitzakis; Colin Dewey; Nicholas J. Dickens; Mark Diekhans; Sheila Dodge; Inna Dubchak; Diane M. Dunn; Sean R. Eddy; Laura Elnitski; Richard D. Emes; Pallavi Eswara; Eduardo Eyras; Adam Felsenfeld; Ginger A. Fewell; Paul Flicek; Karen Foley; Wayne N. Frankel; Lucinda A. Fulton; Robert S. Fulton; Terrence S. Furey; Diane Gage; Richard A. Gibbs; Gustavo Glusman; Sante Gnerre; Nick Goldman; Leo Goodstadt; Darren Grafham; Tina A. Graves; Eric D. Green; Simon Gregory; Roderic Guigó; Mark Guyer; Ross C. Hardison; David Haussler; Yoshihide Hayashizaki; LaDeana W. Hillier; Angela Hinrichs; Wratko Hlavina; Timothy Holzer; Fan Hsu; Axin Hua; Tim Hubbard; Adrienne Hunt; Ian Jackson; David B. Jaffe; L. Steven Johnson; Matthew Jones; Thomas A. Jones; Ann Joy; Michael Kamal; Elinor K. Karlsson; Donna Karolchik; Arkadiusz Kasprzyk; Jun Kawai; Evan Keibler; Cristyn Kells; W. James Kent; Andrew Kirby; Diana L. Kolbe; Ian Korf; Raju S. Kucherlapati; Edward J. Kulbokas; David Kulp; Tom Landers; J. P. Leger; Steven Leonard; Ivica Letunic; Rosie Levine; Jia Li; Ming Li; Christine Lloyd; Susan Lucas; Bin Ma; Donna R. Maglott; Elaine R. Mardis; Lucy Matthews; Evan Mauceli; John H. Mayer; Megan McCarthy; W. Richard McCombie; Stuart McLaren; Kirsten McLay; John D. McPherson; Jim Meldrim; Beverley Meredith; Jill P. Mesirov; Webb Miller; Tracie L. Miner; Emmanuel Mongin; Kate T. Montgomery; Michael Morgan; Richard Mott; James C. Mullikin; Donna M. Muzny; William E. Nash; Joanne O. Nelson; Michael N. Nhan; Robert Nicol; Zemin Ning; Chad Nusbaum; Michael J. O'Connor; Yasushi Okazaki; Karen Oliver; Emma Overton-Larty; Lior Pachter; Genís Parra; Kymberlie H. Pepin; Jane Peterson; Pavel Pevzner; Robert Plumb; Craig S. Pohl; Alex Poliakov; Tracy C. Ponce; Simon Potter; Michael Quail; Alexandre Reymond; Bruce A. Roe; Krishna M. Roskin; Edward M. Rubin; Alistair G. Rust; Victor Sapojnikov; Brian Schultz; Jörg Schultz; Scott Schwartz; Carol Scott; Steven Seaman; Steve Searle; Ted Sharpe; Andrew Sheridan; Ratna Shownkeen; Sarah Sims; Jonathan B. Singer; Guy Slater; Arian Smit; Douglas R. Smith; Brian Spencer; Arne Stabenau; Nicole Stange-Thomann; Charles Sugnet; Mikita Suyama; Glenn Tesler; Johanna Thompson; David Torrents; Evanne Trevaskis; John Tromp; Catherine Ucla; Abel Ureta-Vidal; Jade P. Vinson; Andrew C. von Niederhausern; Claire M. Wade; Melanie Wall; Ryan J. Weber; Robert B. Weiss; Michael C. Wendl; Anthony P. West; Kris Wetterstrand; Raymond Wheeler; Simon Whelan; Jamey Wierzbowski; David Willey; Sophie Williams; Richard K. Wilson; Eitan Winter; Kim C. Worley; Dudley Wyman; Shan Yang; Shiaw-Pyng Yang; Evgeny M. Zdobnov; Michael C. Zody; Eric S. Lander; Chris P. Ponting; Matthias S. Schwartz

2002-01-01

148

Building the sequence map of the human pan-genome  

Microsoft Academic Search

Here we integrate the de novo assembly of an Asian and an African genome with the NCBI reference human genome, as a step toward constructing the human pan-genome. We identified ?5 Mb of novel sequences not present in the reference genome in each of these assemblies. Most novel sequences are individual or population specific, as revealed by their comparison to

Ruiqiang Li; Yingrui Li; Hancheng Zheng; Ruibang Luo; Hongmei Zhu; Qibin Li; Wubin Qian; Yuanyuan Ren; Geng Tian; Jinxiang Li; Guangyu Zhou; Xuan Zhu; Honglong Wu; Junjie Qin; Xin Jin; Dongfang Li; Hongzhi Cao; Xueda Hu; Hélène Blanche; Howard Cann; Xiuqing Zhang; Songgang Li; Lars Bolund; Karsten Kristiansen; Huanming Yang; Jun Wang; Jian Wang

2009-01-01

149

Cactus: Algorithms for genome multiple sequence alignment  

PubMed Central

Much attention has been given to the problem of creating reliable multiple sequence alignments in a model incorporating substitutions, insertions, and deletions. Far less attention has been paid to the problem of optimizing alignments in the presence of more general rearrangement and copy number variation. Using Cactus graphs, recently introduced for representing sequence alignments, we describe two complementary algorithms for creating genomic alignments. We have implemented these algorithms in the new “Cactus” alignment program. We test Cactus using the Evolver genome evolution simulator, a comprehensive new tool for simulation, and show using these and existing simulations that Cactus significantly outperforms all of its peers. Finally, we make an empirical assessment of Cactus's ability to properly align genes and find interesting cases of intra-gene duplication within the primates.

Paten, Benedict; Earl, Dent; Nguyen, Ngan; Diekhans, Mark; Zerbino, Daniel; Haussler, David

2011-01-01

150

Finishing a whole-genome shotgun: Release 3 of the Drosophila melanogaster euchromatic genome sequence  

Microsoft Academic Search

BACKGROUND: The Drosophila melanogaster genome was the first metazoan genome to have been sequenced by the whole-genome shotgun (WGS) method. Two issues relating to this achievement were widely debated in the genomics community: how correct is the sequence with respect to base-pair (bp) accuracy and frequency of assembly errors? And, how difficult is it to bring a WGS sequence to

Susan E Celniker; David A Wheeler; Brent Kronmiller; Joseph W Carlson; Aaron Halpern; Sandeep Patel; Mark Adams; Mark Champe; Shannon P Dugan; Erwin Frise; Ann Hodgson; Reed A George; Roger A Hoskins; Todd Laverty; Donna M Muzny; Catherine R Nelson; Joanne M Pacleb; Soo Park; Barret D Pfeiffer; Stephen Richards; Erica J Sodergren; Robert Svirskas; Paul E Tabor; Kenneth Wan; Mark Stapleton; Granger G Sutton; Craig Venter; George Weinstock; Steven E Scherer; Eugene W Myers; Richard A Gibbs; Gerald M Rubin

2002-01-01

151

Draft Genome Sequence of Rubrivivax gelatinosus CBS  

SciTech Connect

Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N{sub 2} as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H{sub 2}. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

Hu, P. S.; Lang, J.; Wawrousek, K.; Yu, J. P.; Maness, P. C.; Chen, J.

2012-06-01

152

Draft Genome Sequence of Rubrivivax gelatinosus CBS  

PubMed Central

Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N2 as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H2. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium.

Hu, Pingsha; Lang, Juan; Wawrousek, Karen; Yu, Jianping; Maness, Pin-Ching

2012-01-01

153

Sequence of the Oxytricha trifallax macronuclear genome  

Microsoft Academic Search

We propose complete sequencing of the macronuclear genome of the ciliated protozoan Oxytricha trifallax (Alveolate; class Spirotrichea). Ciliates have been important experimental organisms for over 100 years, contributing to the discovery and understanding of many essential cellular processes—including self-splicing RNA, telomere biochemistry, and transcriptional regulation by histone modification—with Oxytricha representing the lineage—the spirotrichs—with the very surprising discoveries of gene- sized

Thomas G. Doak; Glenn Herrick; Laura F. Landweber; Robert B. Weiss

154

The Predictive Capacity of Personal Genome Sequencing  

PubMed Central

New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a “genometype”. A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that: (i) for 23 of the 24 diseases, the majority of individuals will receive negative test results, (ii) these negative test results will, in general, not be very informative, as the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 - 80% of that in the general population, and (iii) on the positive side, in the best-case scenario more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy makers and consumers.

Roberts, Nicholas J.; Vogelstein, Joshua T.; Parmigiani, Giovanni; Kinzler, Kenneth W.; Vogelstein, Bert; Velculescu, Victor E.

2013-01-01

155

Global Identification of Human Transcribed Sequences with Genome Tiling Arrays  

Microsoft Academic Search

Elucidating the transcribed regions of the genome constitutes a fundamental aspect of human biology, yet this remains an outstanding problem. To comprehensively identify coding sequences, we constructed a series of high-density oligonucleotide tiling arrays representing sense and antisense strands of the entire nonrepetitive sequence of the human genome. Transcribed sequences were located across the genome via hybridization to complementary DNA

Paul Bertone; Viktor Stolc; Thomas E. Royce; Joel S. Rozowsky; Alexander E. Urban; Xiaowei Zhu; John L. Rinn; Waraporn Tongprasit; Manoj Samanta; Sherman Weissman; Mark Gerstein; Michael Snyder

2004-01-01

156

Genome Sequences of Pseudomonas spp. Isolated from Cereal Crops  

PubMed Central

Compared to those of dicot-infecting bacteria, the available genome sequences of bacteria that infect wheat and barley are limited. Herein, we report the draft genome sequences of four pseudomonads originally isolated from these cereals. These genome sequences provide a useful resource for comparative analyses within the genus and for cross-kingdom analyses of plant pathogenesis.

Stiller, Jiri; Covarelli, Lorenzo; Lindeberg, Magdalen; Shivas, Roger G.; Manners, John M.

2013-01-01

157

Genome Sequences of Pseudomonas spp. Isolated from Cereal Crops.  

PubMed

Compared to those of dicot-infecting bacteria, the available genome sequences of bacteria that infect wheat and barley are limited. Herein, we report the draft genome sequences of four pseudomonads originally isolated from these cereals. These genome sequences provide a useful resource for comparative analyses within the genus and for cross-kingdom analyses of plant pathogenesis. PMID:23661484

Gardiner, Donald M; Stiller, Jiri; Covarelli, Lorenzo; Lindeberg, Magdalen; Shivas, Roger G; Manners, John M

2013-05-09

158

Genome-wide epigenetic modifications in cancer  

PubMed Central

Epigenetic alterations in cancer include changes in DNA methylation and associated histone modifications that influence the chromatin states and impact gene expression patterns. Due to recent technological advantages, the scientific community is now obtaining a better picture of the genome-wide epigenetic changes that occurs in a cancer genome. These epigenetic alterations are associated with chromosomal instability and changes in transcriptional control which influence the overall gene expression differences seen in many human malignancies. In this review, we will briefly summarize our current knowledge of the epigenetic patterns and mechanisms of gene regulation in healthy tissues and relate this to what is known for cancer genomes. Our focus will be on DNA methylation. We will review the current standing of technologies that have been developed over recent years. This field is experiencing a revolution in the strategies used to measure epigenetic alterations, which includes the incorporation of next generation sequencing tools. We also will review strategies that utilize epigenetic information for translational purposes, with a special emphasis on the potential use of DNA methylation marks for early disease detection and prognosis. The review will close with an outlook on challenges that this field is facing.

Park, Yoon Jung; Claus, Rainer; Weichenhan, Dieter; Plass, Christoph

2011-01-01

159

Identification of ancient remains through genomic sequencing.  

PubMed

Studies of ancient DNA have been hindered by the preciousness of remains, the small quantities of undamaged DNA accessible, and the limitations associated with conventional PCR amplification. In these studies, we developed and applied a genomewide adapter-mediated emulsion PCR amplification protocol for ancient mammalian samples estimated to be between 45,000 and 69,000 yr old. Using 454 Life Sciences (Roche) and Illumina sequencing (formerly Solexa sequencing) technologies, we examined over 100 megabases of DNA from amplified extracts, revealing unbiased sequence coverage with substantial amounts of nonredundant nuclear sequences from the sample sources and negligible levels of human contamination. We consistently recorded over 500-fold increases, such that nanogram quantities of starting material could be amplified to microgram quantities. Application of our protocol to a 50,000-yr-old uncharacterized bone sample that was unsuccessful in mitochondrial PCR provided sufficient nuclear sequences for comparison with extant mammals and subsequent phylogenetic classification of the remains. The combined use of emulsion PCR amplification and high-throughput sequencing allows for the generation of large quantities of DNA sequence data from ancient remains. Using such techniques, even small amounts of ancient remains with low levels of endogenous DNA preservation may yield substantial quantities of nuclear DNA, enabling novel applications of ancient DNA genomics to the investigation of extinct phyla. PMID:18426903

Blow, Matthew J; Zhang, Tao; Woyke, Tanja; Speller, Camilla F; Krivoshapkin, Andrei; Yang, Dongya Y; Derevianko, Anatoly; Rubin, Edward M

2008-04-21

160

Data structures and compression algorithms for genomic sequence data  

Microsoft Academic Search

Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function, and evolution, but also for the storage, navigation, and privacy of genomic data. Here we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and

Marty C. Brandon; Douglas C. Wallace; Pierre Baldi

2009-01-01

161

Ovarian cancer genome.  

PubMed

Ovarian cancer (OC) is a relatively frequent malignant disease with a lifetime risk approaching to approximately 1 in 70. As many as 15-25 % OC arise due to known heterozygous germ-line mutations in DNA repair genes, such as BRCA1, BRCA2, RAD51C, NBN (NBS1), BRIP, and PALB2. Sporadic ovarian cancers often phenocopy the features of BRCA1-related hereditary disease (so-called BRCAness), i.e., show biallelic somatic inactivation of the BRCA1 gene. Tumor-specific BRCA1 deficiency renders selective sensitivity of transformed cells to platinating compounds and several other anticancer drugs, which explains high response rates of OC to systemic therapies. High-throughput molecular profiling of OC is instrumental for further progress in identification of novel OC diagnostic markers as well as for the development of new OC-specific treatments. However, interpretation of the huge bulk of incoming data may present a challenge. There is a critical need in the development of bioinformatic tools capable to integrate the multiplicity of available data sets into biologically and medically meaningful pieces of knowledge. PMID:23913204

Imyanitov, Evgeny N

2013-01-01

162

International network of cancer genome projects  

PubMed Central

The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumors from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of over 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically-relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies.

2010-01-01

163

International network of cancer genome projects.  

PubMed

The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of more than 25,000 cancer genomes at the genomic, epigenomic and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies. PMID:20393554

Hudson, Thomas J; Anderson, Warwick; Artez, Axel; Barker, Anna D; Bell, Cindy; Bernabé, Rosa R; Bhan, M K; Calvo, Fabien; Eerola, Iiro; Gerhard, Daniela S; Guttmacher, Alan; Guyer, Mark; Hemsley, Fiona M; Jennings, Jennifer L; Kerr, David; Klatt, Peter; Kolar, Patrik; Kusada, Jun; Lane, David P; Laplace, Frank; Youyong, Lu; Nettekoven, Gerd; Ozenberger, Brad; Peterson, Jane; Rao, T S; Remacle, Jacques; Schafer, Alan J; Shibata, Tatsuhiro; Stratton, Michael R; Vockley, Joseph G; Watanabe, Koichi; Yang, Huanming; Yuen, Matthew M F; Knoppers, Bartha M; Bobrow, Martin; Cambon-Thomsen, Anne; Dressler, Lynn G; Dyke, Stephanie O M; Joly, Yann; Kato, Kazuto; Kennedy, Karen L; Nicolás, Pilar; Parker, Michael J; Rial-Sebbag, Emmanuelle; Romeo-Casabona, Carlos M; Shaw, Kenna M; Wallace, Susan; Wiesner, Georgia L; Zeps, Nikolajs; Lichter, Peter; Biankin, Andrew V; Chabannon, Christian; Chin, Lynda; Clément, Bruno; de Alava, Enrique; Degos, Françoise; Ferguson, Martin L; Geary, Peter; Hayes, D Neil; Hudson, Thomas J; Johns, Amber L; Kasprzyk, Arek; Nakagawa, Hidewaki; Penny, Robert; Piris, Miguel A; Sarin, Rajiv; Scarpa, Aldo; Shibata, Tatsuhiro; van de Vijver, Marc; Futreal, P Andrew; Aburatani, Hiroyuki; Bayés, Mónica; Botwell, David D L; Campbell, Peter J; Estivill, Xavier; Gerhard, Daniela S; Grimmond, Sean M; Gut, Ivo; Hirst, Martin; López-Otín, Carlos; Majumder, Partha; Marra, Marco; McPherson, John D; Nakagawa, Hidewaki; Ning, Zemin; Puente, Xose S; Ruan, Yijun; Shibata, Tatsuhiro; Stratton, Michael R; Stunnenberg, Hendrik G; Swerdlow, Harold; Velculescu, Victor E; Wilson, Richard K; Xue, Hong H; Yang, Liu; Spellman, Paul T; Bader, Gary D; Boutros, Paul C; Campbell, Peter J; Flicek, Paul; Getz, Gad; Guigó, Roderic; Guo, Guangwu; Haussler, David; Heath, Simon; Hubbard, Tim J; Jiang, Tao; Jones, Steven M; Li, Qibin; López-Bigas, Nuria; Luo, Ruibang; Muthuswamy, Lakshmi; Ouellette, B F Francis; Pearson, John V; Puente, Xose S; Quesada, Victor; Raphael, Benjamin J; Sander, Chris; Shibata, Tatsuhiro; Speed, Terence P; Stein, Lincoln D; Stuart, Joshua M; Teague, Jon W; Totoki, Yasushi; Tsunoda, Tatsuhiko; Valencia, Alfonso; Wheeler, David A; Wu, Honglong; Zhao, Shancen; Zhou, Guangyu; Stein, Lincoln D; Guigó, Roderic; Hubbard, Tim J; Joly, Yann; Jones, Steven M; Kasprzyk, Arek; Lathrop, Mark; López-Bigas, Nuria; Ouellette, B F Francis; Spellman, Paul T; Teague, Jon W; Thomas, Gilles; Valencia, Alfonso; Yoshida, Teruhiko; Kennedy, Karen L; Axton, Myles; Dyke, Stephanie O M; Futreal, P Andrew; Gerhard, Daniela S; Gunter, Chris; Guyer, Mark; Hudson, Thomas J; McPherson, John D; Miller, Linda J; Ozenberger, Brad; Shaw, Kenna M; Kasprzyk, Arek; Stein, Lincoln D; Zhang, Junjun; Haider, Syed A; Wang, Jianxin; Yung, Christina K; Cros, Anthony; Cross, Anthony; Liang, Yong; Gnaneshan, Saravanamuttu; Guberman, Jonathan; Hsu, Jack; Bobrow, Martin; Chalmers, Don R C; Hasel, Karl W; Joly, Yann; Kaan, Terry S H; Kennedy, Karen L; Knoppers, Bartha M; Lowrance, William W; Masui, Tohru; Nicolás, Pilar; Rial-Sebbag, Emmanuelle; Rodriguez, Laura Lyman; Vergely, Catherine; Yoshida, Teruhiko; Grimmond, Sean M; Biankin, Andrew V; Bowtell, David D L; Cloonan, Nicole; deFazio, Anna; Eshleman, James R; Etemadmoghadam, Dariush; Gardiner, Brooke B; Gardiner, Brooke A; Kench, James G; Scarpa, Aldo; Sutherland, Robert L; Tempero, Margaret A; Waddell, Nicola J; Wilson, Peter J; McPherson, John D; Gallinger, Steve; Tsao, Ming-Sound; Shaw, Patricia A; Petersen, Gloria M; Mukhopadhyay, Debabrata; Chin, Lynda; DePinho, Ronald A; Thayer, Sarah; Muthuswamy, Lakshmi; Shazand, Kamran; Beck, Timothy; Sam, Michelle; Timms, Lee; Ballin, Vanessa; Lu, Youyong; Ji, Jiafu; Zhang, Xiuqing; Chen, Feng; Hu, Xueda; Zhou, Guangyu; Yang, Qi; Tian, Geng; Zhang, Lianhai; Xing, Xiaofang; Li, Xianghong; Zhu, Zhenggang; Yu, Yingyan; Yu, Jun; Yang, Huanming; Lathrop, Mark; Tost, Jörg; Brennan, Paul; Holcatova, Ivana; Zaridze, David; Brazma, Alvis; Egevard, Lars; Prokhortchouk, Egor; Banks, Rosamonde Elizabeth; Uhlén, Mathias; Cambon-Thomsen, Anne; Viksna, Juris; Ponten, Fredrik; Skryabin, Konstantin; Stratton, Michael R; Futreal, P Andrew; Birney, Ewan; Borg, Ake; Børresen-Dale, Anne-Lise; Caldas, Carlos; Foekens, John A; Martin, Sancha; Reis-Filho, Jorge S; Richardson, Andrea L; Sotiriou, Christos; Stunnenberg, Hendrik G; Thoms, Giles; van de Vijver, Marc; van't Veer, Laura; Calvo, Fabien; Birnbaum, Daniel; Blanche, Hélène; Boucher, Pascal; Boyault, Sandrine; Chabannon, Christian; Gut, Ivo; Masson-Jacquemier, Jocelyne D; Lathrop, Mark; Pauporté, Iris; Pivot, Xavier; Vincent-Salomon, Anne

2010-04-15

164

Whole genome sequencing of matched primary and metastatic acral melanomas  

PubMed Central

Next generation sequencing has enabled systematic discovery of mutational spectra in cancer samples. Here, we used whole genome sequencing to characterize somatic mutations and structural variation in a primary acral melanoma and its lymph node metastasis. Our data show that the somatic mutational rates in this acral melanoma sample pair were more comparable to the rates reported in cancer genomes not associated with mutagenic exposure than in the genome of a melanoma cell line or the transcriptome of melanoma short-term cultures. Despite the perception that acral skin is sun-protected, the dominant mutational signature in these samples is compatible with damage due to ultraviolet light exposure. A nonsense mutation in ERCC5 discovered in both the primary and metastatic tumors could also have contributed to the mutational signature through accumulation of unrepaired dipyrimidine lesions. However, evidence of transcription-coupled repair was suggested by the lower mutational rate in the transcribed regions and expressed genes. The primary and the metastasis are highly similar at the level of global gene copy number alterations, loss of heterozygosity and single nucleotide variation (SNV). Furthermore, the majority of the SNVs in the primary tumor were propagated in the metastasis and one nonsynonymous coding SNV and one splice site mutation appeared to arise de novo in the metastatic lesion.

Turajlic, Samra; Furney, Simon J.; Lambros, Maryou B.; Mitsopoulos, Costas; Kozarewa, Iwanka; Geyer, Felipe C.; MacKay, Alan; Hakas, Jarle; Zvelebil, Marketa; Lord, Christopher J.; Ashworth, Alan; Thomas, Meirion; Stamp, Gordon; Larkin, James; Reis-Filho, Jorge S.; Marais, Richard

2012-01-01

165

Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence  

Microsoft Academic Search

BACKGROUND: Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS)

Frank M You; Naxin Huo; Karin R Deal; Yong Q Gu; Ming-Cheng Luo; Patrick E McGuire; Jan Dvorak; Olin D Anderson

2011-01-01

166

The Consensus Coding Sequences of Human Breast and Colorectal Cancers  

Microsoft Academic Search

The elucidation of the human genome sequence has made it possible to identify genetic alterations in cancers in unprecedented detail. To begin a systematic analysis of such alterations, we determined the sequence of well-annotated human protein-coding genes in two common tumor types. Analysis of 13,023 genes in 11 breast and 11 colorectal cancers revealed that individual tumors accumulate an average

Tobias Sjöblom; Siân Jones; Laura D. Wood; D. Williams Parsons; Jimmy Lin; Thomas D. Barber; Diana Mandelker; Rebecca J. Leary; Janine Ptak; Natalie Silliman; Steve Szabo; Phillip Buckhaults; Christopher Farrell; Paul Meeh; Sanford D. Markowitz; Joseph Willis; Dawn Dawson; James K. V. Willson; Adi F. Gazdar; James Hartigan; Leo Wu; Changsheng Liu; Giovanni Parmigiani; Ben Ho Park; Kurtis E. Bachman; Nickolas Papadopoulos; Bert Vogelstein; Kenneth W. Kinzler; Victor E. Velculescu

2006-01-01

167

GAIA: framework annotation of genomic sequence.  

PubMed

As increasing amounts of genomic sequence from many organisms become available, and as DNA sequences become a primary reagent in biologic investigations, the role of annotation as a prospective guide for laboratory experiments will expand rapidly. Here we describe a process of high-throughput, reliable annotation, called framework annotation, which is designed to provide a foundation for initial biologic characterization of previously unexamined sequence. To examine this concept in practice, we have constructed Genome Annotation and Information Analysis (GAIA), a prototype software architecture that implements several elements important for framework annotation. The center of GAIA consists of an annotation database and the associated data management subsystem that forms the software bus along which other components communicate. The schema for this database defines three principal concepts: (1) Entries, consisting of sequence and associated historical data; (2) Features, comprising information of biologic interest; and (3) Experiments, describing the evidence that supports Features. The database permits tracking of annotation results over time, as well as assessment of the reliability of particular results. New framework annotation is produced by CARTA, a set of autonomous sensors that perform automatic analyses and assert results into the annotation database. These results are available via a Web-based query interface that uses graphical Java applets as well as text-based HTML pages to display data at different levels of resolution and permit interactive exploration of annotation. We present results for initial application of framework annotation to a set of test sequences, demonstrating its effectiveness in providing a starting point for biologic investigation, and discuss ways in which the current prototype can be improved. The prototype is available for public use and comment at http://www.cbil.upenn.edu/gaia. PMID:9521927

Bailey, L C; Fischer, S; Schug, J; Crabtree, J; Gibson, M; Overton, G C

1998-03-01

168

Second Generation Sequencing of the Mesothelioma Tumor Genome  

PubMed Central

The current paradigm for elucidating the molecular etiology of cancers relies on the interrogation of small numbers of genes, which limits the scope of investigation. Emerging second-generation massively parallel DNA sequencing technologies have enabled more precise definition of the cancer genome on a global scale. We examined the genome of a human primary malignant pleural mesothelioma (MPM) tumor and matched normal tissue by using a combination of sequencing-by-synthesis and pyrosequencing methodologies to a 9.6X depth of coverage. Read density analysis uncovered significant aneuploidy and numerous rearrangements. Method-dependent informatics rules, which combined the results of different sequencing platforms, were developed to identify and validate candidate mutations of multiple types. Many more tumor-specific rearrangements than point mutations were uncovered at this depth of sequencing, resulting in novel, large-scale, inter- and intra-chromosomal deletions, inversions, and translocations. Nearly all candidate point mutations appeared to be previously unknown SNPs. Thirty tumor-specific fusions/translocations were independently validated with PCR and Sanger sequencing. Of these, 15 represented disrupted gene-encoding regions, including kinases, transcription factors, and growth factors. One large deletion in DPP10 resulted in altered transcription and expression of DPP10 transcripts in a set of 53 additional MPM tumors correlated with survival. Additionally, three point mutations were observed in the coding regions of NKX6-2, a transcription regulator, and NFRKB, a DNA-binding protein involved in modulating NFKB1. Several regions containing genes such as PCBD2 and DHFR, which are involved in growth factor signaling and nucleotide synthesis, respectively, were selectively amplified in the tumor. Second-generation sequencing uncovered all types of mutations in this MPM tumor, with DNA rearrangements representing the dominant type.

Bueno, Raphael; De Rienzo, Assunta; Dong, Lingsheng; Gordon, Gavin J.; Hercus, Colin F.; Richards, William G.; Jensen, Roderick V.; Anwar, Arif; Maulik, Gautam; Chirieac, Lucian R.; Ho, Kim-Fong; Taillon, Bruce E.; Turcotte, Cynthia L.; Hercus, Robert G.; Gullans, Steven R.; Sugarbaker, David J.

2010-01-01

169

Aligning Two Genomic Sequences That Contain Duplications  

NASA Astrophysics Data System (ADS)

It is difficult to properly align genomic sequences that contain intra-species duplications. With this goal in mind, we have developed a tool, called TOAST (two-way orthologous alignment selection tool), for predicting whether two aligned regions from different species are orthologous, i.e., separated by a speciation event, as opposed to a duplication event. The advantage of restricting alignment to orthologous pairs is that they constitute the aligning regions that are most likely to share the same biological function, and most easily analyzed for evidence of selection. We evaluate TOAST on 12 human/mouse gene clusters.

Hou, Minmei; Riemer, Cathy; Berman, Piotr; Hardison, Ross C.; Miller, Webb

170

Ten years of bacterial genome sequencing: comparative-genomics-based discoveries  

Microsoft Academic Search

It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: “What have we learned from this vast amount of

Tim T. Binnewies; Yair Motro; Peter F. Hallin; Ole Lund; David Dunn; Tom La; David J. Hampson; Matthew Bellgard; Trudy M. Wassenaar; David W. Ussery

2006-01-01

171

Initial sequencing and comparative analysis of the mouse genome.  

PubMed

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism. PMID:12466850

Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

2002-12-01

172

Initial sequencing and comparative analysis of the mouse genome  

SciTech Connect

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

2002-12-15

173

Research ethics and the challenge of whole-genome sequencing  

PubMed Central

The recent completion of the first two individual whole-genome sequences is a research milestone. As personal genome research advances, investigators and international research bodies must ensure ethical research conduct. We identify three major ethical considerations that have been implicated in whole-genome research: the return of research results to participants; the obligations, if any, that are owed to participants’ relatives; and the future use of samples and data taken for whole-genome sequencing. Although the issues are not new, we discuss their implications for personal genomics and provide recommendations for appropriate management in the context of research involving individual whole-genome sequencing.

McGuire, Amy L.; Caulfield, Timothy; Cho, Mildred K.

2008-01-01

174

Complete Genome Sequence of Serratia plymuthica Bacteriophage ?MAM1  

PubMed Central

A virulent bacteriophage (?MAM1) that infects Serratia plymuthica was isolated from the natural environment and characterized. Genomic sequence analysis revealed a circular double-stranded DNA sequence of 157,834 bp, encoding 198 proteins and 3 tRNAs. The ?MAM1 genome shows high homology to previously reported ViI-like enterobacterial bacteriophage genomes.

Matilla, Miguel A.

2012-01-01

175

A snapshot of the emerging tomato genome sequence  

Microsoft Academic Search

The genome of tomato (Solanum lycopersicum L.) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy, and the United States) as part of the larger “International Solanaceae Genome Project (SOL): Systems Approach to Diversity and Adaptation” initiative. The tomato genome sequencing project uses an ordered bacterial artificial

L. A. Mueller; R. M. Klein Lankhorst; S. D. Tanksley; R. M. Peters; Staveren van M. J; E. Datema; M. W. E. J. Fiers; Ham van R. C. H. J; D. Szinay; Jong de J. H. S. G. M; N. Menda; I. Y. Tecle; A. Bombarely; S. Stack; S. M. Royer; S.-B. Chang; L. A. Shearer; B. D. Kim; S.-H. Jo; C.-G. Hur; D. Choi; C.-B. Li; J. Zhao; H. Jiang; Y. Geng; Y. Dai; H. Fan; J. Chen; F. Lu; J. Shi; S. Sun; X. Yang; C. Lu; M. Chen; Z. Cheng; H. Ling; Y. Xue; Y. Wang; G. B. Seymour; G. J. Bishop; G. Bryan; J. Rogers; S. Sims; S. Butcher; D. Buchan; J. Abbott; H. Beasley; C. Nicholson; C. Riddle; S. Humphray; K. McLaren; S. Mathur; S. Vyas; A. U. Solanke; R. Kumar; V. Gupta; A. K. Sharma; P. Khurana; J. P. Khurana; A. Tyagi; Sarita; P. Chowdhury; S. Shridhar; D. Chattopadhyay; A. Pandit; P. Singh; A. Kumar; R. Dixit; A. Singh; S. Praveen; V. Dalal; M. Yadav; I. A. Ghazi; K. Gaikwad; T. R. Sharma; T. Mohapatra; N. K. Singh; H. de Jong; S. Peters; M. van Staveren; R. C. H. J. van Ham; P. Lindhout; M. Philippot; P. Frasse; F. Regad; M. Zouine; M. Bouzayen; E. Asamizu; S. Sato; H. Fukuoka; S. Tabata; D. Shibata; M. A. Botella; M. Perez-Alonso; V. Fernandez-Pedrosa; S. Osorio; A. Mico; A. Granell; Z. Zhang; J. He; S. Huang; Y. Du; D. Qu; L. Liu; D. Liu; J. Wang; Z. Ye; W. Yang; G. Wang; A. Vezzi; S. Todesco; G. Valle; G. Falcone; M. Pietrella; G. Giuliano; S. Grandillo; A. Traini; N. D'Agostino; M. L. Chiusano; M. Ercolano; A. Barone; L. Frusciante; H. Schoof; A. Jocker; R. Bruggmann; M. Spannagl; K. X. F. Mayer; R. Guigo; F. Camara; S. Rombauts; J. A. Fawcett; Y. Van de Peer; S. Knapp; D. Zamir; W. Stiekema

2009-01-01

176

Genome sequence of Pediococcus pentosaceus strain IE-3.  

PubMed

We report the 1.8-Mb genome sequence of Pediococcus pentosaceus strain IE-3, isolated from a dairy effluent sample. The whole-genome sequence of this strain will aid in comparative genomics of Pediococcus pentosaceus strains of diverse ecological origins and their biotechnological applications. PMID:22843596

Midha, Samriti; Ranjan, Manish; Sharma, Vikas; Kumari, Annu; Singh, Pradip Kumar; Korpole, Suresh; Patil, Prabhu B

2012-08-01

177

Complete genome sequence of the alkaliphilic bacterium Bacillus halodurans and genomic sequence comparison with Bacillus subtilis  

Microsoft Academic Search

The 4 202 353 bp genome of the alkaliphilic bacterium Bacillus halodurans C-125 contains 4066 predicted protein coding sequences (CDSs), 2141 (52.7%) of which have functional assignments, 1182 (29%) of which are conserved CDSs with unknown function and 743 (18.3%) of which have no match to any protein database. Among the total CDSs, 8.8% match sequences of proteins found only

Hideto Takami; Kaoru Nakasone; Yoshihiro Takaki; Go Maeno; Rumie Sasaki; Noriaki Masui; Fumie Fuji; Chie Hirama; Yuka Nakamura; Naotake Ogasawara; Satoru Kuhara; Koki Horikoshi

2000-01-01

178

Evaluation of Genome Sequencing Quality in Selected Plant Species Using Expressed Sequence Tags  

PubMed Central

Background With the completion of genome sequencing projects for more than 30 plant species, large volumes of genome sequences have been produced and stored in online databases. Advancements in sequencing technologies have reduced the cost and time of whole genome sequencing enabling more and more plants to be subjected to genome sequencing. Despite this, genome sequence qualities of multiple plants have not been evaluated. Methodology/Principal Finding Integrity and accuracy were calculated to evaluate the genome sequence quality of 32 plants. The integrity of a genome sequence is presented by the ratio of chromosome size and genome size (or between scaffold size and genome size), which ranged from 55.31% to nearly 100%. The accuracy of genome sequence was presented by the ratio between matched EST and selected ESTs where 52.93% ? 98.28% and 89.02% ? 98.85% of the randomly selected clean ESTs could be mapped to chromosome and scaffold sequences, respectively. According to the integrity, accuracy and other analysis of each plant species, thirteen plant species were divided into four levels. Arabidopsis thaliana, Oryza sativa and Zea mays had the highest quality, followed by Brachypodium distachyon, Populus trichocarpa, Vitis vinifera and Glycine max, Sorghum bicolor, Solanum lycopersicum and Fragaria vesca, and Lotus japonicus, Medicago truncatula and Malus × domestica in that order. Assembling the scaffold sequences into chromosome sequences should be the primary task for the remaining nineteen species. Low GC content and repeat DNA influences genome sequence assembly. Conclusion The quality of plant genome sequences was found to be lower than envisaged and thus the rapid development of genome sequencing projects as well as research on bioinformatics tools and the algorithms of genome sequence assembly should provide increased processing and correction of genome sequences that have already been published.

Shangguan, Lingfei; Han, Jian; Kayesh, Emrul; Sun, Xin; Zhang, Changqing; Pervaiz, Tariq; Wen, Xicheng; Fang, Jinggui

2013-01-01

179

Signatures of mutation and selection in the cancer genome  

PubMed Central

The cancer genome is moulded by the dual processes of somatic mutation and selection. Homozygous deletions in cancer genomes occur over recessive cancer genes, where they can confer selective growth advantage, and over fragile sites, where they are thought to reflect an increased local rate of DNA breakage. However, most homozygous deletions in cancer genomes are unexplained. Here we identified 2,428 somatic homozygous deletions in 746 cancer cell lines. These overlie 11% of protein-coding genes that, therefore, are not mandatory for survival of human cells. We derived structural signatures that distinguish between homozygous deletions over recessive cancer genes and fragile sites. Application to clusters of unexplained homozygous deletions suggests that many are in regions of inherent fragility, whereas a small subset overlies recessive cancer genes. The results illustrate how structural signatures can be used to distinguish between the influences of mutation and selection in cancer genomes. The extensive copy number, genotyping, sequence and expression data available for this large series of publicly available cancer cell lines renders them informative reagents for future studies of cancer biology and drug discovery.

Bignell, Graham R.; Greenman, Chris D.; Davies, Helen; Butler, Adam P.; Edkins, Sarah; Andrews, Jenny M.; Buck, Gemma; Chen, Lina; Beare, David; Latimer, Calli; Widaa, Sara; Hinton, Jonathon; Fahey, Ciara; Fu, Beiyuan; Swamy, Sajani; Dalgliesh, Gillian L.; Teh, Bin T.; Deloukas, Panos; Yang, Fengtang; Campbell, Peter J.; Futreal, P. Andrew; Stratton, Michael R.

2011-01-01

180

[Progress on whole genome sequencing in woody plants].  

PubMed

In recent years, the number of sequencing data of plant whole genome have been increasing rapidly and the whole genome sequencing has been also performed widely in woody plants. However, there are a set of obstacles in investigating the whole genome sequencing in woody plants, which include larger genome, complex genome structure, limitations of assembly, annotation, functional analysis, and restriction of the funds for scientific research. Therefore, to promote the efficiency of the whole genome sequencing in woody plants, the development and defect of this field should be analyzed. The three-generation sequencing technologies (i.e., Sanger sequencing, synthesis sequencing, and single molecule sequencing) were compared in our studies. The progress mainly focused on the whole genome sequencing in four woody plants (Populus, Grapevine, Papaya, and Apple), and the application of sequencing results also was analyzed. The future of whole genome sequencing research in woody plants, consisting of material selection, establishment of genetic map and physical map, selection of sequencing technology, bioinformatic analysis, and application of sequencing results, was discussed. PMID:22382056

Shi, Ji-Sen; Wang, Zhan-Jun; Chen, Jin-Hui

2012-02-01

181

Detecting long tandem duplications in genomic sequences  

PubMed Central

Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS?

2012-01-01

182

Methods for Obtaining and Analyzing Whole Chloroplast Genome Sequences  

Microsoft Academic Search

During the past decade, there has been a rapid increase in our understanding of plastid genome organization and evolution due to the availability of many new completely sequenced genomes. There are 45 complete genomes published and ongoing projects are likely to increase this sampling to nearly 200 genomes during the next 5 years. Several groups of researchers including ours have

Robert K. Jansen; Linda A. Raubeson; Jeffrey L. Boore; Claude W. dePamphilis; Timothy W. Chumley; Rosemarie C. Haberle; Stacia K. Wyman; Andrew J. Alverson; Rhiannon Peery; Sallie J. Herman; H. Matthew Fourcade; Jennifer V. Kuehl; Joel R. McNeal; James Leebens-Mack; Liying Cui

2005-01-01

183

Genomic Sequence Comparisons, 1987-2003 Final Report  

SciTech Connect

This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

George M. Church

2004-07-29

184

Whole-Genome Sequence Analysis of Human Papillomavirus Type 18 from Infected Thai Women  

Microsoft Academic Search

Objective: The aim of this study was to attain molecular knowledge of human papillomavirus type 18 (HPV18) by sequencing the whole genome of HPV18 isolated from Thai women at various clinical stages of disease progression. Method: Our group analyzed 9 samples of whole-genome HPV18 in infected women ranging from normal to cervical cancer by PCR, a sequencing method and bioinformatics

Woradee Lurchachaiwong; Pairoj Junyangdikul; Wichai Termrungruanglert; Sunchai Payungporn; Pichet Sampatanukul; Damrong Tresukosol; Somchai Niruthisard; Prasert Trivijitsilp; Anant Karalak; Sukumarn Swangvaree; Yong Poovorawan

2010-01-01

185

Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain  

PubMed Central

This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources.

Singh, Pallavi; Springman, A. Cody; Davies, H. Dele

2012-01-01

186

Whole Genome Sequencing and Evolutionary Analysis of Human Papillomavirus Type 16 in Central China  

Microsoft Academic Search

Human papillomavirus type 16 plays a critical role in the neoplastic transformation of cervical cancers. Molecular variants of HPV16 existing in different ethnic groups have shown substantial phenotypic differences in pathogenicity, immunogenicity and tumorigenicity. In this study, we sequenced the entire HPV16 genome of 76 isolates originated from Anyang, central China. Phylogenetic analysis of these sequences identified two major variants

Min Sun; Lei Gao; Ying Liu; Yiqiang Zhao; Xueqian Wang; Yaqi Pan; Tao Ning; Hong Cai; Haijun Yang; Weiwei Zhai; Yang Ke

2012-01-01

187

Two genome sequences of the same bacterial strain, Gluconacetobacter diazotrophicus PAl 5, suggest a new standard in genome sequence submission  

PubMed Central

Gluconacetobacter diazotrophicus PAl 5 is of agricultural significance due to its ability to provide fixed nitrogen to plants. Consequently, its genome sequence has been eagerly anticipated to enhance understanding of endophytic nitrogen fixation. Two groups have sequenced the PAl 5 genome from the same source (ATCC 49037), though the resulting sequences contain a surprisingly high number of differences. Therefore, an optical map of PAl 5 was constructed in order to determine which genome assembly more closely resembles the chromosomal DNA by aligning each sequence against a physical map of the genome. While one sequence aligned very well, over 98% of the second sequence contained numerous rearrangements. The many differences observed between these two genome sequences could be owing to either assembly errors or rapid evolutionary divergence. The extent of the differences derived from sequence assembly errors could be assessed if the raw sequencing reads were provided by both genome centers at the time of genome sequence submission. Hence, a new genome sequence standard is proposed whereby the investigator supplies the raw reads along with the closed sequence so that the community can make more accurate judgments on whether differences observed in a single stain may be of biological origin or are simply caused by differences in genome assembly procedures.

Giongo, Adriana; Tyler, Heather L.; Zipperer, Ursula N.; Triplett, Eric W.

2010-01-01

188

The genome sequence of Podospora anserina, a classic model fungus  

PubMed Central

The completed genome sequence of the coprophilous fungus Podospora anserina increases the sampling of fungal genomes. In line with its habitat of herbivore dung, this ascomycete has an exceptionally rich gene set devoted to the catabolism of complex carbohydrates.

Paoletti, Mathieu; Saupe, Sven J

2008-01-01

189

Next Generation Sequencing at the University of Chicago Genomics Core  

ScienceCinema

The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

190

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried, near Munich, Germany, continues its longstanding tradition to develop and maintain high quality curated genome databases. In addition, efforts have been intensified to cover the wealth of complete genome sequences in a systematic, comprehensive form. Bioinformatics, supporting national as well as European sequencing and functional analysis projects, has resulted in several

Hans-werner Mewes; Dmitrij Frishman; Christian Gruber; Birgitta Geier; Dirk Haase; Andreas Kaps; Kai Lemcke; Gertrud Mannhaupt; Friedhelm Pfeiffer; Christine M. Schüller; S. Stocker; B. Weil

2000-01-01

191

Rapid genome sequencing with short universal tiling probes  

Microsoft Academic Search

The increasing availability of high-quality reference genomic sequences has created a demand for ways to survey the sequence differences present in individual genomes. Here we describe a DNA sequencing method based on hybridization of a universal panel of tiling probes. Millions of shotgun fragments are amplified in situ and subjected to sequential hybridization with short fluorescent probes. Long fragments of

Arno Pihlak; Göran Baurén; Ellef Hersoug; Peter Lönnerberg; Ats Metsis; Sten Linnarsson

2008-01-01

192

An update and lessons from whole-genome sequencing projects  

Microsoft Academic Search

A number of prokaryotic and eukaryotic genomes are currently being sequenced. Already, the nucleotide sequences of four yeast chromosomes and of 2.2 Mb from Caenorhabditis elegans have been reported. Human genomic sequences have also been used in comparative studies with both mouse and Fugu rubripes.

Steven JM Jones

1995-01-01

193

Next-generation sequencing and potential applications in fungal genomics.  

PubMed

Since the first fungal genome was sequenced in 1996, sequencing technologies have advanced dramatically. In recent years, it has become possible to cost-effectively generate vast amounts of DNA sequence data using a number of cell- and electrophoresis-free sequencing technologies, commonly known as "next" or "second" generation. In this chapter, we present a brief overview of next-generation sequencers that are commercially available now. Their potential applications in fungal genomics studies are discussed. PMID:21590412

Sanmiguel, Phillip

2011-01-01

194

Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)  

SciTech Connect

Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Yasawong, Montri [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Teshima, Hazuki [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Tapia, Roxanne [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

195

Complete genome sequence of Streptosporangium roseum type strain (NI 9100).  

PubMed

Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The 'pinkish coiled Streptomyces-like organism with a spore case' was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304675

Nolan, Matt; Sikorski, Johannes; Jando, Marlen; Lucas, Susan; Lapidus, Alla; Glavina Del Rio, Tijana; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Chertkov, Olga; Sims, David; Meincke, Linda; Brettin, Thomas; Han, Cliff; Detter, John C; Bruce, David; Goodwin, Lynne; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

2010-01-28

196

Applications of next-generation sequencing technologies in functional genomics  

Microsoft Academic Search

A new generation of sequencing technologies, from Illumina\\/Solexa, ABI\\/SOLiD, 454\\/Roche, and Helicos, has provided unprecedented opportunities for high-throughput functional genomic research. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted resequencing, discovery of transcription factor binding sites, and noncoding RNA expression profiling. This review discusses applications of next-generation sequencing technologies in functional genomics

Olena Morozova; Marco A. Marra

2008-01-01

197

Genome-Scale Validation of Deep-Sequencing Libraries  

Microsoft Academic Search

Chromatin immunoprecipitation followed by high-throughput (HTP) sequencing (ChIP-seq) is a powerful tool to establish protein-DNA interactions genome-wide. The primary limitation of its broad application at present is the often-limited access to sequencers. Here we report a protocol, Mab-seq, that generates genome-scale quality evaluations for nucleic acid libraries intended for deep-sequencing. We show how commercially available genomic microarrays can be used

Dominic Schmidt; Rory Stark; Michael D. Wilson; Gordon D. Brown; Duncan T. Odom; Jürg Bähler

2008-01-01

198

Ancient human genome sequence of an extinct Palaeo-Eskimo  

Microsoft Academic Search

We report here the genome sequence of an ancient human. Obtained from ~4,000-year-old permafrost-preserved hair, the genome represents a male individual from the first known culture to settle in Greenland. Sequenced to an average depth of 20×, we recover 79% of the diploid genome, an amount close to the practical limit of current sequencing technologies. We identify 353,151 high-confidence single-nucleotide

Morten Rasmussen; Yingrui Li; Stinus Lindgreen; Jakob Skou Pedersen; Anders Albrechtsen; Ida Moltke; Mait Metspalu; Ene Metspalu; Toomas Kivisild; Ramneek Gupta; Marcelo Bertalan; Kasper Nielsen; M. Thomas; P. Gilbert; Yong Wang; Maanasa Raghavan; Paula F. Campos; Hanne Munkholm Kamp; Andrew S. Wilson; Andrew Gledhill; Silvana Tridico; Michael Bunce; Eline D. Lorenzen; Jonas Binladen; Xiaosen Guo; Jing Zhao; Xiuqing Zhang; Hao Zhang; Tracey L. Pierre; Morten Meldgaard; Sardana A. Fedorova; Ludmila P. Osipova; Thomas F. G. Higham; Christopher Bronk; Finn C. Nielsen; Michael H. Crawford; Søren Brunak; Thomas Sicheritz-Ponten; Richard Villems; Rasmus Nielsen; Anders Krogh; Jun Wang; Eske Willerslev

2010-01-01

199

Synergy between sequence and size in Large-scale genomics  

Microsoft Academic Search

Until recently the study of individual DNA sequences and of total DNA content (the C-value) sat at opposite ends of the spectrum in genome biology. For gene sequencers, the vast stretches of non-coding DNA found in eukaryotic genomes were largely considered to be an annoyance, whereas genome-size researchers attributed little relevance to specific nucleotide sequences. However, the dawn of comprehensive

T. Ryan Gregory

2005-01-01

200

Genome sequencing and analysis of the biomass-degrading fungus ...  

Treesearch

Title: Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. ... Keywords: Cellulase, microbial metabolism, regulation, enzymes, industrial applications, biotechnology, gene expression, chemical ...

201

Complete genome sequence of Allochromatium vinosum DSM 180T  

PubMed Central

Allochromatium vinosum formerly Chromatium vinosum is a mesophilic purple sulfur bacterium belonging to the family Chromatiaceae in the bacterial class Gammaproteobacteria. The genus Allochromatium contains currently five species. All members were isolated from freshwater, brackish water or marine habitats and are predominately obligate phototrophs. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the Chromatiaceae within the purple sulfur bacteria thriving in globally occurring habitats. The 3,669,074 bp genome with its 3,302 protein-coding and 64 RNA genes was sequenced within the Joint Genome Institute Community Sequencing Program.

Weissgerber, Thomas; Zigann, Renate; Bruce, David; Chang, Yun-juan; Detter, John C.; Han, Cliff; Hauser, Loren; Jeffries, Cynthia D.; Land, Miriam; Munk, A. Christine; Tapia, Roxanne; Dahl, Christiane

2011-01-01

202

A sequence-based survey of the complex structural organization of tumor genomes  

SciTech Connect

The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

2008-04-03

203

Insight into the heterogeneity of breast cancer through next-generation sequencing.  

PubMed

Rapid and sophisticated improvements in molecular analysis have allowed us to sequence whole human genomes as well as cancer genomes, and the findings suggest that we may be approaching the ability to individualize the diagnosis and treatment of cancer. This paradigmatic shift in approach will require clinicians and researchers to overcome several challenges including the huge spectrum of tumor types within a given cancer, as well as the cell-to-cell variations observed within tumors. This review discusses how next-generation sequencing of breast cancer genomes already reveals insight into tumor heterogeneity and how it can contribute to future breast cancer classification and management. PMID:21965338

Russnes, Hege G; Navin, Nicholas; Hicks, James; Borresen-Dale, Anne-Lise

2011-10-03

204

Insight into the heterogeneity of breast cancer through next-generation sequencing  

PubMed Central

Rapid and sophisticated improvements in molecular analysis have allowed us to sequence whole human genomes as well as cancer genomes, and the findings suggest that we may be approaching the ability to individualize the diagnosis and treatment of cancer. This paradigmatic shift in approach will require clinicians and researchers to overcome several challenges including the huge spectrum of tumor types within a given cancer, as well as the cell-to-cell variations observed within tumors. This review discusses how next-generation sequencing of breast cancer genomes already reveals insight into tumor heterogeneity and how it can contribute to future breast cancer classification and management.

Russnes, Hege G.; Navin, Nicholas; Hicks, James; Borresen-Dale, Anne-Lise

2011-01-01

205

Genome Sequence of Lactobacillus plantarum Strain UCMA 3037  

PubMed Central

Nucleic acid of the strain Lactobacillus plantarum UCMA 3037, isolated from raw milk camembert cheese in our laboratory, was sequenced. We present its draft genome sequence with the aim of studying its functional properties and relationship to the cheese ecosystem.

Naz, Saima; Tareb, Raouf; Bernardeau, Marion; Vaisse, Melissa; Lucchetti-Miganeh, Celine; Rechenmann, Mathias

2013-01-01

206

Ultra-high Throughput Sequencing and Genomics in CDRH  

Center for Biologics Evaluation and Research (CBER)

Text VersionPage 1. Ultra-high Throughput Sequencing and Genomics in CDRH ... Page 3. Ultra-high Throughput Sequencing* • Informal scientific meetings with ... More results from www.fda.gov/downloads/advisorycommittees/committeesmeetingmaterials

207

Toward a Comprehensive Genomic Analysis of Cancer  

Cancer.gov

The National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) convened a "Toward a Comprehensive Genomic Analysis of Cancer" workshop in Washington, D.C. This workshop brought together physicians, basic scientists and other members of the U.S. and international cancer communities to assist in outlining the most effective strategies for the development of a successful project. Information about this workshop is reported in the Executive Summary.

208

The $1000 Genome: Ethical and Legal Issues in Whole Genome Sequencing of Individuals  

Microsoft Academic Search

Progress in gene sequencing could make rapid whole genome sequencing of individuals affordable to millions of persons and useful for many purposes in a future era of genomic medicine. Using the idea of $1000 genome as a focus, this article reviews the main technical, ethical, and legal issues that must be resolved to make mass genotyping of individuals cost-effective and

John A. Robertson

2003-01-01

209

Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome  

Microsoft Academic Search

Background: It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results: We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D.

Casey M Bergman; Barret D Pfeiffer; Diego E Rincón-Limas; Roger A Hoskins; Andreas Gnirke; Chris J Mungall; Adrienne M Wang; Brent Kronmiller; Joanne Pacleb; Soo Park; Mark Stapleton; Kenneth Wan; Reed A George; Pieter J de Jong; Juan Botas; Gerald M Rubin; Susan E Celniker

2002-01-01

210

The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags  

Microsoft Academic Search

Whereas genome sequencing defines the genetic potential of an organism, transcript sequencing defines the utilization of this potential and links the genome with most areas of biology. To exploit the information within the human genome in the fight against cancer, we have deposited some two million expressed sequence tags (ESTs) from human tumors and their corresponding normal tissues in the

Helena Brentani; Otávia L. Caballero; Anamaria A. Camargo; Aline M. da Silva; Wilson Araújo da Silva Jr.; Emmanuel Dias Neto; Marco Grivet; Arthur Gruber; Pedro Edson Moreira Guimaraes; Winston Hide; Christian Iseli; C. Victor Jongeneel; Janet Kelso; Maria Aparecida Nagai; Elida Paula Benquique Ojopi; Elisson C. Osorio; Eduardo M. R. Reis; Gregory J. Riggins; Andrew John George Simpson; Sandro de Souza; Brian J. Stevenson; Robert L. Strausberg; Eloiza H. Tajara; Sergio Verjovski-Almeida; Marcio Luis Acencio; Mário Henrique Bengtson; Fabiana Bettoni; Walter F. Bodmer; Marcelo R. S. Briones; Luiz Paulo Camargo; Webster Cavenee; Janete M. Cerutti; Luís Eduardo Coelho Andrade; Paulo César Costa Dos Santos; Maria Cristina Ramos Costa; Israel Tojal da Silva; Marcos Roberto H. Estécio; Karine Sa Ferreira; Frank B. Furnari; Milton Faria Jr.; Pedro A. F. Galante; Gustavo S. Guimaraes; Adriano Jesus Holanda; Edna Teruko Kimura; Maarten R. Leerkes; Xin Lu; Rui M. B. Maciel; Elizabeth A. L. Martins; Katlin Brauer Massirer; Analy S. A. Melo; Carlos Alberto Mestriner; Elisabete Cristina Miracca; Leandro Lorenco Miranda; Francisco G. Nobrega; Paulo S. Oliveira; Apuã C. M. Paquola; José Rodrigo C. Pandolfi; Maria Inês de Moura Campos Pardini; Fabio Passetti; John Quackenbush; Beatriz Schnabel; Mari Cleide Sogayar; Jorge E. Souza; Sandro R. Valentini; Andre C. Zaiats; Elisabete Jorge Amaral; Liliane A. T. Arnaldi; Amélia Goes de Araújo; Simone Aparecida de Bessa; David C. Bicknell; Maria Eugenia Ribeiro de Camaro; Dirce Maria Carraro; Helaine Carrer; Alex F. Carvalho; Christian Colin; Fernando Costa; Cyntia Curcio; Ismael Dale Cotrim Guerreiro da Silva; Neusa Pereira da Silva; Márcia Dellamano; Hamza El-Dorry; Enilza Maria Espreafico; Ari José Scattone Ferreira; Cristiane Ayres Ferreira; Maria Angela H. Z. Fortes; Angelita Habr Gama; Daniel Giannella-Neto; Maria Lúcia C. C. Giannella; Ricardo R. Giorgi; Gustavo Henrique Goldman; Maria Helena S. Goldman; Christine Hackel; Paulo Lee Ho; Elza Myiuki Kimura; Luiz Paulo Kowalski; Jose E. Krieger; Luciana C. C. Leite; Ademar Lopes; Ana Mercedes S. C. Luna; Alan Mackay; Suely Kazue Nagahashi Mari; Adriana Aparecida Marques; Waleska K. Martins; André Montagnini; Mario Mourão Neto; Ana Lucia T. O. Nascimento; A. Munro Neville; Marina P. Nobrega; Mike J. O'Hare; Audrey Yumi Otsuka; Anna Izabel Ruas de Melo; Maria Luisa Paçó-Larson; Gonçalo Guimarães Pereira; João Bosco Pesquero; Juliana Gilbert Pessoa; Paula Rahal; Claudia Aparecida Rainho; Vanderlei Rodrigues; Silvia Regina Rogatto; Camila Malta Romano; Janaína Gusmão Romeiro; Benedito Mauro Rossi; Monica Rusticci; Renata Guerra de Sá; Simone Cristina Sant' Anna; Míriam L. Sarmazo; Teresa Cristina De Lima E. Silva; Fernando Augusto Soares; Maria de Fátima Sonati; Josane de Freitas Sousa; Diana Queiroz; Valéria Valente; André Luiz Vettore; Fabiola Elizabeth Villanova; Marco Antonio Zago; Heloisa Zalcberg

2003-01-01

211

De Novo Next Generation Sequencing of Plant Genomes  

Microsoft Academic Search

The genome sequencing of all major food and bioenergy crops is of critical importance in the race to improve crop production\\u000a to meet the future food and energy security needs of the world. Next generation sequencing technologies have brought about\\u000a great improvements in sequencing throughput and cost, but do not yet allow for de novo sequencing of large repetitive genomes

Steve Rounsley; Pradeep Reddy Marri; Yeisoo Yu; Ruifeng He; Nick Sisneros; Jose Luis Goicoechea; So Jeong Lee; Angelina Angelova; Dave Kudrna; Meizhong Luo; Jason Affourtit; Brian Desany; James Knight; Faheem Niazi; Michael Egholm; Rod A. Wing

2009-01-01

212

Genome Project Standards in a New Era of Sequencing  

SciTech Connect

For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the quality of the genome sequence, based on our collective understanding of the different technologies, available assemblers, and the varied efforts to improve upon drafted genomes. Due to the increasingly rapid pace of genomics we avoided the use of rigid numerical thresholds in our definitions to take into account the types of products achieved by any combination of technology, chemistry, assembler, or improvement/finishing process.

GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

2009-06-01

213

Finishing The Euchromatic Sequence Of The Human Genome  

SciTech Connect

The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

2004-09-07

214

Whole-genome sequencing and variant discovery in C. elegans  

Microsoft Academic Search

Massively parallel sequencing instruments enable rapid and inexpensive DNA sequence data production. Because these instruments are new, their data require characterization with respect to accuracy and utility. To address this, we sequenced a Caernohabditis elegans N2 Bristol strain isolate using the Solexa Sequence Analyzer, and compared the reads to the reference genome to characterize the data and to evaluate coverage

LaDeana W Hillier; Gabor T Marth; Aaron R Quinlan; David Dooling; Ginger Fewell; Derek Barnett; Paul Fox; Jarret I Glasscock; Matthew Hickenbotham; Weichun Huang; Vincent J Magrini; Ryan J Richt; Sacha N Sander; Donald A Stewart; Michael Stromberg; Eric F Tsung; Todd Wylie; Tim Schedl; Richard K Wilson; Elaine R Mardis

2008-01-01

215

Validation of rice genome sequence by optical mapping  

PubMed Central

Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences revealed by optical maps constructed from a broad range of rice subspecies and varieties.

Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

2007-01-01

216

Mapping and sequencing complex genomes: let's get physical!  

Microsoft Academic Search

Physical maps provide an essential framework for ordering and joining sequence data, genetically mapped markers and large-insert clones in eukaryotic genome projects. A good physical map is also an important resource for cloning specific genes of interest, comparing genomes, and understanding the size and complexity of a genome. Although physical maps are usually taken at face value, a good deal

Blake C. Meyers; Simone Scalabrin; Michele Morgante

2004-01-01

217

Beyond the Sequence: Cellular Organization of Genome Function  

Microsoft Academic Search

Genomes are more than linear sequences. In vivo they exist as elaborate physical struc- tures, and their functional properties are strongly determined by their cellular organization. I discuss here the functional relevance of spatial and temporal genome organization at three hierarchical levels: the organization of nuclear processes, the higher-order organization of the chromatin fiber, and the spatial arrangement of genomes

Tom Misteli

2007-01-01

218

Mapping copy number variation by population-scale genome sequencing  

Microsoft Academic Search

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from

Ryan E. Mills; Klaudia Walter; Chip Stewart; Robert E. Handsaker; Ken Chen; Can Alkan; Alexej Abyzov; Seungtai Chris Yoon; Kai Ye; R. Keira Cheetham; Asif Chinwalla; Donald F. Conrad; Yutao Fu; Fabian Grubert; Iman Hajirasouliha; Fereydoun Hormozdiari; Lilia M. Iakoucheva; Zamin Iqbal; Shuli Kang; Jeffrey M. Kidd; Miriam K. Konkel; Joshua Korn; Ekta Khurana; Deniz Kural; Hugo Y. K. Lam; Jing Leng; Ruiqiang Li; Yingrui Li; Chang-Yun Lin; Ruibang Luo; Xinmeng Jasmine Mu; James Nemesh; Heather E. Peckham; Tobias Rausch; Aylwyn Scally; Xinghua Shi; Michael P. Stromberg; Adrian M. Stütz; Alexander Eckehart Urban; Jerilyn A. Walker; Jiantao Wu; Yujun Zhang; Zhengdong D. Zhang; Mark A. Batzer; Li Ding; Gabor T. Marth; Gil McVean; Jonathan Sebat; Michael Snyder; Jun Wang; Kenny Ye; Evan E. Eichler; Mark B. Gerstein; Matthew E. Hurles; Charles Lee; Steven A. McCarroll; Jan O. Korbel

2011-01-01

219

Genome Sequence of Enterohemorrhagic Escherichia coli NCCP15658  

PubMed Central

Enterohemorrhagic Escherichia coli causes severe food-borne disease in the guts of humans and animals. Here, we report the high-quality draft genome sequence of E. coli NCCP15658 isolated from a patient in the Republic of Korea. Its genome size was determined to be 5.46 Mb, and its genomic features, including genes encoding virulence factors, were analyzed.

Song, Ju Yeon; Yoo, Ran Hee; Jang, Song Yee; Seong, Won-Keun; Kim, Seon-Young; Jeong, Haeyoung; Kang, Sung Gyun; Kim, Byung Kwon; Kwon, Soon-Kyeong; Lee, Choong Hoon; Yu, Dong Su; Park, Mi-Sun

2012-01-01

220

Genome sequence of Brevibacillus laterosporus strain GI-9.  

PubMed

We report the 5.18-Mb genome sequence of Brevibacillus laterosporus strain GI-9, isolated from a subsurface soil sample during a screen for novel strains producing antimicrobial compounds. The draft genome of this strain will aid in biotechnological exploitation and comparative genomics of Brevibacillus laterosporus strains. PMID:22328768

Sharma, Vikas; Singh, Pradip K; Midha, Samriti; Ranjan, Manish; Korpole, Suresh; Patil, Prabhu B

2012-03-01

221

Genome Sequence of Brevibacillus laterosporus Strain GI-9  

PubMed Central

We report the 5.18-Mb genome sequence of Brevibacillus laterosporus strain GI-9, isolated from a subsurface soil sample during a screen for novel strains producing antimicrobial compounds. The draft genome of this strain will aid in biotechnological exploitation and comparative genomics of Brevibacillus laterosporus strains.

Sharma, Vikas; Singh, Pradip K.; Midha, Samriti; Ranjan, Manish

2012-01-01

222

Identification of Candidate Drosophila Olfactory Receptors from Genomic DNA Sequence  

Microsoft Academic Search

We have taken advantage of the availability of a large amount of Drosophila genomic DNA sequence in the Berkeley Drosophila Genome Project database (?1\\/5 of the genome) to identify a family of novel seven transmembrane domain encoding genes that are putative Drosophila olfactory receptors. Members of the family are expressed in distinct subsets of olfactory neurons, and certain family members

Qian Gao; Andrew Chess

1999-01-01

223

Genome Sequence of the Rice Pathogen Pseudomonas fuscovaginae CB98818  

PubMed Central

Pseudomonas fuscovaginae is a phytopathogenic bacterium causing bacterial sheath brown rot of cereal crops. Here, we present the draft genome sequence of P. fuscovaginae CB98818, originally isolated from a diseased rice plant in China. The draft genome will aid in epidemiological studies, comparative genomics, and quarantine of this broad-host-range pathogen.

Xie, Guanlin; Cui, Zhouqi; Tao, Zhongyun; Qiu, Hui; Liu, He; Zhu, Bo; Jin, Gulei; Sun, Guochang; Almoneafy, Abdulwareth

2012-01-01

224

Research ethics and the challenge of whole-genome sequencing  

Microsoft Academic Search

The recent completion of the first two individual whole-genome sequences is a research milestone. As personal genome research advances, investigators and international research bodies must ensure ethical research conduct. We identify three major ethical considerations that have been implicated in whole-genome research: the return of research results to participants; the obligations, if any, that are owed to participants' relatives; and

Amy L. McGuire; Mildred K. Cho; Timothy Caulfield

2007-01-01

225

Genome Sequence of Aedes aegypti, a Major Arbovirus Vector  

Microsoft Academic Search

We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at ~1376 million base pairs is about 5 times the size of the genome of the malaria vector Anopheles gambiae. Nearly 50% of the Ae. aegypti genome consists of transposable elements. These contribute to a factor of ~4 to

Vishvanath Nene; Jennifer R. Wortman; Daniel Lawson; Brian Haas; Chinnappa Kodira; Z. Tu; Brendan Loftus; Zhiyong Xi; Karyn Megy; Manfred Grabherr; Quinghu Ren; E. M. Zdobnov; N. F. Lobo; K. S. Campbell; S. E. Brown; M. F. Bonaldo; Jingsong Zhu; S. P. Sinkins; D. G. Hogenkamp; Paolo Amedeo; Peter Arensburger; P. W. Atkinson; Shelby Bidwell; Jim Biedler; Ewan Birney; Robert V. Bruggner; Javier Costas; M. R. Coy; Jonathan Crabtree; Matt Crawford; Becky deBruyn; David DeCaprio; Karin Eiglmeier; Eric Eisenstadt; Hamza El-Dorry; W. M. Gelbart; S. L. Gomes; Martin Hammond; Linda I. Hannick; M. H. Holmes; J. R. Hogan; David Jaffe; J. S. Johnston; R. C. Kennedy; Hean Koo; Saul Kravitz; Evgenia V. Kriventseva; David Kulp; Kurt LaButti; Eduardo Lee; Song Li; Diane D. Lovin; Chunhong Mao; Evan Mauceli; C. F. M. Menck; J. R. Miller; Philip Montgomery; Akio Mori; A. L. Nascimento; H. F. Naveira; Chad Nusbaum; S. O'Leary; Joshua Orvis; Mihaela Pertea; Hadi Quesneville; K. R. Reidenbach; Yu-Hui Rogers; C. W. Roth; J. R. Schneider; Michael Schatz; Martin Shumway; Mario Stanke; E. O. Stinson; J. M. C. Tubio; J. P. VanZee; Sergio Verjovski-Almeida; Doreen Werner; Owen White; Stefan Wyder; Qiandong Zeng; Qi Zhao; Yongmei Zhao; C. A. Hill; A. S. Raikhel; M. B. Soares; D. L. Knudson; N. H. Lee; James Galagan; S. L. Salzberg; I. T. Paulsen; George Dimopoulos; F. H. Collins; Bruce Birren; C. M. Fraser-Liggett; D. W. Severson

2007-01-01

226

Draft genome sequence of the coccolithovirus Emiliania huxleyi virus 202.  

PubMed

Emiliania huxleyi virus 202 (EhV-202) is a member of the Coccolithoviridae, a group of viruses that infect the marine coccolithophorid Emiliania huxleyi. EhV-202 has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 407 kbp, consisting of 485 coding sequences (CDSs). Here we describe the genomic features of EhV-202, together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome. PMID:22282334

Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

2012-02-01

227

Draft genome sequence of the Coccolithovirus Emiliania huxleyi virus 203.  

PubMed

The Coccolithoviridae are a recently discovered group of viruses that infect the marine coccolithophorid Emiliania huxleyi. Emiliania huxleyi virus 203 (EhV-203) has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 400 kbp, consisting of 464 coding sequences (CDSs). Here we describe the genomic features of EhV-203 together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome. PMID:22106382

Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

2011-12-01

228

Complete genome sequence of Enterobacter aerogenes KCTC 2190.  

PubMed

This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs. PMID:22493190

Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

2012-05-01

229

Limitations of next-generation genome sequence assembly  

Microsoft Academic Search

High-throughput sequencing technologies promise to transform the fields of genetics and comparative biology by delivering tens of thousands of genomes in the near future. Although it is feasible to construct de novo genome assemblies in a few months, there has been relatively little attention to what is lost by sole application of short sequence reads. We compared the recent de

Can Alkan; Saba Sajjadian; Evan E Eichler

2010-01-01

230

WHOLE GENOME SEQUENCE OF FUSARIUM GRAMINEARUM, LINEAGE 7  

Technology Transfer Automated Retrieval System (TEKTRAN)

We have generated a draft sequence assembly of the F. graminearum genome that is available on the web for download and query. The sequence is of high quality with the entire 36Mb assembly consisting of just 511 contigs (> 2kb) contained within 28 supercontigs (scaffolds). The second genome release...

231

Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny  

Microsoft Academic Search

Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance

ELISABETH A. HERNIOU; TERESA LUQUE; XINWEN CHEN; JUST M. VLAK; DOREEN WINSTANLEY; JENNIFER S. CORY; D. R. O'Reilly

2001-01-01

232

MIPS: a database for protein sequences and complete genomes  

Microsoft Academic Search

The MIPS group (Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)) at the Max-Planck- Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis

Hans-werner Mewes; Jean Hani; Friedhelm Pfeiffer; Dmitrij Frishman

1998-01-01

233

Draft Genome Sequence of Aspergillus oryzae Strain 3.042  

PubMed Central

Aspergillus oryzae is the most important fungus for the traditional fermentation in China and is particularly important in soy sauce fermentation. We report the 36,547,279-bp draft genome sequence of A. oryzae 3.042 and compared it to the published genome sequence of A. oryzae RIB40.

Zhao, Guozhong; Yao, Yunping; Qi, Wei; Wang, Chunling; Hou, Lihua; Zeng, Bin

2012-01-01

234

The mitochondrial genome sequence of the Tasmanian tiger (Thylacinus cynocephalus)  

Microsoft Academic Search

We report the first two complete mitochondrial genome sequences of the thylacine (Thylacinus cynocephalus), or so-called Tasmanian tiger, extinct since 1936. The thylacine's phylogenetic position within australidelphian marsupials has long been debated, and here we provide strong support for the thylacine's basal position in Dasyuromorphia, aided by mi- tochondrial genome sequence that we generated from the extant numbat (Myrmecobius fasciatus).

Webb Miller; Daniela I. Drautz; Jan E. Janecka; Arthur M. Lesk; Aakrosh Ratan; Lynn P. Tomsho; Mike Packard; Yeting Zhang; Lindsay R. McClellan; Ji Qi; Fangqing Zhao; M. Thomas; P. Gilbert; Juan Luis Arsuaga; Daniel H. Huson; Kristofer M. Helgen; William J. Murphy; Anders Gotherstrom; Stephan C. Schuster

2009-01-01

235

Sequence Surveyor: Leveraging Overview for Scalable Genomic Alignment Visualization  

Microsoft Academic Search

Fig. 1. Sequence Surveyor visualizing 100 synthetic genomes generated by an evolution simulation. Each genome is mapped to a row and genes are ordered by position. Color encodes the position of the gene within the chosen reference sequence (top row, indicated by the green box). Genes are aggregated, with each block's texture reflecting the overall distribution of colors in that

Danielle Albers; Colin Dewey; Michael Gleicher

2011-01-01

236

Complete Genome Sequence of Enterobacter aerogenes KCTC 2190  

PubMed Central

This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon

2012-01-01

237

Genome Sequence of Alcaligenes sp. Strain HPC1271  

PubMed Central

We report a draft genome sequence of Alcaligenes sp. strain HPC1271, which demonstrates antimicrobial activity against multidrug-resistant bacteria. Antibiotic production by Alcaligenes has not been frequently reported, and hence, the availability of the genome sequence should enable us to explore new antibiotic-producing gene clusters.

Sagarkar, Sneha; Tanksale, Himgouri; Sharma, Nandita; Qureshi, Asifa; Khardenavis, Anshuman; Purohit, Hemant J.

2013-01-01

238

The first Irish genome and ways of improving sequence accuracy  

PubMed Central

Whole-genome sequencing of an Irish person reveals hundreds of thousands of novel genomic variants. Imputation using previous known information improves the accuracy of low-read-depth sequencing. See research article: http://genomebiology.com/2010/11/9/R91

2010-01-01

239

Distribution and intensity of constraint in mammalian genomic sequence  

Microsoft Academic Search

Comparisons of orthologous genomic DNA sequences can be used to characterize regions that have been subject to purifying selection and are enriched for functional elements. We here present the results of such an analysis on an alignment of sequences from 29 mammalian species. The alignment captures ?3.9 neutral substitutions per site and spans ?1.9 Mbp of the human genome. We

Gregory M. Cooper; Eric A. Stone; Eric D. Green; Serafim Batzoglou; Arend Sidow

2005-01-01

240

Initial sequencing and analysis of the human genome  

Microsoft Academic Search

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Eric S. Lander; Lauren M. Linton; Bruce Birren; Chad Nusbaum; Michael C. Zody; Jennifer Baldwin; Keri Devon; Ken Dewar; Michael Doyle; William FitzHugh; Roel Funke; Diane Gage; Katrina Harris; Andrew Heaford; John Howland; Lisa Kann; Jessica Lehoczky; Rosie LeVine; Paul McEwan; Kevin McKernan; James Meldrim; Jill P. Mesirov; Cher Miranda; William Morris; Jerome Naylor; Christina Raymond; Mark Rosetti; Ralph Santos; Andrew Sheridan; Carrie Sougnez; Nicole Stange-Thomann; Nikola Stojanovic; Aravind Subramanian; Dudley Wyman; Jane Rogers; John Sulston; Rachael Ainscough; Stephan Beck; David Bentley; John Burton; Christopher Clee; Nigel Carter; Alan Coulson; Rebecca Deadman; Panos Deloukas; Andrew Dunham; Ian Dunham; Richard Durbin; Lisa French; Darren Grafham; Simon Gregory; Tim Hubbard; Sean Humphray; Adrienne Hunt; Matthew Jones; Christine Lloyd; Amanda McMurray; Lucy Matthews; Simon Mercer; Sarah Milne; James C. Mullikin; Andrew Mungall; Robert Plumb; Mark Ross; Ratna Shownkeen; Sarah Sims; Robert H. Waterston; Richard K. Wilson; LaDeana W. Hillier; John D. McPherson; Marco A. Marra; Elaine R. Mardis; Lucinda A. Fulton; Asif T. Chinwalla; Kymberlie H. Pepin; Warren R. Gish; Stephanie L. Chissoe; Michael C. Wendl; Kim D. Delehaunty; Tracie L. Miner; Andrew Delehaunty; Jason B. Kramer; Lisa L. Cook; Robert S. Fulton; Douglas L. Johnson; Patrick J. Minx; Sandra W. Clifton; Trevor Hawkins; Elbert Branscomb; Paul Predki; Paul Richardson; Sarah Wenning; Tom Slezak; Norman Doggett; Jan-Fang Cheng; Anne Olsen; Susan Lucas; Christopher Elkin; Edward Uberbacher; Marvin Frazier; Richard A. Gibbs; Donna M. Muzny; Steven E. Scherer; John B. Bouck; Erica J. Sodergren; Kim C. Worley; Catherine M. Rives; James H. Gorrell; Michael L. Metzker; Susan L. Naylor; Raju S. Kucherlapati; David L. Nelson; George M. Weinstock; Yoshiyuki Sakaki; Asao Fujiyama; Masahira Hattori; Tetsushi Yada; Atsushi Toyoda; Takehiko Itoh; Chiharu Kawagoe; Hidemi Watanabe; Yasushi Totoki; Todd Taylor; Jean Weissenbach; Roland Heilig; William Saurin; Francois Artiguenave; Philippe Brottier; Thomas Bruls; Eric Pelletier; Catherine Robert; Patrick Wincker; Douglas R. Smith; Lynn Doucette-Stamm; Marc Rubenfield; Keith Weinstock; Hong Mei Lee; JoAnn Dubois; André Rosenthal; Matthias Platzer; Gerald Nyakatura; Stefan Taudien; Andreas Rump; Huanming Yang; Jun Yu; Jian Wang; Guyang Huang; Jun Gu; Leroy Hood; Lee Rowen; Anup Madan; Shizen Qin; Ronald W. Davis; Nancy A. Federspiel; A. Pia Abola; Michael J. Proctor; Richard M. Myers; Jeremy Schmutz; Mark Dickson; Jane Grimwood; David R. Cox; Maynard V. Olson; Rajinder Kaul; Christopher Raymond; Nobuyoshi Shimizu; Kazuhiko Kawasaki; Shinsei Minoshima; Glen A. Evans; Maria Athanasiou; Roger Schultz; Bruce A. Roe; Feng Chen; Huaqin Pan; Juliane Ramser; Hans Lehrach; Richard Reinhardt; W. Richard McCombie; Melissa de la Bastide; Neilay Dedhia; Helmut Blöcker; Klaus Hornischer; Gabriele Nordsiek; Richa Agarwala; L. Aravind; Jeffrey A. Bailey; Serafim Batzoglou; Ewan Birney; Peer Bork; Daniel G. Brown; Christopher B. Burge; Lorenzo Cerutti; Hsiu-Chuan Chen; Deanna Church; Michele Clamp; Richard R. Copley; Tobias Doerks; Sean R. Eddy; Evan E. Eichler; Terrence S. Furey; James Galagan; James G. R. Gilbert; Cyrus Harmon; Yoshihide Hayashizaki; David Haussler; Henning Hermjakob; Karsten Hokamp; Wonhee Jang; L. Steven Johnson; Thomas A. Jones; Simon Kasif; Arek Kaspryzk; Scot Kennedy; W. James Kent; Paul Kitts; Eugene V. Koonin; Ian Korf; David Kulp; Doron Lancet; Todd M. Lowe; Aoife McLysaght; Tarjei Mikkelsen; John V. Moran; Nicola Mulder; Victor J. Pollara; Chris P. Ponting; Greg Schuler; Jörg Schultz; Guy Slater; Arian F. A. Smit; Elia Stupka; Joseph Szustakowki; Danielle Thierry-Mieg; Jean Thierry-Mieg; Lukas Wagner; John Wallis; Raymond Wheeler; Alan Williams; Yuri I. Wolf; Kenneth H. Wolfe; Shiaw-Pyng Yang; Ru-Fang Yeh; Francis Collins; Mark S. Guyer; Jane Peterson; Adam Felsenfeld; Kris A. Wetterstrand; Aristides Patrinos; Michael J. Morgan

2001-01-01

241

Genome Sequence of the Pathogenic Bacterium Vibrio vulnificus Biotype 3.  

PubMed

We report the first genome sequence of the pathogenic Vibrio vulnificus biotype 3. This draft genome sequence of the environmental strain VVyb1(BT3), isolated in Israel, provides a representation of this newly emerged clonal group, which reveals higher similarity to the clinical strains of biotype 1 than to the environmental ones. PMID:23599289

Danin-Poleg, Yael; Elgavish, Sharona; Raz, Nili; Efimov, Vera; Kashi, Yechezkel

2013-04-18

242

Genome Sequence of the Pathogenic Bacterium Vibrio vulnificus Biotype 3  

PubMed Central

We report the first genome sequence of the pathogenic Vibrio vulnificus biotype 3. This draft genome sequence of the environmental strain VVyb1(BT3), isolated in Israel, provides a representation of this newly emerged clonal group, which reveals higher similarity to the clinical strains of biotype 1 than to the environmental ones.

Danin-Poleg, Yael; Elgavish, Sharona; Raz, Nili; Efimov, Vera

2013-01-01

243

Draft Genome Sequence of the Wolbachia Endosymbiont of Drosophila suzukii  

PubMed Central

Wolbachia is one of the most successful and abundant symbiotic bacteria in nature, infecting more than 40% of the terrestrial arthropod species. Here we report the draft genome sequence of a novel Wolbachia strain named “wSuzi” that was retrieved from the genome sequencing of its host, the invasive pest Drosophila suzukii.

Cestaro, Alessandro; Kaur, Rupinder; Pertot, Ilaria; Rota-Stabelli, Omar; Anfora, Gianfranco

2013-01-01

244

High-quality genome sequence of Pichia pastoris CBS7435  

Microsoft Academic Search

The methylotrophic yeast Pichia pastoris (Komagataella phaffii) CBS7435 is the parental strain of commonly used P. pastoris recombinant protein production hosts making it well suited for improving the understanding of associated genomic features. Here, we present a 9.35Mbp high-quality genome sequence of P. pastoris CBS7435 established by a combination of 454 and Illumina sequencing. An automatic annotation of the genome

Andreas Küberl; Jessica Schneider; Gerhard G. Thallinger; Ingund Anderl; Daniel Wibberg; Tanja Hajek; Sebastian Jaenicke; Karina Brinkrolf; Alexander Goesmann; Rafael Szczepanowski; Alfred Pühler; Helmut Schwab; Anton Glieder; Harald Pichler

2011-01-01

245

A Complete Sequence of the T. tengcongensis Genome  

Microsoft Academic Search

Thermoanaerobacter tengcongensis is a rod-shaped, gram-negative, anaerobic eubacterium that was isolated from a freshwater hot spring in Tengchong, China. Using a whole-genome-shotgun method, we sequenced its 2,689,445-bp genome from an isolate, MB4 T (Genbank accession no. AE008691). The genome encodes 2588 predicted coding sequences (CDS). Among them, 1764 (68.2%) are classified according to homology to other documented proteins, and the

Qiyu Bao; Yuqing Tian; Wei Li; Zuyuan Xu; Zhenyu Xuan; Songnian Hu; Wei Dong; Jian Yang; Yanjiong Chen; Yanfen Xue; Yi Xu; Xiaoqin Lai; Li Huang; Xiuzhu Dong; Yanhe Ma; Lunjiang Ling; Huarong Tan; Runsheng Chen; Jian Wang; Jun Yu; Huanming Yang

2002-01-01

246

Low-pass sequencing for microbial comparative genomics  

Microsoft Academic Search

BACKGROUND: We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1

Young Ah Goo; Jared Roach; Gustavo Glusman; Nitin S Baliga; Kerry Deutsch; Min Pan; Sean Kennedy; Shiladitya DasSarma; Wailap Victor Ng; Leroy Hood

2004-01-01

247

Genome sequence of the human malaria parasite Plasmodium falciparum  

Microsoft Academic Search

The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date.

Malcolm J. Gardner; Neil Hall; Eula Fung; Owen White; Matthew Berriman; Richard W. Hyman; Jane M. Carlton; Arnab Pain; Sharen Bowman; Ian T. Paulsen; Keith James; Kim Rutherford; Steven L. Salzberg; Alister Craig; Sue Kyes; Man-Suen Chan; Vishvanath Nene; Shamira J. Shallom; Bernard Suh; Jeremy Peterson; Sam Angiuoli; Mihaela Pertea; Jonathan Allen; Jeremy Selengut; Daniel Haft; Michael W. Mather; Akhil B. Vaidya; Alan H. Fairlamb; Martin J. Fraunholz; David S. Roos; Stuart A. Ralph; Geoffrey I. McFadden; Leda M. Cummings; G. Mani Subramanian; Chris Mungall; J. Craig Venter; Daniel J. Carucci; Stephen L. Hoffman; Chris Newbold; Ronald W. Davis; Claire M. Fraser; Bart Barrell

2002-01-01

248

Sequence analysis of mutations and translocations across breast cancer subtypes.  

PubMed

Breast carcinoma is the leading cause of cancer-related mortality in women worldwide, with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis and responses to available therapy. Recurrent somatic alterations in breast cancer have been described, including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration. Previous DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA, TP53, AKT1, GATA3 and MAP3K1, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3-AKT3 fusion enriched in triple-negative breast cancer lacking oestrogen and progesterone receptors and ERBB2 expression. The MAGI3-AKT3 fusion leads to constitutive activation of AKT kinase, which is abolished by treatment with an ATP-competitive AKT small-molecule inhibitor. PMID:22722202

Banerji, Shantanu; Cibulskis, Kristian; Rangel-Escareno, Claudia; Brown, Kristin K; Carter, Scott L; Frederick, Abbie M; Lawrence, Michael S; Sivachenko, Andrey Y; Sougnez, Carrie; Zou, Lihua; Cortes, Maria L; Fernandez-Lopez, Juan C; Peng, Shouyong; Ardlie, Kristin G; Auclair, Daniel; Bautista-Piña, Veronica; Duke, Fujiko; Francis, Joshua; Jung, Joonil; Maffuz-Aziz, Antonio; Onofrio, Robert C; Parkin, Melissa; Pho, Nam H; Quintanar-Jurado, Valeria; Ramos, Alex H; Rebollar-Vega, Rosa; Rodriguez-Cuevas, Sergio; Romero-Cordoba, Sandra L; Schumacher, Steven E; Stransky, Nicolas; Thompson, Kristin M; Uribe-Figueroa, Laura; Baselga, Jose; Beroukhim, Rameen; Polyak, Kornelia; Sgroi, Dennis C; Richardson, Andrea L; Jimenez-Sanchez, Gerardo; Lander, Eric S; Gabriel, Stacey B; Garraway, Levi A; Golub, Todd R; Melendez-Zajgla, Jorge; Toker, Alex; Getz, Gad; Hidalgo-Miranda, Alfredo; Meyerson, Matthew

2012-06-20

249

A remarkably simple genome underlies highly malignant pediatric rhabdoid cancers  

PubMed Central

Cancer is principally considered a genetic disease, and numerous mutations are thought essential to drive its growth. However, the existence of genomically stable cancers and the emergence of mutations in genes that encode chromatin remodelers raise the possibility that perturbation of chromatin structure and epigenetic regulation are capable of driving cancer formation. Here we sequenced the exomes of 35 rhabdoid tumors, highly aggressive cancers of early childhood characterized by biallelic loss of SMARCB1, a subunit of the SWI/SNF chromatin remodeling complex. We identified an extremely low rate of mutation, with loss of SMARCB1 being essentially the sole recurrent event. Indeed, in 2 of the cancers there were no other identified mutations. Our results demonstrate that high mutation rates are dispensable for the genesis of cancers driven by mutation of a chromatin remodeling complex. Consequently, cancer can be a remarkably genetically simple disease.

Lee, Ryan S.; Stewart, Chip; Carter, Scott L.; Ambrogio, Lauren; Cibulskis, Kristian; Sougnez, Carrie; Lawrence, Michael S.; Auclair, Daniel; Mora, Jaume; Golub, Todd R.; Biegel, Jaclyn A.; Getz, Gad; Roberts, Charles W.M.

2012-01-01

250

Complete Genome Sequence of Probiotic Strain Lactobacillus acidophilus La-14.  

PubMed

We present the 1,991,830-bp complete genome sequence of Lactobacillus acidophilus strain La-14 (SD-5212). Comparative genomic analysis revealed 99.98% similarity overall to the L. acidophilus NCFM genome. Globally, 111 single nucleotide polymorphisms (SNPs) (95 SNPs, 16 indels) were observed throughout the genome. Also, a 416-bp deletion in the LA14_1146 sugar ABC transporter was identified. PMID:23788546

Stahl, Buffy; Barrangou, Rodolphe

2013-06-20

251

Next-Generation Sequencing for Cancer Diagnostics: a Practical Perspective  

PubMed Central

Next-generation sequencing (NGS) is arguably one of the most significant technological advances in the biological sciences of the last 30 years. The second generation sequencing platforms have advanced rapidly to the point that several genomes can now be sequenced simultaneously in a single instrument run in under two weeks. Targeted DNA enrichment methods allow even higher genome throughput at a reduced cost per sample. Medical research has embraced the technology and the cancer field is at the forefront of these efforts given the genetic aspects of the disease. World-wide efforts to catalogue mutations in multiple cancer types are underway and this is likely to lead to new discoveries that will be translated to new diagnostic, prognostic and therapeutic targets. NGS is now maturing to the point where it is being considered by many laboratories for routine diagnostic use. The sensitivity, speed and reduced cost per sample make it a highly attractive platform compared to other sequencing modalities. Moreover, as we identify more genetic determinants of cancer there is a greater need to adopt multi-gene assays that can quickly and reliably sequence complete genes from individual patient samples. Whilst widespread and routine use of whole genome sequencing is likely to be a few years away, there are immediate opportunities to implement NGS for clinical use. Here we review the technology, methods and applications that can be immediately considered and some of the challenges that lie ahead.

Meldrum, Cliff; Doyle, Maria A; Tothill, Richard W

2011-01-01

252

Complete Chloroplast Genome Sequence of Glycine max and Comparative Analyses with other Legume Genomes  

Microsoft Academic Search

Lack of complete chloroplast genome sequences is still one of the major limitations to extending chloroplast genetic engineering technology to useful crops. Therefore, we sequenced the soybean chloroplast genome and compared it to the other completely sequenced legumes, Lotus and Medicago. The chloroplast genome of Glycine is 152,218 basepairs (bp) in length, including a pair of inverted repeats of 25,574 bp

Christopher Saski; Seung-Bum Lee; Henry Daniell; Todd C. Wood; Jeffrey Tomkins; Hyi-Gyung Kim; Robert K. Jansen

2005-01-01

253

Sequences Associated with Centromere Competency in the Human Genome  

PubMed Central

Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance.

Hayden, Karen E.; Strome, Erin D.; Merrett, Stephanie L.; Lee, Hye-Ran; Rudd, M. Katharine

2013-01-01

254

Looking to future of genome mapping, sequencing  

SciTech Connect

The human genome mapping and sequencing project is perhaps the prime example of an international project in medicine today. The project director, Nobelist James D. Watson, PhD, noted at the bicentennial conference that it may be possible to bring the cost down to as low as 50{cents} a base pair without any enormous technological breakthroughs in the 10-nation effort. Another speaker, George Poste, PhD, DVM, DSc, head of research and development, Smith Kline French Laboratories, Philadelphia, PA, predicted that completion of the genetic dictionary will lead to compilation of a protein dictionary for each cell type for use against disease. Anti-trust legislation, he said, is overtly ignored all the time in the defense industry because it is deemed to be in the national interest. However, Poste went on, the legislative bodies of the world do not yet understand the implications of the directions in which we are going in terms of Big Biology and the requirements for companies to be able to work together.

Kangilaski, J.

1989-07-21

255

Community-wide analysis of microbial genome sequence signatures  

PubMed Central

Background Analyses of DNA sequences from cultivated microorganisms have revealed genome-wide, taxa-specific nucleotide compositional characteristics, referred to as genome signatures. These signatures have far-reaching implications for understanding genome evolution and potential application in classification of metagenomic sequence fragments. However, little is known regarding the distribution of genome signatures in natural microbial communities or the extent to which environmental factors shape them. Results We analyzed metagenomic sequence data from two acidophilic biofilm communities, including composite genomes reconstructed for nine archaea, three bacteria, and numerous associated viruses, as well as thousands of unassigned fragments from strain variants and low-abundance organisms. Genome signatures, in the form of tetranucleotide frequencies analyzed by emergent self-organizing maps, segregated sequences from all known populations sharing < 50 to 60% average amino acid identity and revealed previously unknown genomic clusters corresponding to low-abundance organisms and a putative plasmid. Signatures were pervasive genome-wide. Clusters were resolved because intra-genome differences resulting from translational selection or protein adaptation to the intracellular (pH ~5) versus extracellular (pH ~1) environment were small relative to inter-genome differences. We found that these genome signatures stem from multiple influences but are primarily manifested through codon composition, which we propose is the result of genome-specific mutational biases. Conclusions An important conclusion is that shared environmental pressures and interactions among coevolving organisms do not obscure genome signatures in acid mine drainage communities. Thus, genome signatures can be used to assign sequence fragments to populations, an essential prerequisite if metagenomics is to provide ecological and biochemical insights into the functioning of microbial communities.

Dick, Gregory J; Andersson, Anders F; Baker, Brett J; Simmons, Sheri L; Thomas, Brian C; Yelton, A Pepper; Banfield, Jillian F

2009-01-01

256

MIPS: a database for genomes and protein sequences  

Microsoft Academic Search

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein

Hans-werner Mewes; Dmitrij Frishman; Ulrich Güldener; Gertrud Mannhaupt; Klaus F. X. Mayer; Martin Mokrejs; Burkhard Morgenstern; Martin Münsterkötter; Stephen Rudd; B. Weil

2002-01-01

257

Genome Sequence of the Trichosporon asahii Environmental Strain CBS 8904  

PubMed Central

This is the first report of the genome sequence of Trichosporon asahii environmental strain CBS 8904, which was isolated from maize cobs. Comparison of the genome sequence with that of clinical strain CBS 2479 revealed that they have >99% chromosomal and mitochondrial sequence identity, yet CBS 8904 has 368 specific genes. Analysis of clusters of orthologous groups predicted that 3,307 genes belong to 23 functional categories and 703 genes were predicted to have a general function.

Li, Hai Tao; Zhu, He; Zhou, Guang Peng; Wang, Meng; Wang, Lei

2012-01-01

258

Nucleotide sequence of the genomic RNA of bamboo mosaic potexvirus  

Microsoft Academic Search

The complete nucleotide sequence of the genomic RNA of bamboo mosaic virus (BaMV) was determined by sequencing a set of overlapping cDNA clones and by direct sequencing of the viral RNA. The RNA genome of BaMV is 63 66 nucleotides long (excluding 3'poly(A) tail) and contains six open reading frames (ORFs 1 to 6) coding for polypeptides with M~. values

Na-Sheng Lin; Biing-Yuan Lin; Neng-Wen Lo; Chung-Chi Hu; Teh-Yuan Chow; Yau-Heiu Hsu

1994-01-01

259

Multiple alignment of genomic sequences using CHAOS, DIALIGN and ABC  

Microsoft Academic Search

Comparative analysis of genomic sequences is a powerful approach to discover functional sites in these sequences. Herein, we present a WWW-based software system for multiple alignment of genomic sequences. We use the local alignment tool CHAOS to rapidly identify chains of pairwise similarities. These similarities are used as anchor points to speed up the DIALIGN multiple-alignment program. Finally,thevisualizationtoolABCisusedforinteract- ive graphical

Dirk Pöhler; Nadine Werner; Rasmus Steinkamp; Burkhard Morgenstern

2005-01-01

260

Bioinformatics interpretation of exome sequencing: blood cancer.  

PubMed

We had analyzed 10 exome sequencing data and single nucleotide polymorphism chips for blood cancer provided by the PGM21 (The National Project for Personalized Genomic Medicine) Award program. We had removed sample G06 because the pair is not correct and G10 because of possible contamination. In-house software somatic copy-number and heterozygosity alteration estimation (SCHALE) was used to detect one loss of heterozygosity region in G05. We had discovered 27 functionally important mutations. Network and pathway analyses gave us clues that NPM1, GATA2, and CEBPA were major driver genes. By comparing with previous somatic mutation profiles, we had concluded that the provided data originated from acute myeloid leukemia. Protein structure modeling showed that somatic mutations in IDH2, RASGEF1B, and MSH4 can affect protein structures. PMID:23613679

Kim, Jiwoong; Lee, Yun-Gyeong; Kim, Namshin

2013-03-31

261

Bioinformatics Interpretation of Exome Sequencing: Blood Cancer  

PubMed Central

We had analyzed 10 exome sequencing data and single nucleotide polymorphism chips for blood cancer provided by the PGM21 (The National Project for Personalized Genomic Medicine) Award program. We had removed sample G06 because the pair is not correct and G10 because of possible contamination. In-house software somatic copy-number and heterozygosity alteration estimation (SCHALE) was used to detect one loss of heterozygosity region in G05. We had discovered 27 functionally important mutations. Network and pathway analyses gave us clues that NPM1, GATA2, and CEBPA were major driver genes. By comparing with previous somatic mutation profiles, we had concluded that the provided data originated from acute myeloid leukemia. Protein structure modeling showed that somatic mutations in IDH2, RASGEF1B, and MSH4 can affect protein structures.

Kim, Jiwoong; Lee, Yun-Gyeong

2013-01-01

262

Automated sequencing of complete mitochondrial genomes from laser-capture microdissected samples.  

PubMed

Mitochondrial DNA mutations have been related to both aging and a variety of diseases such as cancer. Due to the relatively small size of the genome (16 kb) and with the use of automated DNA sequencing, the entire genome can be sequenced from clinical specimens in days. We present a reliable approach to complete mitochondrial genome sequencing from laser-capture microdissected human clinical cancer specimens that overcome the inherent limitations of relatively small tissue samples and partial DNA degradation, which are unavoidable when laser-capture microdissection is used to attain pure populations of cells from heterogeneous tissues obtained from surgical procedures. The acquisition of sufficient template combined with a standard set of 18 pairs of PCR primers allows for the efficient amplification of the genome. Subsequent single-stranded amplification is performed using 36 sequencing primers, and samples are run on an ABI PRISM 3100 Genetic Analyzer. The use of this procedure should allow even investigators with little experience sequencing from clinical specimens success in complete mitochondrial genome sequencing. PMID:14513566

Aldridge, Beau A; Lim, So Dug; Baumann, Amanda K; Hosseini, Seyed; Buck, Whitney; Almekinder, Tara L; Sun, Carrie Q; Petros, John A

2003-09-01

263

Data structures and compression algorithms for genomic sequence data  

PubMed Central

Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function and evolution, but also for the storage, navigation and privacy of genomic data. Here, we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and protecting the data. Results: The general idea is to encode only the differences between a genome sequence and a reference sequence, using absolute or relative coordinates for the location of the differences. These locations and the corresponding differential variants can be encoded into binary strings using various entropy coding methods, from fixed codes such as Golomb and Elias codes, to variables codes, such as Huffman codes. We demonstrate the approach and various tradeoffs using highly variables human mitochondrial genome sequences as a testbed. With only a partial level of optimization, 3615 genome sequences occupying 56 MB in GenBank are compressed down to only 167 KB, achieving a 345-fold compression rate, using the revised Cambridge Reference Sequence as the reference sequence. Using the consensus sequence as the reference sequence, the data can be stored using only 133 KB, corresponding to a 433-fold level of compression, roughly a 23% improvement. Extensions to nuclear genomes and high-throughput sequencing data are discussed. Availability: Data are publicly available from GenBank, the HapMap web site, and the MITOMAP database. Supplementary materials with additional results, statistics, and software implementations are available from http://mammag.web.uci.edu/bin/view/Mitowiki/ProjectDNACompression. Contact: pfbaldi@ics.uci.edu

Brandon, Marty C.; Wallace, Douglas C.; Baldi, Pierre

2009-01-01

264

Characterizing the cancer genome in lung adenocarcinoma  

Microsoft Academic Search

Somatic alterations in cellular DNA underlie almost all human cancers1. The prospect of targeted therapies2 and the development of high-resolution, genome-wide approaches3-8 are now spurring systematic efforts to characterize cancer genomes. Here we report a large-scale project to characterize copy-number alterations in primary lung adenocarcinomas. By analysis of a large collection oftumours(n 5371)usingdensesinglenucleotidepolymorphism arrays, we identify a total of 57

Barbara A. Weir; Michele S. Woo; Gad Getz; Sven Perner; Li Ding; Rameen Beroukhim; William M. Lin; Michael A. Province; Aldi Kraja; Laura A. Johnson; Kinjal Shah; Mitsuo Sato; Roman K. Thomas; Justine A. Barletta; Ingrid B. Borecki; Stephen Broderick; Andrew C. Chang; Derek Y. Chiang; Lucian R. Chirieac; Jeonghee Cho; Yoshitaka Fujii; Adi F. Gazdar; Thomas Giordano; Heidi Greulich; Megan Hanna; Bruce E. Johnson; Mark G. Kris; Alex Lash; Ling Lin; Neal Lindeman; Elaine R. Mardis; John D. McPherson; John D. Minna; Margaret B. Morgan; Mark Nadel; Mark B. Orringer; John R. Osborne; Brad Ozenberger; Alex H. Ramos; James Robinson; Jack A. Roth; Valerie Rusch; Hidefumi Sasaki; Frances Shepherd; Carrie Sougnez; Margaret R. Spitz; Ming-Sound Tsao; David Twomey; Roel G. W. Verhaak; George M. Weinstock; David A. Wheeler; Wendy Winckler; Akihiko Yoshizawa; Soyoung Yu; Maureen F. Zakowski; Qunyuan Zhang; David G. Beer; Ignacio I. Wistuba; Mark A. Watson; Levi A. Garraway; Marc Ladanyi; William D. Travis; William Pao; Mark A. Rubin; Stacey B. Gabriel; Richard A. Gibbs; Harold E. Varmus; Richard K. Wilson; Eric S. Lander; Matthew Meyerson

2007-01-01

265

Genome-Wide Epigenetic Modifications in Cancer  

Microsoft Academic Search

\\u000a Epigenetic alterations in cancer include changes in DNA methylation and associated histone modifications that influence the\\u000a chromatin states and impact gene expression patterns. Due to recent technological advantages, the scientific community is\\u000a now obtaining a better picture of the genome-wide epigenetic changes that occur in a cancer genome. These epigenetic alterations\\u000a are associated with chromosomal instability and changes in transcriptional

Yoon Jung Park; Rainer Claus; Dieter Weichenhan; Christoph Plass

266

Characterizing the cancer genome in lung adenocarcinoma  

Microsoft Academic Search

Somatic alterations in cellular DNA underlie almost all human cancers. The prospect of targeted therapies and the development of high-resolution, genome-wide approaches are now spurring systematic efforts to characterize cancer genomes. Here we report a large-scale project to characterize copy-number alterations in primary lung adenocarcinomas. By analysis of a large collection of tumours (n = 371) using dense single nucleotide

Barbara A. Weir; Michele S. Woo; Gad Getz; Sven Perner; Li Ding; Rameen Beroukhim; William M. Lin; Michael A. Province; Aldi Kraja; Laura A. Johnson; Kinjal Shah; Mitsuo Sato; Roman K. Thomas; Justine A. Barletta; Ingrid B. Borecki; Stephen Broderick; Andrew C. Chang; Derek Y. Chiang; Lucian R. Chirieac; Jeonghee Cho; Yoshitaka Fujii; Adi F. Gazdar; Thomas Giordano; Heidi Greulich; Megan Hanna; Bruce E. Johnson; Mark G. Kris; Alex Lash; Ling Lin; Neal Lindeman; Elaine R. Mardis; John D. McPherson; John D. Minna; Margaret B. Morgan; Mark Nadel; Mark B. Orringer; John R. Osborne; Brad Ozenberger; Alex H. Ramos; James Robinson; Jack A. Roth; Valerie Rusch; Hidefumi Sasaki; Frances Shepherd; Carrie Sougnez; Margaret R. Spitz; Ming-Sound Tsao; David Twomey; Roel G. W. Verhaak; George M. Weinstock; David A. Wheeler; Wendy Winckler; Akihiko Yoshizawa; Soyoung Yu; Maureen F. Zakowski; Qunyuan Zhang; David G. Beer; Ignacio I. Wistuba; Mark A. Watson; Levi A. Garraway; Marc Ladanyi; William D. Travis; William Pao; Mark A. Rubin; Stacey B. Gabriel; Richard A. Gibbs; Harold E. Varmus; Richard K. Wilson; Eric S. Lander; Matthew Meyerson

2007-01-01

267

Whole-exome targeted sequencing of the uncharacterized pine genome.  

PubMed

The large genome size of many species hinders the development and application of genomic tools to study them. For instance, loblolly pine (Pinus taeda L.), an ecologically and economically important conifer, has a large and yet uncharacterized genome of 21.7 Gbp. To characterize the pine genome, we performed exome capture and sequencing of 14 729 genes derived from an assembly of expressed sequence tags. Efficiency of sequence capture was evaluated and shown to be similar across samples with increasing levels of complexity, including haploid cDNA, haploid genomic DNA and diploid genomic DNA. However, this efficiency was severely reduced for probes that overlapped multiple exons, presumably because intron sequences hindered probe:exon hybridizations. Such regions could not be entirely avoided during probe design, because of the lack of a reference sequence. To improve the throughput and reduce the cost of sequence capture, a method to multiplex the analysis of up to eight samples was developed. Sequence data showed that multiplexed capture was reproducible among 24 haploid samples, and can be applied for high-throughput analysis of targeted genes in large populations. Captured sequences were de novo assembled, resulting in 11 396 expanded and annotated gene models, significantly improving the knowledge about the pine gene space. Interspecific capture was also evaluated with over 98% of all probes designed from P. taeda that were efficient in sequence capture, were also suitable for analysis of the related species Pinus elliottii Engelm. PMID:23551702

Neves, Leandro G; Davis, John M; Barbazuk, William B; Kirst, Matias

2013-05-07

268

Identification of Optimum Sequencing Depth Especially for De Novo Genome Assembly of Small Genomes Using Next Generation Sequencing Data  

PubMed Central

Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6–40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

2013-01-01

269

Savant: genome browser for high-throughput sequencing data  

PubMed Central

Motivation: The advent of high-throughput sequencing (HTS) technologies has made it affordable to sequence many individuals' genomes. Simultaneously the computational analysis of the large volumes of data generated by the new sequencing machines remains a challenge. While a plethora of tools are available to map the resulting reads to a reference genome, and to conduct primary analysis of the mappings, it is often necessary to visually examine the results and underlying data to confirm predictions and understand the functional effects, especially in the context of other datasets. Results: We introduce Savant, the Sequence Annotation, Visualization and ANalysis Tool, a desktop visualization and analysis browser for genomic data. Savant was developed for visualizing and analyzing HTS data, with special care taken to enable dynamic visualization in the presence of gigabases of genomic reads and references the size of the human genome. Savant supports the visualization of genome-based sequence, point, interval and continuous datasets, and multiple visualization modes that enable easy identification of genomic variants (including single nucleotide polymorphisms, structural and copy number variants), and functional genomic information (e.g. peaks in ChIP-seq data) in the context of genomic annotations. Availability: Savant is freely available at http://compbio.cs.toronto.edu/savant Contact: savant@cs.toronto.edu

Fiume, Marc; Williams, Vanessa; Brook, Andrew; Brudno, Michael

2010-01-01

270

Towards systematic functional characterization of cancer genomes  

Microsoft Academic Search

Whole-genome approaches to identify genetic and epigenetic alterations in cancer genomes have begun to provide new insights into the range of molecular events that occurs in human tumours. Although in some cases this knowledge immediately illuminates a path towards diagnostic or therapeutic implementation, the bewildering lists of mutations in each tumour make it clear that systematic functional approaches are also

Jesse S. Boehm; William C. Hahn

2011-01-01

271

Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students  

ERIC Educational Resources Information Center

|Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

2005-01-01

272

The genomic complexity of primary human prostate cancer  

PubMed Central

Prostate cancer is the second most common cause of male cancer deaths in the United States. Here we present the complete sequence of seven primary prostate cancers and their paired normal counterparts. Several tumors contained complex chains of balanced rearrangements that occurred within or adjacent to known cancer genes. Rearrangement breakpoints were enriched near open chromatin, androgen receptor and ERG DNA binding sites in the setting of the ETS gene fusion TMPRSS2-ERG, but inversely correlated with these regions in tumors lacking ETS fusions. This observation suggests a link between chromatin or transcriptional regulation and the genesis of genomic aberrations. Three tumors contained rearrangements that disrupted CADM2, and four harbored events disrupting either PTEN (unbalanced events), a prostate tumor suppressor, or MAGI2 (balanced events), a PTEN interacting protein not previously implicated in prostate tumorigenesis. Thus, genomic rearrangements may arise from transcriptional or chromatin aberrancies to engage prostate tumorigenic mechanisms.

Berger, Michael F.; Lawrence, Michael S.; Demichelis, Francesca; Drier, Yotam; Cibulskis, Kristian; Sivachenko, Andrey Y.; Sboner, Andrea; Esgueva, Raquel; Pflueger, Dorothee; Sougnez, Carrie; Onofrio, Robert; Carter, Scott L.; Park, Kyung; Habegger, Lukas; Ambrogio, Lauren; Fennell, Timothy; Parkin, Melissa; Saksena, Gordon; Voet, Douglas; Ramos, Alex H.; Pugh, Trevor J.; Wilkinson, Jane; Fisher, Sheila; Winckler, Wendy; Mahan, Scott; Ardlie, Kristin; Baldwin, Jennifer; Simons, Jonathan W.; Kitabayashi, Naoki; MacDonald, Theresa Y.; Kantoff, Philip W.; Chin, Lynda; Gabriel, Stacey B.; Gerstein, Mark B.; Golub, Todd R.; Meyerson, Matthew; Tewari, Ashutosh; Lander, Eric S.; Getz, Gad; Rubin, Mark A.; Garraway, Levi A.

2010-01-01

273

Sequencing Initiative at the Norris Cotton Cancer Center  

PubMed Central

The Dartmouth Genomics Shared Resource recently purchased the Ion Torrent Personal Genome Machine (PGM) and the Ion Proton with contributions from the Norris Cotton Cancer Center (NCCC), Geisel School of Medicine and the Institute for Quantitative Biomedical Sciences. The transition to Ion Torrent deep sequencing was relatively smooth and the workflows easily established. In collaboration with the NCCC, we are offering NCCC investigators an initiative to encourage deep sequencing and translational research. Investigators can choose one of two cancer panels: the Ion Torrent hotspot cancer panel (50 genes), and a custom-designed cancer gene panel (541 genes). The 541-cancer gene panel includes the desired genes from every NCCC investigator, which covers a broad spectrum of cancers and signaling pathways. The 541-cancer gene panel was designed using the Haloplex system (Agilent, Santa Clara, CA). We have validated extraction of DNA from both formalin-fixed paraffin-embedded (FFPE) and fresh frozen tissues to offer clinicians and researchers options for sample collection. Data are presented from the hotspot cancer gene panel using DNA obtained from FFPE and frozen breast cancer tissues.

Shipman, S.; Trask, H.; Lytle, C.; Taylor, W.; Moore, J.; Tomlinson, C.; Kerley-Hamilton, Joanna

2013-01-01

274

First draft genome sequence of the Japanese eel, Anguilla japonica.  

PubMed

The Japanese eel is a much appreciated research object and very important for Asian aquaculture; however, its genomic resources are still limited. We have used a streamlined bioinformatics pipeline for the de novo assembly of the genome sequence of the Japanese eel from raw Illumina sequence reads. The total assembled genome has a size of 1.15 Gbp, which is divided over 323,776 scaffolds with an N50 of 52,849 bp, a minimum scaffold size of 200 bp and a maximum scaffold size of 1.14 Mbp. Direct comparison of a representative set of scaffolds revealed that all the Hox genes and their intergenic distances are almost perfectly conserved between the European and the Japanese eel. The first draft genome sequence of an organism strongly catalyzes research progress in multiple fields. Therefore, the Japanese eel genome sequence will provide a rich resource of data for all scientists working on this important fish species. PMID:23026207

Henkel, Christiaan V; Dirks, Ron P; de Wijze, Daniëlle L; Minegishi, Yuki; Aoyama, Jun; Jansen, Hans J; Turner, Ben; Knudsen, Bjarne; Bundgaard, Martin; Hvam, Kenneth Lyneborg; Boetzer, Marten; Pirovano, Walter; Weltzien, Finn-Arne; Dufour, Sylvie; Tsukamoto, Katsumi; Spaink, Herman P; van den Thillart, Guido E E J M

2012-09-29

275

Reference genome sequence of the model plant Setaria  

SciTech Connect

We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

Bennetzen, Jeffrey L [ORNL; Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Tuskan, Gerald A [ORNL

2012-01-01

276

Reference genome sequence of the model plant Setaria  

SciTech Connect

We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

Bennetzen, Jeffrey L [ORNL; Schmutz, Jeremy [Hudson Alpha Institute of Biotechnology; Wang, Hao [University of Georgia, Athens, GA; Percifield, Ryan [University of Georgia, Athens, GA; Hawkins, Jennifer [University of Georgia, Athens, GA; Pontaroli, Ana C. [University of Georgia, Athens, GA; Estep, Matt [University of Georgia, Athens, GA; Feng, Liang [University of Georgia, Athens, GA; Vaughn, Justin N [ORNL; Grimwood, Jane [Hudson Alpha Institute of Biotechnology; Jenkins, Jerry [Hudson Alpha Institute of Biotechnology; Barry, Kerrie [U.S. Department of Energy, Joint Genome Institute; Lindquist, Erika [U.S. Department of Energy, Joint Genome Institute; Hellsten, Uffe [U.S. Department of Energy, Joint Genome Institute; Deshpande, Shweta [U.S. Department of Energy, Joint Genome Institute; Wang, Xuewen [University of Georgia, Athens, GA; Wu, Xiaomei [University of Georgia, Athens, GA; Mitros, Therese [University of California, Berkeley; Triplett, Jimmy [University of Missouri, St. Louis; Yang, Xiaohan [ORNL; Ye, Chuyu [ORNL; Mauro-Herrera, Margarita [Oklahoma State University; Wang, Lin [Cornell University; Li, Pinghua [Cornell University; Sharma, Manoj [University of California, Davis; Sharma, Rita [University of California, Davis; Ronald, Pamela [University of California, Davis; Panaud, Olivier [Universite de Perpignan, Perpignan, France; Kellogg, Elizabeth A. [University of Missouri, St. Louis; Brutnell, Thomas P. [Cornell University; Doust, Andrew N. [Oklahoma State University; Tuskan, Gerald A [ORNL; Rokhsar, Daniel [U.S. Department of Energy, Joint Genome Institute; Devos, Katrien M [ORNL

2012-01-01

277

Fuzzy Genome Sequence Assembly for Single and Environmental Genomes  

Microsoft Academic Search

Summary. Traditional methods obtain a microorganism's DNA by culturing it in- dividually. Recent advances in genomics have lead to the procurement of DNA of more than one organism from its natural habitat. Indeed, natural microbial commu- nities are often very complex with tens and hundreds of species. Assembling these genomes is a crucial step irrespective of the method of obtaining

Sara Nasser; Adrienne Breland; Frederick C. Harris Jr.; Monica N. Nicolescu; Gregory L. Vert

2009-01-01

278

Comparative Analysis of Rice Genome Sequence to Understand the Molecular Basis of Genome Evolution  

Microsoft Academic Search

Accurate sequencing of the rice genome has ignited a passion for elucidating mechanism for sequence diversity among rice varieties\\u000a and species, both in protein-coding regions and in genomic regions that are important for chromosome functions. Here, we have\\u000a shown examples of sequence diversity in genic and non-genic regions. Sequence analysis of chromosome ends has revealed that\\u000a there is diversity in

Jianzhong Wu; Hiroshi Mizuno; Takuji Sasaki; Takashi Matsumoto

2008-01-01

279

Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)  

SciTech Connect

Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Lang, Elke [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Spring, Stefan [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

280

Complete genome sequence of Thermomonospora curvata type strain (B9)  

SciTech Connect

Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

Chertkov, Olga [Los Alamos National Laboratory (LANL); Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Nolan, Matt [Joint Genome Institute, Walnut Creek, California; Lapidus, Alla L. [Joint Genome Institute, Walnut Creek, California; Lucas, Susan [Joint Genome Institute, Walnut Creek, California; Glavina Del Rio, Tijana [Joint Genome Institute, Walnut Creek, California; Tice, Hope [Joint Genome Institute, Walnut Creek, California; Cheng, Jan-Fang [Joint Genome Institute, Walnut Creek, California; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [Joint Genome Institute, Walnut Creek, California; Liolios, Konstantinos [Joint Genome Institute, Walnut Creek, California; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [Joint Genome Institute, Walnut Creek, California; Palaniappan, Krishna [Joint Genome Institute, Walnut Creek, California; Ngatchou, Olivier Duplex [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Brettin, Thomas S [ORNL; Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [Joint Genome Institute, Walnut Creek, California; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [Joint Genome Institute, Walnut Creek, California; Bristow, James [Joint Genome Institute, Walnut Creek, California; Eisen, Jonathan [Joint Genome Institute, Walnut Creek, California; Markowitz, Victor [Joint Genome Institute, Walnut Creek, California; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [Joint Genome Institute, Walnut Creek, California

2011-01-01

281

Complete genome sequence of Spirosoma linguale type strain (1T)  

SciTech Connect

Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

Lail, Kathleen [U.S. Department of Energy, Joint Genome Institute; Sikorski, Johannes [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Schutze, Andrea [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Tindall, Brian [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chen, Feng [U.S. Department of Energy, Joint Genome Institute

2010-01-01

282

Complete genome sequence of Gordonia bronchialis type strain (3410T)  

PubMed Central

Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Ivanova, Natalia; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Saunders, Elizabeth; Han, Cliff; Detter, John C.; Brettin, Thomas; Rohde, Manfred; Goker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C.

2010-01-01

283

Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)  

SciTech Connect

Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

2009-05-20

284

Isolation and bioinformatics analysis of differentially methylated genomic fragments in human gastric cancer  

Microsoft Academic Search

AIM: To isolate and analyze the DNA sequences which are methylated differentially between gastric cancer and normal gastric mucosa. METHODS: The differentially methylated DNA sequences between gastric cancer and normal gastric mucosa were isolated by methylation-sensitive representational difference analysis (MS-RDA). Similarities between the separated fragments and the human genomic DNA were analyzed with Basic Local Alignment Search Tool (BLAST). RESULTS:

Ai-Jun Liao; Qi Su; Xun Wang; Bin Zeng; Wei Shi

285

Curated list of prokaryote viruses with fully sequenced genomes  

Microsoft Academic Search

Genome sequencing is of enormous importance for classification of prokaryote viruses and for understanding the evolution of these viruses. This survey covers 284 sequenced viruses for which a full description has been published and for which the morphology is known. This corresponds to 219 (4%) of tailed and 75 (36%) of tailless viruses of prokaryotes. The number of sequenced tailless

Hans-W. Ackermann; Andrew M. Kropinski

2007-01-01

286

Cataloging Coding Sequence Variations in Human Genome Databases  

Microsoft Academic Search

BackgroundWith the recent growth of information on sequence variations in the human genome, predictions regarding the functional effects and relevance to disease phenotypes of coding sequence variations are becoming increasingly important. The aims of this study were to catalog protein-coding sequence variations (CVs) occurring in genetic variation databases and to use bioinformatic programs to analyze CVs. In addition, we aim

Hong-Hee Won; Hee-Jin Kim; Kyung-A. Lee; Jong-Won Kim; Cecile Fairhead

2008-01-01

287

Indexing Huge Genome Sequences for Solving Various Problems  

Microsoft Academic Search

Because of the increase in the size of genome sequence databases, the importance of indexing the sequences for fast queries grows. Suffix trees and suffix arrays are used for simple queries. However these are not suitable for complicated queries from huge amount of sequences because the indices are stored in disk which has slow access speed. We propose storing the

Kunihiko Sadakane; Tetsuo Shibuya

2001-01-01

288

Complete genome sequence of Thioalkalivibrio sp. K90mix.  

PubMed

Thioalkalivibrio sp. K90mix is an obligately chemolithoautotrophic, natronophilic sulfur-oxidizing bacterium (SOxB) belonging to the family Ectothiorhodospiraceae within the Gammaproteobacteria. The strain was isolated from a mixture of sediment samples obtained from different soda lakes located in the Kulunda Steppe (Altai, Russia) based on its extreme potassium carbonate tolerance as an enrichment method. Here we report the complete genome sequence of strain K90mix and its annotation. The genome was sequenced within the Joint Genome Institute Community Sequencing Program, because of its relevance to the sustainable removal of sulfide from wastewater and gas streams. PMID:22675584

Muyzer, Gerard; Sorokin, Dimitry Y; Mavromatis, Konstantinos; Lapidus, Alla; Foster, Brian; Sun, Hui; Ivanova, Natalia; Pati, Amrita; D'haeseleer, Patrik; Woyke, Tanja; Kyrpides, Nikos C

2011-12-23

289

Management of incidental findings in clinical genomic sequencing.  

PubMed

Genomic sequencing is becoming accurate, fast, and inexpensive, and is rapidly being incorporated into clinical practice. Incidental findings, which result in large numbers from genomic sequencing, are a potential barrier to the utility of this new technology due to their high prevalence and the lack of evidence or guidelines available to guide their clinical interpretation. This unit reviews the definition, classification, and management of incidental findings from genomic sequencing. The unit focuses on the clinical aspects of handling incidental findings, with an emphasis on the key role of clinical context in defining incidental findings and determining their clinical relevance and utility. PMID:23595601

Krier, Joel B; Green, Robert C

2013-01-01

290

Complete genome sequence of Staphylothermus hellenicus P8T  

SciTech Connect

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Wirth, Reinhard [Universitat Regensburg, Regensburg, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Davenport, Karen W. [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

2011-01-01

291

Complete genome sequence of Staphylothermus hellenicus P8.  

PubMed

Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phylum Crenarchaeota. Strain P8(T) is the type strain of the species and was isolated from a shallow hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the complete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein-coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) Laboratory Sequencing Program (LSP) project. PMID:22180806

Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne; Pitluck, Samuel; Davenport, Karen; Detter, John C; Han, Cliff; Tapia, Roxanne; Land, Miriam; Hauser, Loren; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos; Ivanova, Natalia

2011-09-23

292

Complete genome sequence of Thioalkalivibrio sp. K90mix  

PubMed Central

Thioalkalivibrio sp. K90mix is an obligately chemolithoautotrophic, natronophilic sulfur-oxidizing bacterium (SOxB) belonging to the family Ectothiorhodospiraceae within the Gammaproteobacteria. The strain was isolated from a mixture of sediment samples obtained from different soda lakes located in the Kulunda Steppe (Altai, Russia) based on its extreme potassium carbonate tolerance as an enrichment method. Here we report the complete genome sequence of strain K90mix and its annotation. The genome was sequenced within the Joint Genome Institute Community Sequencing Program, because of its relevance to the sustainable removal of sulfide from wastewater and gas streams.

Muyzer, Gerard; Sorokin, Dimitry Y.; Mavromatis, Konstantinos; Lapidus, Alla; Foster, Brian; Sun, Hui; Ivanova, Natalia; Pati, Amrita; D'haeseleer, Patrik; Woyke, Tanja; Kyrpides, Nikos C.

2011-01-01

293

Exploring Microbial Genome Sequences to Identify Protein Families on the Grid.  

National Technical Information Service (NTIS)

The analysis of microbial genome sequences can identify protein families that provide potential drug targets for new antibiotics. With the rapid accumulation of newly sequenced genomes, the analysis of complete genome sequences has become a computationall...

Y. Sun A. Wipat M. Pocock P. Lee K. Flanagan J. Worthington

2005-01-01

294

Next-generation sequencing reveals the secrets of the chronic lymphocytic leukemia genome.  

PubMed

The study of the detailed molecular history of cancer development is one of the most promising techniques to understand and fight this diverse and prevalent disease. Unfortunately, this history is as diverse as cancer itself. Therefore, even with next-generation sequencing techniques, it is not easy to distinguish significant (driver) from random (passenger) events. The International Cancer Genome Consortium (ICGC) was formed to solve this fundamental issue by coordinating the sequencing of samples from 50 different cancer types and/or sub-types that are of clinical and societal importance. The contribution of Spain in this consortium has been focused on chronic lymphocytic leukemia (CLL). This approach has unveiled new and unexpected events in the development of CLL. In this review, we introduce the approaches utilized by the consortium for the study of the CLL genome and discuss the recent results and future perspectives of this work. PMID:22911550

Ramsay, Andrew J; Martínez-Trillos, Alejandra; Jares, Pedro; Rodríguez, David; Kwarciak, Agnieszka; Quesada, Víctor

2012-08-22

295

Exon discovery by genomic sequence alignment  

Microsoft Academic Search

Motivation: During evolution, functional regions in ge- nomic sequences tend to be more highly conserved than randomly mutating 'junk DNA' so local sequence similarity often indicates biological functionality. This fact can be used to identify functional elements in large eukaryotic DNA sequences by cross-species sequence comparison. In recent years, several gene-prediction methods have been proposed that work by comparing anonymous

Burkhard Morgenstern; Oliver Rinner; Saïd Abdeddaïm; Dirk Haase; Klaus F. X. Mayer; Andreas W. M. Dress; Hans-werner Mewes

2002-01-01

296

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence  

Microsoft Academic Search

BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June

Joseph Cheung; Xavier Estivill; Razi Khaja; Jeffrey R MacDonald; Ken Lau; Lap-Chee Tsui; Stephen W Scherer

2003-01-01

297

Rapid Genome Evolution Revealed by Comparative Sequence Analysis of Orthologous Regions from Four Triticeae Genomes  

Microsoft Academic Search

Bread wheat (Triticum aestivum) is an allohexaploid species, consisting of three subgenomes (A, B, and D). To study the molecular evolution of these closely related genomes, we compared the sequence of a 307-kb physical contig covering the high molecular weight (HMW)-glutenin locus from the A genome of durum wheat (Triticum turgidum, AABB) with the orthologous regions from the B genome

Yong Qiang Gu; Devin Coleman-Derr; Xiuying Kong; Olin D. Anderson

2004-01-01

298

Characterizing and interpreting genetic variation from personal genome sequencing.  

PubMed

Since the completion of the human genome project, there has been enormous progress in the development of novel technologies for DNA sequencing. The advent of next-generation sequencing technologies now makes it possible to sequence an entire human genome in one or a few experiments. As a consequence, several individual human genomes have now been fully sequenced, using different experimental strategies. Although the protocols differ between the various sequencing technologies, the challenges of analyzing the data, calling variation, and interpreting the results are similar for all platforms. Here, we give an overview of the human genome sequencing projects completed to date. The strategies for aligning sequence reads and extracting information about different types of genetic variation from the sequence data are discussed. Identification of structural variation, such as copy number variation and insertion-deletion variants, can be complex, and there are a plethora of algorithms and analysis tools available. We also give an overview of the challenge of interpreting the whole-genome sequence data both from a technical and clinical perspective. PMID:22228021

Johansson, Anna C V; Feuk, Lars

2012-01-01

299

Visualizing associations between genome sequences and gene expression data using genome-mean expression profiles  

Microsoft Academic Search

The combination of genome-wide expression patterns and full genome sequences offers a great opportunity to further our understanding of the mechanisms and logic of transcriptional regulation. Many methods have been described that identify sequence motifs enriched in transcription control regions of genes that share similar gene expression patterns. Here we present an alternative approach that evaluates the transcriptional information contained

Derek Y. Chiang; Patrick O. Brown; Michael B. Eisen

2001-01-01

300

The Arabidopsis lyrata genome sequence and the basis of rapid genome size change  

SciTech Connect

In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

2011-04-29

301

Assembly of large genomes using second-generation sequencing  

PubMed Central

Second-generation sequencing technology can now be used to sequence an entire human genome in a matter of days and at low cost. Sequence read lengths, initially very short, have rapidly increased since the technology first appeared, and we now are seeing a growing number of efforts to sequence large genomes de novo from these short reads. In this Perspective, we describe the issues associated with short-read assembly, the different types of data produced by second-gen sequencers, and the latest assembly algorithms designed for these data. We also review the genomes that have been assembled recently from short reads and make recommendations for sequencing strategies that will yield a high-quality assembly.

Schatz, Michael C.; Delcher, Arthur L.; Salzberg, Steven L.

2010-01-01

302

Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome  

Microsoft Academic Search

BACKGROUND: The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences. RESULTS: Using a high-resolution annotation of TEs in Release 4 genome sequence, we

Casey M Bergman; Hadi Quesneville; Dominique Anxolabéhère; Michael Ashburner

2006-01-01

303

Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies.  

PubMed

The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina's HiSeq2000, Life Technologies' SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics' technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies' platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689

Rieber, Nora; Zapatka, Marc; Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland

2013-06-11

304

Prostate Cancer Genomics: Toward a New Understanding  

PubMed Central

Preface Recent genetics and genomics studies of prostate cancer help clarify the genetic basis of this common but complex disease. Genome-wide studies have detected numerous variants associated with disease as well as common gene fusions and expression ‘signatures’ in prostate tumors. Based on these results, some advocate gene-based individualized screening for prostate cancer, although such testing may only be worthwhile to distinguish disease aggressiveness. Lessons learned here provide strategies for further deciphering the genetic causes of prostate cancer and other diseases.

Witte, John S.

2009-01-01

305

Cancer genomics: from discovery science to personalized medicine  

Microsoft Academic Search

Recent advances in genome technologies and the ensuing outpouring of genomic information related to cancer have accelerated the convergence of discovery science and clinical medicine. Successful examples of translating cancer genomics into therapeutics and diagnostics reinforce its potential to make possible personalized cancer medicine. However, the bottlenecks along the path of converting a genome discovery into a tangible clinical endpoint

Jannik N Andersen; P Andrew Futreal; Lynda Chin

2011-01-01

306

[Genomic basis for breast cancer: advances in personalized medicine].  

PubMed

Genomic analysis of breast cancer has allowed the development of new tools for the prediction of recurrence and the response to treatment of this disease. Gene expression profiles allow better tumor classification, identifying tumor subgroups with particular clinical outcomes. New potential molecular targets involved in breast carcinogenesis have also been identified through the analysis of DNA copy number aberrations and microRNA expression patterns. Whole genome association studies have identified genetic variants associated with a higher risk to develop this tumor, providing more information for public health decisions. Progress in DNA sequencing methods will also allow for the analysis of all the genetic alterations present in a tumor. In this review, we describe the current state of genomic research in breast cancer as well as how these findings are being translated into clinical practice, contributing to development of personalized medicine. PMID:19967275

Hidalgo-Miranda, Alfredo; Jiménez-Sánchez, Gerardo

2009-01-01

307

Complete genome sequence of Cellulomonas flavigena type strain (134T)  

SciTech Connect

Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

Abt, Birte [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Foster, Brian [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Clum, Alicia [U.S. Department of Energy, Joint Genome Institute; Sun, Hui [U.S. Department of Energy, Joint Genome Institute; Pukall, Rudiger [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Tice, Hope [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Liolios, Konstantinos [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany

2010-01-01

308

Complete genome sequence of Haloterrigena turkmenica type strain (4k).  

PubMed

Haloterrigena turkmenica (Zvyagintseva and Tarasov 1987) Ventosa et al. 1999, comb. nov. is the type species of the genus Haloterrigena in the euryarchaeal family Halobacteriaceae. It is of phylogenetic interest because of the yet unclear position of the genera Haloterrigena and Natrinema within the Halobacteriaceae, which created some taxonomic problems historically. H. turkmenica, was isolated from sulfate saline soil in Turkmenistan, is a relatively fast growing, chemoorganotrophic, carotenoid-containing, extreme halophile, requiring at least 2 M NaCl for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Haloterrigena, but the eighth genome sequence from a member of the family Halobacteriaceae. The 5,440,782 bp genome (including six plasmids) with its 5,287 protein-coding and 63 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304683

Saunders, Elisabeth; Tindall, Brian J; Fähnrich, Regine; Lapidus, Alla; Copeland, Alex; Del Rio, Tijana Glavina; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Han, Cliff; Detter, John C; Bruce, David; Goodwin, Lynne; Chain, Patrick; Pitluck, Sam; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Rohde, Manfred; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

2010-02-28

309

Genome sequencing and analysis of the model grass Brachypodium distachyon  

SciTech Connect

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

Yang, Xiaohan [ORNL; Kalluri, Udaya C [ORNL; Tuskan, Gerald A [ORNL

2010-01-01

310

A comprehensive catalogue of somatic mutations from a human cancer genome  

Microsoft Academic Search

All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer.

Erin D. Pleasance; R. Keira Cheetham; Philip J. Stephens; David J. McBride; Sean J. Humphray; Chris D. Greenman; Ignacio Varela; Meng-Lay Lin; Gonzalo R. Ordóñez; Graham R. Bignell; Kai Ye; Julie Alipaz; Markus J. Bauer; David Beare; Adam Butler; Richard J. Carter; Lina Chen; Anthony J. Cox; Sarah Edkins; Paula I. Kokko-Gonzales; Niall A. Gormley; Russell J. Grocock; Christian D. Haudenschild; Matthew M. Hims; Terena James; Mingming Jia; Zoya Kingsbury; Catherine Leroy; John Marshall; Andrew Menzies; Laura J. Mudie; Zemin Ning; Tom Royce; Ole B. Schulz-Trieglaff; Anastassia Spiridou; Lucy A. Stebbings; Lukasz Szajkowski; Jon Teague; David Williamson; Lynda Chin; Mark T. Ross; Peter J. Campbell; David R. Bentley; P. Andrew Futreal; Michael R. Stratton

2010-01-01

311

Genome analysis: A new approach for visualization of sequence organization in genomes  

Microsoft Academic Search

In this article we describe and demonstrate the versatility of a computer program, GENOME MAPPING, that uses interactive graphics\\u000a and runs on an IRIS workstation. The program helps to visualize as well as analyse global and local patterns of genomic DNA\\u000a sequences. It was developed keeping in mind the requirements of the human genome sequencing programme, which requires rapid\\u000a analysis

Pradeep Kumar Burma; Alok Raj; Jayant K. Deb; Samir K. Brahmachari

1992-01-01

312

Mitochondrial genome sequences and comparative genomics of Phytophthora ramorum and P. sojae  

Microsoft Academic Search

The sequences of the mitochondrial genomes of the oomycetes Phytophthora ramorum and P. sojae were determined during the course of complete nuclear genome sequencing (Tyler et al., Science, 313:1261,2006). Both mitochondrial\\u000a genomes are circular mapping, with sizes of 39,314 bp for P. ramorum and 42,977 bp for P. sojae. Each contains a total of 37 recognizable protein-encoding genes, 26 or 25 tRNAs (P.

Frank N. Martin; Douda Bensasson; Brett M. Tyler; Jeffrey L. Boore

2007-01-01

313

Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae  

Microsoft Academic Search

The 2,160,267 bp genome sequence of Streptococcus agalactiae, the leading cause of bacterial sepsis, pneumonia, and meningitis in neonates in the U.S. and Europe, is predicted to encode 2,175 genes. Genome comparisons among S. agalactiae, Streptococcus pneumoniae, Streptococcus pyogenes, and the other completely sequenced genomes identified genes specific to the streptococci and to S. agalactiae. These in silico analyses, combined

Hervé Tettelin; Vega Masignani; Michael J. Cieslewicz; Jonathan A. Eisen; Scott Peterson; Michael R. Wessels; Ian T. Paulsen; Karen E. Nelson; Immaculada Margarit; Timothy D. Read; Lawrence C. Madoff; Alex M. Wolf; Maureen J. Beanan; Lauren M. Brinkac; Sean C. Daugherty; Robert T. Deboy; A. Scott Durkin; James F. Kolonay; Ramana Madupu; Matthew R. Lewis; Diana Radune; Nadezhda B. Fedorova; David Scanlan; Hoda Khouri; Stephanie Mulligan; Heather A. Carty; Robin T. Cline; Susan E. van Aken; John Gill; Maria Scarselli; Marirosa Mora; Emilia T. Iacobini; Cecilia Brettoni; Giuliano Galli; Massimo Mariani; Filippo Vegni; Domenico Maione; Daniela Rinaudo; Rino Rappuoli; John L. Telford; Dennis L. Kasper; Guido Grandi; Claire M. Fraser

2002-01-01

314

Genome Science and Personalized Cancer Treatment  

SciTech Connect

Summer Lecture Series 2009: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

Gray, Joe

2009-08-04

315

Next-Generation Sequence Analysis of Cancer Xenograft Models  

PubMed Central

Next-generation sequencing (NGS) studies in cancer are limited by the amount, quality and purity of tissue samples. In this situation, primary xenografts have proven useful preclinical models. However, the presence of mouse-derived stromal cells represents a technical challenge to their use in NGS studies. We examined this problem in an established primary xenograft model of small cell lung cancer (SCLC), a malignancy often diagnosed from small biopsy or needle aspirate samples. Using an in silico strategy that assign reads according to species-of-origin, we prospectively compared NGS data from primary xenograft models with matched cell lines and with published datasets. We show here that low-coverage whole-genome analysis demonstrated remarkable concordance between published genome data and internal controls, despite the presence of mouse genomic DNA. Exome capture sequencing revealed that this enrichment procedure was highly species-specific, with less than 4% of reads aligning to the mouse genome. Human-specific expression profiling with RNA-Seq replicated array-based gene expression experiments, whereas mouse-specific transcript profiles correlated with published datasets from human cancer stroma. We conclude that primary xenografts represent a useful platform for complex NGS analysis in cancer research for tumours with limited sample resources, or those with prominent stromal cell populations.

Rossello, Fernando J.; Tothill, Richard W.; Britt, Kara; Marini, Kieren D.; Falzon, Jeanette; Thomas, David M.; Peacock, Craig D.; Marchionni, Luigi; Li, Jason; Bennett, Samara; Tantoso, Erwin; Brown, Tracey; Chan, Philip; Martelotto, Luciano G.; Watkins, D. Neil

2013-01-01

316

High-quality genome sequence of Pichia pastoris CBS7435.  

PubMed

The methylotrophic yeast Pichia pastoris (Komagataella phaffii) CBS7435 is the parental strain of commonly used P. pastoris recombinant protein production hosts making it well suited for improving the understanding of associated genomic features. Here, we present a 9.35 Mbp high-quality genome sequence of P. pastoris CBS7435 established by a combination of 454 and Illumina sequencing. An automatic annotation of the genome sequence yielded 5007 protein-coding genes, 124 tRNAs and 29 rRNAs. Moreover, we report the complete DNA sequence of the first mitochondrial genome of a methylotrophic yeast. Fifteen genes encoding proteins, 2 rRNA and 25 tRNA loci were identified on the 35.7 kbp circular, mitochondrial DNA. Furthermore, the architecture of the putative alpha mating factor protein of P. pastoris CBS7435 turned out to be more complex than the corresponding protein of Saccharomyces cerevisiae. PMID:21575661

Küberl, Andreas; Schneider, Jessica; Thallinger, Gerhard G; Anderl, Ingund; Wibberg, Daniel; Hajek, Tanja; Jaenicke, Sebastian; Brinkrolf, Karina; Goesmann, Alexander; Szczepanowski, Rafael; Pühler, Alfred; Schwab, Helmut; Glieder, Anton; Pichler, Harald

2011-05-06

317

Genome Sequence of the Fish Pathogen Flavobacterium columnare ATCC 49512  

PubMed Central

Flavobacterium columnare is a Gram-negative, rod-shaped, motile, and highly prevalent fish pathogen causing columnaris disease in freshwater fish worldwide. Here, we present the complete genome sequence of F. columnare strain ATCC 49512.

Tekedar, Hasan C.; Karsi, Attila; Gillaspy, Allison F.; Dyer, David W.; Benton, Nicole R.; Zaitshik, Jeremy; Vamenta, Stefanie; Banes, Michelle M.; Gulsoy, Nagihan; Aboko-Cole, Mary; Waldbieser, Geoffrey C.

2012-01-01

318

Genome Sequence of the Halophilic Archaeon Halococcus hamelinensis  

PubMed Central

Halococcus hamelinensis was isolated from hypersaline stromatolites in Shark Bay, Australia. Here we report the genome sequence (3,133,046 bp) of H. hamelinensis, which provides insights into the ecology, evolution, and adaptation of this novel microorganism.

Gudhka, Reema K.; Neilan, Brett A.

2012-01-01

319

Complete Genome Sequence of Pseudomonas denitrificans ATCC 13867  

PubMed Central

Pseudomonas denitrificans ATCC 13867, a Gram-negative facultative anaerobic bacterium, is known to produce vitamin B12 under aerobic conditions. This paper reports the annotated whole-genome sequence of the circular chromosome of this organism.

Ainala, Satish Kumar; Somasundar, Ashok

2013-01-01

320

Complete Genome Sequences of Six Strains of the Genus Methylobacterium  

SciTech Connect

The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

Marx, Christopher J [Harvard University; Bringel, Francoise O. [University of Strasbourg; Christoserdova, Ludmila [University of Washington, Seattle; Moulin, Lionel [UMR, France; UI Hague, Muhammad Farhan [University of Strasbourg; Fleischman, Darrell E. [Wright State University, Dayton, OH; Gruffaz, Christelle [CNRS, Strasbourg, France; Jourand, Philippe [UMR, France; Knief, Claudia [ETH Zurich, Switzerland; Lee, Ming-Chun [Harvard University; Muller, Emilie E. L. [CNRS, Strasbourg, France; Nadalig, Thierry [CNRS, Strasbourg, France; Peyraud, Remi [ETH Zurich, Switzerland; Roselli, Sandro [CNRS, Strasbourg, France; Russ, Lina [ETH Zurich, Switzerland; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Ivanov, Pavel S. [University of Wyoming, Laramie; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lajus, Aurelie [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Land, Miriam L [ORNL; Medigue, Claudine [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Stolyar, Sergey [University of Washington; Vorholt, Julia A. [ETH Zurich, Switzerland; Vuilleumier, Stephane [University of Strasbourg

2012-01-01

321

Complete genome sequences of six strains of the genus methylobacterium  

SciTech Connect

The complete and assembled genome sequences were determined for six strains of the alphaproteobacterial genus Methylobacterium, chosen for their key adaptations to different plant-associated niches and environmental constraints.

Marx, Christopher J [Harvard University; Bringel, Francoise O. [University of Strasbourg; Christoserdova, Ludmila [University of Washington, Seattle; Moulin, Lionel [UMR, France; Farhan Ul Haque, Muhammad [CNRS, Strasbourg, France; Fleischman, Darrell E. [Wright State University, Dayton, OH; Gruffaz, Christelle [CNRS, Strasbourg, France; Jourand, Philippe [UMR, France; Knief, Claudia [ETH Zurich, Switzerland; Lee, Ming-Chun [Harvard University; Muller, Emilie E. L. [CNRS, Strasbourg, France; Nadalig, Thierry [CNRS, Strasbourg, France; Peyraud, Remi [ETH Zurich, Switzerland; Roselli, Sandro [CNRS, Strasbourg, France; Russ, Lina [ETH Zurich, Switzerland; Aguero, Fernan [Universidad Nacional de General San Martin; Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Lajus, Aurelie [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Land, Miriam L [ORNL; Medigue, Claudine [Genoscope/Centre National de la Recherche Scientifique-Unite Mixte de Recherche; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Stolyar, Sergey [University of Washington; Vorholt, Julia A. [ETH Zurich, Switzerland; Vuilleumier, Stephane [University of Strasbourg

2012-01-01

322

Complete Genome Sequence of Rahnella aquatilis CIP 78.65  

PubMed Central

Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

Bruce, David; Detter, Chris; Goodwin, Lynne A.; Han, James; Han, Cliff S.; Held, Brittany; Land, Miriam L.; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobecky, Patricia A.

2012-01-01

323

Genome Sequence of the Immunomodulatory Strain Bifidobacterium bifidum LMG 13195  

PubMed Central

In this work, we report the genome sequences of Bifidobacterium bifidum strain LMG13195. Results from our research group show that this strain is able to interact with human immune cells, generating functional regulatory T cells.

Gueimonde, Miguel; Ventura, Marco; Margolles, Abelardo

2012-01-01

324

Draft Genome Sequence of Lactobacillus casei W56  

PubMed Central

We announce the draft genome sequence of Lactobacillus casei W56 in one contig. This strain shows immunomodulatory and probiotic properties. The strain is also an ingredient of commercially available probiotic products.

Hochwind, Kerstin; Weinmaier, Thomas; Schmid, Michael; van Hemert, Saskia; Hartmann, Anton; Rattei, Thomas

2012-01-01

325

Bacterial epidemiology and biology - lessons from genome sequencing  

PubMed Central

Next-generation sequencing has ushered in a new era of microbial genomics, enabling the detailed historical and geographical tracing of bacteria. This is helping to shape our understanding of bacterial evolution.

2011-01-01

326

Sequencing of Chloroplast Genome Using Whole Cellular DNA and Solexa Sequencing Technology  

PubMed Central

Sequencing of the chloroplast (cp) genome using traditional sequencing methods has been difficult because of its size (>120?kb) and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the cp genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassica rapa accessions with one lane per accession. In total, 246, 362, and 361?Mb sequence data were generated for the three accessions Chiifu-401-42, Z16, and FT, respectively. Micro-reads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7–99.8 or 95.5–99.7% of the B. rapa cp genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of cp genome.

Wu, Jian; Liu, Bo; Cheng, Feng; Ramchiary, Nirala; Choi, Su Ryun; Lim, Yong Pyo; Wang, Xiao-Wu

2012-01-01

327

Sequencing of chloroplast genome using whole cellular DNA and solexa sequencing technology.  

PubMed

Sequencing of the chloroplast (cp) genome using traditional sequencing methods has been difficult because of its size (>120?kb) and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the cp genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassicarapa accessions with one lane per accession. In total, 246, 362, and 361?Mb sequence data were generated for the three accessions Chiifu-401-42, Z16, and FT, respectively. Micro-reads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7-99.8 or 95.5-99.7% of the B. rapa cp genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of cp genome. PMID:23162558

Wu, Jian; Liu, Bo; Cheng, Feng; Ramchiary, Nirala; Choi, Su Ryun; Lim, Yong Pyo; Wang, Xiao-Wu

2012-11-08

328

Compressing Genomic Sequence Fragments Using SlimGene  

NASA Astrophysics Data System (ADS)

With the advent of next generation sequencing technologies, the cost of sequencing whole genomes is poised to go below 1000 per human individual in a few years. As more and more genomes are sequenced, analysis methods are undergoing rapid development, making it tempting to store sequencing data for long periods of time so that the data can be re-analyzed with the latest techniques. The challenging open research problems, huge influx of data, and rapidly improving analysis techniques have created the need to store and transfer very large volumes of data.

Kozanitis, Christos; Saunders, Chris; Kruglyak, Semyon; Bafna, Vineet; Varghese, George

329

Complete genome sequence of Treponema pallidum strain DAL-1  

PubMed Central

Treponema pallidum strain DAL-1 is a human uncultivable pathogen causing the sexually transmitted disease syphilis. Strain DAL-1 was isolated from the amniotic fluid of a pregnant woman in the secondary stage of syphilis. Here we describe the 1,139,971 bp long genome of T. pallidum strain DAL-1 which was sequenced using two independent sequencing methods (454 pyrosequencing and Illumina). In rabbits, strain DAL-1 replicated better than the T. pallidum strain Nichols. The comparison of the complete DAL-1 genome sequence with the Nichols sequence revealed a list of genetic differences that are potentially responsible for the increased rabbit virulence of the DAL-1 strain.

Zobanikova, Marie; Mikolka, Pavol; Cejkova, Darina; Pospisilova, Petra; Chen, Lei; Strouhal, Michal; Qin, Xiang; Weinstock, George M.; Smajs, David

2012-01-01

330

An integrated semiconductor device enabling non-optical genome sequencing  

Microsoft Academic Search

The seminal importance of DNA sequencing to the life sciences, biotechnology and medicine has driven the search for more scalable and lower-cost solutions. Here we describe a DNA sequencing technology in which scalable, low-cost semiconductor manufacturing techniques are used to make an integrated circuit able to directly perform non-optical DNA sequencing of genomes. Sequence data are obtained by directly sensing

Wolfgang Hinz; Todd M. Rearick; Jonathan Schultz; William Mileski; Mel Davey; John H. Leamon; Kim Johnson; Mark J. Milgrew; Matthew Edwards; Jeremy Hoon; Jan F. Simons; David Marran; Jason W. Myers; John F. Davidson; Annika Branting; John R. Nobile; Bernard P. Puc; David Light; Travis A. Clark; Martin Huber; Jeffrey T. Branciforte; Isaac B. Stoner; Simon E. Cawley; Michael Lyons; Yutao Fu; Nils Homer; Marina Sedova; Xin Miao; Brian Reed; Jeffrey Sabina; Erika Feierstein; Michelle Schorn; Mohammad Alanjary; Eileen Dimalanta; Devin Dressman; Rachel Kasinskas; Tanya Sokolsky; Jacqueline A. Fidanza; Eugeni Namsaraev; Kevin J. McKernan; Alan Williams; G. Thomas Roth; James Bustillo; Jonathan M. Rothberg

2011-01-01

331

A non-radioactive multiprime sequencing method for HIV genomes  

Microsoft Academic Search

A manual non-radioactive DNA sequencing protocol was developed for rapid analysis of variable HIV-1 genomes. Sets of up to ten primers were used in one sequencing reaction. After polyacrylamide gel electrophoresis and blotting onto nylon membranes the individual sequences were detected by hybridization with digoxigenin-labelled oligonucleotides and chemiluminescence. The method is applicable to any sequencing project where numerous variants of

Jutta Huber; Wolfgang Hell; Hans Wolf

1995-01-01

332

Intra-species sequence comparisons for annotating genomes  

SciTech Connect

Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

2004-07-15

333

Complete Genome Sequence of Methanomassiliicoccus luminyensis, the Largest Genome of a Human-Associated Archaea Species  

PubMed Central

The present study describes the complete and annotated genome sequence of Methanomassiliicoccus luminyensis strain B10 (DSM 24529T, CSUR P135), which was isolated from human feces. The 2.6-Mb genome represents the largest genome of a methanogenic euryarchaeon isolated from humans. The genome data of M. luminyensis reveal unique features and horizontal gene transfer events, which might have occurred during its adaptation and/or evolution in the human ecosystem.

Gorlas, Aurore; Robert, Catherine; Gimenez, Gregory; Drancourt, Michel

2012-01-01

334

The Genomic HyperBrowser: inferential genomics at the sequence level  

PubMed Central

The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no.

2010-01-01

335

Microsatellite evolution inferred from human- chimpanzee genomic sequence alignments  

Microsoft Academic Search

Most studies of microsatellite evolution utilize long, highly mutable loci, which are unrepresentative of the majority of simple repeats in the human genome. Here we use an unbiased sample of 2,467 microsatellite loci derived from alignments of 5.1 Mb of genomic sequence from human and chimpanzee to investigate the mutation process of tandemly repetitive DNA. The results indicate that the

Matthew T. Webster; Nick G. C. Smith; Hans Ellegren

2002-01-01

336

Insights into hominid evolution from the gorilla genome sequence  

Microsoft Academic Search

Gorillas are humans’ closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing

Aylwyn Scally; Julien Y. Dutheil; LaDeana W. Hillier; Gregory E. Jordan; Ian Goodhead; Javier Herrero; Asger Hobolth; Tuuli Lappalainen; Thomas Mailund; Tomas Marques-Bonet; Shane McCarthy; Stephen H. Montgomery; Petra C. Schwalie; Y. Amy Tang; Michelle C. Ward; Yali Xue; Bryndis Yngvadottir; Can Alkan; Lars N. Andersen; Qasim Ayub; Edward V. Ball; Kathryn Beal; Brenda J. Bradley; Yuan Chen; Chris M. Clee; Stephen Fitzgerald; Tina A. Graves; Yong Gu; Paul Heath; Andreas Heger; Emre Karakoc; Anja Kolb-Kokocinski; Gavin K. Laird; Gerton Lunter; Stephen Meader; Matthew Mort; James C. Mullikin; Kasper Munch; Timothy D. O’Connor; Andrew D. Phillips; Javier Prado-Martinez; Anthony S. Rogers; Saba Sajjadian; Dominic Schmidt; Katy Shaw; Jared T. Simpson; Peter D. Stenson; Daniel J. Turner; Linda Vigilant; Albert J. Vilella; Weldon Whitener; Baoli Zhu; David N. Cooper; Pieter de Jong; Emmanouil T. Dermitzakis; Evan E. Eichler; Paul Flicek; Nick Goldman; Nicholas I. Mundy; Zemin Ning; Duncan T. Odom; Chris P. Ponting; Michael A. Quail; Oliver A. Ryder; Stephen M. Searle; Wesley C. Warren; Richard K. Wilson; Mikkel H. Schierup; Jane Rogers; Chris Tyler-Smith; Richard Durbin

2012-01-01

337

Genome Sequence of Pectobacterium sp. Strain SCC3193  

PubMed Central

We report the complete and annotated genome sequence of the plant-pathogenic enterobacterium Pectobacterium sp. strain SCC3193, a model strain isolated from potato in Finland. The Pectobacterium sp. SCC3193 genome consists of a 516,411-bp chromosome, with no plasmids.

Koskinen, J. Patrik; Laine, Pia; Niemi, Outi; Nykyri, Johanna; Harjunpaa, Heidi; Auvinen, Petri; Paulin, Lars; Pirhonen, Minna; Palva, Tapio

2012-01-01

338

A snapshot of the emerging tomato genome sequence  

Technology Transfer Automated Retrieval System (TEKTRAN)

The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...

339

Draft genome sequence of Paenibacillus peoriae strain KCTC 3763T.  

PubMed

Paenibacillus peoriae is a potentially plant-beneficial soil bacterium and is a close relative to Paenibacillus polymyxa, the type species of the genus Paenibacillus. Herein, we present the 5.77-Mb draft genome sequence of the P. peoriae type strain with the aim of providing insight into the genomic basis of plant growth-promoting Paenibacillus species. PMID:22328743

Jeong, Haeyoung; Choi, Soo-Keun; Park, Soo-Young; Kim, Sun Hong; Park, Seung-Hwan

2012-03-01

340

The Genomic Sequence of the Accidental Pathogen Legionella pneumophila  

Microsoft Academic Search

We present the genomic sequence of Legionella pneumophila, the bacterial agent of Legionnaires' disease, a potentially fatal pneumonia acquired from aerosolized contaminated fresh water. The genome includes a 45-kilobase pair element that can exist in chromosomal and episomal forms, selective expansions of important gene families, genes for unexpected metabolic pathways, and previously unknown candidate virulence determinants. We highlight the genes

Minchen Chien; Irina Morozova; Shundi Shi; Huitao Sheng; Jing Chen; Shawn M. Gomez; Gifty Asamani; Kendra Hill; John Nuara; Marc Feder; Justin Rineer; Joseph J. Greenberg; Valeria Steshenko; Samantha H. Park; Baohui Zhao; Elita Teplitskaya; John R. Edwards; Sergey Pampou; Anthi Georghiou; I.-Chun Chou; William Iannuccilli; Michael E. Ulz; Dae H. Kim; Alex Geringer-Sameth; Curtis Goldsberry; Pavel Morozov; Stuart G. Fischer; Gil Segal; Xiaoyan Qu; Andrey Rzhetsky; Peisen Zhang; Eftihia Cayanis; Pieter J. De Jong; Jingyue Ju; Sergey Kalachikov; Howard A. Shuman; James J. Russo

2004-01-01

341

Draft Genome Sequence of Avibacterium paragallinarum Strain 221.  

PubMed

Avibacterium paragallinarum is the causative agent of infectious coryza. Here we report the draft genome sequence of reference strain 221 of A. paragallinarum serovar A. The genome is composed of 135 contigs for 2,685,568 bp with a 41% G+C content. PMID:23704189

Xu, Fuzhou; Miao, Deyuan; Du, Yu; Chen, Xiaoling; Zhang, Peijun; Sun, Huiling

2013-05-23

342

A Cryptographic Approach to Securely Share and Query Genomic Sequences  

Microsoft Academic Search

To support large-scale biomedical research projects, organizations need to share person-specific genomic sequences without violating the privacy of their data subjects. In the past, organizations protected subjects' identities by removing identifiers, such as name and social security number; however, recent investigations illustrate that deidentified genomic data can be ldquoreidentifiedrdquo to named individuals using simple automated methods. In this paper, we

Murat Kantarcioglu; Ying Liu; Bradley Malin

2008-01-01

343

Complete Genome Sequence of the Soil Actinomycete Kocuria rhizophila  

Microsoft Academic Search

The soil actinomycete Kocuria rhizophila belongs to the suborder Micrococcineae, a divergent bacterial group for which only a limited amount of genomic information is currently available. K. rhizophila is also important in industrial applications; e.g., it is commonly used as a standard quality control strain for antimicrobial susceptibility testing. Sequencing and annotation of the genome of K. rhizophila DC2201 (NBRC

Hiromi Takarada; Mitsuo Sekine; Hiroki Kosugi; Yasunori Matsuo; Takatomo Fujisawa; Seiha Omata; Emi Kishi; Ai Shimizu; Naofumi Tsukatani; Satoshi Tanikawa; Nobuyuki Fujita; Shigeaki Harayama

2008-01-01

344

Complete Genome Sequence of Cyanobacterial Siphovirus KBS2A.  

PubMed

We present the genome of a cyanosiphovirus (KBS2A) that infects a marine Synechococcus sp. (strain WH7803). Unique to this genome, relative to other sequenced cyanosiphoviruses, is the absence of elements associated with integration into the host chromosome, suggesting this virus may not be able to establish a lysogenic relationship. PMID:23969045

Ponsero, Alise J; Chen, Feng; Lennon, Jay T; Wilhelm, Steven W

2013-08-22

345

Complete Genome Sequence of Antarctic Bacterium Psychrobacter sp. Strain G.  

PubMed

Here, we report the complete genome sequence of Psychrobacter sp. strain G, isolated from King George Island, Antarctica, which can produce lipolytic enzymes at low temperatures. The genomics information of this strain will facilitate the study of the physiology, cold adaptation properties, and evolution of this genus. PMID:24051316

Che, Shuai; Song, Lai; Song, Weizhi; Yang, Meng; Liu, Guiming; Lin, Xuezheng

2013-09-19

346

Taxonomy becoming a driving force in genome sequencing projects.  

PubMed

We studied the possible impact of genomic projects by comparing the number of published articles before and after the completion of the project. We found that for most species, there is no significant change in the number of citations. Also our study remarks the growing importance of taxonomy as main motivation for the sequencing of genomes. PMID:23453737

Tamames, Javier; Durante-Rodríguez, Gonzalo

2013-03-01

347

The genome sequence and structure of rice chromosome 1  

Microsoft Academic Search

The rice species Oryza sativa is considered to be a model plant because of its small genome size, extensive genetic map, relative ease of transformation and synteny with other cereal crops. Here we report the essentially complete sequence of chromosome 1, the longest chromosome in the rice genome. We summarize characteristics of the chromosome structure and the biological insight gained

Takuji Sasaki; Takashi Matsumoto; Kimiko Yamamoto; Katsumi Sakata; Tomoya Baba; Yuichi Katayose; Jianzhong Wu; Yoshihito Niimura; Zhukuan Cheng; Yoshiaki Nagamura; Baltazar A. Antonio; Hiroyuki Kanamori; Satomi Hosokawa; Masatoshi Masukawa; Koji Arikawa; Yoshino Chiden; Mika Hayashi; Masako Okamoto; Tsuyu Ando; Hiroyoshi Aoki; Kohei Arita; Masao Hamada; Chizuko Harada; Saori Hijishita; Mikiko Honda; Yoko Ichikawa; Atsuko Idonuma; Masumi Iijima; Michiko Ikeda; Maiko Ikeno; Sachie Ito; Tomoko Ito; Yuichi Ito; Yukiyo Ito; Aki Iwabuchi; Kozue Kamiya; Wataru Karasawa; Satoshi Katagiri; Ari Kikuta; Noriko Kobayashi; Izumi Kono; Kayo Machita; Tomoko Maehara; Hiroshi Mizuno; Tatsumi Mizubayashi; Yoshiyuki Mukai; Hideki Nagasaki; Marina Nakashima; Yuko Nakama; Yumi Nakamichi; Mari Nakamura; Nobukazu Namiki; Manami Negishi; Isamu Ohta; Nozomi Ono; Shoko Saji; Kumiko Sakai; Michie Shibata; Takanori Shimokawa; Ayahiko Shomura; Jianyu Song; Yuka Takazaki; Kimihiro Terasawa; Kumiko Tsuji; Kazunori Waki; Harumi Yamagata; Hiroko Yamane; Shoji Yoshiki; Rie Yoshihara; Kazuko Yukawa; Huisun Zhong; Hisakazu Iwama; Toshinori Endo; Hidetaka Ito; Jang Ho Hahn; Ho-Il Kim; Moo-Young Eun; Masahiro Yano; Jiming Jiang; Takashi Gojobori

2002-01-01

348

Sequence Analysis of the Genome of the Neodiprion sertifer Nucleopolyhedrovirus  

Microsoft Academic Search

The genome of the Neodiprion sertifer nucleopolyhedrovirus (NeseNPV), which infects the European pine sawfly, N. sertifer (Hymenoptera: Diprionidae), was sequenced and analyzed. The genome was 86,462 bp in size. The CG content of 34% was lower than that of the majority of baculoviruses. A total of 90 methionine- initiated open reading frames (ORFs) with more than 50 amino acids and

Alejandra Garcia-Maruniak; James E. Maruniak; Paolo M. A. Zanotto; Aissa E. Doumbouya; Jaw-Ching Liu; Thomas M. Merritt; Jennifer S. Lanoie

2004-01-01

349

Triticeae genomics: advances in sequence analysis of large genome cereal crops.  

PubMed

Whole genome sequencing provides direct access to all genes of an organism and represents an essential step towards a systematic understanding of (crop) plant biology. Wheat and barley, two of the most important crop species worldwide, have two- to five-fold larger genomes than human - too large to be completely sequenced at current costs. Nevertheless, significant progress has been made to unlock the gene contents of these species by sequencing expressed sequence tags (EST) for high-density mapping and as a basis for elucidating gene function on a large scale. Several megabases of genomic (BAC) sequences have been obtained providing a first insight into the complexity of these huge cereal genomes. However, to fully exploit the information of the wheat and barley genomes for crop improvement, sequence analysis of a significantly larger portion of the Triticeae genomes is needed. In this review an overview of the current status of Triticeae genome sequencing and a perspective concerning future developments in cereal structural genomics is provided. PMID:17295124

Stein, Nils

2007-01-01

350

Genome sequence of the biocontrol strain Pseudomonas fluorescens F113.  

PubMed

Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A; Giddens, Stephen R; Coppoolse, Eric R; Muriel, Candela; Stiekema, Willem J; Rainey, Paul B; Dowling, David; O'Gara, Fergal; Martín, Marta; Rivilla, Rafael

2012-03-01

351

Complete Genome Sequences of Novel Rat Noroviruses in Hong Kong  

PubMed Central

We report two genome sequences of novel noroviruses isolated from fecal swab specimens of brown rats in Hong Kong. The complete genome is approximately 7.5 kb in length and consists of 3 overlapping open reading frames encoding ORF1 polyprotein, VP1, and VP2, respectively. Sequence analysis suggested that these noroviruses should be classified in genogroup V, but they are distinct from other known rodent noroviruses and represent a novel cluster within the genogroup.

Tse, Herman; Chan, Wan-Mui; Lam, Carol S. F.; Lau, Susanna K. P.; Woo, Patrick C. Y.

2012-01-01

352

Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113  

PubMed Central

Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms.

Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martinez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sanchez-Contreras, Maria; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martin, Marta

2012-01-01

353

Complete Genome Sequence of Bacillus cereus Bacteriophage PBC1  

PubMed Central

Bacillus cereus is a ubiquitous, spore-forming bacterium associated with food poisoning cases. To develop an efficient biocontrol agent against B. cereus, we isolated lytic phage PBC1 and sequenced its genome. PBC1 showed a very low degree of homology to previously reported phages, implying that it is novel. Here we report the complete genome sequence of PBC1 and describe major findings from our analysis.

Kong, Minsuk; Kim, Minsik

2012-01-01

354

The Genome Sequence of the SARS-Associated Coronavirus  

Microsoft Academic Search

We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously

Marco A. Marra; Steven J. M. Jones; Caroline R. Astell; Robert A. Holt; Angela Brooks-Wilson; Yaron S. N. Butterfield; Jaswinder Khattra; Jennifer K. Asano; Sarah A. Barber; Susanna Y. Chan; Alison Cloutier; Shaun M. Coughlin; Doug Freeman; Noreen Girn; Obi L. Griffith; Stephen R. Leach; Michael Mayo; Helen McDonald; Stephen B. Montgomery; Pawan K. Pandoh; Anca S. Petrescu; A. Gordon Robertson; Jacqueline E. Schein; Asim Siddiqui; Duane E. Smailus; Jeff M. Stott; George S. Yang; Francis Plummer; Anton Andonov; Harvey Artsob; Nathalie Bastien; Kathy Bernard; Timothy F. Booth; Donnie Bowness; Michael Drebot; Lisa Fernando; Ramon Flick; Michael Garbutt; Michael Garbutt; Allen Grolla; Heinz Feldmann; Adrienne Meyers; Amin Kabani; Yan Li; Susan Normand; Ute Stroher; Graham A. Tipples; Shaun Tyler; Robert Vogrig; Diane Ward; Robert C. Brunham; Mel Krajden; Martin Petric; Danuta M. Skowronski; Chris Upton; Rachel L. Roper

2003-01-01

355

Genome Sequence of Pantoea agglomerans Strain IG1  

PubMed Central

Pantoea agglomerans is a Gram-negative bacterium that grows symbiotically with various plants. Here we report the 4.8-Mb genome sequence of P. agglomerans strain IG1. The lipopolysaccharides derived from P. agglomerans IG1 have been shown to be effective in the prevention of various diseases, such as bacterial or viral infection, lifestyle-related diseases. This genome sequence represents a substantial step toward the elucidation of pathways for production of lipopolysaccharides.

Matsuzawa, Tomohiko; Mori, Kazuki; Kadowaki, Takeshi; Shimada, Misato; Tashiro, Kosuke; Kuhara, Satoru; Inagawa, Hiroyuki; Soma, Gen-ichiro

2012-01-01

356

Genome sequence of Pantoea agglomerans strain IG1.  

PubMed

Pantoea agglomerans is a gram-negative bacterium that grows symbiotically with various plants. Here we report the 4.8-Mb genome sequence of P. agglomerans strain IG1. The lipopolysaccharides derived from P. agglomerans IG1 have been shown to be effective in the prevention of various diseases, such as bacterial or viral infection, lifestyle-related diseases. This genome sequence represents a substantial step toward the elucidation of pathways for production of lipopolysaccharides. PMID:22328756

Matsuzawa, Tomohiko; Mori, Kazuki; Kadowaki, Takeshi; Shimada, Misato; Tashiro, Kosuke; Kuhara, Satoru; Inagawa, Hiroyuki; Soma, Gen-ichiro; Takegawa, Kaoru

2012-03-01

357

Complete Genome Sequence of Bifidobacterium bifidum S17?  

PubMed Central

Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome sequence will provide new insights into the biology of this potential probiotic organism and allow for the characterization of the molecular mechanisms underlying its beneficial properties.

Zhurina, Daria; Zomer, Aldert; Gleinser, Marita; Brancaccio, Vincenco Francesco; Auchter, Marc; Waidmann, Mark S.; Westermann, Christina; van Sinderen, Douwe; Riedel, Christian U.

2011-01-01

358

Complete genome sequence of Bifidobacterium bifidum S17.  

PubMed

Here, we report on the first completely annotated genome sequence of a Bifidobacterium bifidum strain. B. bifidum S17, isolated from feces of a breast-fed infant, was shown to strongly adhere to intestinal epithelial cells and has potent anti-inflammatory activity in vitro and in vivo. The genome sequence will provide new insights into the biology of this potential probiotic organism and allow for the characterization of the molecular mechanisms underlying its beneficial properties. PMID:21037011

Zhurina, Daria; Zomer, Aldert; Gleinser, Marita; Brancaccio, Vincenco Francesco; Auchter, Marc; Waidmann, Mark S; Westermann, Christina; van Sinderen, Douwe; Riedel, Christian U

2010-10-29

359

Sequencing viral genomes from a single isolated plaque  

PubMed Central

Background Whole genome sequencing of viruses and bacteriophages is often hindered because of the need for large quantities of genomic material. A method is described that combines single plaque sequencing with an optimization of Sequence Independent Single Primer Amplification (SISPA). This method can be used for de novo whole genome next-generation sequencing of any cultivable virus without the need for large-scale production of viral stocks or viral purification using centrifugal techniques. Methods A single viral plaque of a variant of the 2009 pandemic H1N1 human Influenza A virus was isolated and amplified using the optimized SISPA protocol. The sensitivity of the SISPA protocol presented here was tested with bacteriophage F_HA0480sp/Pa1651 DNA. The amplified products were sequenced with 454 and Illumina HiSeq platforms. Mapping and de novo assemblies were performed to analyze the quality of data produced from this optimized method. Results Analysis of the sequence data demonstrated that from a single viral plaque of Influenza A, a mapping assembly with 3590-fold average coverage representing 100% of the genome could be produced. The de novo assembled data produced contigs with 30-fold average sequence coverage, representing 96.5% of the genome. Using only 10 pg of starting DNA from bacteriophage F_HA0480sp/Pa1651 in the SISPA protocol resulted in sequencing data that gave a mapping assembly with 3488-fold average sequence coverage, representing 99.9% of the reference and a de novo assembly with 45-fold average sequence coverage, representing 98.1% of the genome. Conclusions The optimized SISPA protocol presented here produces amplified product that when sequenced will give high quality data that can be used for de novo assembly. The protocol requires only a single viral plaque or as little as 10 pg of DNA template, which will facilitate rapid identification of viruses during an outbreak and viruses that are difficult to propagate.

2013-01-01

360

Large-Scale Sequencing: The Future of Genomic Sciences Colloquium  

SciTech Connect

Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencing is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which

Margaret Riley; Merry Buckley

2009-01-01

361

GC tag-modified bisulfite genomic DNA sequencing for continuous methylation spectra  

US Patent & Trademark Office Database

The present invention relates to a tag-modified bisulfite genomic sequencing (tBGS) method developed for simplified evaluation of DNA methylation sites. The method employs direct cycle sequencing of PCR products at kilobase scale, without conventional DNA fragment cloning. The method entails subjecting bisulfite-modified genomic DNA to a second-round PCR amplification employing GC-tagged primers. The invention also relates to a method for identifying a patient at risk for lung cancer using the tBGS technique disclosed.

2010-12-14

362

Mitochondrial Genome Sequence of the Legume Vicia faba  

PubMed Central

The number of plant mitochondrial genomes sequenced exceeds two dozen. However, for a detailed comparative study of different phylogenetic branches more plant mitochondrial genomes should be sequenced. This article presents sequencing data and comparative analysis of mitochondrial DNA (mtDNA) of the legume Vicia faba. The size of the V. faba circular mitochondrial master chromosome of cultivar Broad Windsor was estimated as 588,000?bp with a genome complexity of 387,745?bp and 52 conservative mitochondrial genes; 32 of them encoding proteins, 3 rRNA, and 17 tRNA genes. Six tRNA genes were highly homologous to chloroplast genome sequences. In addition to the 52 conservative genes, 114 unique open reading frames (ORFs) were found, 36 without significant homology to any known proteins and 29 with homology to the Medicago truncatula nuclear genome and to other plant mitochondrial ORFs, 49 ORFs were not homologous to M. truncatula but possessed sequences with significant homology to other plant mitochondrial or nuclear ORFs. In general, the unique ORFs revealed very low homology to known closely related legumes, but several sequence homologies were found between V. faba, Beta vulgaris, Nicotiana tabacum, Vitis vinifera, and even the monocots Oryza sativa and Zea mays. Most likely these ORFs arose independently during angiosperm evolution (Kubo and Mikami, 2007; Kubo and Newton, 2008). Computational analysis revealed in total about 45% of V. faba mtDNA sequence being homologous to the Medicago truncatula nuclear genome (more than to any sequenced plant mitochondrial genome), and 35% of this homology ranging from a few dozen to 12,806?bp are located on chromosome 1. Apparently, mitochondrial rrn5, rrn18, rps10, ATP synthase subunit alpha, cox2, and tRNA sequences are part of transcribed nuclear mosaic ORFs.

Negruk, Valentine

2013-01-01

363

Genome sequence of the date palm Phoenix dactylifera L.  

PubMed

Date palm (Phoenix dactylifera L.) is a cultivated woody plant species with agricultural and economic importance. Here we report a genome assembly for an elite variety (Khalas), which is 605.4?Mb in size and covers >90% of the genome (~671?Mb) and >96% of its genes (~41,660 genes). Genomic sequence analysis demonstrates that P. dactylifera experienced a clear genome-wide duplication after either ancient whole genome duplications or massive segmental duplications. Genetic diversity analysis indicates that its stress resistance and sugar metabolism-related genes tend to be enriched in the chromosomal regions where the density of single-nucleotide polymorphisms is relatively low. Using transcriptomic data, we also illustrate the date palm's unique sugar metabolism that underlies fruit development and ripening. Our large-scale genomic and transcriptomic data pave the way for further genomic studies not only on P. dactylifera but also other Arecaceae plants. PMID:23917264

Al-Mssallem, Ibrahim S; Hu, Songnian; Zhang, Xiaowei; Lin, Qiang; Liu, Wanfei; Tan, Jun; Yu, Xiaoguang; Liu, Jiucheng; Pan, Linlin; Zhang, Tongwu; Yin, Yuxin; Xin, Chengqi; Wu, Hao; Zhang, Guangyu; Ba Abdullah, Mohammed M; Huang, Dawei; Fang, Yongjun; Alnakhli, Yasser O; Jia, Shangang; Yin, An; Alhuzimi, Eman M; Alsaihati, Burair A; Al-Owayyed, Saad A; Zhao, Duojun; Zhang, Sun; Al-Otaibi, Noha A; Sun, Gaoyuan; Majrashi, Majed A; Li, Fusen; Tala; Wang, Jixiang; Yun, Quanzheng; Alnassar, Nafla A; Wang, Lei; Yang, Meng; Al-Jelaify, Rasha F; Liu, Kan; Gao, Shenghan; Chen, Kaifu; Alkhaldi, Samiyah R; Liu, Guiming; Zhang, Meng; Guo, Haiyan; Yu, Jun

2013-01-01

364

Complete chloroplast genome sequences of Solanum bulbocastanum , Solanum lycopersicum and comparative analyses with other Solanaceae genomes  

Microsoft Academic Search

Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and

Henry Daniell; Seung-Bum Lee; Justin Grevich; Christopher Saski; Tania Quesada-Vargas; Chittibabu Guda; Jeffrey Tomkins; Robert K. Jansen

2006-01-01

365

Choosing a Benchtop Sequencing Machine to Characterise Helicobacter pylori Genomes  

PubMed Central

The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori.

Perkins, Timothy T.; Tay, Chin Yen; Thirriot, Fanny; Marshall, Barry

2013-01-01

366

Choosing a benchtop sequencing machine to characterise Helicobacter pylori genomes.  

PubMed

The fully annotated genome sequence of the European strain, 26695 was first published in 1997 and, in 1999, it was directly compared to the USA isolate J99, promoting two standard laboratory isolates for Helicobacter pylori (H. pylori) research. With the genomic scaffolds available from these important genomes and the advent of benchtop high-throughput sequencing technology, a bacterial genome can now be sequenced within a few days. We sequenced and analysed strains J99 and 26695 using the benchtop-sequencing machines Ion Torrent PGM and the Illumina MiSeq Nextera and Nextera XT methodologies. Using publically available algorithms, we analysed the raw data and interrogated both genomes by mapping the data and by de novo assembly. We compared the accuracy of the coding sequence assemblies to the originally published sequences. With the Ion Torrent PGM, we found an inherently high-error rate in the raw sequence data. Using the Illumina MiSeq, we found significantly more non-covered nucleotides when using the less expensive Illumina Nextera XT compared with the Illumina Nextera library creation method. We found the most accurate de novo assemblies using the Nextera technology, however, extracting an accurate multi-locus sequence type was inconsistent compared to the Ion Torrent PGM. We found the cagPAI failed to assemble onto a single contig in all technologies but was more accurate using the Nextera. Our results indicate the Illumina MiSeq Nextera method is the most accurate for de novo whole genome sequencing of H. pylori. PMID:23840736

Perkins, Timothy T; Tay, Chin Yen; Thirriot, Fanny; Marshall, Barry

2013-06-28

367

RESTseq--efficient benchtop population genomics with RESTriction Fragment SEQuencing.  

PubMed

We present RESTseq, an improved approach for a cost efficient, highly flexible and repeatable enrichment of DNA fragments from digested genomic DNA using Next Generation Sequencing platforms including small scale Personal Genome sequencers. Easy adjustments make it suitable for a wide range of studies requiring SNP detection or SNP genotyping from fine-scale linkage mapping to population genomics and population genetics also in non-model organisms. We demonstrate the validity of our approach by comparing two honeybee and several stingless bee samples. PMID:23691128

Stolle, Eckart; Moritz, Robin F A

2013-05-17

368

Complete genome sequence of Ferroglobus placidus AEDII12DO  

SciTech Connect

Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

Anderson, Iain [U.S. Department of Energy, Joint Genome Institute; Risso, Carla [University of Massachusetts, Amherst; Holmes, Dawn [University of Massachusetts, Amherst; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Cheng, Jan-Fang [U.S. Department of Energy, Joint Genome Institute; Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Pitluck, Sam [U.S. Department of Energy, Joint Genome Institute; Saunders, Elizabeth H [Los Alamos National Laboratory (LANL); Brettin, Thomas S [ORNL; Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Han, Cliff [Los Alamos National Laboratory (LANL); Tapia, Roxanne [Los Alamos National Laboratory (LANL); Larimer, Frank W [ORNL; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Woyke, Tanja [U.S. Department of Energy, Joint Genome Institute; Lovley, Derek [University of Massachusetts, Amherst; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute

2011-01-01

369

Complete genome sequence of Serratia plymuthica strain AS12.  

PubMed

A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled "Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens". PMID:22768360

Neupane, Saraswoti; Finlay, Roger D; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

2012-05-01

370

Complete genome sequence of Serratia plymuthica strain AS12  

PubMed Central

A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled “Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens”.

Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C.; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C.; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

2012-01-01

371

RESTseq - Efficient Benchtop Population Genomics with RESTriction Fragment SEQuencing  

PubMed Central

We present RESTseq, an improved approach for a cost efficient, highly flexible and repeatable enrichment of DNA fragments from digested genomic DNA using Next Generation Sequencing platforms including small scale Personal Genome sequencers. Easy adjustments make it suitable for a wide range of studies requiring SNP detection or SNP genotyping from fine-scale linkage mapping to population genomics and population genetics also in non-model organisms. We demonstrate the validity of our approach by comparing two honeybee and several stingless bee samples.

Stolle, Eckart; Moritz, Robin F. A.

2013-01-01

372

Comparison of Sample Sequences of the Salmonella typhi Genome to the Sequence of the Complete Escherichia coli K-12 Genome  

PubMed Central

Raw sequence data representing the majority of a bacterial genome can be obtained at a tiny fraction of the cost of a completed sequence. To demonstrate the utility of such a resource, 870 single-stranded M13 clones were sequenced from a shotgun library of the Salmonella typhi Ty2 genome. The sequence reads averaged over 400 bases and sampled the genome with an average spacing of once every 5,000 bases. A total of 339,243 bases of unique sequence was generated (approximately 7% representation). The sample of 870 sequences was compared to the complete Escherichia coli K-12 genome and to the rest of the GenBank database, which can also be considered a collection of sampled sequences. Despite the incomplete S. typhi data set, interesting categories could easily be discerned. Sixteen percent of the sequences determined from S. typhi had close homologs among known Salmonella sequences (P < 1e?40 in BlastX or BlastN), reflecting the proportion of these genomes that have been sequenced previously; 277 sequences (32%) had no apparent orthologs in the complete E. coli K-12 genome (P > 1e?20), of which 155 sequences (18%) had no close similarities to any sequence in the database (P > 1e?5). Eight of the 277 sequences had similarities to genes in other strains of E. coli or plasmids, and six sequences showed evidence of novel phage lysogens or sequence remnants of phage integrations, including a member of the lambda family (P < 1e?15). Twenty-three sample sequences had a significantly closer similarity a sequence in the database from organisms other than the E. coli/Salmonella clade (which includes Shigella and Citrobacter). These sequences are new candidate lateral transfer events to the S. typhi lineage or deletions on the E. coli K-12 lineage. Eleven putative junctions of insertion/deletion events greater than 100 bp were observed in the sample, indicating that well over 150 such events may distinguish S. typhi from E. coli K-12. The need for automatic methods to more effectively exploit sample sequences is discussed.

McClelland, Michael; Wilson, Richard K.

1998-01-01

373

Sequence analysis and organization of the Neodiprion abietis nucleopolyhedrovirus genome.  

PubMed

Of 30 baculovirus genomes that have been sequenced to date, the only nonlepidopteran baculoviruses include the dipteran Culex nigripalpus nucleopolyhedrovirus and two hymenopteran nucleopolyhedroviruses that infect the sawflies Neodiprion lecontei (NeleNPV) and Neodiprion sertifer (NeseNPV). This study provides a complete sequence and genome analysis of the nucleopolyhedrovirus that infects the balsam fir sawfly Neodiprion abietis (Hymenoptera, Symphyta, Diprionidae). The N. abietis nucleopolyhedrovirus (NeabNPV) is 84,264 bp in size, with a G+C content of 33.5%, and contains 93 predicted open reading frames (ORFs). Eleven predicted ORFs are unique to this baculovirus, 10 ORFs have a putative sequence homologue in the NeleNPV genome but not the NeseNPV genome, and 1 ORF (neab53) has a putative sequence homologue in the NeseNPV genome but not the NeleNPV genome. Specific repeat sequences are coincident with major genome rearrangements that distinguish NeabNPV and NeleNPV. Genes associated with these repeat regions encode a common amino acid motif, suggesting that they are a family of repeated contiguous gene clusters. Lepidopteran baculoviruses, similarly, have a family of repeated genes called the bro gene family. However, there is no significant sequence similarity between the NeabNPV and bro genes. Homologues of early-expressed genes such as ie-1 and lef-3 were absent in NeabNPV, as they are in the previously sequenced hymenopteran baculoviruses. Analyses of ORF upstream sequences identified potential temporally distinct genes on the basis of putative promoter elements. PMID:16809301

Duffy, Simon P; Young, Aaron M; Morin, Benoit; Lucarotti, Christopher J; Koop, Ben F; Levin, David B

2006-07-01

374

Biased distribution of DNA uptake sequences towards genome maintenance genes  

PubMed Central

Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9–10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress.

Davidsen, Tonje; R?dland, Einar A.; Lagesen, Karin; Seeberg, Erling; Rognes, Torbj?rn; T?njum, Tone

2004-01-01

375

Draft genome sequences of 21 Salmonella enterica serovar enteritidis strains.  

PubMed

Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project. PMID:23045502

Timme, Ruth E; Allard, Marc W; Luo, Yan; Strain, Errol; Pettengill, James; Wang, Charles; Li, Cong; Keys, Christine E; Zheng, Jie; Stones, Robert; Wilson, Mark R; Musser, Steven M; Brown, Eric W

2012-11-01

376

Draft Genome Sequences of 21 Salmonella enterica Serovar Enteritidis Strains  

PubMed Central

Salmonella enterica subsp. enterica serovar Enteritidis is a common food-borne pathogen, often associated with shell eggs and poultry. Here, we report draft genomes of 21 S. Enteritidis strains associated with or related to the U.S.-wide 2010 shell egg recall. Eleven of these genomes were from environmental isolates associated with the egg outbreak, and 10 were reference isolates from previous years, unrelated to the outbreak. The whole-genome sequence data for these 21 human pathogen strains are being released in conjunction with the newly formed 100K Genome Project.

Allard, Marc W.; Luo, Yan; Strain, Errol; Pettengill, James; Wang, Charles; Li, Cong; Keys, Christine E.; Zheng, Jie; Stones, Robert; Wilson, Mark R.; Musser, Steven M.; Brown, Eric W.

2012-01-01

377

Complete genome sequence of Atopobium parvulum type strain (IPP 1246).  

PubMed

Atopobium parvulum (Weinberg et al. 1937) Collins and Wallbanks 1993 comb. nov. is the type strain of the species and belongs to the genomically yet unstudied Atopobium/Olsenella branch of the family Coriobacteriaceae. The species A. parvulum is of interest because its members are frequently isolated from the human oral cavity and are found to be associated with halitosis (oral malodor) but not with periodontitis. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Atopobium, and the 1,543,805 bp long single replicon genome with its 1369 protein-coding and 49 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304653

Copeland, Alex; Sikorski, Johannes; Lapidus, Alla; Nolan, Matt; Del Rio, Tijana Glavina; Lucas, Susan; Chen, Feng; Tice, Hope; Pitluck, Sam; Cheng, Jan-Fang; Pukall, Rüdiger; Chertkov, Olga; Brettin, Thomas; Han, Cliff; Detter, John C; Kuske, Cheryl; Bruce, David; Goodwin, Lynne; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

2009-09-23

378

Complete genome sequence of Sulfurimonas autotrophica type strain (OK10).  

PubMed

Sulfurimonas autotrophica Inagaki et al. 2003 is the type species of the genus Sulfurimonas. This genus is of interest because of its significant contribution to the global sulfur cycle as it oxidizes sulfur compounds to sulfate and by its apparent habitation of deep-sea hydrothermal and marine sulfidic environments as potential ecological niche. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the second complete genome sequence of the genus Sulfurimonas and the 15(th) genome in the family Helicobacteraceae. The 2,153,198 bp long genome with its 2,165 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304749

Sikorski, Johannes; Munk, Christine; Lapidus, Alla; Ngatchou Djao, Olivier Duplex; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Han, Cliff; Cheng, Jan-Fang; Tapia, Roxanne; Goodwin, Lynne; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Sims, David; Meincke, Linda; Brettin, Thomas; Detter, John C; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Lang, Elke; Spring, Stefan; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

2010-10-27

379

Complete genome sequence of Kribbella flavida type strain (IFO 14399).  

PubMed

The genus Kribbella consists of 15 species, with Kribbella flavida (Park et al. 1999) as the type species. The name Kribbella was formed from the acronym of the Korea Research Institute of Bioscience and Biotechnology, KRIBB. Strains of the various Kribbella species were originally isolated from soil, potato, alum slate mine, patinas of catacombs or from horse racecourses. Here we describe the features of K. flavida together with the complete genome sequence and annotation. In addition to the 5.3 Mbp genome of Nocardioides sp. JS614, this is only the second completed genome sequence of the family Nocardioidaceae. The 7,579,488 bp long genome with its 7,086 protein-coding and 60 RNA genes and is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304701

Pukall, Rüdiger; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Labutti, Kurt; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pitluck, Sam; Bruce, David; Goodwin, Lynne; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Chen, Amy; Palaniappan, Krishna; Chain, Patrick; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Brettin, Thomas

2010-03-30

380

A survey of tools for variant analysis of next-generation genome sequencing data.  

PubMed

Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R; Zschocke, Johannes; Trajanoski, Zlatko

2013-01-21

381

Complete Genome Sequence of Equine Herpesvirus Type 9  

PubMed Central

Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson's gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson's gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9.

Yamaguchi, Tsuyoshi; Yamada, Souichi

2012-01-01

382

Complete genome sequence of Rhodothermus marinus type strain (R-10).  

PubMed

Rhodothermus marinus Alfredsson et al. 1995 is the type species of the genus and is of phylogenetic interest because the Rhodothermaceae represent the deepest lineage in the phylum Bacteroidetes. R. marinus R-10(T) is a Gram-negative, non-motile, non-spore-forming bacterium isolated from marine hot springs off the coast of Iceland. Strain R-10(T) is strictly aerobic and requires slightly halophilic conditions for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Rhodothermus, and only the second sequence from members of the family Rhodothermaceae. The 3,386,737 bp genome (including a 125 kb plasmid) with its 2914 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304669

Nolan, Matt; Tindall, Brian J; Pomrenke, Helga; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Han, Cliff; Bruce, David; Goodwin, Lynne; Chain, Patrick; Pitluck, Sam; Ovchinikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

2009-12-29

383

Complete genome sequence of Streptobacillus moniliformis type strain (9901T)  

SciTech Connect

Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.

Nolan, Matt [U.S. Department of Energy, Joint Genome Institute; Gronow, Sabine [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Lapidus, Alla L. [U.S. Department of Energy, Joint Genome Institute; Ivanova, N [U.S. Department of Energy, Joint Genome Institute; Copeland, A [U.S. Department of Energy, Joint Genome Institute; Lucas, Susan [U.S. Department of Energy, Joint Genome Institute; Glavina Del Rio, Tijana [U.S. Department of Energy, Joint Genome Institute; Chen, Feng [U.S. Department of Energy, Joint Genome Institute; Sims, David [Los Alamos National Laboratory (LANL); Meincke, Linda [Los Alamos National Laboratory (LANL); Bruce, David [Los Alamos National Laboratory (LANL); Goodwin, Lynne A. [Los Alamos National Laboratory (LANL); Han, Cliff [Los Alamos National Laboratory (LANL); Detter, J. Chris [U.S. Department of Energy, Joint Genome Institute; Ovchinnikova, Galina [U.S. Department of Energy, Joint Genome Institute; Pati, Amrita [U.S. Department of Energy, Joint Genome Institute; Mavromatis, K [U.S. Department of Energy, Joint Genome Institute; Mikhailova, Natalia [U.S. Department of Energy, Joint Genome Institute; Chen, Amy [U.S. Department of Energy, Joint Genome Institute; Palaniappan, Krishna [U.S. Department of Energy, Joint Genome Institute; Land, Miriam L [ORNL; Hauser, Loren John [ORNL; Chang, Yun-Juan [ORNL; Jeffries, Cynthia [Oak Ridge National Laboratory (ORNL); Rohde, Manfred [HZI - Helmholtz Centre for Infection Research, Braunschweig, Germany; Sproer, Cathrin [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Goker, Markus [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Bristow, James [U.S. Department of Energy, Joint Genome Institute; Eisen, Jonathan [U.S. Department of Energy, Joint Genome Institute; Markowitz, Victor [U.S. Department of Energy, Joint Genome Institute; Hugenholtz, Philip [U.S. Department of Energy, Joint Genome Institute; Kyrpides, Nikos C [U.S. Department of Energy, Joint Genome Institute; Klenk, Hans-Peter [DSMZ - German Collection of Microorganisms and Cell Cultures GmbH, Braunschweig, Germany; Chain, Patrick S. G. [Lawrence Livermore National Laboratory (LLNL)

2009-01-01

384

Complete genome sequence of Shewanella putrefaciens. Final report  

SciTech Connect

Seventy percent of the costs for genome sequencing Shewanella putrefaciens (oneidensis) were requested. These funds were expected to allow completion of the low-pass (5-fold) random sequencing and complete closure and annotation of the 200 kbp plasmid. Because of cost reduction that occurred during the period of this grant, these goals have been far exceeded. Currently, the S. putrefaciens genome is very nearly completely closed, even though the genome was significantly larger than expected and extremely repetitive. The entire genome sequence has been made BLAST searchable on the TIGR web page, and an extensive effort has been made to make data and analyses available to all researchers working on S. putrefaciens (oneidensis).

Heidelberg, John F.

2001-04-01

385

Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences  

PubMed Central

Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of the zebrafish genome. BES of common carp are tremendous tools for comparative mapping between the two closely related species, zebrafish and common carp, which should facilitate both structural and functional genome analysis in common carp.

2011-01-01

386

Monitoring Genomic Sequences during SELEX Using High-Throughput Sequencing: Neutral SELEX  

PubMed Central

Background SELEX is a well established in vitro selection tool to analyze the structure of ligand-binding nucleic acid sequences called aptamers. Genomic SELEX transforms SELEX into a tool to discover novel, genomically encoded RNA or DNA sequences binding a ligand of interest, called genomic aptamers. Concerns have been raised regarding requirements imposed on RNA sequences undergoing SELEX selection. Methodology/Principal Findings To evaluate SELEX and assess the extent of these effects, we designed and performed a Neutral SELEX experiment omitting the selection step, such that the sequences are under the sole selective pressure of SELEX's amplification steps. Using high-throughput sequencing, we obtained thousands of full-length sequences from the initial genomic library and the pools after each of the 10 rounds of Neutral SELEX. We compared these to sequences obtained from a Genomic SELEX experiment deriving from the same initial library, but screening for RNAs binding with high affinity to the E. coli regulator protein Hfq. With each round of Neutral SELEX, sequences became less stable and changed in nucleotide content, but no sequences were enriched. In contrast, we detected substantial enrichment in the Hfq-selected set with enriched sequences having structural stability similar to the neutral sequences but with significantly different nucleotide selection. Conclusions/Significance Our data indicate that positive selection in SELEX acts independently of the neutral selective requirements imposed on the sequences. We conclude that Genomic SELEX, when combined with high-throughput sequencing of positively and neutrally selected pools, as well as the gnomic library, is a powerful method to identify genomic aptamers.

Chen, Doris; Lorenz, Christina; Schroeder, Renee

2010-01-01

387

The impact of next-generation sequencing on genomics  

Microsoft Academic Search

This article reviews basic concepts, general applications, and the potential impact of next-generation sequencing (NGS) technologies on genomics, with particular reference to currently available and possible future platforms and bioinformatics. NGS technologies have demonstrated the capacity to sequence DNA at unprecedented speed, thereby enabling previously unimaginable scientific achievements and novel biological applications. But, the massive data produced by NGS also

Jun Zhang; Rod Chiodini; Ahmed Badr; Genfa Zhang

2011-01-01

388

A Comparison Study of Virus Classification by Genome Sequences  

Microsoft Academic Search

In this study, instead of traditional approaches to virus classification, we proposed a novel approach in the vector space model for virus classification via two types of genome sequences, DNA and CDS. For DNA sequence, in this study, the k-mer approach was adopted for pattern extraction and the entropy of the pattern frequency distribution among classes was for pattern weighting.

Jing-Doo Wang

2011-01-01

389

Analysis of Chimpanzee History Based on Genome Sequence Alignments  

Microsoft Academic Search

Population geneticists often study small numbers of carefully chosen loci, but it has become possible to obtain orders of magnitude for more data from overlaps of genome sequences. Here, we generate tens of millions of base pairs of multiple sequence alignments from combinations of three western chimpanzees, three central chimpanzees, an eastern chimpanzee, a bonobo, a human, an orangutan, and

Jennifer L. Caswell; Swapan Mallick; Daniel J. Richter; Julie Neubauer; Christine Schirmer; Sante Gnerre; David Reich

2008-01-01

390

Genome Sequence of Fusobacterium nucleatum Subspecies Polymorphum — a Genetically Tractable  

Microsoft Academic Search

Fusobacterium nucleatum is a prominent member of the oral microbiota and is a common cause of human infection. F. nucleatum includes five subspecies: polymorphum, nucleatum, vincentii, fusiforme, and animalis. F. nucleatum subsp. polymorphum ATCC 10953 has been well characterized phenotypically and, in contrast to previously sequenced strains, is amenable to gene transfer. We sequenced and annotated the 2,429,698 bp genome

Fusobacterium Sandor; E. Karpathy; Xiang Qin; Jason Gioia; Huaiyang Jiang; Yamei Liu; Joseph F. Petrosino; Shailaja Yerrapragada; George E. Fox; Susan Kinder Haake; George M. Weinstock; Sarah K. Highlander

391

Brucella microti: the genome sequence of an emerging pathogen  

Microsoft Academic Search

BACKGROUND: Using a combination of pyrosequencing and conventional Sanger sequencing, the complete genome sequence of the recently described novel Brucella species, Brucella microti, was determined. B. microti is a member of the genus Brucella within the Alphaproteobacteria, which consists of medically important highly pathogenic facultative intracellular bacteria. In contrast to all other Brucella species, B. microti is a fast growing

Stéphane Audic; Magali Lescot; Jean-Michel Claverie; Holger C Scholz

2009-01-01

392

DNA sequence organization in the genomes of five marine invertebrates  

Microsoft Academic Search

The arrangement of repetitive and non-repetitive sequence was studied in the genomic DNA of the oyster (Crassostrea virginica), the surf clam (Spisula solidissima), the horseshoe crab (Limulus polyphemus), a nemertean worm (Cerebratulus lacteus) and a jellyfish (Aurelia aurita). Except for the jellyfish these animals belong to the protostomial branch of animal evolution, for which little information regarding DNA sequence organization

Robert B. Goldberg; William R. Crain; Joan V. Ruderman; Gordon P. Moore; Thomas R. Barnett; Ratchford C. Higgins; Robert A. Gelfand; Glenn A. Galau; Roy J. Britten; Eric H. Davidson

1975-01-01

393

GENOMIC SEQUENCE ANALYSIS OF LEPTOSPIRA BORGPETERSENII SEROVAR HARDJO  

Technology Transfer Automated Retrieval System (TEKTRAN)

A genomic library from Leptospira borgpetersenii serovar hardjo strain JB197 was prepared by mechanically shearing the DNA and inserting it into a positive selection vector. DNA was prepared from approximately 22,000 random clones and used as templates for automated sequencing. Sequence data was c...

394

PHYTOPHTHORA GENOME SEQUENCES UNCOVER EVOLUTIONARY ORIGINS AND MECHANISMS OF PATHOGENESIS  

Technology Transfer Automated Retrieval System (TEKTRAN)

Draft genome sequences of the soybean pathogen Phytophthora sojae and the sudden oak death pathogen Phytophthora ramorum have been determined. Oomycetes such as these Phytophthora species share the kingdom Stramenopiles with photosynthetic algae such as diatoms, and the Phytophthora sequences sugges...

395

Molecular Poltergeists: Mitochondrial DNA Copies (numts) in Sequenced Nuclear Genomes  

Microsoft Academic Search

The natural transfer of DNA from mitochondria to the nucleus generates nuclear copies of mitochondrial DNA (numts) and is an ongoing evolutionary process, as genome sequences attest. In humans, five different numts cause genetic disease and a dozen human loci are polymorphic for the presence of numts, underscoring the rapid rate at which mitochondrial sequences reach the nucleus over evolutionary

Einat Hazkani-Covo; Raymond M. Zeller; William Martin

2010-01-01

396

Genome Sequencing and Bioinformatics Analyses of Higher Plants Chloroplasts  

Microsoft Academic Search

Chloroplast DNA in higher plants exist as closed circular molecules of about 150 kb (±30), usually presenting inverted repeat sequences separating two single copy regions (1). It is available the complete chloroplast genomes of around 13 higher plants species available in the gene bank. Our group has completely sequenced the sugarcane chloroplast DNA which is 141182 nucleotides in size. We

Helaine Carrer

397

Genome sequencing and analysis of the model grass Brachypodium distachyon  

Microsoft Academic Search

Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum

David F. Garvin; Todd C. Mockler; Jeremy Schmutz; Dan Rokhsar; Kerrie Barry; Susan Lucas; Miranda Harmon-Smith; Kathleen Lail; Hope Tice; Jane Grimwood; Neil McKenzie; Naxin Huo; Yong Q. Gu; Gerard R. Lazo; Olin D. Anderson; Frank M. You; Ming-Cheng Luo; Jan Dvorak; Jonathan Wright; Melanie Febrer; Michael W. Bevan; Dominika Idziak; Robert Hasterok; Erika Lindquist; Mei Wang; Samuel E. Fox; Henry D. Priest; Sergei A. Filichkin; Scott A. Givan; Douglas W. Bryant; Jeff H. Chang; Haiyan Wu; Wei Wu; An-Ping Hsia; Patrick S. Schnable; Anantharaman Kalyanaraman; Brad Barbazuk; Todd P. Michael; Samuel P. Hazen; Jennifer N. Bragg; Debbie Laudencia-Chingcuanco; Yiqun Weng; Georg Haberer; Manuel Spannagl; Klaus Mayer; Thomas Rattei; Therese Mitros; Sang-Jik Lee; Jocelyn K. C. Rose; Lukas A. Mueller; Jan P. Buchmann; Jaakko Tanskanen; Heidrun Gundlach; Antonio Costa de Oliveira; Luciano da C. Maia; William Belknap; Ning Jiang; Jinsheng Lai; Liucun Zhu; Jianxin Ma; Cheng Sun; Florent Murat; Michael Abrouk; Remy Bruggmann; Joachim Messing; Noah Fahlgren; Christopher M. Sullivan; James C. Carrington; Elisabeth J. Chapman; Greg D. May; Jixian Zhai; Matthias Ganssmann; Sai Guna Ranjan Gurazada; Marcelo German; Ludmila Tyler; Jiajie Wu; James Thomson; Shan Chen; Henrik V. Scheller; Jesper Harholt; Peter Ulvskov; Jeffrey A. Kimbrel; Laura E. Bartley; Peijian Cao; Ki-Hong Jung; Manoj K. Sharma; Miguel Vega-Sanchez; Pamela Ronald; Christopher D. Dardick; Stefanie de Bodt; Wim Verelst; Dirk Inzé; Maren Heese; Arp Schnittger; Xiaohan Yang; Udaya C. Kalluri; Gerald A. Tuskan; Zhihua Hua; Richard D. Vierstra; Yu Cui; Shuhong Ouyang; Qixin Sun; Zhiyong Liu; Alper Yilmaz; Erich Grotewold; Richard Sibout; Kian Hematy; Gregory Mouille; Herman Höfte; Jérome Pelloux; Devin O'Connor; James Schnable; Scott Rowe; Frank Harmon; Cynthia L. Cass; John C. Sedbrook; Mary E. Byrne; Sean Walsh; Janet Higgins; Pinghua Li; Thomas Brutnell; Turgay Unver; Hikmet Budak; Harry Belcram; Mathieu Charles; Boulos Chalhoub; Ivan Baxter

2010-01-01

398

Whole-genome sequencing of multiple Arabidopsis thaliana populations  

Microsoft Academic Search

The plant Arabidopsis thaliana occurs naturally in many different habitats throughout Eurasia. As a foundation for identifying genetic variation contributing to adaptation to diverse environments, a 1001 Genomes Project to sequence geographically diverse A. thaliana strains has been initiated. Here we present the first phase of this project, based on population-scale sequencing of 80 strains drawn from eight regions throughout

Jun Cao; Korbinian Schneeberger; Stephan Ossowski; Torsten Günther; Sebastian Bender; Joffrey Fitz; Daniel Koenig; Christa Lanz; Oliver Stegle; Christoph Lippert; Xi Wang; Felix Ott; Jonas Müller; Carlos Alonso-Blanco; Karsten Borgwardt; Karl J Schmid; Detlef Weigel

2011-01-01

399

Targeted enrichment of genomic DNA regions for next generation sequencing  

Microsoft Academic Search

In this review we discuss the latest targeted enrichment methods, and aspects of their utilization along with second generation sequencing for complex genome analysis. In doing so we provide an overview of issues involved in detecting genetic variation, for which targeted enrichment has become a powerful tool. We explain how targeted enrichment for next generation sequencing has made great progress

F. Mertens; A. El-Sharawy; S. Sauer; J. Van Helvoort; P. J. Van der Zaag; A. Franke; M. Nilsson; Lehrach. H; A. Brookes

2011-01-01

400

Genomic Sequencing of Single Microbial Cells from Environmental Samples  

SciTech Connect

Recently developed techniques allow genomic DNA sequencing from single microbial cells [Lasken RS: Single-cell genomic sequencing using multiple displacement amplification, Curr Opin Microbiol 2007, 10:510-516]. Here, we focus on research strategies for putting these methods into practice in the laboratory setting. An immediate consequence of single-cell sequencing is that it provides an alternative to culturing organisms as a prerequisite for genomic sequencing. The microgram amounts of DNA required as template are amplified from a single bacterium by a method called multiple displacement amplification (MDA) avoiding the need to grow cells. The ability to sequence DNA from individual cells will likely have an immense impact on microbiology considering the vast numbers of novel organisms, which have been inaccessible unless culture-independent methods could be used. However, special approaches have been necessary to work with amplified DNA. MDA may not recover the entire genome from the single copy present in most bacteria. Also, some sequence rearrangements can occur during the DNA amplification reaction. Over the past two years many research groups have begun to use MDA, and some practical approaches to single-cell sequencing have been developed. We review the consensus that is emerging on optimum methods, reliability of amplified template, and the proper interpretation of 'composite' genomes which result from the necessity of combining data from several single-cell MDA reactions in order to complete the assembly. Preferred laboratory methods are considered on the basis of experience at several large sequencing centers where >70% of genomes are now often recovered from single cells. Methods are reviewed for preparation of bacterial fractions from environmental samples, single-cell isolation, DNA amplification by MDA, and DNA sequencing.

Ishoey, Thomas; Woyke, Tanja; Stepanauskas, Ramunas; Novotny, Mark; Lasken, Roger S.

2008-02-01

401

Motivators for participation in a whole-genome sequencing study: implications for translational genomics research  

Microsoft Academic Search

The promise of personalized medicine depends on the ability to integrate genetic sequencing information into disease risk assessment for individuals. As genomic sequencing technology enters the realm of clinical care, its scale necessitates answers to key social and behavioral research questions about the complexities of understanding, communicating, and ultimately using sequence information to improve health. Our study captured the motivations

Flavia M Facio; Stephanie Brooks; Johanna Loewenstein; Susannah Green; Leslie G Biesecker; Barbara B Biesecker

2011-01-01

402

Complete genome sequencing and variant analysis of a Pakistani individual.  

PubMed

We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than 90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3?224?311 single-nucleotide polymorphisms (SNPs), of which 388?532 (12% of the total SNPs) had not been previously recorded in single nucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variants were screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified 'retinoic acid signaling' and 'regulation of transcription' as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against the OMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indian genome sequence revealed >1.8 million shared SNPs; 32% of which were annotated in ?14?000 genes. Gene Ontology (GO) terms analysis of these genes identified 'response to jasmonic acid stimulus', 'aminoglycoside antibiotic metabolic process' and 'glycoside metabolic process' with considerable enrichment. A total of 59?558 of small indels (1-5?bp) and 16?063 large structural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered in Pakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b) compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will be an important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent. PMID:23842039

Azim, Muhammad Kamran; Yang, Chuanchun; Yan, Zhixiang; Choudhary, Muhammad Iqbal; Khan, Asifullah; Sun, Xiao; Li, Ran; Asif, Huma; Sharif, Sana; Zhang, Yong

2013-0