Science.gov

Sample records for cancer genome sequences

  1. Optimizing cancer genome sequencing and analysis

    PubMed Central

    Griffith, Malachi; Miller, Christopher A.; Griffith, Obi L.; Krysiak, Kilannin; Skidmore, Zachary L.; Ramu, Avinash; Walker, Jason R.; Dang, Ha X.; Trani, Lee; Larson, David E.; Demeter, Ryan T.; Wendl, Michael C.; McMichael, Joshua F.; Austin, Rachel E.; Magrini, Vincent; McGrath, Sean D.; Ly, Amy; Kulkarni, Shashikant; Cordes, Matthew G.; Fronick, Catrina C.; Fulton, Robert S.; Maher, Christopher A.; Ding, Li; Klco, Jeffery M.; Mardis, Elaine R.; Ley, Timothy J.; Wilson, Richard K.

    2015-01-01

    Summary Tumors are typically sequenced to depths of 75–100× (exome) or 30–50× (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ~312×) whole genome sequencing (WGS) and exome capture (up to ~433×) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ~200,000 putative SNVs by sequencing them to depths of ~1,000×. Additional targeted sequencing provided over 10,000× coverage and ddPCR assays provided up to ~250,000× sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP accession id phs000159). PMID:26645048

  2. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  3. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics

    PubMed Central

    Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K.; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

    2015-01-01

    Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C·G > T·A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes. PMID:26305677

  4. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics.

    PubMed

    Munchel, Sarah; Hoang, Yen; Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

    2015-09-22

    Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C · G > T · A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes.

  5. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics.

    PubMed

    Munchel, Sarah; Hoang, Yen; Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

    2015-09-22

    Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C · G > T · A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes. PMID:26305677

  6. Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data.

    PubMed

    Wang, Edwin; Zaman, Naif; Mcgee, Shauna; Milanese, Jean-Sébastien; Masoudi-Nejad, Ali; O'Connor-McCourt, Maureen

    2015-02-01

    Tumor genome sequencing leads to documenting thousands of DNA mutations and other genomic alterations. At present, these data cannot be analyzed adequately to aid in the understanding of tumorigenesis and its evolution. Moreover, we have little insight into how to use these data to predict clinical phenotypes and tumor progression to better design patient treatment. To meet these challenges, we discuss a cancer hallmark network framework for modeling genome sequencing data to predict cancer clonal evolution and associated clinical phenotypes. The framework includes: (1) cancer hallmarks that can be represented by a few molecular/signaling networks. 'Network operational signatures' which represent gene regulatory logics/strengths enable to quantify state transitions and measures of hallmark traits. Thus, sets of genomic alterations which are associated with network operational signatures could be linked to the state/measure of hallmark traits. The network operational signature transforms genotypic data (i.e., genomic alterations) to regulatory phenotypic profiles (i.e., regulatory logics/strengths), to cellular phenotypic profiles (i.e., hallmark traits) which lead to clinical phenotypic profiles (i.e., a collection of hallmark traits). Furthermore, the framework considers regulatory logics of the hallmark networks under tumor evolutionary dynamics and therefore also includes: (2) a self-promoting positive feedback loop that is dominated by a genomic instability network and a cell survival/proliferation network is the main driver of tumor clonal evolution. Surrounding tumor stroma and its host immune systems shape the evolutionary paths; (3) cell motility initiating metastasis is a byproduct of the above self-promoting loop activity during tumorigenesis; (4) an emerging hallmark network which triggers genome duplication dominates a feed-forward loop which in turn could act as a rate-limiting step for tumor formation; (5) mutations and other genomic alterations have

  7. A somatic reference standard for cancer genome sequencing.

    PubMed

    Craig, David W; Nasser, Sara; Corbett, Richard; Chan, Simon K; Murray, Lisa; Legendre, Christophe; Tembe, Waibhav; Adkins, Jonathan; Kim, Nancy; Wong, Shukmei; Baker, Angela; Enriquez, Daniel; Pond, Stephanie; Pleasance, Erin; Mungall, Andrew J; Moore, Richard A; McDaniel, Timothy; Ma, Yussanne; Jones, Steven J M; Marra, Marco A; Carpten, John D; Liang, Winnie S

    2016-04-20

    Large-scale multiplexed identification of somatic alterations in cancer has become feasible with next generation sequencing (NGS). However, calibration of NGS somatic analysis tools has been hampered by a lack of tumor/normal reference standards. We thus performed paired PCR-free whole genome sequencing of a matched metastatic melanoma cell line (COLO829) and normal across three lineages and across separate institutions, with independent library preparations, sequencing, and analysis. We generated mean mapped coverages of 99X for COLO829 and 103X for the paired normal across three institutions. Results were combined with previously generated data allowing for comparison to a fourth lineage on earlier NGS technology. Aggregate variant detection led to the identification of consensus variants, including key events that represent hallmark mutation types including amplified BRAF V600E, a CDK2NA small deletion, a 12 kb PTEN deletion, and a dinucleotide TERT promoter substitution. Overall, common events include >35,000 point mutations, 446 small insertion/deletions, and >6,000 genes affected by copy number changes. We present this reference to the community as an initial standard for enabling quantitative evaluation of somatic mutation pipelines across institutions.

  8. A somatic reference standard for cancer genome sequencing.

    PubMed

    Craig, David W; Nasser, Sara; Corbett, Richard; Chan, Simon K; Murray, Lisa; Legendre, Christophe; Tembe, Waibhav; Adkins, Jonathan; Kim, Nancy; Wong, Shukmei; Baker, Angela; Enriquez, Daniel; Pond, Stephanie; Pleasance, Erin; Mungall, Andrew J; Moore, Richard A; McDaniel, Timothy; Ma, Yussanne; Jones, Steven J M; Marra, Marco A; Carpten, John D; Liang, Winnie S

    2016-01-01

    Large-scale multiplexed identification of somatic alterations in cancer has become feasible with next generation sequencing (NGS). However, calibration of NGS somatic analysis tools has been hampered by a lack of tumor/normal reference standards. We thus performed paired PCR-free whole genome sequencing of a matched metastatic melanoma cell line (COLO829) and normal across three lineages and across separate institutions, with independent library preparations, sequencing, and analysis. We generated mean mapped coverages of 99X for COLO829 and 103X for the paired normal across three institutions. Results were combined with previously generated data allowing for comparison to a fourth lineage on earlier NGS technology. Aggregate variant detection led to the identification of consensus variants, including key events that represent hallmark mutation types including amplified BRAF V600E, a CDK2NA small deletion, a 12 kb PTEN deletion, and a dinucleotide TERT promoter substitution. Overall, common events include >35,000 point mutations, 446 small insertion/deletions, and >6,000 genes affected by copy number changes. We present this reference to the community as an initial standard for enabling quantitative evaluation of somatic mutation pipelines across institutions. PMID:27094764

  9. A somatic reference standard for cancer genome sequencing

    PubMed Central

    Craig, David W.; Nasser, Sara; Corbett, Richard; Chan, Simon K.; Murray, Lisa; Legendre, Christophe; Tembe, Waibhav; Adkins, Jonathan; Kim, Nancy; Wong, Shukmei; Baker, Angela; Enriquez, Daniel; Pond, Stephanie; Pleasance, Erin; Mungall, Andrew J.; Moore, Richard A.; McDaniel, Timothy; Ma, Yussanne; Jones, Steven J. M.; Marra, Marco A.; Carpten, John D.; Liang, Winnie S.

    2016-01-01

    Large-scale multiplexed identification of somatic alterations in cancer has become feasible with next generation sequencing (NGS). However, calibration of NGS somatic analysis tools has been hampered by a lack of tumor/normal reference standards. We thus performed paired PCR-free whole genome sequencing of a matched metastatic melanoma cell line (COLO829) and normal across three lineages and across separate institutions, with independent library preparations, sequencing, and analysis. We generated mean mapped coverages of 99X for COLO829 and 103X for the paired normal across three institutions. Results were combined with previously generated data allowing for comparison to a fourth lineage on earlier NGS technology. Aggregate variant detection led to the identification of consensus variants, including key events that represent hallmark mutation types including amplified BRAF V600E, a CDK2NA small deletion, a 12 kb PTEN deletion, and a dinucleotide TERT promoter substitution. Overall, common events include >35,000 point mutations, 446 small insertion/deletions, and >6,000 genes affected by copy number changes. We present this reference to the community as an initial standard for enabling quantitative evaluation of somatic mutation pipelines across institutions. PMID:27094764

  10. Quantification of read species behavior within whole genome sequencing of cancer genomes for the stratification and visualization of genomic variation.

    PubMed

    Hibsh, Dror; Buetow, Kenneth H; Yaari, Gur; Efroni, Sol

    2016-05-19

    The cancer genome is abnormal genome, and the ability to monitor its sequence had undergone a technological revolution. Yet prognosis and diagnosis remain an expert-based decision, with only limited abilities to provide machine-based decisions. We introduce a heterogeneity-based method for stratifying and visualizing whole-genome sequencing (WGS) reads. This method uses the heterogeneity within WGS reads to markedly reduce the dimensionality of next-generation sequencing data; it is available through the tool HiBS (Heterogeneity-Based Subclassification) that allows cancer sample classification. We validated HiBS using >200 WGS samples from nine different cancer types from The Cancer Genome Atlas (TCGA). With HiBS, we show progress with two WGS related issues: (i) differentiation between normal (NB) and tumor (TP) samples based solely on the information structure of their WGS data, and (ii) identification of specific regions of chromosomal amplification/deletion and their association with tumor stage. By comparing results to those obtained through available WGS analyses tools, we demonstrate some of the novelties obtained by the approach implemented in HiBS and also show nearly perfect normal/tumor classification, used to identify known and unknown chromosomal aberrations. Finally, the HiBS index has been associated with breast cancer tumor stage. PMID:26809676

  11. Quantification of read species behavior within whole genome sequencing of cancer genomes for the stratification and visualization of genomic variation.

    PubMed

    Hibsh, Dror; Buetow, Kenneth H; Yaari, Gur; Efroni, Sol

    2016-05-19

    The cancer genome is abnormal genome, and the ability to monitor its sequence had undergone a technological revolution. Yet prognosis and diagnosis remain an expert-based decision, with only limited abilities to provide machine-based decisions. We introduce a heterogeneity-based method for stratifying and visualizing whole-genome sequencing (WGS) reads. This method uses the heterogeneity within WGS reads to markedly reduce the dimensionality of next-generation sequencing data; it is available through the tool HiBS (Heterogeneity-Based Subclassification) that allows cancer sample classification. We validated HiBS using >200 WGS samples from nine different cancer types from The Cancer Genome Atlas (TCGA). With HiBS, we show progress with two WGS related issues: (i) differentiation between normal (NB) and tumor (TP) samples based solely on the information structure of their WGS data, and (ii) identification of specific regions of chromosomal amplification/deletion and their association with tumor stage. By comparing results to those obtained through available WGS analyses tools, we demonstrate some of the novelties obtained by the approach implemented in HiBS and also show nearly perfect normal/tumor classification, used to identify known and unknown chromosomal aberrations. Finally, the HiBS index has been associated with breast cancer tumor stage.

  12. Genome-wide sequencing to identify the cause of hereditary cancer syndromes: with examples from familial pancreatic cancer.

    PubMed

    Roberts, Nicholas J; Klein, Alison P

    2013-11-01

    Advances in our understanding of the human genome and next-generation technologies have facilitated the use of genome-wide sequencing to decipher the genetic basis of Mendelian disease and hereditary cancer syndromes. However, the application of genome-wide sequencing in hereditary cancer syndromes has had mixed success, in part, due to complex nature of the underlying genetic architecture. In this review we discuss the use of genome-wide sequencing in both Mendelian diseases and hereditary cancer syndromes, highlighting the potential and challenges of this approach using familial pancreatic cancer as an example. PMID:23196058

  13. Analyzing Somatic Genome Rearrangements in Human Cancers by Using Whole-Exome Sequencing | Office of Cancer Genomics

    Cancer.gov

    Although exome sequencing data are generated primarily to detect single-nucleotide variants and indels, they can also be used to identify a subset of genomic rearrangements whose breakpoints are located in or near exons. Using >4,600 tumor and normal pairs across 15 cancer types, we identified over 9,000 high confidence somatic rearrangements, including a large number of gene fusions.

  14. Integrated Analysis of Whole Genome and Transcriptome Sequencing Reveals Diverse Transcriptomic Aberrations Driven by Somatic Genomic Changes in Liver Cancers

    PubMed Central

    Shiraishi, Yuichi; Fujimoto, Akihiro; Furuta, Mayuko; Tanaka, Hiroko; Chiba, Ken-ichi; Boroevich, Keith A.; Abe, Tetsuo; Kawakami, Yoshiiku; Ueno, Masaki; Gotoh, Kunihito; Ariizumi, Shun-ichi; Shibuya, Tetsuo; Nakano, Kaoru; Sasaki, Aya; Maejima, Kazuhiro; Kitada, Rina; Hayami, Shinya; Shigekawa, Yoshinobu; Marubashi, Shigeru; Yamada, Terumasa; Kubo, Michiaki; Ishikawa, Osamu; Aikata, Hiroshi; Arihiro, Koji; Ohdan, Hideki; Yamamoto, Masakazu; Yamaue, Hiroki; Chayama, Kazuaki; Tsunoda, Tatsuhiko; Miyano, Satoru; Nakagawa, Hidewaki

    2014-01-01

    Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV)-related hepatocellular carcinomas (HCCs) and their matched controls. Comparison of whole genome sequence (WGS) and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3), and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome. PMID:25526364

  15. Whole-genome sequencing and cancer therapy: is too much ever enough?

    PubMed

    Garraway, Levi A; Baselga, José

    2012-09-01

    This issue of Cancer Discovery features an article that describes the use of whole-genome sequencing to discover an actionable genetic alteration that was not detected using a lower resolution diagnostic approach. This finding highlights the growing debate surrounding the optimal deployment of powerful new genomics technologies in the clinical oncology arena.

  16. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads.

    PubMed

    Moncunill, Valentí; Gonzalez, Santi; Beà, Sílvia; Andrieux, Lise O; Salaverria, Itziar; Royo, Cristina; Martinez, Laura; Puiggròs, Montserrat; Segura-Wang, Maia; Stütz, Adrian M; Navarro, Alba; Royo, Romina; Gelpí, Josep L; Gut, Ivo G; López-Otín, Carlos; Orozco, Modesto; Korbel, Jan O; Campo, Elias; Puente, Xose S; Torrents, David

    2014-11-01

    The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer. PMID:25344728

  17. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

    PubMed Central

    Alioto, Tyler S.; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D.; Hovig, Eivind; Heisler, Lawrence E.; Beck, Timothy A.; Simpson, Jared T.; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S.; Butler, Adam P.; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W.; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C.; Gut, Marta; Denroche, Robert E.; Harding, Nicholas J.; Yamaguchi, Takafumi N.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G.; Anderson, Charlotte L.; Waddell, Nicola; Pearson, John V.; Grimmond, Sean M.; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A.; López-Otín, Carlos; Campo, Elías; Campbell, Peter J.; Boutros, Paul C.; Puente, Xose S.; Gerhard, Daniela S.; Pfister, Stefan M.; McPherson, John D.; Hudson, Thomas J.; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T. W.; Gut, Ivo G.

    2015-01-01

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970

  18. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing.

    PubMed

    Alioto, Tyler S; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D; Hovig, Eivind; Heisler, Lawrence E; Beck, Timothy A; Simpson, Jared T; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S; Butler, Adam P; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C; Gut, Marta; Denroche, Robert E; Harding, Nicholas J; Yamaguchi, Takafumi N; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G; Anderson, Charlotte L; Waddell, Nicola; Pearson, John V; Grimmond, Sean M; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A; López-Otín, Carlos; Campo, Elías; Campbell, Peter J; Boutros, Paul C; Puente, Xose S; Gerhard, Daniela S; Pfister, Stefan M; McPherson, John D; Hudson, Thomas J; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T W; Gut, Ivo G

    2015-01-01

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970

  19. Whole genome sequencing defines the genetic heterogeneity of familial pancreatic cancer

    PubMed Central

    Roberts, Nicholas J.; Norris, Alexis L.; Petersen, Gloria M.; Bondy, Melissa L.; Brand, Randall; Gallinger, Steven; Kurtz, Robert C.; Olson, Sara H.; Rustgi, Anil K.; Schwartz, Ann G.; Stoffel, Elena; Syngal, Sapna; Zogopoulos, George; Ali, Syed Z.; Axilbund, Jennifer; Chaffee, Kari G.; Chen, Yun-Ching; Cote, Michele L.; Childs, Erica J.; Douville, Christopher; Goes, Fernando S.; Herman, Joseph M.; Iacobuzio-Donahue, Christine; Kramer, Melissa; Makohon-Moore, Alvin; McCombie, Richard W.; McMahon, K. Wyatt; Niknafs, Noushin; Parla, Jennifer; Pirooznia, Mehdi; Potash, James B.; Rhim, Andrew D.; Smith, Alyssa L.; Wang, Yuxuan; Wolfgang, Christopher L.; Wood, Laura D.; Zandi, Peter P.; Goggins, Michael; Karchin, Rachel; Eshleman, James R.; Papadopoulos, Nickolas; Kinzler, Kenneth W.; Vogelstein, Bert; Hruban, Ralph H.; Klein, Alison P.

    2015-01-01

    Pancreatic cancer is projected to become the second leading cause of cancer-related death in the United States by 2020. A familial aggregation of pancreatic cancer has been established, but the cause of this aggregation in most families is unknown. To determine the genetic basis of susceptibility in these families, we sequenced the germline genome of 638 familial pancreatic cancer patients. We also sequenced the exomes of 39 familial pancreatic adenocarcinomas. Our analyses support the role of previously identified familial pancreatic cancer susceptibility genes such as BRCA2, CDKN2A and ATM, and identify novel candidate genes harboring rare, deleterious germline variants for further characterization. We also show how somatic point mutations that occur during hematopoiesis can affect the interpretation of genome-wide studies of hereditary traits. Our observations have important implications for the etiology of pancreatic cancer and for the identification of susceptibility genes in other common cancer types. PMID:26658419

  20. Clinical applications of next generation sequencing in cancer: from panels, to exomes, to genomes

    PubMed Central

    Shen, Tony; Pajaro-Van de Stadt, Stefan Hans; Yeat, Nai Chien; Lin, Jimmy C.-H.

    2015-01-01

    This article will review recent impact of massively parallel next-generation sequencing (NGS) in our understanding and treatment of cancer. While whole exome sequencing (WES) remains popular and effective as a method of genetically profiling different cancers, advances in sequencing technology has enabled an increasing number of whole-genome based studies. Clinically, NGS has been used or is being developed for genetic screening, diagnostics, and clinical assessment. Though challenges remain, clinicians are in the early stages of using genetic data to make treatment decisions for cancer patients. As the integration of NGS in the study and treatment of cancer continues to mature, we believe that the field of cancer genomics will need to move toward more complete 100% genome sequencing. Current technologies and methods are largely limited to coding regions of the genome. A number of recent studies have demonstrated that mutations in non-coding regions may have direct tumorigenic effects or lead to genetic instability. Non-coding regions represent an important frontier in cancer genomics. PMID:26136771

  1. Return of Results from Genomic Sequencing: A Policy Discussion of Secondary Findings for Cancer Predisposition.

    PubMed

    Johnson, Kimberly J; Gehlert, Sarah

    2014-09-01

    Advances in DNA sequencing technology now allow for the rapid genome-wide identification of inherited and acquired genetic variants including those that have been identified as pathogenic alleles for a number of diseases including cancer. Whole genome and exome sequencing are increasingly becoming a part of both clinical practice and research studies. In 2013 the American College of Medical Genetics and Genomics (ACMG) recommended that results of pathogenic genetic variants in 56 genes, nearly half of which comprise cancer genes (including BRCA1, BRCA2, TP53, MLH1, MLH2, MSH6, PMS2, and APC),be returned to patients who have their genome sequenced independent of the purpose for the test. This recommendation has been highly controversial for several reasons, particularly the recommendation that individuals be returned secondary findings of disease causing variants for adult onset conditions regardless of age and without consideration of patient preferences. In addition, the policy regarding returning results of secondary findings from genomic sequencing studies in research settings is currently unclear. In response to these emerging ethical issues, the Washington University Brown School in St. Louis, MO, United Stateshosted a policy forum entitled "First do no harm: Genetic privacy in the age of genomic sequencing" on February 25(th), 2014. The forum included a panel of experts to discuss their views on ethical issues related to return of results in both the clinical and research settings. In this report, we highlight key issues related to return of results from genome sequencing tests that emerged during the forum.

  2. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    PubMed Central

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  3. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer.

    PubMed

    Morrison, Carl D; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C; Johnson, Candace S; Trump, Donald L

    2014-02-11

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as "stitchers," to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication-licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer.

  4. The current use and attitudes towards tumor genome sequencing in breast cancer

    PubMed Central

    Gingras, I.; Sonnenblick, A.; de Azambuja, E.; Paesmans, M.; Delaloge, S.; Aftimos, Philippe; Piccart, M. J.; Sotiriou, C.; Ignatiadis, M.; Azim, H. A.

    2016-01-01

    There is increasing availability of technologies that can interrogate the genomic landscape of an individual tumor; however, their impact on daily practice remains uncertain. We conducted a 28-item survey to investigate the current attitudes towards the integration of tumor genome sequencing in breast cancer management. A link to the survey was communicated via newsletters of several oncological societies, and dedicated mailing by academic research groups. Multivariable logistic regression modeling was carried out to determine the relationship between predictors and outcomes. 215 physicians participated to the survey. The majority were medical oncologists (88%), practicing in Europe (70%) and working in academic institutions (66%). Tumor genome sequencing was requested by 82 participants (38%), of whom 21% reported low confidence in their genomic knowledge, and 56% considered tumor genome sequencing to be poorly accessible. In multivariable analysis, having time allocated to research (OR 3.37, 95% CI 1.84–6.15, p < 0.0001), working in Asia (OR 5.76, 95% CI 1.57 – 21.15, p = 0.01) and having institutional guidelines for molecular sequencing (OR 2.09, 95% 0.99–4.42, p = 0.05) were associated with a higher probability of use. In conclusion, our survey indicates that tumor genome sequencing is sometimes used, albeit not widely, in guiding management of breast cancer patients. PMID:26931736

  5. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  6. Complete Genome Sequence of Helicobacter pylori Strain 29CaP Isolated from a Mexican Patient with Gastric Cancer

    PubMed Central

    Mucito-Varela, Eduardo; Castillo-Rojas, Gonzalo; Cevallos, Miguel A.; Lozano, Luis; Merino, Enrique; López-Leal, Gamaliel

    2016-01-01

    Helicobacter pylori infection is a risk factor for the development of gastric cancer and other gastroduodenal diseases. We report here the complete genome sequence of H. pylori strain 29CaP, isolated from a Mexican patient with gastric cancer. The genomic data analysis revealed a cag-negative H. pylori strain that contains a prophage sequence. PMID:26769924

  7. Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing

    PubMed Central

    Walsh, Tom; Lee, Ming K.; Casadei, Silvia; Thornton, Anne M.; Stray, Sunday M.; Pennil, Christopher; Nord, Alex S.; Mandell, Jessica B.; Swisher, Elizabeth M.; King, Mary-Claire

    2010-01-01

    Inherited loss-of-function mutations in the tumor suppressor genes BRCA1, BRCA2, and multiple other genes predispose to high risks of breast and/or ovarian cancer. Cancer-associated inherited mutations in these genes are collectively quite common, but individually rare or even private. Genetic testing for BRCA1 and BRCA2 mutations has become an integral part of clinical practice, but testing is generally limited to these two genes and to women with severe family histories of breast or ovarian cancer. To determine whether massively parallel, “next-generation” sequencing would enable accurate, thorough, and cost-effective identification of inherited mutations for breast and ovarian cancer, we developed a genomic assay to capture, sequence, and detect all mutations in 21 genes, including BRCA1 and BRCA2, with inherited mutations that predispose to breast or ovarian cancer. Constitutional genomic DNA from subjects with known inherited mutations, ranging in size from 1 to >100,000 bp, was hybridized to custom oligonucleotides and then sequenced using a genome analyzer. Analysis was carried out blind to the mutation in each sample. Average coverage was >1200 reads per base pair. After filtering sequences for quality and number of reads, all single-nucleotide substitutions, small insertion and deletion mutations, and large genomic duplications and deletions were detected. There were zero false-positive calls of nonsense mutations, frameshift mutations, or genomic rearrangements for any gene in any of the test samples. This approach enables widespread genetic testing and personalized risk assessment for breast and ovarian cancer. PMID:20616022

  8. The Tip of the Iceberg: Clinical Implications of Genomic Sequencing Projects in Head and Neck Cancer

    PubMed Central

    Birkeland, Andrew C.; Ludwig, Megan L.; Meraj, Taha S.; Brenner, J. Chad; Prince, Mark E.

    2015-01-01

    Recent genomic sequencing studies have provided valuable insight into genetic aberrations in head and neck squamous cell carcinoma. Despite these great advances, certain hurdles exist in translating genomic findings to clinical care. Further correlation of genetic findings to clinical outcomes, additional analyses of subgroups of head and neck cancers and follow-up investigation into genetic heterogeneity are needed. While the development of targeted therapy trials is of key importance, numerous challenges exist in establishing and optimizing such programs. This review discusses potential upcoming steps for further genetic evaluation of head and neck cancers and implementation of genetic findings into precision medicine trials. PMID:26506389

  9. Home - The Cancer Genome Atlas - Cancer Genome - TCGA

    Cancer.gov

    The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

  10. Return of Results from Genomic Sequencing: A Policy Discussion of Secondary Findings for Cancer Predisposition

    PubMed Central

    Johnson, Kimberly J.; Gehlert, Sarah

    2014-01-01

    Advances in DNA sequencing technology now allow for the rapid genome-wide identification of inherited and acquired genetic variants including those that have been identified as pathogenic alleles for a number of diseases including cancer. Whole genome and exome sequencing are increasingly becoming a part of both clinical practice and research studies. In 2013 the American College of Medical Genetics and Genomics (ACMG) recommended that results of pathogenic genetic variants in 56 genes, nearly half of which comprise cancer genes (including BRCA1, BRCA2, TP53, MLH1, MLH2, MSH6, PMS2, and APC),be returned to patients who have their genome sequenced independent of the purpose for the test. This recommendation has been highly controversial for several reasons, particularly the recommendation that individuals be returned secondary findings of disease causing variants for adult onset conditions regardless of age and without consideration of patient preferences. In addition, the policy regarding returning results of secondary findings from genomic sequencing studies in research settings is currently unclear. In response to these emerging ethical issues, the Washington University Brown School in St. Louis, MO, United Stateshosted a policy forum entitled “First do no harm: Genetic privacy in the age of genomic sequencing” on February 25th, 2014. The forum included a panel of experts to discuss their views on ethical issues related to return of results in both the clinical and research settings. In this report, we highlight key issues related to return of results from genome sequencing tests that emerged during the forum. PMID:25229012

  11. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  12. Clinical genomics information management software linking cancer genome sequence and clinical decisions.

    PubMed

    Watt, Stuart; Jiao, Wei; Brown, Andrew M K; Petrocelli, Teresa; Tran, Ben; Zhang, Tong; McPherson, John D; Kamel-Reid, Suzanne; Bedard, Philippe L; Onetto, Nicole; Hudson, Thomas J; Dancey, Janet; Siu, Lillian L; Stein, Lincoln; Ferretti, Vincent

    2013-09-01

    Using sequencing information to guide clinical decision-making requires coordination of a diverse set of people and activities. In clinical genomics, the process typically includes sample acquisition, template preparation, genome data generation, analysis to identify and confirm variant alleles, interpretation of clinical significance, and reporting to clinicians. We describe a software application developed within a clinical genomics study, to support this entire process. The software application tracks patients, samples, genomic results, decisions and reports across the cohort, monitors progress and sends reminders, and works alongside an electronic data capture system for the trial's clinical and genomic data. It incorporates systems to read, store, analyze and consolidate sequencing results from multiple technologies, and provides a curated knowledge base of tumor mutation frequency (from the COSMIC database) annotated with clinical significance and drug sensitivity to generate reports for clinicians. By supporting the entire process, the application provides deep support for clinical decision making, enabling the generation of relevant guidance in reports for verification by an expert panel prior to forwarding to the treating physician.

  13. Whole-genome plasma sequencing reveals focal amplifications as a driving force in metastatic prostate cancer

    PubMed Central

    Ulz, Peter; Belic, Jelena; Graf, Ricarda; Auer, Martina; Lafer, Ingrid; Fischereder, Katja; Webersinke, Gerald; Pummer, Karl; Augustin, Herbert; Pichler, Martin; Hoefler, Gerald; Bauernhofer, Thomas; Geigl, Jochen B.; Heitzer, Ellen; Speicher, Michael R.

    2016-01-01

    Genomic alterations in metastatic prostate cancer remain incompletely characterized. Here we analyse 493 prostate cancer cases from the TCGA database and perform whole-genome plasma sequencing on 95 plasma samples derived from 43 patients with metastatic prostate cancer. From these samples, we identify established driver aberrations in a cancer-related gene in nearly all cases (97.7%), including driver gene fusions (TMPRSS2:ERG), driver focal deletions (PTEN, RYBP and SHQ1) and driver amplifications (AR and MYC). In serial plasma analyses, we observe changes in focal amplifications in 40% of cases. The mean time interval between new amplifications was 26.4 weeks (range: 5–52 weeks), suggesting that they represent rapid adaptations to selection pressure. An increase in neuron-specific enolase is accompanied by clonal pattern changes in the tumour genome, most consistent with subclonal diversification of the tumour. Our findings suggest a high plasticity of prostate cancer genomes with newly occurring focal amplifications as a driving force in progression. PMID:27328849

  14. Center for Cancer Genomics | Office of Cancer Genomics

    Cancer.gov

    The Center for Cancer Genomics (CCG) was established to unify the National Cancer Institute's activities in cancer genomics, with the goal of advancing genomics research and translating findings into the clinic to improve the precise diagnosis and treatment of cancers. In addition to promoting genomic sequencing approach

  15. U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line

    PubMed Central

    Eskin, Ascia; Lee, Hane; Merriman, Barry; Nelson, Stanley F.

    2010-01-01

    U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational

  16. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Cancer.gov

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  17. A genome-wide view of microsatellite instability: old stories of cancer mutations revisited with new sequencing technologies

    PubMed Central

    Kim, Tae-Min; Park, Peter J

    2014-01-01

    Microsatellites are simple tandem repeats that are present at millions of loci in the human genome. Microsatellite instability (MSI) refers to DNA slippage events on microsatellites that occur frequently in cancer genomes when there is a defect in the DNA mismatch repair system. These somatic mutations can result in inactivation of tumor suppressor genes or disrupt other non-coding regulatory sequences, thereby playing a role in carcinogenesis. Here, we will discuss the ways in which high-throughput sequencing data can facilitate a genome- or exome-wide discovery and more detailed investigation of MSI events in microsatellite-unstable cancer genomes. We will address the methodological aspects of this approach and highlight insights from recent analyses of colorectal and endometrial cancer genomes from The Cancer Genome Atlas project. These include identification of novel MSI targets within and across tumor types and the relationship between the likelihood of MSI events to chromatin structure. Given the increasing popularity of exome and genome sequencing of cancer genomes, a comprehensive characterization of MSI may serve as a valuable marker of cancer evolution and aid in a search for therapeutic targets. PMID:25371413

  18. Integrated genome and transcriptome sequencing identifies a novel form of hybrid and aggressive prostate cancer.

    PubMed

    Wu, Chunxiao; Wyatt, Alexander W; Lapuk, Anna V; McPherson, Andrew; McConeghy, Brian J; Bell, Robert H; Anderson, Shawn; Haegert, Anne; Brahmbhatt, Sonal; Shukin, Robert; Mo, Fan; Li, Estelle; Fazli, Ladan; Hurtado-Coll, Antonio; Jones, Edward C; Butterfield, Yaron S; Hach, Faraz; Hormozdiari, Fereydoun; Hajirasouliha, Iman; Boutros, Paul C; Bristow, Robert G; Jones, Steven Jm; Hirst, Martin; Marra, Marco A; Maher, Christopher A; Chinnaiyan, Arul M; Sahinalp, S Cenk; Gleave, Martin E; Volik, Stanislav V; Collins, Colin C

    2012-05-01

    Next-generation sequencing is making sequence-based molecular pathology and personalized oncology viable. We selected an individual initially diagnosed with conventional but aggressive prostate adenocarcinoma and sequenced the genome and transcriptome from primary and metastatic tissues collected prior to hormone therapy. The histology-pathology and copy number profiles were remarkably homogeneous, yet it was possible to propose the quadrant of the prostate tumour that likely seeded the metastatic diaspora. Despite a homogeneous cell type, our transcriptome analysis revealed signatures of both luminal and neuroendocrine cell types. Remarkably, the repertoire of expressed but apparently private gene fusions, including C15orf21:MYC, recapitulated this biology. We hypothesize that the amplification and over-expression of the stem cell gene MSI2 may have contributed to the stable hybrid cellular identity. This hybrid luminal-neuroendocrine tumour appears to represent a novel and highly aggressive case of prostate cancer with unique biological features and, conceivably, a propensity for rapid progression to castrate-resistance. Overall, this work highlights the importance of integrated analyses of genome, exome and transcriptome sequences for basic tumour biology, sequence-based molecular pathology and personalized oncology.

  19. Landscape of somatic mutations in 560 breast cancer whole-genome sequences

    DOE PAGESBeta

    Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; Ramakrishna, Manasa; Glodzik, Dominik; Zou, Xueqing; Martincorena, Inigo; Alexandrov, Ludmil B.; Martin, Sancha; Wedge, David C.; et al

    2016-05-02

    Here, we analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, anothermore » with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.« less

  20. The cancer genome

    PubMed Central

    Stratton, Michael R.; Campbell, Peter J.; Futreal, P. Andrew

    2010-01-01

    All cancers arise as a result of changes that have occurred in the DNA sequence of the genomes of cancer cells. Over the past quarter of a century much has been learnt about these mutations and the abnormal genes that operate in human cancers. We are now, however, moving into an era in which it will be possible to obtain the complete DNA sequence of large numbers of cancer genomes. These studies will provide us with a detailed and comprehensive perspective on how individual cancers have developed. PMID:19360079

  1. Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences.

    PubMed

    Bacolla, Albino; Tainer, John A; Vasquez, Karen M; Cooper, David N

    2016-07-01

    Gross chromosomal rearrangements (including translocations, deletions, insertions and duplications) are a hallmark of cancer genomes and often create oncogenic fusion genes. An obligate step in the generation of such gross rearrangements is the formation of DNA double-strand breaks (DSBs). Since the genomic distribution of rearrangement breakpoints is non-random, intrinsic cellular factors may predispose certain genomic regions to breakage. Notably, certain DNA sequences with the potential to fold into secondary structures [potential non-B DNA structures (PONDS); e.g. triplexes, quadruplexes, hairpin/cruciforms, Z-DNA and single-stranded looped-out structures with implications in DNA replication and transcription] can stimulate the formation of DNA DSBs. Here, we tested the postulate that these DNA sequences might be found at, or in close proximity to, rearrangement breakpoints. By analyzing the distribution of PONDS-forming sequences within ±500 bases of 19 947 translocation and 46 365 sequence-characterized deletion breakpoints in cancer genomes, we find significant association between PONDS-forming repeats and cancer breakpoints. Specifically, (AT)n, (GAA)n and (GAAA)n constitute the most frequent repeats at translocation breakpoints, whereas A-tracts occur preferentially at deletion breakpoints. Translocation breakpoints near PONDS-forming repeats also recur in different individuals and patient tumor samples. Hence, PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes. PMID:27084947

  2. Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences

    PubMed Central

    Bacolla, Albino; Tainer, John A.; Vasquez, Karen M.; Cooper, David N.

    2016-01-01

    Gross chromosomal rearrangements (including translocations, deletions, insertions and duplications) are a hallmark of cancer genomes and often create oncogenic fusion genes. An obligate step in the generation of such gross rearrangements is the formation of DNA double-strand breaks (DSBs). Since the genomic distribution of rearrangement breakpoints is non-random, intrinsic cellular factors may predispose certain genomic regions to breakage. Notably, certain DNA sequences with the potential to fold into secondary structures [potential non-B DNA structures (PONDS); e.g. triplexes, quadruplexes, hairpin/cruciforms, Z-DNA and single-stranded looped-out structures with implications in DNA replication and transcription] can stimulate the formation of DNA DSBs. Here, we tested the postulate that these DNA sequences might be found at, or in close proximity to, rearrangement breakpoints. By analyzing the distribution of PONDS-forming sequences within ±500 bases of 19 947 translocation and 46 365 sequence-characterized deletion breakpoints in cancer genomes, we find significant association between PONDS-forming repeats and cancer breakpoints. Specifically, (AT)n, (GAA)n and (GAAA)n constitute the most frequent repeats at translocation breakpoints, whereas A-tracts occur preferentially at deletion breakpoints. Translocation breakpoints near PONDS-forming repeats also recur in different individuals and patient tumor samples. Hence, PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes. PMID:27084947

  3. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    PubMed Central

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  4. Cancer of the ampulla of Vater: analysis of the whole genome sequence exposes a potential therapeutic vulnerability

    PubMed Central

    2012-01-01

    Background Recent advances in the treatment of cancer have focused on targeting genomic aberrations with selective therapeutic agents. In rare tumors, where large-scale clinical trials are daunting, this targeted genomic approach offers a new perspective and hope for improved treatments. Cancers of the ampulla of Vater are rare tumors that comprise only about 0.2% of gastrointestinal cancers. Consequently, they are often treated as either distal common bile duct or pancreatic cancers. Methods We analyzed DNA from a resected cancer of the ampulla of Vater and whole blood DNA from a 63 year-old man who underwent a pancreaticoduodenectomy by whole genome sequencing, achieving 37× and 40× coverage, respectively. We determined somatic mutations and structural alterations. Results We identified relevant aberrations, including deleterious mutations of KRAS and SMAD4 as well as a homozygous focal deletion of the PTEN tumor suppressor gene. These findings suggest that these tumors have a distinct oncogenesis from either common bile duct cancer or pancreatic cancer. Furthermore, this combination of genomic aberrations suggests a therapeutic context for dual mTOR/PI3K inhibition. Conclusions Whole genome sequencing can elucidate an oncogenic context and expose potential therapeutic vulnerabilities in rare cancers. PMID:22762308

  5. Colorectal Cancer and the Human Gut Microbiome: Reproducibility with Whole-Genome Shotgun Sequencing

    PubMed Central

    Hua, Xing; Zeller, Georg; Sunagawa, Shinichi; Voigt, Anita Y.; Hercog, Rajna; Goedert, James J.; Shi, Jianxin; Bork, Peer; Sinha, Rashmi

    2016-01-01

    Accumulating evidence indicates that the gut microbiota affects colorectal cancer development, but previous studies have varied in population, technical methods, and associations with cancer. Understanding these variations is needed for comparisons and for potential pooling across studies. Therefore, we performed whole-genome shotgun sequencing on fecal samples from 52 pre-treatment colorectal cancer cases and 52 matched controls from Washington, DC. We compared findings from a previously published 16S rRNA study to the metagenomics-derived taxonomy within the same population. In addition, metagenome-predicted genes, modules, and pathways in the Washington, DC cases and controls were compared to cases and controls recruited in France whose specimens were processed using the same platform. Associations between the presence of fecal Fusobacteria, Fusobacterium, and Porphyromonas with colorectal cancer detected by 16S rRNA were reproduced by metagenomics, whereas higher relative abundance of Clostridia in cancer cases based on 16S rRNA was merely borderline based on metagenomics. This demonstrated that within the same sample set, most, but not all taxonomic associations were seen with both methods. Considering significant cancer associations with the relative abundance of genes, modules, and pathways in a recently published French metagenomics dataset, statistically significant associations in the Washington, DC population were detected for four out of 10 genes, three out of nine modules, and seven out of 17 pathways. In total, colorectal cancer status in the Washington, DC study was associated with 39% of the metagenome-predicted genes, modules, and pathways identified in the French study. More within and between population comparisons are needed to identify sources of variation and disease associations that can be reproduced despite these variations. Future studies should have larger sample sizes or pool data across studies to have sufficient power to detect

  6. Flexible positions, managed hopes: the promissory bioeconomy of a whole genome sequencing cancer study.

    PubMed

    Haase, Rachel; Michie, Marsha; Skinner, Debra

    2015-04-01

    Genomic research has rapidly expanded its scope and ambition over the past decade, promoted by both public and private sectors as having the potential to revolutionize clinical medicine. This promissory bioeconomy of genomic research and technology is generated by, and in turn generates, the hopes and expectations shared by investors, researchers and clinicians, patients, and the general public alike. Examinations of such bioeconomies have often focused on the public discourse, media representations, and capital investments that fuel these "regimes of hope," but also crucial are the more intimate contexts of small-scale medical research, and the private hopes, dreams, and disappointments of those involved. Here we examine one local site of production in a university-based clinical research project that sought to identify novel cancer predisposition genes through whole genome sequencing in individuals at high risk for cancer. In-depth interviews with 24 adults who donated samples to the study revealed an ability to shift flexibly between positioning themselves as research participants on the one hand, and as patients or as family members of patients, on the other. Similarly, interviews with members of the research team highlighted the dual nature of their positions as researchers and as clinicians. For both parties, this dual positioning shaped their investment in the project and valuing of its possible outcomes. In their narratives, all parties shifted between these different relational positions as they managed hopes and expectations for the research project. We suggest that this flexibility facilitated study implementation and participation in the face of potential and probable disappointment on one or more fronts, and acted as a key element in the resilience of this local promissory bioeconomy. We conclude that these multiple dimensions of relationality and positionality are inherent and essential in the creation of any complex economy, "bio" or otherwise. PMID

  7. Flexible Positions, Managed Hopes: The Promissory Bioeconomy of a Whole Genome Sequencing Cancer Study

    PubMed Central

    Haase, Rachel; Michie, Marsha; Skinner, Debra

    2015-01-01

    Genomic research has rapidly expanded its scope and ambition over the past decade, promoted by both public and private sectors as having the potential to revolutionize clinical medicine. This promissory bioeconomy of genomic research and technology is generated by, and in turn generates, the hopes and expectations shared by investors, researchers and clinicians, patients, and the general public alike. Examinations of such bioeconomies have often focused on the public discourse, media representations, and capital investments that fuel these “regimes of hope,” but also crucial are the more intimate contexts of small-scale medical research, and the private hopes, dreams, and disappointments of those involved. Here we examine one local site of production in a university-based clinical research project that sought to identify novel cancer predisposition genes through whole genome sequencing in individuals at high risk for cancer. In-depth interviews with 24 adults who donated samples to the study revealed an ability to shift flexibly between positioning themselves as research participants on the one hand, and as patients or as family members of patients, on the other. Similarly, interviews with members of the research team highlighted the dual nature of their positions as researchers and as clinicians. For both parties, this dual positioning shaped their investment in the project and valuing of its possible outcomes. In their narratives, all parties shifted between these different relational positions as they managed hopes and expectations for the research project. We suggest that this flexibility facilitated study implementation and participation in the face of potential and probable disappointment on one or more fronts, and acted as a key element in the resilience of this local promissory bioeconomy. We conclude that these multiple dimensions of relationality and positionality are inherent and essential in the creation of any complex economy, “bio” or

  8. Genomic Datasets for Cancer Research

    Cancer.gov

    A variety of datasets from genome-wide association studies of cancer and other genotype-phenotype studies, including sequencing and molecular diagnostic assays, are available to approved investigators through the Extramural National Cancer Institute Data Access Committee.

  9. Identification of high-confidence somatic mutations in whole genome sequence of formalin-fixed breast cancer specimens

    PubMed Central

    Yost, Shawn E.; Smith, Erin N.; Schwab, Richard B.; Bao, Lei; Jung, HyunChul; Wang, Xiaoyun; Voest, Emile; Pierce, John P.; Messer, Karen; Parker, Barbara A.; Harismendy, Olivier; Frazer, Kelly A.

    2012-01-01

    The utilization of archived, formalin-fixed paraffin-embedded (FFPE) tumor samples for massive parallel sequencing has been challenging due to DNA damage and contamination with normal stroma. Here, we perform whole genome sequencing of DNA isolated from two triple-negative breast cancer tumors archived for >11 years as 5 µm FFPE sections and matched germline DNA. The tumor samples show differing amounts of FFPE damaged DNA sequencing reads revealed as relatively high alignment mismatch rates enriched for C·G > T·A substitutions compared to germline samples. This increase in mismatch rate is observable with as few as one million reads, allowing for an upfront evaluation of the sample integrity before whole genome sequencing. By applying innovative quality filters incorporating global nucleotide mismatch rates and local mismatch rates, we present a method to identify high-confidence somatic mutations even in the presence of FFPE induced DNA damage. This results in a breast cancer mutational profile consistent with previous studies and revealing potentially important functional mutations. Our study demonstrates the feasibility of performing genome-wide deep sequencing analysis of FFPE archived tumors of limited sample size such as residual cancer after treatment or metastatic biopsies. PMID:22492626

  10. Local sequence assembly reveals a high-resolution profile of somatic structural variations in 97 cancer genomes

    PubMed Central

    Zhuang, Jiali; Weng, Zhiping

    2015-01-01

    Genomic structural variations (SVs) are pervasive in many types of cancers. Characterizing their underlying mechanisms and potential molecular consequences is crucial for understanding the basic biology of tumorigenesis. Here, we engineered a local assembly-based algorithm (laSV) that detects SVs with high accuracy from paired-end high-throughput genomic sequencing data and pinpoints their breakpoints at single base-pair resolution. By applying laSV to 97 tumor-normal paired genomic sequencing datasets across six cancer types produced by The Cancer Genome Atlas Research Network, we discovered that non-allelic homologous recombination is the primary mechanism for generating somatic SVs in acute myeloid leukemia. This finding contrasts with results for the other five types of solid tumors, in which non-homologous end joining and microhomology end joining are the predominant mechanisms. We also found that the genes recursively mutated by single nucleotide alterations differed from the genes recursively mutated by SVs, suggesting that these two types of genetic alterations play different roles during cancer progression. We further characterized how the gene structures of the oncogene JAK1 and the tumor suppressors KDM6A and RB1 are affected by somatic SVs and discussed the potential functional implications of intergenic SVs. PMID:26283183

  11. SINGLE CELL GENOME SEQUENCING

    PubMed Central

    Yilmaz, Suzan; Singh, Anup K.

    2011-01-01

    Whole genome amplification and next-generation sequencing of single cells has become a powerful approach for studying uncultivated microorganisms that represent 90–99 % of all environmental microbes. Single cell sequencing enables not only the identification of microbes but also linking of functions to species, a feat not achievable by metagenomic techniques. Moreover, it allows the analysis of low abundance species that may be missed in community-based analyses. It has also proved very useful in complementing metagenomics in the assembly and binning of single genomes. With the advent of drastically cheaper and higher throughput sequencing technologies, it is expected that single cell sequencing will become a standard tool in studying the genome and transcriptome of microbial communities. PMID:22154471

  12. Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

    PubMed Central

    Carr, Antony M.

    2016-01-01

    Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites. PMID:27688957

  13. Are sites with multiple single nucleotide variants in cancer genomes a consequence of drivers, hypermutable sites or sequencing errors?

    PubMed Central

    Carr, Antony M.

    2016-01-01

    Across independent cancer genomes it has been observed that some sites have been recurrently hit by single nucleotide variants (SNVs). Such recurrently hit sites might be either (i) drivers of cancer that are postively selected during oncogenesis, (ii) due to mutation rate variation, or (iii) due to sequencing and assembly errors. We have investigated the cause of recurrently hit sites in a dataset of >3 million SNVs from 507 complete cancer genome sequences. We find evidence that many sites have been hit significantly more often than one would expect by chance, even taking into account the effect of the adjacent nucleotides on the rate of mutation. We find that the density of these recurrently hit sites is higher in non-coding than coding DNA and hence conclude that most of them are unlikely to be drivers. We also find that most of them are found in parts of the genome that are not uniquely mappable and hence are likely to be due to mapping errors. In support of the error hypothesis, we find that recurently hit sites are not randomly distributed across sequences from different laboratories. We fit a model to the data in which the rate of mutation is constant across sites but the rate of error varies. This model suggests that ∼4% of all SNVs are errors in this dataset, but that the rate of error varies by thousands-of-fold between sites.

  14. A whole-genome sequence and transcriptome perspective on HER2-positive breast cancers

    PubMed Central

    Ferrari, Anthony; Vincent-Salomon, Anne; Pivot, Xavier; Sertier, Anne-Sophie; Thomas, Emilie; Tonon, Laurie; Boyault, Sandrine; Mulugeta, Eskeatnaf; Treilleux, Isabelle; MacGrogan, Gaëtan; Arnould, Laurent; Kielbassa, Janice; Le Texier, Vincent; Blanché, Hélène; Deleuze, Jean-François; Jacquemier, Jocelyne; Mathieu, Marie-Christine; Penault-Llorca, Frédérique; Bibeau, Frédéric; Mariani, Odette; Mannina, Cécile; Pierga, Jean-Yves; Trédan, Olivier; Bachelot, Thomas; Bonnefoi, Hervé; Romieu, Gilles; Fumoleau, Pierre; Delaloge, Suzette; Rios, Maria; Ferrero, Jean-Marc; Tarpin, Carole; Bouteille, Catherine; Calvo, Fabien; Gut, Ivo Glynne; Gut, Marta; Martin, Sancha; Nik-Zainal, Serena; Stratton, Michael R.; Pauporté, Iris; Saintigny, Pierre; Birnbaum, Daniel; Viari, Alain; Thomas, Gilles

    2016-01-01

    HER2-positive breast cancer has long proven to be a clinically distinct class of breast cancers for which several targeted therapies are now available. However, resistance to the treatment associated with specific gene expressions or mutations has been observed, revealing the underlying diversity of these cancers. Therefore, understanding the full extent of the HER2-positive disease heterogeneity still remains challenging. Here we carry out an in-depth genomic characterization of 64 HER2-positive breast tumour genomes that exhibit four subgroups, based on the expression data, with distinctive genomic features in terms of somatic mutations, copy-number changes or structural variations. The results suggest that, despite being clinically defined by a specific gene amplification, HER2-positive tumours melt into the whole luminal–basal breast cancer spectrum rather than standing apart. The results also lead to a refined ERBB2 amplicon of 106 kb and show that several cases of amplifications are compatible with a breakage–fusion–bridge mechanism. PMID:27406316

  15. A whole-genome sequence and transcriptome perspective on HER2-positive breast cancers.

    PubMed

    Ferrari, Anthony; Vincent-Salomon, Anne; Pivot, Xavier; Sertier, Anne-Sophie; Thomas, Emilie; Tonon, Laurie; Boyault, Sandrine; Mulugeta, Eskeatnaf; Treilleux, Isabelle; MacGrogan, Gaëtan; Arnould, Laurent; Kielbassa, Janice; Le Texier, Vincent; Blanché, Hélène; Deleuze, Jean-François; Jacquemier, Jocelyne; Mathieu, Marie-Christine; Penault-Llorca, Frédérique; Bibeau, Frédéric; Mariani, Odette; Mannina, Cécile; Pierga, Jean-Yves; Trédan, Olivier; Bachelot, Thomas; Bonnefoi, Hervé; Romieu, Gilles; Fumoleau, Pierre; Delaloge, Suzette; Rios, Maria; Ferrero, Jean-Marc; Tarpin, Carole; Bouteille, Catherine; Calvo, Fabien; Gut, Ivo Glynne; Gut, Marta; Martin, Sancha; Nik-Zainal, Serena; Stratton, Michael R; Pauporté, Iris; Saintigny, Pierre; Birnbaum, Daniel; Viari, Alain; Thomas, Gilles

    2016-07-13

    HER2-positive breast cancer has long proven to be a clinically distinct class of breast cancers for which several targeted therapies are now available. However, resistance to the treatment associated with specific gene expressions or mutations has been observed, revealing the underlying diversity of these cancers. Therefore, understanding the full extent of the HER2-positive disease heterogeneity still remains challenging. Here we carry out an in-depth genomic characterization of 64 HER2-positive breast tumour genomes that exhibit four subgroups, based on the expression data, with distinctive genomic features in terms of somatic mutations, copy-number changes or structural variations. The results suggest that, despite being clinically defined by a specific gene amplification, HER2-positive tumours melt into the whole luminal-basal breast cancer spectrum rather than standing apart. The results also lead to a refined ERBB2 amplicon of 106 kb and show that several cases of amplifications are compatible with a breakage-fusion-bridge mechanism.

  16. A whole-genome sequence and transcriptome perspective on HER2-positive breast cancers.

    PubMed

    Ferrari, Anthony; Vincent-Salomon, Anne; Pivot, Xavier; Sertier, Anne-Sophie; Thomas, Emilie; Tonon, Laurie; Boyault, Sandrine; Mulugeta, Eskeatnaf; Treilleux, Isabelle; MacGrogan, Gaëtan; Arnould, Laurent; Kielbassa, Janice; Le Texier, Vincent; Blanché, Hélène; Deleuze, Jean-François; Jacquemier, Jocelyne; Mathieu, Marie-Christine; Penault-Llorca, Frédérique; Bibeau, Frédéric; Mariani, Odette; Mannina, Cécile; Pierga, Jean-Yves; Trédan, Olivier; Bachelot, Thomas; Bonnefoi, Hervé; Romieu, Gilles; Fumoleau, Pierre; Delaloge, Suzette; Rios, Maria; Ferrero, Jean-Marc; Tarpin, Carole; Bouteille, Catherine; Calvo, Fabien; Gut, Ivo Glynne; Gut, Marta; Martin, Sancha; Nik-Zainal, Serena; Stratton, Michael R; Pauporté, Iris; Saintigny, Pierre; Birnbaum, Daniel; Viari, Alain; Thomas, Gilles

    2016-01-01

    HER2-positive breast cancer has long proven to be a clinically distinct class of breast cancers for which several targeted therapies are now available. However, resistance to the treatment associated with specific gene expressions or mutations has been observed, revealing the underlying diversity of these cancers. Therefore, understanding the full extent of the HER2-positive disease heterogeneity still remains challenging. Here we carry out an in-depth genomic characterization of 64 HER2-positive breast tumour genomes that exhibit four subgroups, based on the expression data, with distinctive genomic features in terms of somatic mutations, copy-number changes or structural variations. The results suggest that, despite being clinically defined by a specific gene amplification, HER2-positive tumours melt into the whole luminal-basal breast cancer spectrum rather than standing apart. The results also lead to a refined ERBB2 amplicon of 106 kb and show that several cases of amplifications are compatible with a breakage-fusion-bridge mechanism. PMID:27406316

  17. Pancreatic cancer genomics.

    PubMed

    Chang, David K; Grimmond, Sean M; Biankin, Andrew V

    2014-02-01

    Pancreatic cancer is one of the most lethal malignancies. The overall median survival even with treatment is only 6-9 months, with almost 90% succumbing to the disease within a year of diagnosis. It is characterised by an intense desmoplastic stroma that may contribute to therapeutic resistance, and poses significant challenges for genomic sequencing studies. It is recalcitrant to almost all therapies and consequently remains the fourth leading cause of cancer death in Western societies. Genomic studies are unveiling a vast heterogeneity of mutated genes, and this diversity may explain why conventional clinical trial designs have mostly failed to demonstrate efficacy in unselected patients. Those that are available offer only marginal benefits overall, but are associated with clinically significant responses in as yet undefined subgroups. This chapter describes our current understanding of the genomics of pancreatic cancer and the potential impact of these findings on our approaches to treatment.

  18. The Genome-Wide Analysis of Carcinoembryonic Antigen Signaling by Colorectal Cancer Cells Using RNA Sequencing.

    PubMed

    Bajenova, Olga; Gorbunova, Anna; Evsyukov, Igor; Rayko, Michael; Gapon, Svetlana; Bozhokina, Ekaterina; Shishkin, Alexander; O'Brien, Stephen J

    2016-01-01

    Сarcinoembryonic antigen (CEA, CEACAM5, CD66) is a promoter of metastasis in epithelial cancers that is widely used as a prognostic clinical marker of metastasis. The aim of this study is to identify the network of genes that are associated with CEA-induced colorectal cancer liver metastasis. We compared the genome-wide transcriptomic profiles of CEA positive (MIP101 clone 8) and CEA negative (MIP 101) colorectal cancer cell lines with different metastatic potential in vivo. The CEA-producing cells displayed quantitative changes in the level of expression for 100 genes (over-expressed or down-regulated). They were confirmed by quantitative RT-PCR. The KEGG pathway analysis identified 4 significantly enriched pathways: cytokine-cytokine receptor interaction, MAPK signaling pathway, TGF-beta signaling pathway and pyrimidine metabolism. Our results suggest that CEA production by colorectal cancer cells triggers colorectal cancer progression by inducing the epithelial- mesenchymal transition, increasing tumor cell invasiveness into the surrounding tissues and suppressing stress and apoptotic signaling. The novel gene expression distinctions establish the relationships between the existing cancer markers and implicate new potential biomarkers for colorectal cancer hepatic metastasis. PMID:27583792

  19. The Genome-Wide Analysis of Carcinoembryonic Antigen Signaling by Colorectal Cancer Cells Using RNA Sequencing

    PubMed Central

    Gorbunova, Anna; Evsyukov, Igor; Rayko, Michael; Gapon, Svetlana; Bozhokina, Ekaterina; Shishkin, Alexander; O’Brien, Stephen J.

    2016-01-01

    Сarcinoembryonic antigen (CEA, CEACAM5, CD66) is a promoter of metastasis in epithelial cancers that is widely used as a prognostic clinical marker of metastasis. The aim of this study is to identify the network of genes that are associated with CEA-induced colorectal cancer liver metastasis. We compared the genome-wide transcriptomic profiles of CEA positive (MIP101 clone 8) and CEA negative (MIP 101) colorectal cancer cell lines with different metastatic potential in vivo. The CEA-producing cells displayed quantitative changes in the level of expression for 100 genes (over-expressed or down-regulated). They were confirmed by quantitative RT-PCR. The KEGG pathway analysis identified 4 significantly enriched pathways: cytokine-cytokine receptor interaction, MAPK signaling pathway, TGF-beta signaling pathway and pyrimidine metabolism. Our results suggest that CEA production by colorectal cancer cells triggers colorectal cancer progression by inducing the epithelial- mesenchymal transition, increasing tumor cell invasiveness into the surrounding tissues and suppressing stress and apoptotic signaling. The novel gene expression distinctions establish the relationships between the existing cancer markers and implicate new potential biomarkers for colorectal cancer hepatic metastasis. PMID:27583792

  20. Genome sequencing conference II

    SciTech Connect

    Not Available

    1990-01-01

    Genome Sequencing Conference 2 was held September 30 to October 30, 1990. 26 speaker abstracts and 33 poster presentations were included in the program report. New and improved methods for DNA sequencing and genetic mapping were presented. Many of the papers were concerned with accuracy and speed of acquisition of data with computers and automation playing an increasing role. Individual papers have been processed separately for inclusion on the database.

  1. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges

    PubMed Central

    Liu, Biao; Morrison, Carl D.; Johnson, Candace S.; Trump, Donald L.; Qin, Maochun; Conroy, Jeffrey C.; Wang, Jianmin; Liu, Song

    2013-01-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections. PMID:24240121

  2. Computational methods for detecting copy number variations in cancer genome using next generation sequencing: principles and challenges.

    PubMed

    Liu, Biao; Morrison, Carl D; Johnson, Candace S; Trump, Donald L; Qin, Maochun; Conroy, Jeffrey C; Wang, Jianmin; Liu, Song

    2013-11-01

    Accurate detection of somatic copy number variations (CNVs) is an essential part of cancer genome analysis, and plays an important role in oncotarget identifications. Next generation sequencing (NGS) holds the promise to revolutionize somatic CNV detection. In this review, we provide an overview of current analytic tools used for CNV detection in NGS-based cancer studies. We summarize the NGS data types used for CNV detection, decipher the principles for data preprocessing, segmentation, and interpretation, and discuss the challenges in somatic CNV detection. This review aims to provide a guide to the analytic tools used in NGS-based cancer CNV studies, and to discuss the important factors that researchers need to consider when analyzing NGS data for somatic CNV detections.

  3. Identification of cancer predisposition variants in apparently healthy individuals using a next-generation sequencing-based family genomics approach.

    PubMed

    Karageorgos, Ioannis; Mizzi, Clint; Giannopoulou, Efstathia; Pavlidis, Cristiana; Peters, Brock A; Zagoriti, Zoi; Stenson, Peter D; Mitropoulos, Konstantinos; Borg, Joseph; Kalofonos, Haralabos P; Drmanac, Radoje; Stubbs, Andrew; van der Spek, Peter; Cooper, David N; Katsila, Theodora; Patrinos, George P

    2015-01-01

    Cancer, like many common disorders, has a complex etiology, often with a strong genetic component and with multiple environmental factors contributing to susceptibility. A considerable number of genomic variants have been previously reported to be causative of, or associated with, an increased risk for various types of cancer. Here, we adopted a next-generation sequencing approach in 11 members of two families of Greek descent to identify all genomic variants with the potential to predispose family members to cancer. Cross-comparison with data from the Human Gene Mutation Database identified a total of 571 variants, from which 47 % were disease-associated polymorphisms, 26 % disease-associated polymorphisms with additional supporting functional evidence, 19 % functional polymorphisms with in vitro/laboratory or in vivo supporting evidence but no known disease association, 4 % putative disease-causing mutations but with some residual doubt as to their pathological significance, and 3 % disease-causing mutations. Subsequent analysis, focused on the latter variant class most likely to be involved in cancer predisposition, revealed two variants of prime interest, namely MSH2 c.2732T>A (p.L911R) and BRCA1 c.2955delC, the first of which is novel. KMT2D c.13895delC and c.1940C>A variants are additionally reported as incidental findings. The next-generation sequencing-based family genomics approach described herein has the potential to be applied to other types of complex genetic disorder in order to identify variants of potential pathological significance. PMID:26092435

  4. Draft Genome Sequence of Erythromycin-Resistant Streptococcus gallolyticus subsp. gallolyticus NTS 31106099 Isolated from a Patient with Infective Endocarditis and Colorectal Cancer.

    PubMed

    Kambarev, Stanimir; Caté, Clément; Corvec, Stéphane; Pecorari, Frédéric

    2015-04-23

    Streptococcus gallolyticus subsp. gallolyticus is known for its close association with infective endocarditis and colorectal cancer in humans. Here, we report the draft genome sequence of highly erythromycin-resistant strain NTS 31106099 isolated from a patient with infective endocarditis and colorectal cancer.

  5. The Pediatric Cancer Genome Project

    PubMed Central

    Downing, James R; Wilson, Richard K; Zhang, Jinghui; Mardis, Elaine R; Pui, Ching-Hon; Ding, Li; Ley, Timothy J; Evans, William E

    2013-01-01

    The St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project (PCGP) is participating in the international effort to identify somatic mutations that drive cancer. These cancer genome sequencing efforts will not only yield an unparalleled view of the altered signaling pathways in cancer but should also identify new targets against which novel therapeutics can be developed. Although these projects are still deep in the phase of generating primary DNA sequence data, important results are emerging and valuable community resources are being generated that should catalyze future cancer research. We describe here the rationale for conducting the PCGP, present some of the early results of this project and discuss the major lessons learned and how these will affect the application of genomic sequencing in the clinic. PMID:22641210

  6. Whole-genome sequencing analysis of phenotypic heterogeneity and anticipation in Li–Fraumeni cancer predisposition syndrome

    PubMed Central

    Ariffin, Hany; Hainaut, Pierre; Puzio-Kuter, Anna; Choong, Soo Sin; Chan, Adelyne Sue Li; Tolkunov, Denis; Rajagopal, Gunaretnam; Kang, Wenfeng; Lim, Leon Li Wen; Krishnan, Shekhar; Chen, Kok-Siong; Achatz, Maria Isabel; Karsa, Mawar; Shamsani, Jannah; Levine, Arnold J.; Chan, Chang S.

    2014-01-01

    The Li–Fraumeni syndrome (LFS) and its variant form (LFL) is a familial predisposition to multiple forms of childhood, adolescent, and adult cancers associated with germ-line mutation in the TP53 tumor suppressor gene. Individual disparities in tumor patterns are compounded by acceleration of cancer onset with successive generations. It has been suggested that this apparent anticipation pattern may result from germ-line genomic instability in TP53 mutation carriers, causing increased DNA copy-number variations (CNVs) with successive generations. To address the genetic basis of phenotypic disparities of LFS/LFL, we performed whole-genome sequencing (WGS) of 13 subjects from two generations of an LFS kindred. Neither de novo CNV nor significant difference in total CNV was detected in relation with successive generations or with age at cancer onset. These observations were consistent with an experimental mouse model system showing that trp53 deficiency in the germ line of father or mother did not increase CNV occurrence in the offspring. On the other hand, individual records on 1,771 TP53 mutation carriers from 294 pedigrees were compiled to assess genetic anticipation patterns (International Agency for Research on Cancer TP53 database). No strictly defined anticipation pattern was observed. Rather, in multigeneration families, cancer onset was delayed in older compared with recent generations. These observations support an alternative model for apparent anticipation in which rare variants from noncarrier parents may attenuate constitutive resistance to tumorigenesis in the offspring of TP53 mutation carriers with late cancer onset. PMID:25313051

  7. Whole-genome sequencing analysis of phenotypic heterogeneity and anticipation in Li-Fraumeni cancer predisposition syndrome.

    PubMed

    Ariffin, Hany; Hainaut, Pierre; Puzio-Kuter, Anna; Choong, Soo Sin; Chan, Adelyne Sue Li; Tolkunov, Denis; Rajagopal, Gunaretnam; Kang, Wenfeng; Lim, Leon Li Wen; Krishnan, Shekhar; Chen, Kok-Siong; Achatz, Maria Isabel; Karsa, Mawar; Shamsani, Jannah; Levine, Arnold J; Chan, Chang S

    2014-10-28

    The Li-Fraumeni syndrome (LFS) and its variant form (LFL) is a familial predisposition to multiple forms of childhood, adolescent, and adult cancers associated with germ-line mutation in the TP53 tumor suppressor gene. Individual disparities in tumor patterns are compounded by acceleration of cancer onset with successive generations. It has been suggested that this apparent anticipation pattern may result from germ-line genomic instability in TP53 mutation carriers, causing increased DNA copy-number variations (CNVs) with successive generations. To address the genetic basis of phenotypic disparities of LFS/LFL, we performed whole-genome sequencing (WGS) of 13 subjects from two generations of an LFS kindred. Neither de novo CNV nor significant difference in total CNV was detected in relation with successive generations or with age at cancer onset. These observations were consistent with an experimental mouse model system showing that trp53 deficiency in the germ line of father or mother did not increase CNV occurrence in the offspring. On the other hand, individual records on 1,771 TP53 mutation carriers from 294 pedigrees were compiled to assess genetic anticipation patterns (International Agency for Research on Cancer TP53 database). No strictly defined anticipation pattern was observed. Rather, in multigeneration families, cancer onset was delayed in older compared with recent generations. These observations support an alternative model for apparent anticipation in which rare variants from noncarrier parents may attenuate constitutive resistance to tumorigenesis in the offspring of TP53 mutation carriers with late cancer onset. PMID:25313051

  8. The Global Cancer Genomics Consortium: interfacing genomics and cancer medicine.

    PubMed

    2012-08-01

    The Global Cancer Genomics Consortium (GCGC) is an international collaborative platform that amalgamates cancer biologists, cutting-edge genomics, and high-throughput expertise with medical oncologists and surgical oncologists; they address the most important translational questions that are central to cancer research and treatment. The annual GCGC symposium was held at the Advanced Centre for Treatment Research and Education in Cancer, Mumbai, India, from November 9 to 11, 2011. The symposium showcased international next-generation sequencing efforts that explore cancer-specific transcriptomic changes, single-nucleotide polymorphism, and copy number variations in various types of cancers, as well as the structural genomics approach to develop new therapeutic targets and chemical probes. From the spectrum of studies presented at the symposium, it is evident that the translation of emerging cancer genomics knowledge into clinical applications can only be achieved through the integration of multidisciplinary expertise. In summary, the GCGC symposium provided practical knowledge on structural and cancer genomics approaches, as well as an exclusive platform for focused cancer genomics endeavors. PMID:22628426

  9. The Global Cancer Genomics Consortium: interfacing genomics and cancer medicine.

    PubMed

    2012-08-01

    The Global Cancer Genomics Consortium (GCGC) is an international collaborative platform that amalgamates cancer biologists, cutting-edge genomics, and high-throughput expertise with medical oncologists and surgical oncologists; they address the most important translational questions that are central to cancer research and treatment. The annual GCGC symposium was held at the Advanced Centre for Treatment Research and Education in Cancer, Mumbai, India, from November 9 to 11, 2011. The symposium showcased international next-generation sequencing efforts that explore cancer-specific transcriptomic changes, single-nucleotide polymorphism, and copy number variations in various types of cancers, as well as the structural genomics approach to develop new therapeutic targets and chemical probes. From the spectrum of studies presented at the symposium, it is evident that the translation of emerging cancer genomics knowledge into clinical applications can only be achieved through the integration of multidisciplinary expertise. In summary, the GCGC symposium provided practical knowledge on structural and cancer genomics approaches, as well as an exclusive platform for focused cancer genomics endeavors.

  10. Genome Sequence Databases (Overview): Sequencing and Assembly

    SciTech Connect

    Lapidus, Alla L.

    2009-01-01

    From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly of whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.

  11. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott E.; Thykaer, Jette; Adney, William S.; Brettin, T.; Brockman, Fred J.; D'haeseleer, Patrik; Martinez, Antonio D.; Miller, R. M.; Rokhsar, Daniel S.; Schadt, Christopher W.; Torok, Tamas; Tuskan, Gerald; Bennett, Joan W.; Berka, Randy; Briggs, Steve; Heitman, Joseph; Taylor, John; Turgeon, Barbara G.; Werner-Washburne, Maggie; Himmel, Michael E.

    2008-09-30

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  12. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Baker, Scott; Thykaer, Jette; Adney, William S; Brettin, Tom; Brockman, Fred; Dhaeseleer, Patrick; Martinez, A diego; Miller, R michael; Rokhsar, Daniel; Schadt, Christopher Warren; Torok, Tamas; Tuskan, Gerald A; Bennett, Joan; Berka, Randy; Briggs, Steven; Heitman, Joseph; Taylor, John; Turgeon, Gillian; Werner-Washburne, Maggie; Himmel, Michael E

    2008-01-01

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions. Published by Elsevier Ltd on behalf of The British Mycological Society.

  13. Fungal Genome Sequencing and Bioenergy

    SciTech Connect

    Schadt, Christopher Warren; Baker, Scott; Thykaer, Jette; Adney, William S; Brettin, Tom; Brockman, Fred; Dhaeseleer, Patrick; Martinez, A diego; Miller, R michael; Rokhsar, Daniel; Torok, Tamas; Tuskan, Gerald A; Bennett, Joan; Berka, Randy; Briggs, Steven; Heitman, Joseph; Rizvi, L; Taylor, John; Turgeon, Gillian; Werner-Washburne, Maggie; Himmel, Michael

    2008-01-01

    To date, the number of ongoing filamentous fungal genome sequencing projects is almost tenfold fewer than those of bacterial and archaeal genome projects. The fungi chosen for sequencing represent narrow kingdom diversity; most are pathogens or models. We advocate an ambitious, forward-looking phylogenetic-based genome sequencing program, designed to capture metabolic diversity within the fungal kingdom, thereby enhancing research into alternative bioenergy sources, bioremediation, and fungal-environment interactions.

  14. Complete Genome Sequence of Bacilli bacterium Strain VT-13-104 Isolated from the Intestine of a Patient with Duodenal Cancer

    PubMed Central

    Tetz, Victor

    2015-01-01

    We report the complete genome sequence of Bacilli bacterium strain VT-13-104 isolated from the intestine of a patient with duodenal cancer. The genome is composed of 3,573,421 bp, with a G+C content of 35.7%. It possesses 3,254 predicted protein-coding genes encoding multidrug resistance transporters, resistance to antibiotics, and virulence factors. PMID:26139715

  15. Complete Genome Sequence of a Novel Bacillus sp. VT 712 Strain Isolated from the Duodenum of a Patient with Intestinal Cancer

    PubMed Central

    Tetz, Victor

    2016-01-01

    We report here the complete genome sequence of the spore-forming Bacillus sp. strain VT 712 isolated from the duodenum of a patient with intestinal cancer. The genome is 3,921,583 bp, with 37.9% G+C content. It contains 3,768 predicted protein-coding genes for multidrug resistance transporters, virulence factors, and daunorubicin resistance. PMID:27491975

  16. Targeted Sequencing of the Mitochondrial Genome of Women at High Risk of Breast Cancer without Detectable Mutations in BRCA1/2

    PubMed Central

    Blein, Sophie; Barjhoux, Laure; Damiola, Francesca; Dondon, Marie-Gabrielle; Eon-Marchais, Séverine; Marcou, Morgane; Caron, Olivier; Lortholary, Alain; Buecher, Bruno; Berthet, Pascaline; Noguès, Catherine; Lasset, Christine; Gauthier-Villars, Marion; Mazoyer, Sylvie; Stoppa-Lyonnet, Dominique; Andrieu, Nadine; Cox, David G.

    2015-01-01

    Breast Cancer is a complex multifactorial disease for which high-penetrance mutations have been identified. Approaches used to date have identified genomic features explaining about 50% of breast cancer heritability. A number of low- to medium penetrance alleles (per-allele odds ratio < 1.5 and 4.0, respectively) have been identified, suggesting that the remaining heritability is likely to be explained by the cumulative effect of such alleles and/or by rare high-penetrance alleles. Relatively few studies have specifically explored the mitochondrial genome for variants potentially implicated in breast cancer risk. For these reasons, we propose an exploration of the variability of the mitochondrial genome in individuals diagnosed with breast cancer, having a positive breast cancer family history but testing negative for BRCA1/2 pathogenic mutations. We sequenced the mitochondrial genome of 436 index breast cancer cases from the GENESIS study. As expected, no pathogenic genomic pattern common to the 436 women included in our study was observed. The mitochondrial genes MT-ATP6 and MT-CYB were observed to carry the highest number of variants in the study. The proteins encoded by these genes are involved in the structure of the mitochondrial respiration chain, and variants in these genes may impact reactive oxygen species production contributing to carcinogenesis. More functional and epidemiological studies are needed to further investigate to what extent variants identified may influence familial breast cancer risk. PMID:26406445

  17. Genomic DNA of MCF-7 breast cancer cells not an ideal choice as positive control for PCR amplification based detection of Mouse Mammary Tumor Virus-Like Sequences.

    PubMed

    Kulkarni, Bhushan B; Hiremath, Shivaprakash V; Kulkarni, Suyamindra S; Hallikeri, Umesh R; Patil, Basavaraj R; Gai, Pramod B

    2013-11-01

    The identification of the etiology of breast cancer is a crucial research issue for the development of an effective preventive and treatment strategies. Researchers are exploring the possible involvement of Mouse Mammary Tumor Virus (MMTV) in causing human breast cancer. Hence, it becomes very important to use a consistent positive control agent in PCR amplification based detection of MMTV-Like Sequence (MMTV-LS) in human breast cancer for accurate and reproducible results. This study was done to investigate the feasibility of using genomic DNA of MCF-7 breast cancer cells to detect MMTV-LS using PCR amplification based detection. MMTV env and SAG gene located at the 3' long terminal repeat (LTR) sequences were targeted for the PCR based detection. No amplification was observed in case of the genomic DNA of MCF-7 breast cancer cells. However, the 2.7 kb DNA fragment comprising MMTV env and SAG LTR sequences yielded the products of desired size. From these results it can be concluded that Genomic DNA of MCF-7 cell is not a suitable choice as positive control for PCR or RT-PCR based detection of MMTV-LS. It is also suggested that plasmids containing the cloned genes or sequences of MMTV be used as positive control for detection of MMTV-LS.

  18. A comparison of isolated circulating tumor cells and tissue biopsies using whole-genome sequencing in prostate cancer.

    PubMed

    Jiang, Runze; Lu, Yi-Tsung; Ho, Hao; Li, Bo; Chen, Jie-Fu; Lin, Millicent; Li, Fuqiang; Wu, Kui; Wu, Hanjie; Lichterman, Jake; Wan, Haolei; Lu, Chia-Lun; OuYang, William; Ni, Ming; Wang, Linlin; Li, Guibo; Lee, Tom; Zhang, Xiuqing; Yang, Jonathan; Rettig, Matthew; Chung, Leland W K; Yang, Huanming; Li, Ker-Chau; Hou, Yong; Tseng, Hsian-Rong; Hou, Shuang; Xu, Xun; Wang, Jun; Posadas, Edwin M

    2015-12-29

    Previous studies have demonstrated focal but limited molecular similarities between circulating tumor cells (CTCs) and biopsies using isolated genetic assays. We hypothesized that molecular similarity between CTCs and tissue exists at the single cell level when characterized by whole genome sequencing (WGS). By combining the NanoVelcro CTC Chip with laser capture microdissection (LCM), we developed a platform for single-CTC WGS. We performed this procedure on CTCs and tissue samples from a patient with advanced prostate cancer who had serial biopsies over the course of his clinical history. We achieved 30X depth and ≥ 95% coverage. Twenty-nine percent of the somatic single nucleotide variations (SSNVs) identified were founder mutations that were also identified in CTCs. In addition, 86% of the clonal mutations identified in CTCs could be traced back to either the primary or metastatic tumors. In this patient, we identified structural variations (SVs) including an intrachromosomal rearrangement in chr3 and an interchromosomal rearrangement between chr13 and chr15. These rearrangements were shared between tumor tissues and CTCs. At the same time, highly heterogeneous short structural variants were discovered in PTEN, RB1, and BRCA2 in all tumor and CTC samples. Using high-quality WGS on single-CTCs, we identified the shared genomic alterations between CTCs and tumor tissues. This approach yielded insight into the heterogeneity of the mutational landscape of SSNVs and SVs. It may be possible to use this approach to study heterogeneity and characterize the biological evolution of a cancer during the course of its natural history. PMID:26575023

  19. A comparison of isolated circulating tumor cells and tissue biopsies using whole-genome sequencing in prostate cancer.

    PubMed

    Jiang, Runze; Lu, Yi-Tsung; Ho, Hao; Li, Bo; Chen, Jie-Fu; Lin, Millicent; Li, Fuqiang; Wu, Kui; Wu, Hanjie; Lichterman, Jake; Wan, Haolei; Lu, Chia-Lun; OuYang, William; Ni, Ming; Wang, Linlin; Li, Guibo; Lee, Tom; Zhang, Xiuqing; Yang, Jonathan; Rettig, Matthew; Chung, Leland W K; Yang, Huanming; Li, Ker-Chau; Hou, Yong; Tseng, Hsian-Rong; Hou, Shuang; Xu, Xun; Wang, Jun; Posadas, Edwin M

    2015-12-29

    Previous studies have demonstrated focal but limited molecular similarities between circulating tumor cells (CTCs) and biopsies using isolated genetic assays. We hypothesized that molecular similarity between CTCs and tissue exists at the single cell level when characterized by whole genome sequencing (WGS). By combining the NanoVelcro CTC Chip with laser capture microdissection (LCM), we developed a platform for single-CTC WGS. We performed this procedure on CTCs and tissue samples from a patient with advanced prostate cancer who had serial biopsies over the course of his clinical history. We achieved 30X depth and ≥ 95% coverage. Twenty-nine percent of the somatic single nucleotide variations (SSNVs) identified were founder mutations that were also identified in CTCs. In addition, 86% of the clonal mutations identified in CTCs could be traced back to either the primary or metastatic tumors. In this patient, we identified structural variations (SVs) including an intrachromosomal rearrangement in chr3 and an interchromosomal rearrangement between chr13 and chr15. These rearrangements were shared between tumor tissues and CTCs. At the same time, highly heterogeneous short structural variants were discovered in PTEN, RB1, and BRCA2 in all tumor and CTC samples. Using high-quality WGS on single-CTCs, we identified the shared genomic alterations between CTCs and tumor tissues. This approach yielded insight into the heterogeneity of the mutational landscape of SSNVs and SVs. It may be possible to use this approach to study heterogeneity and characterize the biological evolution of a cancer during the course of its natural history.

  20. Genomic Sequencing in Determining Treatment in Patients With Metastatic Cancer or Cancer That Cannot Be Removed by Surgery

    ClinicalTrials.gov

    2016-07-26

    Metastatic Neoplasm; Recurrent Neoplasm; Recurrent Non-Small Cell Lung Carcinoma; Stage IIIA Non-Small Cell Lung Cancer; Stage IIIB Non-Small Cell Lung Cancer; Stage IV Non-Small Cell Lung Cancer; Unresectable Malignant Neoplasm

  1. Whole-exome/genome sequencing and genomics.

    PubMed

    Grody, Wayne W; Thompson, Barry H; Hudgins, Louanne

    2013-12-01

    As medical genetics has progressed from a descriptive entity to one focused on the functional relationship between genes and clinical disorders, emphasis has been placed on genomics. Genomics, a subelement of genetics, is the study of the genome, the sum total of all the genes of an organism. The human genome, which is contained in the 23 pairs of nuclear chromosomes and in the mitochondrial DNA of each cell, comprises >6 billion nucleotides of genetic code. There are some 23,000 protein-coding genes, a surprisingly small fraction of the total genetic material, with the remainder composed of noncoding DNA, regulatory sequences, and introns. The Human Genome Project, launched in 1990, produced a draft of the genome in 2001 and then a finished sequence in 2003, on the 50th anniversary of the initial publication of Watson and Crick's paper on the double-helical structure of DNA. Since then, this mass of genetic information has been translated at an ever-increasing pace into useable knowledge applicable to clinical medicine. The recent advent of massively parallel DNA sequencing (also known as shotgun, high-throughput, and next-generation sequencing) has brought whole-genome analysis into the clinic for the first time, and most of the current applications are directed at children with congenital conditions that are undiagnosable by using standard genetic tests for single-gene disorders. Thus, pediatricians must become familiar with this technology, what it can and cannot offer, and its technical and ethical challenges. Here, we address the concepts of human genomic analysis and its clinical applicability for primary care providers.

  2. Genomic determinants of cancer immunotherapy.

    PubMed

    Miao, Diana; Van Allen, Eliezer M

    2016-08-01

    Cancer immunotherapies - including therapeutic vaccines, adoptive cell transfer, oncolytic viruses, and immune checkpoint blockade - yield durable responses in many cancer types, but understanding of predictors of response is incomplete. Genomic characterization of human cancers has already contributed to the success of targeted therapies; in cancer immunotherapy, identification of tumor-specific antigens through whole-exome sequencing may be key to designing individualized, highly immunogenic therapeutic vaccines. Additionally, pre-treatment tumor mutational and gene expression signatures can predict which patients are most likely to benefit from cancer immunotherapy. Continued work in harnessing genomic, transcriptomic, and immunological data from clinical cohorts of immunotherapy-treated patients will bring the promises of precision medicine to immuno-oncology.

  3. Genomic profiling of breast cancers

    PubMed Central

    Curtis, Christina

    2015-01-01

    Purpose of review To describe recent advances in the application of advanced genomic technologies towards the identification of biomarkers of prognosis and treatment response in breast cancer. Recent findings Advances in high-throughput genomic profiling such as massively parallel sequencing have enabled researchers to catalogue the spectrum of somatic alterations in breast cancers. These tools also hold promise for precision medicine through accurate patient prognostication, stratification, and the dynamic monitoring of treatment response. For example, recent efforts have defined robust molecular subgroups of breast cancer and novel subtype-specific oncogenes. In addition, previously unappreciated activating mutations in human epidermal growth factor receptor 2 have been reported, suggesting new therapeutic opportunities. Genomic profiling of cell-free tumor DNA and circulating tumor cells has been used to monitor disease burden and the emergence of resistance, and such ‘liquid biopsy’ approaches may facilitate the early, noninvasive detection of aggressive disease. Finally, single-cell genomics is coming of age and will contribute to an understanding of breast cancer evolutionary dynamics. Summary Here, we highlight recent studies that employ high-throughput genomic technologies in an effort to elucidate breast cancer biology, discover new therapeutic targets, improve prognostication and stratification, and discuss the implications for precision cancer medicine. PMID:25502431

  4. Whole-genome sequencing of asian lung cancers: second-hand smoke unlikely to be responsible for higher incidence of lung cancer among Asian never-smokers.

    PubMed

    Krishnan, Vidhya G; Ebert, Philip J; Ting, Jason C; Lim, Elaine; Wong, Swee-Seong; Teo, Audrey S M; Yue, Yong G; Chua, Hui-Hoon; Ma, Xiwen; Loh, Gary S L; Lin, Yuhao; Tan, Joanna H J; Yu, Kun; Zhang, Shenli; Reinhard, Christoph; Tan, Daniel S W; Peters, Brock A; Lincoln, Stephen E; Ballinger, Dennis G; Laramie, Jason M; Nilsen, Geoffrey B; Barber, Thomas D; Tan, Patrick; Hillmer, Axel M; Ng, Pauline C

    2014-11-01

    Asian nonsmoking populations have a higher incidence of lung cancer compared with their European counterparts. There is a long-standing hypothesis that the increase of lung cancer in Asian never-smokers is due to environmental factors such as second-hand smoke. We analyzed whole-genome sequencing of 30 Asian lung cancers. Unsupervised clustering of mutational signatures separated the patients into two categories of either all the never-smokers or all the smokers or ex-smokers. In addition, nearly one third of the ex-smokers and smokers classified with the never-smoker-like cluster. The somatic variant profiles of Asian lung cancers were similar to that of European origin with G.C>T.A being predominant in smokers. We found EGFR and TP53 to be the most frequently mutated genes with mutations in 50% and 27% of individuals, respectively. Among the 16 never-smokers, 69% had an EGFR mutation compared with 29% of 14 smokers/ex-smokers. Asian never-smokers had lung cancer signatures distinct from the smoker signature and their mutation profiles were similar to European never-smokers. The profiles of Asian and European smokers are also similar. Taken together, these results suggested that the same mutational mechanisms underlie the etiology for both ethnic groups. Thus, the high incidence of lung cancer in Asian never-smokers seems unlikely to be due to second-hand smoke or other carcinogens that cause oxidative DNA damage, implying that routine EGFR testing is warranted in the Asian population regardless of smoking status. PMID:25189529

  5. Genomic Approaches to Zebrafish Cancer.

    PubMed

    White, Richard M

    2016-01-01

    The zebrafish has emerged as an important model for studying cancer biology. Identification of DNA, RNA and chromatin abnormalities can give profound insight into the mechanisms of tumorigenesis and the there are many techniques for analyzing the genomes of these tumors. Here, I present an overview of the available technologies for analyzing tumor genomes in the zebrafish, including array based methods as well as next-generation sequencing technologies. I also discuss the ways in which zebrafish tumor genomes can be compared to human genomes using cross-species oncogenomics, which act to filter genomic noise and ultimately uncover central drivers of malignancy. Finally, I discuss downstream analytic tools, including network analysis, that can help to organize the alterations into coherent biological frameworks that can then be investigated further. PMID:27165352

  6. Network Biomarkers of Bladder Cancer Based on a Genome-Wide Genetic and Epigenetic Network Derived from Next-Generation Sequencing Data

    PubMed Central

    Li, Cheng-Wei

    2016-01-01

    Epigenetic and microRNA (miRNA) regulation are associated with carcinogenesis and the development of cancer. By using the available omics data, including those from next-generation sequencing (NGS), genome-wide methylation profiling, candidate integrated genetic and epigenetic network (IGEN) analysis, and drug response genome-wide microarray analysis, we constructed an IGEN system based on three coupling regression models that characterize protein-protein interaction networks (PPINs), gene regulatory networks (GRNs), miRNA regulatory networks (MRNs), and epigenetic regulatory networks (ERNs). By applying system identification method and principal genome-wide network projection (PGNP) to IGEN analysis, we identified the core network biomarkers to investigate bladder carcinogenic mechanisms and design multiple drug combinations for treating bladder cancer with minimal side-effects. The progression of DNA repair and cell proliferation in stage 1 bladder cancer ultimately results not only in the derepression of miR-200a and miR-200b but also in the regulation of the TNF pathway to metastasis-related genes or proteins, cell proliferation, and DNA repair in stage 4 bladder cancer. We designed a multiple drug combination comprising gefitinib, estradiol, yohimbine, and fulvestrant for treating stage 1 bladder cancer with minimal side-effects, and another multiple drug combination comprising gefitinib, estradiol, chlorpromazine, and LY294002 for treating stage 4 bladder cancer with minimal side-effects. PMID:27034531

  7. Network Biomarkers of Bladder Cancer Based on a Genome-Wide Genetic and Epigenetic Network Derived from Next-Generation Sequencing Data.

    PubMed

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-01-01

    Epigenetic and microRNA (miRNA) regulation are associated with carcinogenesis and the development of cancer. By using the available omics data, including those from next-generation sequencing (NGS), genome-wide methylation profiling, candidate integrated genetic and epigenetic network (IGEN) analysis, and drug response genome-wide microarray analysis, we constructed an IGEN system based on three coupling regression models that characterize protein-protein interaction networks (PPINs), gene regulatory networks (GRNs), miRNA regulatory networks (MRNs), and epigenetic regulatory networks (ERNs). By applying system identification method and principal genome-wide network projection (PGNP) to IGEN analysis, we identified the core network biomarkers to investigate bladder carcinogenic mechanisms and design multiple drug combinations for treating bladder cancer with minimal side-effects. The progression of DNA repair and cell proliferation in stage 1 bladder cancer ultimately results not only in the derepression of miR-200a and miR-200b but also in the regulation of the TNF pathway to metastasis-related genes or proteins, cell proliferation, and DNA repair in stage 4 bladder cancer. We designed a multiple drug combination comprising gefitinib, estradiol, yohimbine, and fulvestrant for treating stage 1 bladder cancer with minimal side-effects, and another multiple drug combination comprising gefitinib, estradiol, chlorpromazine, and LY294002 for treating stage 4 bladder cancer with minimal side-effects. PMID:27034531

  8. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 2 of 2

  9. Sequencing Complex Genomic Regions

    SciTech Connect

    Eichler, Evan

    2009-05-28

    Evan Eichler, Howard Hughes Medical Investigator at the University of Washington, gives the May 28, 2009 keynote speech at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM. Part 1 of 2

  10. Genomic Instability and Cancer

    PubMed Central

    Yao, Yixin; Dai, Wei

    2014-01-01

    Genomic instability is a characteristic of most cancer cells. It is an increased tendency of genome alteration during cell division. Cancer frequently results from damage to multiple genes controlling cell division and tumor suppressors. It is known that genomic integrity is closely monitored by several surveillance mechanisms, DNA damage checkpoint, DNA repair machinery and mitotic checkpoint. A defect in the regulation of any of these mechanisms often results in genomic instability, which predisposes the cell to malignant transformation. Posttranslational modifications of the histone tails are closely associated with regulation of the cell cycle as well as chromatin structure. Nevertheless, DNA methylation status is also related to genomic integrity. We attempt to summarize recent developments in this field and discuss the debate of driving force of tumor initiation and progression. PMID:25541596

  11. Genomic Data Commons | Office of Cancer Genomics

    Cancer.gov

    The NCI’s Center for Cancer Genomics launches the Genomic Data Commons (GDC), a unified data sharing platform for the cancer research community. The mission of the GDC is to enable data sharing across the entire cancer research community, to ultimately support precision medicine in oncology.

  12. Genome Sequence of Canine Herpesvirus

    PubMed Central

    Papageorgiou, Konstantinos V.; Suárez, Nicolás M.; Wilkie, Gavin S.; McDonald, Michael; Graham, Elizabeth M.; Davison, Andrew J.

    2016-01-01

    Canine herpesvirus is a widespread alphaherpesvirus that causes a fatal haemorrhagic disease of neonatal puppies. We have used high-throughput methods to determine the genome sequences of three viral strains (0194, V777 and V1154) isolated in the United Kingdom between 1985 and 2000. The sequences are very closely related to each other. The canine herpesvirus genome is estimated to be 125 kbp in size and consists of a unique long sequence (97.5 kbp) and a unique short sequence (7.7 kbp) that are each flanked by terminal and internal inverted repeats (38 bp and 10.0 kbp, respectively). The overall nucleotide composition is 31.6% G+C, which is the lowest among the completely sequenced alphaherpesviruses. The genome contains 76 open reading frames predicted to encode functional proteins, all of which have counterparts in other alphaherpesviruses. The availability of the sequences will facilitate future research on the diagnosis and treatment of canine herpesvirus-associated disease. PMID:27213534

  13. Profiling the cancer genome.

    PubMed

    Cowin, Prue A; Anglesio, Michael; Etemadmoghadam, Dariush; Bowtell, David D L

    2010-01-01

    Cancer profiling studies have had a profound impact on our understanding of the biology of cancers in a number of ways, including providing insights into the biological heterogeneity of specific cancer types, identification of novel oncogenes and tumor suppressors, and defining pathways that interact to drive the growth of individual cancers. Several large-scale genomic studies are underway that aim to catalog all biologically significant mutational events in each cancer type, and these findings will allow researchers to understand how mutational networks function within individual tumors. The identification of molecular predictive and prognostic tools to facilitate treatment decisions is an important step for individualized patient therapy and, ultimately, in improving patient outcomes. Whereas there are still significant challenges to implementing genomic testing and targeted therapy into routine clinical practice, rapid technological advancements provide hope for overcoming these obstacles.

  14. Genome Sequence of Spizellomyces punctatus

    PubMed Central

    Russ, Carsten; Lang, B. Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  15. Genome Sequence of Spizellomyces punctatus.

    PubMed

    Russ, Carsten; Lang, B Franz; Chen, Zehua; Gujja, Sharvari; Shea, Terrance; Zeng, Qiandong; Young, Sarah; Cuomo, Christina A; Nusbaum, Chad

    2016-01-01

    Spizellomyces punctatus is a basally branching chytrid fungus that is found in the Chytridiomycota phylum. Spizellomyces species are common in soil and of importance in terrestrial ecosystems. Here, we report the genome sequence of S. punctatus, which will facilitate the study of this group of early diverging fungi. PMID:27540072

  16. Genome sequences of eight morphologically diverse Alphaproteobacteria.

    PubMed

    Brown, Pamela J B; Kysela, David T; Buechlein, Aaron; Hemmerich, Chris; Brun, Yves V

    2011-09-01

    The Alphaproteobacteria comprise morphologically diverse bacteria, including many species of stalked bacteria. Here we announce the genome sequences of eight alphaproteobacteria, including the first genome sequences of species belonging to the genera Asticcacaulis, Hirschia, Hyphomicrobium, and Rhodomicrobium. PMID:21705585

  17. The Cancer Genome Atlas ovarian cancer analysis

    Cancer.gov

    An analysis of genomic changes in ovarian cancer has provided the most comprehensive and integrated view of cancer genes for any cancer type to date. Ovarian serous adenocarcinoma tumors from 500 patients were examined by The Cancer Genome Atlas (TCGA) Re

  18. Genome Sequence of Burkholderia pseudomallei NCTC 13392

    PubMed Central

    Sahl, Jason W.; Stone, Joshua K.; Gelhaus, H. Carl; Warren, Richard L.; Cruttwell, Caroline J.; Funnell, Simon G.; Keim, Paul

    2013-01-01

    Here, we describe the draft genome sequence of Burkholderia pseudomallei NCTC 13392. This isolate has been distributed as K96243, but distinct genomic differences have been identified. The genomic sequence of this isolate will provide the genomic context for previously conducted functional studies. PMID:23704173

  19. Whole genome and transcriptome sequencing of a B3 thymoma.

    PubMed

    Petrini, Iacopo; Rajan, Arun; Pham, Trung; Voeller, Donna; Davis, Sean; Gao, James; Wang, Yisong; Giaccone, Giuseppe

    2013-01-01

    Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina) and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37). Copy number (CN) aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X) was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs) and 2 insertion/deletions (INDELs) were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma.

  20. Genome Sequence of Mycobacteriophage Momo.

    PubMed

    Pope, Welkin H; Bina, Elizabeth A; Brahme, Indraneel S; Hill, Amy B; Himmelstein, Philip H; Hunsicker, Sara M; Ish, Amanda R; Le, Tinh S; Martin, Mary M; Moscinski, Catherine N; Shetty, Sameer A; Swierzewski, Tomasz; Iyengar, Varun B; Kim, Hannah; Schafer, Claire E; Grubb, Sarah R; Warner, Marcie H; Bowman, Charles A; Russell, Daniel A; Hatfull, Graham F

    2015-06-18

    Momo is a newly discovered phage of Mycobacterium smegmatis mc(2)155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages.

  1. Genome Sequence of Mycobacteriophage Momo

    PubMed Central

    Bina, Elizabeth A.; Brahme, Indraneel S.; Hill, Amy B.; Himmelstein, Philip H.; Hunsicker, Sara M.; Ish, Amanda R.; Le, Tinh S.; Martin, Mary M.; Moscinski, Catherine N.; Shetty, Sameer A.; Swierzewski, Tomasz; Iyengar, Varun B.; Kim, Hannah; Schafer, Claire E.; Grubb, Sarah R.; Warner, Marcie H.; Bowman, Charles A.; Russell, Daniel A.; Hatfull, Graham F.

    2015-01-01

    Momo is a newly discovered phage of Mycobacterium smegmatis mc2155. Momo has a double-stranded DNA genome 154,553 bp in length, with 233 predicted protein-encoding genes, 34 tRNA genes, and one transfer-messenger RNA (tmRNA) gene. Momo has a myoviral morphology and shares extensive nucleotide sequence similarity with subcluster C1 mycobacteriophages. PMID:26089415

  2. Sequencing crop genomes: approaches and applications

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Plant genome sequencing methodology parrallels the sequencing of the human genome. The first projects were slow and very expensive. BAC by BAC approaches were utilized first and whole-genome shotgun sequencing rapidly replaced that approach. So called 'next generation' technologies such as short rea...

  3. Translating genomics in cancer care.

    PubMed

    Bombard, Yvonne; Bach, Peter B; Offit, Kenneth

    2013-11-01

    There is increasing enthusiasm for genomics and its promise in advancing personalized medicine. Genomic information has been used to personalize health care for decades, spanning the fields of cardiovascular disease, infectious disease, endocrinology, metabolic medicine, and hematology. However, oncology has often been the first test bed for the clinical translation of genomics for diagnostic, prognostic, and therapeutic applications. Notable hereditary cancer examples include testing for mutations in BRCA1 or BRCA2 in unaffected women to identify those at significantly elevated risk for developing breast and ovarian cancers, and screening patients with newly diagnosed colorectal cancer for mutations in 4 mismatch repair genes to reduce morbidity and mortality in their relatives. Somatic genomic testing is also increasingly used in oncology, with gene expression profiling of breast tumors and EGFR testing to predict treatment response representing commonly used examples. Health technology assessment provides a rigorous means to inform clinical and policy decision-making through systematic assessment of the evidentiary base, along with precepts of clinical effectiveness, cost-effectiveness, and consideration of risks and benefits for health care delivery and society. Although this evaluation is a fundamental step in the translation of any new therapeutic, procedure, or diagnostic test into clinical care, emerging developments may threaten this standard. These include "direct to consumer" genomic risk assessment services and the challenges posed by incidental results generated from next-generation sequencing (NGS) technologies. This article presents a review of the evidentiary standards and knowledge base supporting the translation of key cancer genomic technologies along the continuum of validity, utility, cost-effectiveness, health service impacts, and ethical and societal issues, and offers future research considerations to guide the responsible introduction of

  4. Translating genomics in cancer care.

    PubMed

    Bombard, Yvonne; Bach, Peter B; Offit, Kenneth

    2013-11-01

    There is increasing enthusiasm for genomics and its promise in advancing personalized medicine. Genomic information has been used to personalize health care for decades, spanning the fields of cardiovascular disease, infectious disease, endocrinology, metabolic medicine, and hematology. However, oncology has often been the first test bed for the clinical translation of genomics for diagnostic, prognostic, and therapeutic applications. Notable hereditary cancer examples include testing for mutations in BRCA1 or BRCA2 in unaffected women to identify those at significantly elevated risk for developing breast and ovarian cancers, and screening patients with newly diagnosed colorectal cancer for mutations in 4 mismatch repair genes to reduce morbidity and mortality in their relatives. Somatic genomic testing is also increasingly used in oncology, with gene expression profiling of breast tumors and EGFR testing to predict treatment response representing commonly used examples. Health technology assessment provides a rigorous means to inform clinical and policy decision-making through systematic assessment of the evidentiary base, along with precepts of clinical effectiveness, cost-effectiveness, and consideration of risks and benefits for health care delivery and society. Although this evaluation is a fundamental step in the translation of any new therapeutic, procedure, or diagnostic test into clinical care, emerging developments may threaten this standard. These include "direct to consumer" genomic risk assessment services and the challenges posed by incidental results generated from next-generation sequencing (NGS) technologies. This article presents a review of the evidentiary standards and knowledge base supporting the translation of key cancer genomic technologies along the continuum of validity, utility, cost-effectiveness, health service impacts, and ethical and societal issues, and offers future research considerations to guide the responsible introduction of

  5. Colon cancer-derived oncogenic EGFR G724S mutant identified by whole genome sequence analysis is dependent on asymmetric dimerization and sensitive to cetuximab

    PubMed Central

    2014-01-01

    Background Inhibition of the activated epidermal growth factor receptor (EGFR) with either enzymatic kinase inhibitors or anti-EGFR antibodies such as cetuximab, is an effective modality of treatment for multiple human cancers. Enzymatic EGFR inhibitors are effective for lung adenocarcinomas with somatic kinase domain EGFR mutations while, paradoxically, anti-EGFR antibodies are more effective in colon and head and neck cancers where EGFR mutations occur less frequently. In colorectal cancer, anti-EGFR antibodies are routinely used as second-line therapy of KRAS wild-type tumors. However, detailed mechanisms and genomic predictors for pharmacological response to these antibodies in colon cancer remain unclear. Findings We describe a case of colorectal adenocarcinoma, which was found to harbor a kinase domain mutation, G724S, in EGFR through whole genome sequencing. We show that G724S mutant EGFR is oncogenic and that it differs from classic lung cancer derived EGFR mutants in that it is cetuximab responsive in vitro, yet relatively insensitive to small molecule kinase inhibitors. Through biochemical and cellular pharmacologic studies, we have determined that cells harboring the colon cancer-derived G719S and G724S mutants are responsive to cetuximab therapy in vitro and found that the requirement for asymmetric dimerization of these mutant EGFR to promote cellular transformation may explain their greater inhibition by cetuximab than small-molecule kinase inhibitors. Conclusion The colon-cancer derived G719S and G724S mutants are oncogenic and sensitive in vitro to cetuximab. These data suggest that patients with these mutations may benefit from the use of anti-EGFR antibodies as part of the first-line therapy. PMID:24894453

  6. Collaborators | Office of Cancer Genomics

    Cancer.gov

    The TARGET initiative is jointly managed within the National Cancer Institute (NCI) by the Office of Cancer Genomics (OCG)Opens in a New Tab and the Cancer Therapy Evaluation Program (CTEP)Opens in a New Tab.

  7. The UCSC Cancer Genomics Browser: update 2013.

    PubMed

    Goldman, Mary; Craft, Brian; Swatloski, Teresa; Ellrott, Kyle; Cline, Melissa; Diekhans, Mark; Ma, Singer; Wilks, Chris; Stuart, Josh; Haussler, David; Zhu, Jingchun

    2013-01-01

    The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a set of web-based tools to display, investigate and analyse cancer genomics data and its associated clinical information. The browser provides whole-genome to base-pair level views of several different types of genomics data, including some next-generation sequencing platforms. The ability to view multiple datasets together allows users to make comparisons across different data and cancer types. Biological pathways, collections of genes, genomic or clinical information can be used to sort, aggregate and zoom into a group of samples. We currently display an expanding set of data from various sources, including 201 datasets from 22 TCGA (The Cancer Genome Atlas) cancers as well as data from Cancer Cell Line Encyclopedia and Stand Up To Cancer. New features include a completely redesigned user interface with an interactive tutorial and updated documentation. We have also added data downloads, additional clinical heatmap features, and an updated Tumor Image Browser based on Google Maps. New security features allow authenticated users access to private datasets hosted by several different consortia through the public website.

  8. The UCSC Cancer Genomics Browser: update 2013.

    PubMed

    Goldman, Mary; Craft, Brian; Swatloski, Teresa; Ellrott, Kyle; Cline, Melissa; Diekhans, Mark; Ma, Singer; Wilks, Chris; Stuart, Josh; Haussler, David; Zhu, Jingchun

    2013-01-01

    The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a set of web-based tools to display, investigate and analyse cancer genomics data and its associated clinical information. The browser provides whole-genome to base-pair level views of several different types of genomics data, including some next-generation sequencing platforms. The ability to view multiple datasets together allows users to make comparisons across different data and cancer types. Biological pathways, collections of genes, genomic or clinical information can be used to sort, aggregate and zoom into a group of samples. We currently display an expanding set of data from various sources, including 201 datasets from 22 TCGA (The Cancer Genome Atlas) cancers as well as data from Cancer Cell Line Encyclopedia and Stand Up To Cancer. New features include a completely redesigned user interface with an interactive tutorial and updated documentation. We have also added data downloads, additional clinical heatmap features, and an updated Tumor Image Browser based on Google Maps. New security features allow authenticated users access to private datasets hosted by several different consortia through the public website. PMID:23109555

  9. Identifying Gene Disruptions in Novel Balanced de novo Constitutional Translocations in Childhood Cancer Patients by Whole Genome Sequencing

    PubMed Central

    Ritter, Deborah I.; Haines, Katherine; Cheung, Hannah; Davis, Caleb F.; Lau, Ching C.; Berg, Jonathan S.; Brown, Chester W.; Thompson, Patrick A.; Gibbs, Richard; Wheeler, David A.; Plon, Sharon E.

    2014-01-01

    Purpose We applied whole genome sequencing to children diagnosed with neoplasms and found to carry apparently balanced constitutional translocations, to discover novel genic disruptions. Methods We applied SV calling programs CREST, Break Dancer, SV-STAT and CGAP-CNV, and developed an annotative filtering strategy to achieve nucleotide resolution at the translocations. Results We identified the breakpoints for t(6;12) (p21.1;q24.31) disrupting HNF1A in a patient diagnosed with hepatic adenomas and Maturity Onset Diabetes of the Young (MODY). Translocation as the disruptive event of HNF1A, a gene known to be involved in MODY3, has not been previously reported. In a subject with Hodgkin’s lymphoma and subsequent low-grade glioma, we identified t(5;18) (q35.1;q21.2), disrupting both SLIT3 and DCC, genes previously implicated in both glioma and lymphoma. Conclusions These examples suggest that implementing clinical whole genome sequencing in the diagnostic work-up of patients with novel but apparently balanced translocations may reveal unanticipated disruption of disease-associated genes and aid in prediction of the clinical phenotype. PMID:25569436

  10. Characterizing genomic alterations in cancer by complementary functional associations | Office of Cancer Genomics

    Cancer.gov

    Systematic efforts to sequence the cancer genome have identified large numbers of mutations and copy number alterations in human cancers. However, elucidating the functional consequences of these variants, and their interactions to drive or maintain oncogenic states, remains a challenge in cancer research. We developed REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene dependency of oncogenic pathways or sensitivity to a drug treatment.

  11. Twenty years of bacterial genome sequencing.

    PubMed

    Loman, Nicholas J; Pallen, Mark J

    2015-12-01

    Twenty years ago, the publication of the first bacterial genome sequence, from Haemophilus influenzae, shook the world of bacteriology. In this Timeline, we review the first two decades of bacterial genome sequencing, which have been marked by three revolutions: whole-genome shotgun sequencing, high-throughput sequencing and single-molecule long-read sequencing. We summarize the social history of sequencing and its impact on our understanding of the biology, diversity and evolution of bacteria, while also highlighting spin-offs and translational impact in the clinic. We look forward to a 'sequencing singularity', where sequencing becomes the method of choice for as-yet unthinkable applications in bacteriology and beyond.

  12. Translational genomics for plant breeding with the genome sequence explosion.

    PubMed

    Kang, Yang Jae; Lee, Taeyoung; Lee, Jayern; Shim, Sangrea; Jeong, Haneul; Satyawan, Dani; Kim, Moon Young; Lee, Suk-Ha

    2016-04-01

    The use of next-generation sequencers and advanced genotyping technologies has propelled the field of plant genomics in model crops and plants and enhanced the discovery of hidden bridges between genotypes and phenotypes. The newly generated reference sequences of unstudied minor plants can be annotated by the knowledge of model plants via translational genomics approaches. Here, we reviewed the strategies of translational genomics and suggested perspectives on the current databases of genomic resources and the database structures of translated information on the new genome. As a draft picture of phenotypic annotation, translational genomics on newly sequenced plants will provide valuable assistance for breeders and researchers who are interested in genetic studies.

  13. Whole genome sequencing reveals potential targets for therapy in patients with refractory KRAS mutated metastatic colorectal cancer

    PubMed Central

    2014-01-01

    Background The outcome of patients with metastatic colorectal carcinoma (mCRC) following first line therapy is poor, with median survival of less than one year. The purpose of this study was to identify candidate therapeutically targetable somatic events in mCRC patient samples by whole genome sequencing (WGS), so as to obtain targeted treatment strategies for individual patients. Methods Four patients were recruited, all of whom had received > 2 prior therapy regimens. Percutaneous needle biopsies of metastases were performed with whole blood collection for the extraction of constitutional DNA. One tumor was not included in this study as the quality of tumor tissue was not sufficient for further analysis. WGS was performed using Illumina paired end chemistry on HiSeq2000 sequencing systems, which yielded coverage of greater than 30X for all samples. NGS data were processed and analyzed to detect somatic genomic alterations including point mutations, indels, copy number alterations, translocations and rearrangements. Results All 3 tumor samples had KRAS mutations, while 2 tumors contained mutations in the APC gene and the PIK3CA gene. Although we did not identify a TCF7L2-VTI1A translocation, we did detect a TCF7L2 mutation in one tumor. Among the other interesting mutated genes was INPPL1, an important gene involved in PI3 kinase signaling. Functional studies demonstrated that inhibition of INPPL1 reduced growth of CRC cells, suggesting that INPPL1 may promote growth in CRC. Conclusions Our study further supports potential molecularly defined therapeutic contexts that might provide insights into treatment strategies for refractory mCRC. New insights into the role of INPPL1 in colon tumor cell growth have also been identified. Continued development of appropriate targeted agents towards specific events may be warranted to help improve outcomes in CRC. PMID:24943349

  14. Sequencing Intractable DNA to Close Microbial Genomes

    SciTech Connect

    Hurt, Jr., Richard Ashley; Brown, Steven D; Podar, Mircea; Palumbo, Anthony Vito; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled intractable resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such difficult regions in the non-contiguous finished Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. These developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  15. Fungal genome sequencing: basic biology to biotechnology.

    PubMed

    Sharma, Krishna Kant

    2016-08-01

    The genome sequences provide a first glimpse into the genomic basis of the biological diversity of filamentous fungi and yeast. The genome sequence of the budding yeast, Saccharomyces cerevisiae, with a small genome size, unicellular growth, and rich history of genetic and molecular analyses was a milestone of early genomics in the 1990s. The subsequent completion of fission yeast, Schizosaccharomyces pombe and genetic model, Neurospora crassa initiated a revolution in the genomics of the fungal kingdom. In due course of time, a substantial number of fungal genomes have been sequenced and publicly released, representing the widest sampling of genomes from any eukaryotic kingdom. An ambitious genome-sequencing program provides a wealth of data on metabolic diversity within the fungal kingdom, thereby enhancing research into medical science, agriculture science, ecology, bioremediation, bioenergy, and the biotechnology industry. Fungal genomics have higher potential to positively affect human health, environmental health, and the planet's stored energy. With a significant increase in sequenced fungal genomes, the known diversity of genes encoding organic acids, antibiotics, enzymes, and their pathways has increased exponentially. Currently, over a hundred fungal genome sequences are publicly available; however, no inclusive review has been published. This review is an initiative to address the significance of the fungal genome-sequencing program and provides the road map for basic and applied research.

  16. SP8 Sequencing Extinct Genomes

    PubMed Central

    Poinar, H.

    2007-01-01

    Nucleic acids, which hold clues to the evolution of various animal and hominid taxa, are comparatively weak molecules from other cellular debris, and thus evolutionary biologists are in essence time trapped. Fortunately, DNA and protein fragments do exist in fossil remains beyond what theoretical experimentation would suggest. Sequestering of DNA molecules in humic or Maillard-like complexes likely represents a rich source of DNA molecules from the past, which have yet to be tapped. These molecules were impossible to acquire due to the selective nature of the polymerase chain reaction. Recently, however, rapid parallel pyrosequencing techniques, such as those used in metagenomics-based research, which, in theory, allow for the identification of all short nucleotide sequences in a sample in a non-selective approach, have the potential to allow the identification of all nucleic acids in a sample, and thus represent the way forward for ancient DNA. In theory, this new technology will allow the completion of genomes of extinct animals, plants, and microbes. I will discuss the benefits and pitfalls of this metagenomics approach to ancient DNA, highlighting our recent efforts underway to sequence the wooly mammoth genome as well as other fossil remains.

  17. Genome-wide analysis of aberrant methylation in human breast cancer cells using methyl-DNA immunoprecipitation combined with high-throughput sequencing

    PubMed Central

    2010-01-01

    Background Cancer cells undergo massive alterations to their DNA methylation patterns that result in aberrant gene expression and malignant phenotypes. However, the mechanisms that underlie methylome changes are not well understood nor is the genomic distribution of DNA methylation changes well characterized. Results Here, we performed methylated DNA immunoprecipitation combined with high-throughput sequencing (MeDIP-seq) to obtain whole-genome DNA methylation profiles for eight human breast cancer cell (BCC) lines and for normal human mammary epithelial cells (HMEC). The MeDIP-seq analysis generated non-biased DNA methylation maps by covering almost the entire genome with sufficient depth and resolution. The most prominent feature of the BCC lines compared to HMEC was a massively reduced methylation level particularly in CpG-poor regions. While hypomethylation did not appear to be associated with particular genomic features, hypermethylation preferentially occurred at CpG-rich gene-related regions independently of the distance from transcription start sites. We also investigated methylome alterations during epithelial-to-mesenchymal transition (EMT) in MCF7 cells. EMT induction was associated with specific alterations to the methylation patterns of gene-related CpG-rich regions, although overall methylation levels were not significantly altered. Moreover, approximately 40% of the epithelial cell-specific methylation patterns in gene-related regions were altered to those typical of mesenchymal cells, suggesting a cell-type specific regulation of DNA methylation. Conclusions This study provides the most comprehensive analysis to date of the methylome of human mammary cell lines and has produced novel insights into the mechanisms of methylome alteration during tumorigenesis and the interdependence between DNA methylome alterations and morphological changes. PMID:20181289

  18. Programs | Office of Cancer Genomics

    Cancer.gov

    OCG facilitates cancer genomics research through a series of highly-focused programs. These programs generate and disseminate genomic data for use by the cancer research community. OCG programs also promote advances in technology-based infrastructure and create valuable experimental reagents and tools. OCG programs encourage collaboration by interconnecting with other genomics and cancer projects in order to accelerate translation of findings into the clinic. Below are OCG’s current, completed, and initiated programs:

  19. Value of a newly sequenced bacterial genome

    PubMed Central

    Barbosa, Eudes GV; Aburjaile, Flavia F; Ramos, Rommel TJ; Carneiro, Adriana R; Le Loir, Yves; Baumbach, Jan; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2014-01-01

    Next-generation sequencing (NGS) technologies have made high-throughput sequencing available to medium- and small-size laboratories, culminating in a tidal wave of genomic information. The quantity of sequenced bacterial genomes has not only brought excitement to the field of genomics but also heightened expectations that NGS would boost antibacterial discovery and vaccine development. Although many possible drug and vaccine targets have been discovered, the success rate of genome-based analysis has remained below expectations. Furthermore, NGS has had consequences for genome quality, resulting in an exponential increase in draft (partial data) genome deposits in public databases. If no further interests are expressed for a particular bacterial genome, it is more likely that the sequencing of its genome will be limited to a draft stage, and the painstaking tasks of completing the sequencing of its genome and annotation will not be undertaken. It is important to know what is lost when we settle for a draft genome and to determine the “scientific value” of a newly sequenced genome. This review addresses the expected impact of newly sequenced genomes on antibacterial discovery and vaccinology. Also, it discusses the factors that could be leading to the increase in the number of draft deposits and the consequent loss of relevant biological information. PMID:24921006

  20. Accurate and comprehensive sequencing of personal genomes.

    PubMed

    Ajay, Subramanian S; Parker, Stephen C J; Abaan, Hatice Ozel; Fajardo, Karin V Fuentes; Margulies, Elliott H

    2011-09-01

    As whole-genome sequencing becomes commoditized and we begin to sequence and analyze personal genomes for clinical and diagnostic purposes, it is necessary to understand what constitutes a complete sequencing experiment for determining genotypes and detecting single-nucleotide variants. Here, we show that the current recommendation of ∼30× coverage is not adequate to produce genotype calls across a large fraction of the genome with acceptably low error rates. Our results are based on analyses of a clinical sample sequenced on two related Illumina platforms, GAII(x) and HiSeq 2000, to a very high depth (126×). We used these data to establish genotype-calling filters that dramatically increase accuracy. We also empirically determined how the callable portion of the genome varies as a function of the amount of sequence data used. These results help provide a "sequencing guide" for future whole-genome sequencing decisions and metrics by which coverage statistics should be reported.

  1. The fungal genome initiative and lessons learned from genome sequencing.

    PubMed

    Cuomo, Christina A; Birren, Bruce W

    2010-01-01

    The sequence of Saccharomyces cerevisiae enabled systematic genome-wide experimental approaches, demonstrating the power of having the complete genome of an organism. The rapid impact of these methods on research in yeast mobilized an effort to expand genomic resources for other fungi. The "fungal genome initiative" represents an organized genome sequencing effort to promote comparative and evolutionary studies across the fungal kingdom. Through such an approach, scientists can not only better understand specific organisms but also illuminate the shared and unique aspects of fungal biology that underlie the importance of fungi in biomedical research, health, food production, and industry. To date, assembled genomes for over 100 fungi are available in public databases, and many more sequencing projects are underway. Here, we discuss both examples of findings from comparative analysis of fungal sequences, with a specific emphasis on yeast genomes, and on the analytical approaches taken to mine fungal genomes. New sequencing methods are accelerating comparative studies of fungi by reducing the cost and difficulty of sequencing. This has driven more common use of sequencing applications, such as to study genome-wide variation in populations or to deeply profile RNA transcripts. These and further technological innovations will continue to be piloted in yeasts and other fungi, and will expand the applications of sequencing to study fungal biology. PMID:20946837

  2. Marsupial Genome Sequences: Providing Insight into Evolution and Disease

    PubMed Central

    Deakin, Janine E.

    2012-01-01

    Marsupials (metatherians), with their position in vertebrate phylogeny and their unique biological features, have been studied for many years by a dedicated group of researchers, but it has only been since the sequencing of the first marsupial genome that their value has been more widely recognised. We now have genome sequences for three distantly related marsupial species (the grey short-tailed opossum, the tammar wallaby, and Tasmanian devil), with the promise of many more genomes to be sequenced in the near future, making this a particularly exciting time in marsupial genomics. The emergence of a transmissible cancer, which is obliterating the Tasmanian devil population, has increased the importance of obtaining and analysing marsupial genome sequence for understanding such diseases as well as for conservation efforts. In addition, these genome sequences have facilitated studies aimed at answering questions regarding gene and genome evolution and provided insight into the evolution of epigenetic mechanisms. Here I highlight the major advances in our understanding of evolution and disease, facilitated by marsupial genome projects, and speculate on the future contributions to be made by such sequences. PMID:24278712

  3. Dr. Marco Marra: Pioneer and Visionary in Cancer Genomics Research | Office of Cancer Genomics

    Cancer.gov

    Dr. Marco Marra is a highly distinguished genomics and bioinformatics researcher. He is the Director of Canada’s Michael Smith Genome Sciences Centre at the BC Cancer Agency and holds a faculty position at the University of British Columbia. The Centre is a state-of-the-art sequencing facility in Vancouver, Canada, with a major focus on the study of cancers.  Many of their research projects are undertaken in collaborations with other Canadian and international institutions.

  4. The Genome Sequencing Center at NCGR

    SciTech Connect

    Schilkey, Faye

    2010-06-02

    Faye Schilkey from the National Center for Genome Resources discusses NCGR's research, sequencing and analysis experience on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  5. Atypical regions in large genomic DNA sequences

    SciTech Connect

    Scherer, S. |; McPeek, M.S.; Speed, T.P.

    1994-07-19

    Large genomic DNA sequences contain regions with distinctive patterns of sequence organization. The authors describe a method using logarithms of probabilities based on seventh-order Markov chains to rapidly identify genomic sequences that do not resemble models of genome organization built from compilations of octanucleotide usage. Data bases have been constructed from Escherichia coli and Saccharomyces cerevisiae DNA sequences of >1000 nt and human sequences of >10,000 nt. Atypical genes and clusters of genes have been located in bacteriophage, yeast, and primate DNA sequences. The authors consider criteria for statistical significance of the results, offer possible explanations for the observed variation in genome organization, and give additional applications of these methods in DNA sequence analysis.

  6. Complete Genome Sequence of Mycobacterium massiliense

    PubMed Central

    Raiol, Tainá; Ribeiro, Guilherme Menegói; Maranhão, Andréa Queiroz; Bocca, Anamélia Lorenzetti; Silva-Pereira, Ildinete; Junqueira-Kipnis, Ana Paula; Brigido, Marcelo de Macedo

    2012-01-01

    Mycobacterium massiliense is a rapidly growing bacterium associated with opportunistic infections. The genome of a representative isolate (strain GO 06) recovered from wound samples from patients who underwent arthroscopic or laparoscopic surgery was sequenced. To the best of our knowledge, this is the first announcement of the complete genome sequence of an M. massiliense strain. PMID:22965084

  7. Towards a reference pecan genome sequence

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The cost of generating DNA sequence data has declined dramatically over the previous 15 years as a result of the Human Genome Project and the potential applications of genome sequencing for human medicine. This cost reduction has generated renewed interest among crop breeding scientists in applying...

  8. Complete Genome Sequence of Gordonia terrae 3612

    PubMed Central

    Russell, Daniel A.; Guerrero Bustamante, Carlos A.; Garlena, Rebecca A.

    2016-01-01

    Here, we report the complete genome sequence of Gordonia terrae 3612, also known by the strain designations ATCC 25594, NRRL B-16283, and NBRC 100016. The genome sequence reveals it to be free of prophage and clustered regularly interspaced short palindromic repeats (CRISPRs), and it is an effective host for the isolation and characterization of Gordonia bacteriophages. PMID:27688316

  9. Complete Genome Sequence of Gordonia terrae 3612.

    PubMed

    Russell, Daniel A; Guerrero Bustamante, Carlos A; Garlena, Rebecca A; Hatfull, Graham F

    2016-01-01

    Here, we report the complete genome sequence of Gordonia terrae 3612, also known by the strain designations ATCC 25594, NRRL B-16283, and NBRC 100016. The genome sequence reveals it to be free of prophage and clustered regularly interspaced short palindromic repeats (CRISPRs), and it is an effective host for the isolation and characterization of Gordonia bacteriophages. PMID:27688316

  10. Genome Sequence of Gordonia Phage Yvonnetastic

    PubMed Central

    Bandyopadhyay, Anshika; Carlton, Meghan L.; Kane, Meghan T.; Panchal, Niyati J.; Pham, Yvonne C.; Reynolds, Zachary J.; Sapienza, Michael S.; German, Brian A.; McDonnell, Jill E.; Schafer, Claire E.; Yu, Victor J.; Furbee, Emily C.; Grubb, Sarah R.; Warner, Marcie H.; Montgomery, Matthew T.; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2016-01-01

    Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes. PMID:27389265

  11. Genome Sequence of Gordonia Phage Yvonnetastic.

    PubMed

    Pope, Welkin H; Bandyopadhyay, Anshika; Carlton, Meghan L; Kane, Meghan T; Panchal, Niyati J; Pham, Yvonne C; Reynolds, Zachary J; Sapienza, Michael S; German, Brian A; McDonnell, Jill E; Schafer, Claire E; Yu, Victor J; Furbee, Emily C; Grubb, Sarah R; Warner, Marcie H; Montgomery, Matthew T; Garlena, Rebecca A; Russell, Daniel A; Jacobs-Sera, Deborah; Hatfull, Graham F

    2016-01-01

    Gordonia bacteriophage Yvonnetastic was isolated from soil in Pittsburgh, PA, using Gordonia terrae 3612 as a host. Yvonnetastic has siphoviral morphology and a genome of 98,136 bp, with 198 predicted protein-coding genes and five tRNA genes. Yvonnetastic does not share substantial sequence similarity with other sequenced bacteriophage genomes. PMID:27389265

  12. Contact | Office of Cancer Genomics

    Cancer.gov

    For more information about the Office of Cancer Genomics, please contact: Office of Cancer Genomics National Cancer Institute 31 Center Drive, 10A07 Bethesda, Maryland 20892-2580 Phone: (301) 451-8027 Fax: (301) 480-4368 Email: ocg@mail.nih.gov *Please note that this site will not function properly in Internet Explorer unless you completely turn off the Compatibility View*

  13. Human genome sequencing in health and disease.

    PubMed

    Gonzaga-Jauregui, Claudia; Lupski, James R; Gibbs, Richard A

    2012-01-01

    Following the "finished," euchromatic, haploid human reference genome sequence, the rapid development of novel, faster, and cheaper sequencing technologies is making possible the era of personalized human genomics. Personal diploid human genome sequences have been generated, and each has contributed to our better understanding of variation in the human genome. We have consequently begun to appreciate the vastness of individual genetic variation from single nucleotide to structural variants. Translation of genome-scale variation into medically useful information is, however, in its infancy. This review summarizes the initial steps undertaken in clinical implementation of personal genome information, and describes the application of whole-genome and exome sequencing to identify the cause of genetic diseases and to suggest adjuvant therapies. Better analysis tools and a deeper understanding of the biology of our genome are necessary in order to decipher, interpret, and optimize clinical utility of what the variation in the human genome can teach us. Personal genome sequencing may eventually become an instrument of common medical practice, providing information that assists in the formulation of a differential diagnosis. We outline herein some of the remaining challenges.

  14. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus.

  15. The genome sequence of parrot bornavirus 5.

    PubMed

    Guo, Jianhua; Tizard, Ian

    2015-12-01

    Although several new avian bornaviruses have recently been described, information on their evolution, virulence, and sequence are often limited. Here we report the complete genome sequence of parrot bornavirus 5 (PaBV-5) isolated from a case of proventricular dilatation disease in a Palm cockatoo (Probosciger aterrimus). The complete genome consists of 8842 nucleotides with distinct 5' and 3' end sequences. This virus shares nucleotide sequence identities of 69-74 % with other bornaviruses in the genomic regions excluding the 5' and 3' terminal sequences. Phylogenetic analysis based on the genomic regions demonstrated this new isolate is an isolated branch within the clade that includes the aquatic bird bornaviruses and the passerine bornaviruses. Based on phylogenetic analyses and its low nucleotide sequence identities with other bornavirus, we support the proposal that PaBV-5 be assigned to a new bornavirus species:- Psittaciform 2 bornavirus. PMID:26403158

  16. Genome Sequencing and Analysis Conference IV

    SciTech Connect

    Not Available

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  17. Expanding the computational toolbox for mining cancer genomes

    PubMed Central

    Ding, Li; Wendl, Michael C.; McMichael, Joshua F.; Raphael, Benjamin J.

    2014-01-01

    High-throughput DNA sequencing has revolutionized cancer genomics with numerous discoveries relevant to cancer diagnosis and treatment. The latest sequencing and analysis methods have successfully identified somatic alterations including single nucleotide variants (SNVs), insertions and deletions (indels), structural aberrations, and gene fusions. Additional computational techniques have proved useful to define those mutations, genes, and molecular networks that drive diverse cancer phenotypes as well as determine clonal architectures in tumour samples. Collectively, these tools have advanced the study of genomic, transcriptomic, epigenomic alterations and their association to clinical properties. Here, we review cancer genomics software and the insights that have been gained from their application. PMID:25001846

  18. Genomic sequencing of Pleistocene cave bears

    SciTech Connect

    Noonan, James P.; Hofreiter, Michael; Smith, Doug; Priest, JamesR.; Rohland, Nadin; Rabeder, Gernot; Krause, Johannes; Detter, J. Chris; Paabo, Svante; Rubin, Edward M.

    2005-04-01

    Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome, the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.

  19. Human genetics and genomics a decade after the release of the draft sequence of the human genome.

    PubMed

    Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

    2011-10-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

  20. The genome sequence of Drosophila melanogaster.

    SciTech Connect

    2000-03-24

    The fly Drosophila melanogaster is one of the most intensively studied organisms in biology and serves as a model system for the investigation of many developmental and cellular processes common to higher eukaryotes, including humans. We have determined the nucleotide sequence of nearly all of the {approximately}120-megabase euchromatic portion of the Drosophila genome using a whole-genome shotgun sequencing strategy supported by extensive clone-based sequence and a high-quality bacterial artificial chromosome physical map. Efforts are under way to close the remaining gaps; however, the sequence is of sufficient accuracy and contiguity to be declared substantially complete and to support an initial analysis of genome structure and preliminary gene annotation and interpretation. The genome encodes {approximately}13,600 genes, somewhat fewer than the smaller Caenorhabditis elegans genome, but with comparable functional diversity.

  1. Sequencing the genome of the Atlantic salmon (Salmo salar)

    PubMed Central

    2010-01-01

    The International Collaboration to Sequence the Atlantic Salmon Genome (ICSASG) will produce a genome sequence that identifies and physically maps all genes in the Atlantic salmon genome and acts as a reference sequence for other salmonids. PMID:20887641

  2. Genomic Resources for Cancer Epidemiology

    Cancer.gov

    This page provides links to research resources, complied by the Epidemiology and Genomics Research Program, that may be of interest to genetic epidemiologists conducting cancer research, but is not exhaustive.

  3. Cancer Genome Anatomy Project | Office of Cancer Genomics

    Cancer.gov

    The National Cancer Institute (NCI) Cancer Genome Anatomy Project (CGAP) is an online resource designed to provide the research community access to biological tissue characterization data. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov.

  4. Sequencing and Analysis of Neanderthal Genomic DNA

    PubMed Central

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith, Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Pääbo, Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2008-01-01

    Our knowledge of Neanderthals is based on a limited number of remains and artifacts from which we must make inferences about their biology, behavior, and relationship to ourselves. Here, we describe the characterization of these extinct hominids from a new perspective, based on the development of a Neanderthal metagenomic library and its high-throughput sequencing and analysis. Several lines of evidence indicate that the 65,250 base pairs of hominid sequence so far identified in the library are of Neanderthal origin, the strongest being the ascertainment of sequence identities between Neanderthal and chimpanzee at sites where the human genomic sequence is different. These results enabled us to calculate the human-Neanderthal divergence time based on multiple randomly distributed autosomal loci. Our analyses suggest that on average the Neanderthal genomic sequence we obtained and the reference human genome sequence share a most recent common ancestor ~706,000 years ago, and that the human and Neanderthal ancestral populations split ~370,000 years ago, before the emergence of anatomically modern humans. Our finding that the Neanderthal and human genomes are at least 99.5% identical led us to develop and successfully implement a targeted method for recovering specific ancient DNA sequences from metagenomic libraries. This initial analysis of the Neanderthal genome advances our understanding of the evolutionary relationship of Homo sapiens and Homo neanderthalensis and signifies the dawn of Neanderthal genomics. PMID:17110569

  5. Microbial species delineation using whole genome sequences

    SciTech Connect

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  6. Genome sequence of Coxiella burnetii strain Namibia

    PubMed Central

    2014-01-01

    We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics. PMID:25593636

  7. Genome Sequence of Serratia plymuthica V4

    PubMed Central

    Cleto, S.; Van der Auwera, G.; Almeida, C.; Vieira, M. J.; Vlamakis, H.

    2014-01-01

    Serratia spp. are gammaproteobacteria and members of the family Enterobacteriaceae. Here, we announce the genome sequence of Serratia plymuthica strain V4, which produces the siderophore serratiochelin and antimicrobial compounds. PMID:24831138

  8. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  9. From sequence mapping to genome assemblies.

    PubMed

    Otto, Thomas D

    2015-01-01

    The development of "next-generation" high-throughput sequencing technologies has made it possible for many labs to undertake sequencing-based research projects that were unthinkable just a few years ago. Although the scientific applications are diverse, e.g., new genome projects, gene expression analysis, genome-wide functional screens, or epigenetics-the sequence data are usually processed in one of two ways: sequence reads are either mapped to an existing reference sequence, or they are built into a new sequence ("de novo assembly"). In this chapter, we first discuss some limitations of the mapping process and how these may be overcome through local sequence assembly. We then introduce the concept of de novo assembly and describe essential assembly improvement procedures such as scaffolding, contig ordering, gap closure, error evaluation, gene annotation transfer and ab initio gene annotation. The results are high-quality draft assemblies that will facilitate informative downstream analyses.

  10. Complete genome sequence of arracacha mottle virus.

    PubMed

    Orílio, Anelise F; Lucinda, Natalia; Dusi, André N; Nagata, Tatsuya; Inoue-Nagata, Alice K

    2013-01-01

    Arracacha mottle virus (AMoV) is the only potyvirus reported to infect arracacha (Arracacia xanthorrhiza) in Brazil. Here, the complete genome sequence of an isolate of AMoV was determined to be 9,630 nucleotides in length, excluding the 3' poly-A tail, and encoding a polyprotein of 3,135 amino acids and a putative P3N-PIPO protein. Its genomic organization is typical of a member of the genus Potyvirus, containing all conserved motifs. Its full genome sequence shared 56.2 % nucleotide identity with sunflower chlorotic mottle virus and verbena virus Y, the most closely related viruses.

  11. Precision medicine in breast cancer: genes, genomes, and the future of genomically driven treatments.

    PubMed

    Stover, Daniel G; Wagle, Nikhil

    2015-04-01

    Remarkable progress in sequencing technology over the past 20 years has made it possible to comprehensively profile tumors and identify clinically relevant genomic alterations. In breast cancer, the most common malignancy affecting women, we are now increasingly able to use this technology to help specify the use of therapies that target key molecular and genetic dependencies. Large sequencing studies have confirmed the role of well-known cancer-related genes and have also revealed numerous other genes that are recurrently mutated in breast cancer. This growing understanding of patient-to-patient variability at the genomic level in breast cancer is advancing our ability to direct the appropriate treatment to the appropriate patient at the appropriate time--a hallmark of "precision cancer medicine." This review focuses on the technological advances that have catalyzed these developments, the landscape of mutations in breast cancer, the clinical impact of genomic profiling, and the incorporation of genomic information into clinical care and clinical trials.

  12. Sequence resources at the Candida Genome Database.

    PubMed

    Arnaud, Martha B; Costanzo, Maria C; Skrzypek, Marek S; Shah, Prachi; Binkley, Gail; Lane, Christopher; Miyasato, Stuart R; Sherlock, Gavin

    2007-01-01

    The Candida Genome Database (CGD, http://www.candidagenome.org/) contains a curated collection of genomic information and community resources for researchers who are interested in the molecular biology of the opportunistic pathogen Candida albicans. With the recent release of a new assembly of the C.albicans genome, Assembly 20, C.albicans genomics has entered a new era. Although the C.albicans genome assembly continues to undergo refinement, multiple assemblies and gene nomenclatures will remain in widespread use by the research community. CGD has now taken on the responsibility of maintaining the most up-to-date version of the genome sequence by providing the data from this new assembly alongside the data from the previous assemblies, as well as any future corrections and refinements. In this database update, we describe the sequence information available for C.albicans, the sequence information contained in CGD, and the tools for sequence retrieval, analysis and comparison that CGD provides. CGD is freely accessible at http://www.candidagenome.org/ and CGD curators may be contacted by email at candida-curator@genome.stanford.edu.

  13. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  14. Draft Genome Sequence of Tombunodavirus UC1

    PubMed Central

    DeRisi, Joseph L.

    2015-01-01

    We report here the draft genome sequence of tombunodavirus UC1 assembled from metagenomic sequencing of organisms in San Francisco wastewater. This virus shares hallmarks of members of the Tombusviridae and the nodavirus-like Plasmopara halstedii and Sclerophthora macrospora viruses. PMID:26139709

  15. Draft Genome Sequence of Tombunodavirus UC1.

    PubMed

    Greninger, Alexander L; DeRisi, Joseph L

    2015-01-01

    We report here the draft genome sequence of tombunodavirus UC1 assembled from metagenomic sequencing of organisms in San Francisco wastewater. This virus shares hallmarks of members of the Tombusviridae and the nodavirus-like Plasmopara halstedii and Sclerophthora macrospora viruses. PMID:26139709

  16. Draft Genome Sequence of Goose Dicistrovirus

    PubMed Central

    Jerome, Keith R.

    2016-01-01

    We report the draft genome sequence of goose dicistrovirus assembled from the filtered feces of a Canadian goose from South Lake Union in Seattle, Washington. The 9.1-kb dicistronic RNA virus falls within the family Dicistroviridae; however, it shares <33% translated amino acid sequence within the nonstructural open reading frame (ORF) from aparavirus or cripavirus. PMID:26941149

  17. Complete Genome Sequences of 63 Mycobacteriophages

    PubMed Central

    2013-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. The current collection of sequenced mycobacteriophages—all isolated on a single host strain, Mycobacterium smegmatis mc2155, reveals substantial genetic diversity. The complete genome sequences of 63 newly isolated mycobacteriophages expand the resolution of our understanding of phage diversity. PMID:24285655

  18. Global Alignment System for Large Genomic Sequencing

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  19. Genome Sequence of Pseudomonas chlororaphis Strain 189

    PubMed Central

    Town, Jennifer; Audy, Patrice; Boyetchko, Susan M.

    2016-01-01

    Pseudomonas chlororaphis strain 189 is a potent inhibitor of the growth of the potato pathogen Phytophthora infestans. We determined the complete, finished sequence of the 6.8-Mbp genome of this strain, consisting of a single contiguous molecule. Strain 189 is closely related to previously sequenced strains of P. chlororaphis. PMID:27340063

  20. Sequencing error correction without a reference genome

    PubMed Central

    2013-01-01

    Background Next (second) generation sequencing is an increasingly important tool for many areas of molecular biology, however, care must be taken when interpreting its output. Even a low error rate can cause a large number of errors due to the high number of nucleotides being sequenced. Identifying sequencing errors from true biological variants is a challenging task. For organisms without a reference genome this difficulty is even more challenging. Results We have developed a method for the correction of sequencing errors in data from the Illumina Solexa sequencing platforms. It does not require a reference genome and is of relevance for microRNA studies, unsequenced genomes, variant detection in ultra-deep sequencing and even for RNA-Seq studies of organisms with sequenced genomes where RNA editing is being considered. Conclusions The derived error model is novel in that it allows different error probabilities for each position along the read, in conjunction with different error rates depending on the particular nucleotides involved in the substitution, and does not force these effects to behave in a multiplicative manner. The model provides error rates which capture the complex effects and interactions of the three main known causes of sequencing error associated with the Illumina platforms. PMID:24350580

  1. Computational Genomics: From Genome Sequence To Global Gene Regulation

    NASA Astrophysics Data System (ADS)

    Li, Hao

    2000-03-01

    As various genome projects are shifting to the post-sequencing phase, it becomes a big challenge to analyze the sequence data and extract biological information using computational tools. In the past, computational genomics has mainly focused on finding new genes and mapping out their biological functions. With the rapid accumulation of experimental data on genome-wide gene activities, it is now possible to understand how genes are regulated on a genomic scale. A major mechanism for gene regulation is to control the level of transcription, which is achieved by regulatory proteins that bind to short DNA sequences - the regulatory elements. We have developed a new approach to identifying regulatory elements in genomes. The approach formalizes how one would proceed to decipher a ``text'' consisting of a long string of letters written in an unknown language that did not delineate words. The algorithm is based on a statistical mechanics model in which the sequence is segmented probabilistically into ``words'' and a ``dictionary'' of ``words'' is built concurrently. For the control regions in the yeast genome, we built a ``dictionary'' of about one thousand words which includes many known as well as putative regulatory elements. I will discuss how we can use this dictionary to search for genes that are likely to be regulated in a similar fashion and to analyze gene expression data generated from DNA micro-array experiments.

  2. Genome Sequence of the Palaeopolyploid soybean

    SciTech Connect

    Schmutz, Jeremy; Cannon, Steven B.; Schlueter, Jessica; Ma, Jianxin; Mitros, Therese; Nelson, William; Hyten, David L.; Song, Qijian; Thelen, Jay J.; Cheng, Jianlin; Xu, Dong; Hellsten, Uffe; May, Gregory D.; Yu, Yeisoo; Sakura, Tetsuya; Umezawa, Taishi; Bhattacharyya, Madan K.; Sandhu, Devinder; Valliyodan, Babu; Lindquist, Erika; Peto, Myron; Grant, David; Shu, Shengqiang; Goodstein, David; Barry, Kerrie; Futrell-Griggs, Montona; Abernathy, Brian; Du, Jianchang; Tian, Zhixi; Zhu, Liucun; Gill, Navdeep; Joshi, Trupti; Libault, Marc; Sethuraman, Anand; Zhang, Xue-Cheng; Shinozaki, Kazuo; Nguyen, Henry T.; Wing, Rod A.; Cregan, Perry; Specht, James; Grimwood, Jane; Rokhsar, Dan; Stacey, Gary; Shoemaker, Randy C.; Jackson, Scott A.

    2009-08-03

    Soybean (Glycine max) is one of the most important crop plants for seed protein and oil content, and for its capacity to fix atmospheric nitrogen through symbioses with soil-borne microorganisms. We sequenced the 1.1-gigabase genome by a whole-genome shotgun approach and integrated it with physical and high-density genetic maps to create a chromosome-scale draft sequence assembly. We predict 46,430 protein-coding genes, 70percent more than Arabidopsis and similar to the poplar genome which, like soybean, is an ancient polyploid (palaeopolyploid). About 78percent of the predicted genes occur in chromosome ends, which comprise less than one-half of the genome but account for nearly all of the genetic recombination. Genome duplications occurred at approximately 59 and 13 million years ago, resulting in a highly duplicated genome with nearly 75percent of the genes present in multiple copies. The two duplication events were followed by gene diversification and loss, and numerous chromosome rearrangements. An accurate soybean genome sequence will facilitate the identification of the genetic basis of many soybean traits, and accelerate the creation of improved soybean varieties.

  3. Chapter 14: Cancer Genome Analysis

    PubMed Central

    Vazquez, Miguel; de la Torre, Victor; Valencia, Alfonso

    2012-01-01

    Although there is great promise in the benefits to be obtained by analyzing cancer genomes, numerous challenges hinder different stages of the process, from the problem of sample preparation and the validation of the experimental techniques, to the interpretation of the results. This chapter specifically focuses on the technical issues associated with the bioinformatics analysis of cancer genome data. The main issues addressed are the use of database and software resources, the use of analysis workflows and the presentation of clinically relevant action items. We attempt to aid new developers in the field by describing the different stages of analysis and discussing current approaches, as well as by providing practical advice on how to access and use resources, and how to implement recommendations. Real cases from cancer genome projects are used as examples. PMID:23300415

  4. Final progress report, Construction of a genome-wide highly characterized clone resource for genome sequencing

    SciTech Connect

    Nierman, William C.

    2000-02-14

    At TIGR, the human Bacterial Artificial Chromosome (BAC) end sequencing and trimming were with an overall sequencing success rate of 65%. CalTech human BAC libraries A, B, C and D as well as Roswell Park Cancer Institute's library RPCI-11 were used. To date, we have generated >300,000 end sequences from >186,000 human BAC clones with an average read length {approx}460 bp for a total of 141 Mb covering {approx}4.7% of the genome. Over sixty percent of the clones have BAC end sequences (BESs) from both ends representing over five-fold coverage of the genome by the paired-end clones. The average phred Q20 length is {approx}400 bp. This high accuracy makes our BESs match the human finished sequences with an average identity of 99% and a match length of 450 bp, and a frequency of one match per 12.8 kb contig sequence. Our sample tracking has ensured a clone tracking accuracy of >90%, which gives researchers a high confidence in (1) retrieving the right clone from the BA C libraries based on the sequence matches; and (2) building a minimum tiling path of sequence-ready clones across the genome and genome assembly scaffolds.

  5. NIH researchers complete whole-exome sequencing of skin cancer

    Cancer.gov

    A team led by researchers at NIH is the first to systematically survey the landscape of the melanoma genome, the DNA code of the deadliest form of skin cancer. The researchers have made surprising new discoveries using whole-exome sequencing, an approach that decodes the 1-2 percent of the genome that contains protein-coding genes.

  6. Genomic profiling of breast cancer.

    PubMed

    Pandey, Anjita; Singh, Alok Kumar; Maurya, Sanjeev Kumar; Rai, Rajani; Tewari, Mallika; Kumar, Mohan; Shukla, Hari S

    2009-05-01

    Genome study provides significant changes in the advancement of molecular diagnosis and treatment in Breast cancer. Several recent critical advances and high-throughput techniques identified the genomic trouble and dramatically accelerated the pace of research in preventing and curing this malignancy. Tumor-suppressor genes, proto-oncogenes, DNA-repair genes, carcinogen-metabolism genes are critically involved in progression of breast cancer. We reviewed imperative finding in breast genetics, ongoing work to segregate further susceptible genes, and preliminary studies on molecular profiling. PMID:19235775

  7. Genomic profiling of breast cancer.

    PubMed

    Pandey, Anjita; Singh, Alok Kumar; Maurya, Sanjeev Kumar; Rai, Rajani; Tewari, Mallika; Kumar, Mohan; Shukla, Hari S

    2009-05-01

    Genome study provides significant changes in the advancement of molecular diagnosis and treatment in Breast cancer. Several recent critical advances and high-throughput techniques identified the genomic trouble and dramatically accelerated the pace of research in preventing and curing this malignancy. Tumor-suppressor genes, proto-oncogenes, DNA-repair genes, carcinogen-metabolism genes are critically involved in progression of breast cancer. We reviewed imperative finding in breast genetics, ongoing work to segregate further susceptible genes, and preliminary studies on molecular profiling.

  8. Accelerating Genome Sequencing 100X with FPGAs

    SciTech Connect

    Storaasli, Olaf O; Strenski, Dave

    2007-01-01

    The performance of two Cray XD1 systems with Virtex-II Pro 50 and Virtex-4 LX160 FPGAs was evaluated using the FASTA computational biology program for human genome (DNA and protein) sequence comparisons. FPGA speedups of 50X (Virtex-II Pro 50) and 100X (Virtex-4 LX160) over a 2.2 GHz Opteron were obtained. FPGA coding issues for human genome data are described.

  9. Cancer Genome Anatomy Project (CGAP) | Office of Cancer Genomics

    Cancer.gov

    CGAP generated a wide range of genomics data on cancerous cells that are accessible through easy-to-use online tools. Researchers, educators, and students can find "in silico" answers to biological questions through the CGAP website. Request a free copy of the CGAP Website Virtual Tour CD from ocg@mail.nih.gov to learn how to navigate the website.

  10. Microbial species delineation using whole genome sequences

    PubMed Central

    Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita

    2015-01-01

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420

  11. Dana-Farber Cancer Institute | Office of Cancer Genomics

    Cancer.gov

    Functional Annotation of Cancer Genomes Principal Investigator: William C. Hahn, M.D., Ph.D. The comprehensive characterization of cancer genomes has and will continue to provide an increasingly complete catalog of genetic alterations in specific cancers. However, most epithelial cancers harbor hundreds of genetic alterations as a consequence of genomic instability. Therefore, the functional consequences of the majority of mutations remain unclear.

  12. Genome Walking by Next Generation Sequencing Approaches

    PubMed Central

    Volpicella, Mariateresa; Leoni, Claudia; Costanza, Alessandra; Fanizza, Immacolata; Placido, Antonio; Ceci, Luigi R.

    2012-01-01

    Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes. PMID:24832505

  13. Sorghum genome sequencing by methylation filtration.

    PubMed

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis.

  14. Sorghum Genome Sequencing by Methylation Filtration

    PubMed Central

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  15. Sorghum genome sequencing by methylation filtration.

    PubMed

    Bedell, Joseph A; Budiman, Muhammad A; Nunberg, Andrew; Citek, Robert W; Robbins, Dan; Jones, Joshua; Flick, Elizabeth; Rholfing, Theresa; Fries, Jason; Bradford, Kourtney; McMenamy, Jennifer; Smith, Michael; Holeman, Heather; Roe, Bruce A; Wiley, Graham; Korf, Ian F; Rabinowicz, Pablo D; Lakey, Nathan; McCombie, W Richard; Jeddeloh, Jeffrey A; Martienssen, Robert A

    2005-01-01

    Sorghum bicolor is a close relative of maize and is a staple crop in Africa and much of the developing world because of its superior tolerance of arid growth conditions. We have generated sequence from the hypomethylated portion of the sorghum genome by applying methylation filtration (MF) technology. The evidence suggests that 96% of the genes have been sequence tagged, with an average coverage of 65% across their length. Remarkably, this level of gene discovery was accomplished after generating a raw coverage of less than 300 megabases of the 735-megabase genome. MF preferentially captures exons and introns, promoters, microRNAs, and simple sequence repeats, and minimizes interspersed repeats, thus providing a robust view of the functional parts of the genome. The sorghum MF sequence set is beneficial to research on sorghum and is also a powerful resource for comparative genomics among the grasses and across the entire plant kingdom. Thousands of hypothetical gene predictions in rice and Arabidopsis are supported by the sorghum dataset, and genomic similarities highlight evolutionarily conserved regions that will lead to a better understanding of rice and Arabidopsis. PMID:15660154

  16. Using comparative genomics to reorder the human genome sequence into a virtual sheep genome

    PubMed Central

    Dalrymple, Brian P; Kirkness, Ewen F; Nefedov, Mikhail; McWilliam, Sean; Ratnakumar, Abhirami; Barris, Wes; Zhao, Shaying; Shetty, Jyoti; Maddox, Jillian F; O'Grady, Margaret; Nicholas, Frank; Crawford, Allan M; Smith, Tim; de Jong, Pieter J; McEwan, John; Oddy, V Hutton; Cockett, Noelle E

    2007-01-01

    Background Is it possible to construct an accurate and detailed subgene-level map of a genome using bacterial artificial chromosome (BAC) end sequences, a sparse marker map, and the sequences of other genomes? Results A sheep BAC library, CHORI-243, was constructed and the BAC end sequences were determined and mapped with high sensitivity and low specificity onto the frameworks of the human, dog, and cow genomes. To maximize genome coverage, the coordinates of all BAC end sequence hits to the cow and dog genomes were also converted to the equivalent human genome coordinates. The 84,624 sheep BACs (about 5.4-fold genome coverage) with paired ends in the correct orientation (tail-to-tail) and spacing, combined with information from sheep BAC comparative genome contigs (CGCs) built separately on the dog and cow genomes, were used to construct 1,172 sheep BAC-CGCs, covering 91.2% of the human genome. Clustered non-tail-to-tail and outsize BACs located close to the ends of many BAC-CGCs linked BAC-CGCs covering about 70% of the genome to at least one other BAC-CGC on the same chromosome. Using the BAC-CGCs, the intrachromosomal and interchromosomal BAC-CGC linkage information, human/cow and vertebrate synteny, and the sheep marker map, a virtual sheep genome was constructed. To identify BACs potentially located in gaps between BAC-CGCs, an additional set of 55,668 sheep BACs were positioned on the sheep genome with lower confidence. A coordinate conversion process allowed us to transfer human genes and other genome features to the virtual sheep genome to display on a sheep genome browser. Conclusion We demonstrate that limited sequencing of BACs combined with positioning on a well assembled genome and integrating locations from other less well assembled genomes can yield extensive, detailed subgene-level maps of mammalian genomes, for which genomic resources are currently limited. PMID:17663790

  17. Assessment of Whole Genome Amplification for Sequence Capture and Massively Parallel Sequencing

    PubMed Central

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information. PMID:24409309

  18. Assessment of whole genome amplification for sequence capture and massively parallel sequencing.

    PubMed

    Hasmats, Johanna; Gréen, Henrik; Orear, Cedric; Validire, Pierre; Huss, Mikael; Käller, Max; Lundeberg, Joakim

    2014-01-01

    Exome sequence capture and massively parallel sequencing can be combined to achieve inexpensive and rapid global analyses of the functional sections of the genome. The difficulties of working with relatively small quantities of genetic material, as may be necessary when sharing tumor biopsies between collaborators for instance, can be overcome using whole genome amplification. However, the potential drawbacks of using a whole genome amplification technology based on random primers in combination with sequence capture followed by massively parallel sequencing have not yet been examined in detail, especially in the context of mutation discovery in tumor material. In this work, we compare mutations detected in sequence data for unamplified DNA, whole genome amplified DNA, and RNA originating from the same tumor tissue samples from 16 patients diagnosed with non-small cell lung cancer. The results obtained provide a comprehensive overview of the merits of these techniques for mutation analysis. We evaluated the identified genetic variants, and found that most (74%) of them were observed in both the amplified and the unamplified sequence data. Eighty-nine percent of the variations found by WGA were shared with unamplified DNA. We demonstrate a strategy for avoiding allelic bias by including RNA-sequencing information.

  19. Multilocus sequence typing of total-genome-sequenced bacteria.

    PubMed

    Larsen, Mette V; Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W; Aarestrup, Frank M; Lund, Ole

    2012-04-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the "gold standard" of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  20. Multilocus Sequence Typing of Total-Genome-Sequenced Bacteria

    PubMed Central

    Cosentino, Salvatore; Rasmussen, Simon; Friis, Carsten; Hasman, Henrik; Marvig, Rasmus Lykke; Jelsbak, Lars; Sicheritz-Pontén, Thomas; Ussery, David W.; Aarestrup, Frank M.; Lund, Ole

    2012-01-01

    Accurate strain identification is essential for anyone working with bacteria. For many species, multilocus sequence typing (MLST) is considered the “gold standard” of typing, but it is traditionally performed in an expensive and time-consuming manner. As the costs of whole-genome sequencing (WGS) continue to decline, it becomes increasingly available to scientists and routine diagnostic laboratories. Currently, the cost is below that of traditional MLST. The new challenges will be how to extract the relevant information from the large amount of data so as to allow for comparison over time and between laboratories. Ideally, this information should also allow for comparison to historical data. We developed a Web-based method for MLST of 66 bacterial species based on WGS data. As input, the method uses short sequence reads from four sequencing platforms or preassembled genomes. Updates from the MLST databases are downloaded monthly, and the best-matching MLST alleles of the specified MLST scheme are found using a BLAST-based ranking method. The sequence type is then determined by the combination of alleles identified. The method was tested on preassembled genomes from 336 isolates covering 56 MLST schemes, on short sequence reads from 387 isolates covering 10 schemes, and on a small test set of short sequence reads from 29 isolates for which the sequence type had been determined by traditional methods. The method presented here enables investigators to determine the sequence types of their isolates on the basis of WGS data. This method is publicly available at www.cbs.dtu.dk/services/MLST. PMID:22238442

  1. Genomics of Cancer and a New Era for Cancer Prevention

    PubMed Central

    Brennan, Paul; Wild, Christopher P.

    2015-01-01

    A primary justification for dedicating substantial amounts of research funding to large-scale cancer genomics projects of both somatic and germline DNA is that the biological insights will lead to new treatment targets and strategies for cancer therapy. While it is too early to judge the success of these projects in terms of clinical breakthroughs, an alternative rationale is that new genomics techniques can be used to reduce the overall burden of cancer by prevention of new cases occurring and also by detecting them earlier. In particular, it is now becoming apparent that studying the genomic profile of tumors can help to identify new carcinogens and may subsequently result in implementing strategies that limit exposure. In parallel, it may be feasible to utilize genomic biomarkers to identify cancers at an earlier and more treatable stage using screening or other early detection approaches based on prediagnostic biospecimens. While the potential for these techniques is large, their successful outcome will depend on international collaboration and planning similar to that of recent sequencing initiatives. PMID:26540230

  2. Mapping and sequencing the human genome

    SciTech Connect

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  3. Mapping and Sequencing the Human Genome

    DOE R&D Accomplishments Database

    1988-01-01

    Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.

  4. Genome Sequence of Gordonia Phage Emalyn

    PubMed Central

    Guido, Madeline J.; Iyengar, Pragnya; Nigra, Jonathan T.; Serbin, Matthew B.; Kasturiarachi, Naomi S.; Pressimone, Catherine A.; Schiebel, Johnathon G.; Furbee, Emily C.; Grubb, Sarah R.; Warner, Marcie H.; Montgomery, Matthew T.; Garlena, Rebecca A.; Russell, Daniel A.; Jacobs-Sera, Deborah; Hatfull, Graham F.

    2016-01-01

    Emalyn is a newly isolated bacteriophage of Gordonia terrae 3612 and has a double-stranded DNA genome 43,982 bp long with 67 predicted protein-encoding genes, 32 of which we can assign putative functions. Emalyn has a prolate capsid and has extensive nucleotide similarity with several previously sequenced phages. PMID:27516499

  5. Complete Genome Sequences of 61 Mycobacteriophages

    PubMed Central

    2016-01-01

    Mycobacteriophages—viruses of mycobacteria—provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis. Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc2155 that expand our understanding of phage diversity. PMID:27389257

  6. Complete Genome Sequences of 61 Mycobacteriophages.

    PubMed

    Hatfull, Graham F

    2016-01-01

    Mycobacteriophages-viruses of mycobacteria-provide insights into viral diversity and evolution as well as numerous tools for genetic dissection of Mycobacterium tuberculosis Here we report the complete genome sequences of 61 mycobacteriophages newly isolated from environmental samples using Mycobacterium smegmatis mc(2)155 that expand our understanding of phage diversity. PMID:27389257

  7. Draft genome sequence of Virgibacillus halodenitrificans 1806.

    PubMed

    Lee, Sang-Jae; Lee, Yong-Jik; Jeong, Haeyoung; Lee, Sang Jun; Lee, Han-Seung; Pan, Jae-Gu; Kim, Byoung-Chan; Lee, Dong-Woo

    2012-11-01

    Virgibacillus halodenitrificans 1806 is an endospore-forming halophilic bacterium isolated from salterns in Korea. Here, we report the draft genome sequence of V. halodenitrificans 1806, which may reveal the molecular basis of osmoadaptation and insights into carbon and anaerobic metabolism in moderate halophiles. PMID:23105070

  8. Genome Sequence of Gordonia Phage Emalyn.

    PubMed

    Pope, Welkin H; Guido, Madeline J; Iyengar, Pragnya; Nigra, Jonathan T; Serbin, Matthew B; Kasturiarachi, Naomi S; Pressimone, Catherine A; Schiebel, Johnathon G; Furbee, Emily C; Grubb, Sarah R; Warner, Marcie H; Montgomery, Matthew T; Garlena, Rebecca A; Russell, Daniel A; Jacobs-Sera, Deborah; Hatfull, Graham F

    2016-01-01

    Emalyn is a newly isolated bacteriophage of Gordonia terrae 3612 and has a double-stranded DNA genome 43,982 bp long with 67 predicted protein-encoding genes, 32 of which we can assign putative functions. Emalyn has a prolate capsid and has extensive nucleotide similarity with several previously sequenced phages. PMID:27516499

  9. Applications of Genomic Sequencing in Pediatric CNS Tumors.

    PubMed

    Bavle, Abhishek A; Lin, Frank Y; Parsons, D Williams

    2016-05-01

    Recent advances in genome-scale sequencing methods have resulted in a significant increase in our understanding of the biology of human cancers. When applied to pediatric central nervous system (CNS) tumors, these remarkable technological breakthroughs have facilitated the molecular characterization of multiple tumor types, provided new insights into the genetic basis of these cancers, and prompted innovative strategies that are changing the management paradigm in pediatric neuro-oncology. Genomic tests have begun to affect medical decision making in a number of ways, from delineating histopathologically similar tumor types into distinct molecular subgroups that correlate with clinical characteristics, to guiding the addition of novel therapeutic agents for patients with high-risk or poor-prognosis tumors, or alternatively, reducing treatment intensity for those with a favorable prognosis. Genomic sequencing has also had a significant impact on translational research strategies in pediatric CNS tumors, resulting in wide-ranging applications that have the potential to direct the rational preclinical screening of novel therapeutic agents, shed light on tumor heterogeneity and evolution, and highlight differences (or similarities) between pediatric and adult CNS tumors. Finally, in addition to allowing the identification of somatic (tumor-specific) mutations, the analysis of patient-matched constitutional (germline) DNA has facilitated the detection of pathogenic germline alterations in cancer genes in patients with CNS tumors, with critical implications for genetic counseling and tumor surveillance strategies for children with familial predisposition syndromes. As our understanding of the molecular landscape of pediatric CNS tumors continues to advance, innovative applications of genomic sequencing hold significant promise for further improving the care of children with these cancers. PMID:27188671

  10. Gambling on a shortcut to genome sequencing

    SciTech Connect

    Roberts, L.

    1991-06-21

    Almost from the start of the Human Genome Project, a debate has been raging over whether to sequence the entire human genome, all 3 billion bases, or just the genes - a mere 2% or 3% of the genome, and by far the most interesting part. In England, Sydney Brenner convinced the Medical Research Council (MRC) to start with the expressed genes, or complementary DNAs. But the US stance has been that the entire sequence is essential if we are to understand the blueprint of man. Craig Venter of the National Institute of Neurological Disorders and Stroke says that focusing on the expressed genes may be even more useful than expected. His strategy involves randomly selecting clones from cDNA libraries which theoretically contain all the genes that are switched on at a particular time in a particular tissue. Then the researchers sequence just a short stretch of each clone, about 400 to 500 bases, to create can expressed sequence tag or EST. The sequences of these ESTs are then stored in a database. Using that information, other researchers can then recreate that EST by using polymerase chain reaction techniques.

  11. The topography of mutational processes in breast cancer genomes

    PubMed Central

    Morganella, Sandro; Alexandrov, Ludmil B.; Glodzik, Dominik; Zou, Xueqing; Davies, Helen; Staaf, Johan; Sieuwerts, Anieta M.; Brinkman, Arie B.; Martin, Sancha; Ramakrishna, Manasa; Butler, Adam; Kim, Hyung-Yong; Borg, Åke; Sotiriou, Christos; Futreal, P. Andrew; Campbell, Peter J.; Span, Paul N.; Van Laere, Steven; Lakhani, Sunil R.; Eyfjord, Jorunn E.; Thompson, Alastair M.; Stunnenberg, Hendrik G.; van de Vijver, Marc J.; Martens, John W. M.; Børresen-Dale, Anne-Lise; Richardson, Andrea L.; Kong, Gu; Thomas, Gilles; Sale, Julian; Rada, Cristina; Stratton, Michael R.; Birney, Ewan; Nik-Zainal, Serena

    2016-01-01

    Somatic mutations in human cancers show unevenness in genomic distribution that correlate with aspects of genome structure and function. These mutations are, however, generated by multiple mutational processes operating through the cellular lineage between the fertilized egg and the cancer cell, each composed of specific DNA damage and repair components and leaving its own characteristic mutational signature on the genome. Using somatic mutation catalogues from 560 breast cancer whole-genome sequences, here we show that each of 12 base substitution, 2 insertion/deletion (indel) and 6 rearrangement mutational signatures present in breast tissue, exhibit distinct relationships with genomic features relating to transcription, DNA replication and chromatin organization. This signature-based approach permits visualization of the genomic distribution of mutational processes associated with APOBEC enzymes, mismatch repair deficiency and homologous recombinational repair deficiency, as well as mutational processes of unknown aetiology. Furthermore, it highlights mechanistic insights including a putative replication-dependent mechanism of APOBEC-related mutagenesis. PMID:27136393

  12. The topography of mutational processes in breast cancer genomes

    SciTech Connect

    Morganella, Sandro; Alexandrov, Ludmil B.; Glodzik, Dominik; Zou, Xueqing; Davies, Helen; Staaf, Johan; Sieuwerts, Anieta M.; Brinkman, Arie B.; Martin, Sancha; Ramakrishna, Manasa; Butler, Adam; Kim, Hyung -Yong; Borg, Ake; Sotiriou, Christos; Futreal, P. Andrew; Campbell, Peter J.; Span, Paul N.; Van Laere, Steven; Lakhani, Sunil R.; Eyfjord, Jorunn E.; Thompson, Alastair M.; Stunnenberg, Hendrik G.; van de Vijver, Marc J.; Martens, John W. M.; Borresen-Dale, Anne -Lise; Richardson, Andrea L.; Kong, Gu; Thomas, Gilles; Sale, Julian; Rada, Cristina; Stratton, Michael R.; Birney, Ewan; Nik-Zainal, Serena

    2016-01-01

    Somatic mutations in human cancers show unevenness in genomic distribution that correlate with aspects of genome structure and function. These mutations are, however, generated by multiple mutational processes operating through the cellular lineage between the fertilized egg and the cancer cell, each composed of specific DNA damage and repair components and leaving its own characteristic mutational signature on the genome. Using somatic mutation catalogues from 560 breast cancer whole-genome sequences, here we show that each of 12 base substitution, 2 insertion/deletion (indel) and 6 rearrangement mutational signatures present in breast tissue, exhibit distinct relationships with genomic features relating to transcription, DNA replication and chromatin organization. This signature-based approach permits visualization of the genomic distribution of mutational processes associated with APOBEC enzymes, mismatch repair deficiency and homologous recombinational repair deficiency, as well as mutational processes of unknown aetiology. Lastly, it highlights mechanistic insights including a putative replication-dependent mechanism of APOBEC-related mutagenesis.

  13. Subclonal diversification of primary breast cancer revealed by multiregion sequencing

    PubMed Central

    Yates, Lucy R; Gerstung, Moritz; Knappskog, Stian; Desmedt, Christine; Gundem, Gunes; Loo, Peter Van; Aas, Turid; Alexandrov, Ludmil B; Larsimont, Denis; Davies, Helen; Li, Yilong; Ju, Young Seok; Ramakrishna, Manasa; Haugland, Hans Kristian; Lilleng, Peer Kaare; Nik-Zainal, Serena; McLaren, Stuart; Butler, Adam; Martin, Sancha; Glodzik, Dominic; Menzies, Andrew; Raine, Keiran; Hinton, Jonathan; Jones, David; Mudie, Laura J; Jiang, Bing; Vincent, Delphine; Greene-Colozzi, April; Adnet, Pierre-Yves; Fatima, Aquila; Maetens, Marion; Ignatiadis, Michail; Stratton, Michael R; Sotiriou, Christos; Richardson, Andrea L; Lønning, Per Eystein; Wedge, David C; Campbell, Peter J

    2015-01-01

    Sequencing cancer genomes may enable tailoring of therapeutics to the underlying biological abnormalities driving a particular patient’s tumor. However, sequencing-based strategies rely heavily on representative sampling of tumors. To understand the subclonal structure of primary breast cancer, we applied whole genome and targeted sequencing to multiple samples from each of 50 patients’ tumors (total 303). The extent of subclonal diversification varied among cases and followed spatial patterns. No strict temporal order was evident, with point mutations and rearrangements affecting the most common breast cancer genes, including PIK3CA, TP53, PTEN, BRCA2 and MYC, occurring early in some tumors and late in others. In 13/50 cancers, potentially targetable mutations were subclonal. Landmarks of disease progression, such as resisting chemotherapy and acquiring invasive or metastatic potential, arose within detectable subclones of antecedent lesions. These findings highlight the importance of including analyses of subclonal structure and tumor evolution in clinical trials of primary breast cancer. PMID:26099045

  14. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  15. Agaricus bisporus genome sequence: a commentary.

    PubMed

    Kerrigan, Richard W; Challen, Michael P; Burton, Kerry S

    2013-06-01

    The genomes of two isolates of Agaricus bisporus have been sequenced recently. This soil-inhabiting fungus has a wide geographical distribution in nature and it is also cultivated in an industrialized indoor process ($4.7bn annual worldwide value) to produce edible mushrooms. Previously this lignocellulosic fungus has resisted precise econutritional classification, i.e. into white- or brown-rot decomposers. The generation of the genome sequence and transcriptomic analyses has revealed a new classification, 'humicolous', for species adapted to grow in humic-rich, partially decomposed leaf material. The Agaricus biporus genomes contain a collection of polysaccharide and lignin-degrading genes and more interestingly an expanded number of genes (relative to other lignocellulosic fungi) that enhance degradation of lignin derivatives, i.e. heme-thiolate peroxidases and β-etherases. A motif that is hypothesized to be a promoter element in the humicolous adaptation suite is present in a large number of genes specifically up-regulated when the mycelium is grown on humic-rich substrate. The genome sequence of A. bisporus offers a platform to explore fungal biology in carbon-rich soil environments and terrestrial cycling of carbon, nitrogen, phosphorus and potassium.

  16. Genome-wide epigenetic modifications in cancer.

    PubMed

    Park, Yoon Jung; Claus, Rainer; Weichenhan, Dieter; Plass, Christoph

    2011-01-01

    Epigenetic alterations in cancer include changes in DNA methylation and associated histone modifications that influence the chromatin states and impact gene expression patterns. Due to recent technological advantages, the scientific community is now obtaining a better picture of the genome-wide epigenetic changes that occur in a cancer genome. These epigenetic alterations are associated with chromosomal instability and changes in transcriptional control which influence the overall gene expression differences seen in many human malignancies. In this review, we will briefly summarize our current knowledge of the epigenetic patterns and mechanisms of gene regulation in healthy tissues and relate this to what is known for cancer genomes. Our focus will be on DNA methylation. We will review the current standing of technologies that have been developed over recent years. This field is experiencing a revolution in the strategies used to measure epigenetic alterations, which includes the incorporation of next generation sequencing tools. We also will review strategies that utilize epigenetic information for translational purposes, with a special emphasis on the potential use of DNA methylation marks for early disease detection and prognosis. The review will close with an outlook on challenges that this field is facing.

  17. Defining Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    Chain, Patrick

    2009-05-27

    Patrick Chain of the DOE Joint Genome Institute gives a talk on behalf of the International Genome Sequencing Standards Consortium on the need for intermediate genome classifications between "draft" and "finished"

  18. Whole-genome sequencing in bacteriology: state of the art

    PubMed Central

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics. PMID:24143115

  19. Whole-genome sequencing in bacteriology: state of the art.

    PubMed

    Dark, Michael J

    2013-01-01

    Over the last ten years, genome sequencing capabilities have expanded exponentially. There have been tremendous advances in sequencing technology, DNA sample preparation, genome assembly, and data analysis. This has led to advances in a number of facets of bacterial genomics, including metagenomics, clinical medicine, bacterial archaeology, and bacterial evolution. This review examines the strengths and weaknesses of techniques in bacterial genome sequencing, upcoming technologies, and assembly techniques, as well as highlighting recent studies that highlight new applications for bacterial genomics.

  20. Draft genome sequence of Actinomyces massiliensis strain 4401292T.

    PubMed

    Roux, Véronique; Robert, Catherine; Gimenez, Grégory; Gharbi, Reem; Raoult, Didier

    2012-09-01

    A draft genome sequence of Actinomyces massiliensis, an anaerobic bacterium isolated from a patient's blood culture, is described here. CRISPR-associated proteins, insertion sequences, and toxin-antitoxin loci were found on the genome.

  1. Draft Genome Sequence of Mycobacterium brumae ATCC 51384

    PubMed Central

    D'Auria, Giuseppe

    2016-01-01

    Here, we report the draft genome sequence of Mycobacterium brumae type strain ATCC 51384. This is the first draft genome sequence of M. brumae, a nonpathogenic, rapidly growing, nonchromogenic mycobacterium, with immunotherapeutic capacities. PMID:27125480

  2. Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness

    MedlinePlus

    ... For Consumers Home For Consumers Consumer Updates Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness ... Bacteria that cause disease have millions of different genomes, or sequences of genetic code, each as unique ...

  3. Draft Genome Sequence of Mycobacterium brumae ATCC 51384.

    PubMed

    D'Auria, Giuseppe; Torrents, Eduard; Luquin, Marina; Comas, Iñaki; Julián, Esther

    2016-01-01

    Here, we report the draft genome sequence of Mycobacterium brumae type strain ATCC 51384. This is the first draft genome sequence of M. brumae, a nonpathogenic, rapidly growing, nonchromogenic mycobacterium, with immunotherapeutic capacities. PMID:27125480

  4. Genome Sequence of Psychrobacter cibarius Strain W1.

    PubMed

    Raghupathi, Prem K; Herschend, Jakob; Røder, Henriette L; Sørensen, Søren J; Burmølle, Mette

    2016-01-01

    Here, we report the draft genome sequence of Psychrobacter cibarius strain W1, which was isolated at a slaughterhouse in Denmark. The 3.63-Mb genome sequence was assembled into 241 contigs. PMID:27231353

  5. Genome instability, cancer and aging

    PubMed Central

    Maslov, Alexander Y.; Vijg, Jan

    2015-01-01

    DNA damage-driven genome instability underlies the diversity of life forms generated by the evolutionary process but is detrimental to the somatic cells of individual organisms. The cellular response to DNA damage can be roughly divided in two parts. First, when damage is severe, programmed cell death may occur or, alternatively, temporary or permanent cell cycle arrest. This protects against cancer but can have negative effects on the long term, e.g., by depleting stem cell reservoirs. Second, damage can be repaired through one or more of the many sophisticated genome maintenance pathways. However, erroneous DNA repair and incomplete restoration of chromatin after damage is resolved, produce mutations and epimutations, respectively, both of which have been shown to accumulate with age. An increased burden of mutations and/or epimutations in aged tissues increases cancer risk and adversely affects gene transcriptional regulation, leading to progressive decline in organ function. Cellular degeneration and uncontrolled cell proliferation are both major hallmarks of aging. Despite the fact that one seems to exclude the other, they both may be driven by a common mechanism. Here, we review age related changes in the mammalian genome and their possible functional consequences, with special emphasis on genome instability in stem/progenitor cells. PMID:19344750

  6. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  7. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  8. International network of cancer genome projects

    PubMed Central

    2010-01-01

    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumors from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of over 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically-relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies. PMID:20393554

  9. Simple sequence repeats in prokaryotic genomes

    PubMed Central

    Mrázek, Jan; Guo, Xiangxue; Shah, Apurva

    2007-01-01

    Simple sequence repeats (SSRs) in DNA sequences are composed of tandem iterations of short oligonucleotides and may have functional and/or structural properties that distinguish them from general DNA sequences. They are variable in length because of slip-strand mutations and may also affect local structure of the DNA molecule or the encoded proteins. Long SSRs (LSSRs) are common in eukaryotes but rare in most prokaryotes. In pathogens, SSRs can enhance antigenic variance of the pathogen population in a strategy that counteracts the host immune response. We analyze representations of SSRs in >300 prokaryotic genomes and report significant differences among different prokaryotes as well as among different types of SSRs. LSSRs composed of short oligonucleotides (1–4 bp length, designated LSSR1–4) are often found in host-adapted pathogens with reduced genomes that are not known to readily survive in a natural environment outside the host. In contrast, LSSRs composed of longer oligonucleotides (5–11 bp length, designated LSSR5–11) are found mostly in nonpathogens and opportunistic pathogens with large genomes. Comparisons among SSRs of different lengths suggest that LSSR1–4 are likely maintained by selection. This is consistent with the established role of some LSSR1–4 in enhancing antigenic variance. By contrast, abundance of LSSR5–11 in some genomes may reflect the SSRs' general tendency to expand rather than their specific role in the organisms' physiology. Differences among genomes in terms of SSR representations and their possible interpretations are discussed. PMID:17485665

  10. Elucidating population histories using genomic DNA sequences.

    PubMed

    Vigilant, Linda

    2009-04-01

    In 1993, Cliff Jolly suggested that rather than debating species definitions and classifications, energy would be better spent investigating multidimensional patterns of variation and gene flow among populations. Until now, however, genetic studies of wild primate populations have been limited to very small portions of the genome. Access to complete genome sequences of humans, chimpanzees, macaques, and other primates makes it possible to design studies surveying substantial amounts of DNA sequence variation at multiple genetic loci in representatives of closely related but distinct wild primate populations. Such data can be analyzed with new approaches that estimate not only when populations diverged but also the relative amounts and directions of subsequent gene flow. These analyses will reemphasize the difficulty of achieving consistent species and subspecies definitions by revealing the extent of variation in the amount and duration of gene flow accompanying population divergences. PMID:19817223

  11. Complete genome sequence of Piry vesiculovirus.

    PubMed

    de Souza, William Marciel; Acrani, Gustavo Olszanski; Romeiro, Marilia Farignoli; Júnior, Osvaldo Reis; Tolardo, Aline Lavado; de Andrade, Amanda Araújo Serrão; da Silva Gonçalves Vianez Júnior, João Lídio; de Almeida Medeiros, Daniele Barbosa; Nunes, Márcio Roberto Teixeira; Figueiredo, Luiz Tadeu Moraes

    2016-08-01

    Piry virus (PIRYV) is a rhabdovirus (genus Vesiculovirus) and is described as a possible human pathogen, originally isolated from a Philander opossum trapped in Para State, Northern Brazil. This study describes the complete full coding sequence and the genetic characterization of PIRYV. The genome sequence reveals that PIRYV has a typical vesiculovirus-like organization, encoding the five genes typical of the genus. Phylogenetic analysis confirmed that PIRYV is most closely related to Perinet virus and clustered in the same clade as Chandipura and Isfahan vesiculoviruses. PMID:27216928

  12. Draft genome sequence of an aflatoxigenic Aspergillus species, A. bombycis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. bombycis Type strain was sequenced using a Personal Genome Machine, followed by annotation of its predicted genes. The genome size for A. bombycis was found to be approximately 37 Mb and contained 12,266 genes. This announcement introduces a sequenced genome for an aflatoxigenic...

  13. The Norway spruce genome sequence and conifer genome evolution.

    PubMed

    Nystedt, Björn; Street, Nathaniel R; Wetterbom, Anna; Zuccolo, Andrea; Lin, Yao-Cheng; Scofield, Douglas G; Vezzi, Francesco; Delhomme, Nicolas; Giacomello, Stefania; Alexeyenko, Andrey; Vicedomini, Riccardo; Sahlin, Kristoffer; Sherwood, Ellen; Elfstrand, Malin; Gramzow, Lydia; Holmberg, Kristina; Hällman, Jimmie; Keech, Olivier; Klasson, Lisa; Koriabine, Maxim; Kucukoglu, Melis; Käller, Max; Luthman, Johannes; Lysholm, Fredrik; Niittylä, Totte; Olson, Ake; Rilakovic, Nemanja; Ritland, Carol; Rosselló, Josep A; Sena, Juliana; Svensson, Thomas; Talavera-López, Carlos; Theißen, Günter; Tuominen, Hannele; Vanneste, Kevin; Wu, Zhi-Qiang; Zhang, Bo; Zerbe, Philipp; Arvestad, Lars; Bhalerao, Rishikesh; Bohlmann, Joerg; Bousquet, Jean; Garcia Gil, Rosario; Hvidsten, Torgeir R; de Jong, Pieter; MacKay, John; Morgante, Michele; Ritland, Kermit; Sundberg, Björn; Thompson, Stacey Lee; Van de Peer, Yves; Andersson, Björn; Nilsson, Ove; Ingvarsson, Pär K; Lundeberg, Joakim; Jansson, Stefan

    2013-05-30

    Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.

  14. Genome sequence of Leuconostoc pseudomesenteroides KCTC 3652.

    PubMed

    Kim, Dong-Wook; Choi, Sang-Haeng; Kang, Aram; Nam, Seong-Hyeuk; Kim, Ryong Nam; Kim, Aeri; Kim, Dae-Soo; Park, Hong-Seog

    2011-08-01

    We announce the genome sequence of one of the most prevalent lactic acid bacteria present during the manufacturing process of cane juice, the type strain Leuconostoc pseudomesenteroides KCTC 3652 (3,244,985 bp, with a G+C content of 38.3%), which consists of 1,160 large contigs (>100 bp in size). All of the contigs were assembled by the Newbler Assembler 2.3 software program (454 Life Sciences).

  15. Draft Genome Sequence of Rubrivivax gelatinosus CBS

    PubMed Central

    Hu, Pingsha; Lang, Juan; Wawrousek, Karen; Yu, Jianping; Maness, Pin-Ching

    2012-01-01

    Rubrivivax gelatinosus CBS, a purple nonsulfur photosynthetic bacterium, can grow photosynthetically using CO and N2 as the sole carbon and nitrogen nutrients, respectively. R. gelatinosus CBS is of particular interest due to its ability to metabolize CO and yield H2. We present the 5-Mb draft genome sequence of R. gelatinosus CBS with the goal of providing genetic insight into the metabolic properties of this bacterium. PMID:22628496

  16. Complete genome sequence of Candidatus Ruthia magnifica.

    PubMed

    Roeselers, Guus; Newton, Irene L G; Woyke, Tanja; Auchtung, Thomas A; Dilly, Geoffrey F; Dutton, Rachel J; Fisher, Meredith C; Fontanez, Kristina M; Lau, Evan; Stewart, Frank J; Richardson, Paul M; Barry, Kerrie W; Saunders, Elizabeth; Detter, John C; Wu, Dongying; Eisen, Jonathan A; Cavanaugh, Colleen M

    2010-01-01

    The hydrothermal vent clam Calyptogena magnifica (Bivalvia: Mollusca) is a member of the Vesicomyidae. Species within this family form symbioses with chemosynthetic Gammaproteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a rudimentary gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. The C. magnifica symbiont, Candidatus Ruthia magnifica, was the first intracellular sulfur-oxidizing endosymbiont to have its genome sequenced (Newton et al. 2007). Here we expand upon the original report and provide additional details complying with the emerging MIGS/MIMS standards. The complete genome exposed the genetic blueprint of the metabolic capabilities of the symbiont. Genes which were predicted to encode the proteins required for all the metabolic pathways typical of free-living chemoautotrophs were detected in the symbiont genome. These include major pathways including carbon fixation, sulfur oxidation, nitrogen assimilation, as well as amino acid and cofactor/vitamin biosynthesis. This genome sequence is invaluable in the study of these enigmatic associations and provides insights into the origin and evolution of autotrophic endosymbiosis.

  17. The predictive capacity of personal genome sequencing.

    PubMed

    Roberts, Nicholas J; Vogelstein, Joshua T; Parmigiani, Giovanni; Kinzler, Kenneth W; Vogelstein, Bert; Velculescu, Victor E

    2012-05-01

    New DNA sequencing methods will soon make it possible to identify all germline variants in any individual at a reasonable cost. However, the ability of whole-genome sequencing to predict predisposition to common diseases in the general population is unknown. To estimate this predictive capacity, we use the concept of a "genometype." A specific genometype represents the genomes in the population conferring a specific level of genetic risk for a specified disease. Using this concept, we estimated the maximum capacity of whole-genome sequencing to identify individuals at clinically significant risk for 24 different diseases. Our estimates were derived from the analysis of large numbers of monozygotic twin pairs; twins of a pair share the same genometype and therefore identical genetic risk factors. Our analyses indicate that (i) for 23 of the 24 diseases, most of the individuals will receive negative test results; (ii) these negative test results will, in general, not be very informative, because the risk of developing 19 of the 24 diseases in those who test negative will still be, at minimum, 50 to 80% of that in the general population; and (iii) on the positive side, in the best-case scenario, more than 90% of tested individuals might be alerted to a clinically significant predisposition to at least one disease. These results have important implications for the valuation of genetic testing by industry, health insurance companies, public policy-makers, and consumers. PMID:22472521

  18. Complete Genome Sequence of Mycobacterium abscessus subsp. bolletii

    PubMed Central

    Spilker, Theodore; LiPuma, John J.

    2016-01-01

    We report the complete genome sequence of a Mycobacterium abscessus subsp. bolletii isolate recovered from a sputum culture from an individual with cystic fibrosis. This sequence is the first completed whole-genome sequence of M. abscessus subsp. bolletii and adds value to studies of M. abscessus complex genomics. PMID:27284156

  19. Complete Genome Sequence of Mycobacterium abscessus subsp. bolletii.

    PubMed

    Caverly, Lindsay J; Spilker, Theodore; LiPuma, John J

    2016-01-01

    We report the complete genome sequence of a Mycobacterium abscessus subsp. bolletii isolate recovered from a sputum culture from an individual with cystic fibrosis. This sequence is the first completed whole-genome sequence of M. abscessus subsp. bolletii and adds value to studies of M. abscessus complex genomics. PMID:27284156

  20. Genome Sequence of the Zoonotic Pathogen Chlamydophila psittaci▿

    PubMed Central

    Seth-Smith, Helena M. B.; Harris, Simon R.; Rance, Richard; West, Anthony P.; Severin, Juliette A.; Ossewaarde, Jacobus M.; Cutcliffe, Lesley T.; Skilton, Rachel J.; Marsh, Pete; Parkhill, Julian; Clarke, Ian N.; Thomson, Nicholas R.

    2011-01-01

    We present the first genome sequence of Chlamydophila psittaci, an intracellular pathogen of birds and a human zoonotic pathogen. A comparison with previously sequenced Chlamydophila genomes shows that, as in other chlamydiae, most of the genome diversity is restricted to the plasticity zone. The C. psittaci plasmid was also sequenced. PMID:21183672

  1. Characterizing genomic alterations in cancer by complementary functional associations

    PubMed Central

    Kim, J. W.; Botvinnik, O. B.; Abudayyeh, O.; Birger, C.; Rosenbluh, J.; Shrestha, Y.; Abazeed, M. E.; Hammerman, P. S.; DiCara, D.; Konieczkowski, D. J.; Johannessen, C. M.; Liberzon, A.; Alizad-Rahvar, A. R.; Alexe, G.; Aguirre, A.; Ghandi, M.; Greulich, H.; Vazquez, F.; Weir, B. A.; Van Allen, E. M.; Tsherniak, A.; Shao, D. D.; Zack, T. I.; Noble, M.; Getz, G.; Beroukhim, R.; Garraway, L. A.; Ardakani, M.; Romualdi, C.; Sales, G.; Barbie, D. A.; Boehm, J. S.; Hahn, W. C.; Mesirov, J. P.; Tamayo, P.

    2016-01-01

    Systematic efforts to sequence the cancer genome have identified large numbers of relevant mutations and copy number alterations in human cancers; however, elucidating their functional consequences, and their interactions to drive or maintain oncogenic states, is still a significant challenge. Here we introduce REVEALER, a computational method that identifies combinations of mutually exclusive genomic alterations correlated with functional phenotypes, such as the activation or gene-dependency of oncogenic pathways or the sensitivity to a drug treatment. We use REVEALER to uncover complementary genomic alterations associated with the transcriptional activation of β-catenin and NRF2, MEK-inhibitor sensitivity, and KRAS dependency. REVEALER successfully identified both known and new associations demonstrating the power of combining functional profiles with extensive characterization of genomic alterations in cancer genomes. PMID:27088724

  2. Reconstruction of ancestral genomic sequences using likelihood.

    PubMed

    Elias, Isaac; Tuller, Tamir

    2007-03-01

    A challenging task in computational biology is the reconstruction of genomic sequences of extinct ancestors, given the phylogenetic tree and the sequences at the leafs. This task is best solved by calculating the most likely estimate of the ancestral sequences, along with the most likely edge lengths. We deal with this problem and also the variant in which the phylogenetic tree in addition to the ancestral sequences need to be estimated. The latter problem is known to be NP-hard, while the computational complexity of the former is unknown. Currently, all algorithms for solving these problems are heuristics without performance guarantees. The biological importance of these problems calls for developing better algorithms with guarantees of finding either optimal or approximate solutions. We develop approximation, fix parameter tractable (FPT), and fast heuristic algorithms for two variants of the problem; when the phylogenetic tree is known and when it is unknown. The approximation algorithm guarantees a solution with a log-likelihood ratio of 2 relative to the optimal solution. The FPT has a running time which is polynomial in the length of the sequences and exponential in the number of taxa. This makes it useful for calculating the optimal solution for small trees. Moreover, we combine the approximation algorithm and the FPT into an algorithm with arbitrary good approximation guarantee (PTAS). We tested our algorithms on both synthetic and biological data. In particular, we used the FPT for computing the most likely ancestral mitochondrial genomes of hominidae (the great apes), thereby answering an interesting biological question. Moreover, we show how the approximation algorithms find good solutions for reconstructing the ancestral genomes for a set of lentiviruses (relatives of HIV). Supplementary material of this work is available at www.nada.kth.se/~isaac/publications/aml/aml.html.

  3. Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine.

    PubMed

    Paolillo, Carmela; Londin, Eric; Fortina, Paolo

    2016-01-01

    Over the past decade, testing the genes of patients and their specific cancer types has become standardized practice in medical oncology since somatic mutations, changes in gene expression and epigenetic modifications are all hallmarks of cancer. However, while cancer genetic assessment has been limited to single biomarkers to guide the use of therapies, improvements in nucleic acid sequencing technologies and implementation of different genome analysis tools have enabled clinicians to detect these genomic alterations and identify functional and disease-associated genomic variants. Next-generation sequencing (NGS) technologies have provided clues about therapeutic targets and genomic markers for novel clinical applications when standard therapy has failed. While Sanger sequencing, an accurate and sensitive approach, allows for the identification of potential novel variants, it is however limited by the single amplicon being interrogated. Similarly, quantitative and qualitative profiling of gene expression changes also represents a challenge for the cancer field. Both RT-PCR and microarrays are efficient approaches, but are limited to the genes present on the array or being assayed. This leaves vast swaths of the transcriptome, including non-coding RNAs and other features, unexplored. With the advent of the ability to collect and analyze genomic sequence data in a timely fashion and at an ever-decreasing cost, many of these limitations have been overcome and are being incorporated into cancer research and diagnostics giving patients and clinicians new hope for targeted and personalized treatment. Below we highlight the various applications of next-generation sequencing in precision cancer medicine. PMID:27542004

  4. Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine.

    PubMed

    Paolillo, Carmela; Londin, Eric; Fortina, Paolo

    2016-01-01

    Over the past decade, testing the genes of patients and their specific cancer types has become standardized practice in medical oncology since somatic mutations, changes in gene expression and epigenetic modifications are all hallmarks of cancer. However, while cancer genetic assessment has been limited to single biomarkers to guide the use of therapies, improvements in nucleic acid sequencing technologies and implementation of different genome analysis tools have enabled clinicians to detect these genomic alterations and identify functional and disease-associated genomic variants. Next-generation sequencing (NGS) technologies have provided clues about therapeutic targets and genomic markers for novel clinical applications when standard therapy has failed. While Sanger sequencing, an accurate and sensitive approach, allows for the identification of potential novel variants, it is however limited by the single amplicon being interrogated. Similarly, quantitative and qualitative profiling of gene expression changes also represents a challenge for the cancer field. Both RT-PCR and microarrays are efficient approaches, but are limited to the genes present on the array or being assayed. This leaves vast swaths of the transcriptome, including non-coding RNAs and other features, unexplored. With the advent of the ability to collect and analyze genomic sequence data in a timely fashion and at an ever-decreasing cost, many of these limitations have been overcome and are being incorporated into cancer research and diagnostics giving patients and clinicians new hope for targeted and personalized treatment. Below we highlight the various applications of next-generation sequencing in precision cancer medicine.

  5. Analysis of gains and losses of DNA sequences along all human chromosomes by comparative genomic hybridization implicates 6q and several other chromosomal sites as putative tumor suppressor gene loci in prostate cancer

    SciTech Connect

    Visakorpi, T.; Karhu, R.; Kallioniemi, A.

    1994-09-01

    Genetic changes associated with the development of prostate cancer are poorly known. We sought to identify regions that contain important genes for the development of prostate cancer by using comparative genomic hybridization (CGH) for genome-wide screening of gains and losses of DNA sequences. In CGH, differentially labeled tumor and normal DNAs are co-hybridized to normal metaphase spreads to visualize chromosomal regions with losses and gains of DNA sequences. Analysis of 31 uncultured primary prostate cancers showed that deletions predominated over gains with a ratio of 5:1. The most commonly deleted regions were 8p; 32% (minimal common region p12-pter), 13q; 32% (q21-q31), 6q; 22% (cen-q21), 16q; 19% (cen-q23), 18q; 19% (q22-qter) and 9p; 16%(p23-pter). Gain of the entire long arm of chromosome 8 was found in 6% of cases but no high-level amplifications were found in any of the specimens. Of the aberrations found by CGH, 6q represents a previously unreported, major site for deletion in prostate cancer. Analysis of loss of heterozygosity (LOH) was used to confirm the presence of 6q and other deletions found by CGH. LOH and CGH data showed an about 75% concordance. The significance of genetic aberrations in prostate cancer are being evaluated by correlating CGH findings with clinical outcome as well as by comparing genetic changes observed in the primary tumor with those found in recurrent lesions and metastases of the same patient.

  6. Transforming clinical microbiology with bacterial genome sequencing

    PubMed Central

    2016-01-01

    Whole genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here we review the current status of clinical microbiology and how it has already begun to be transformed by the use of next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. The application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow. PMID:22868263

  7. Transforming clinical microbiology with bacterial genome sequencing.

    PubMed

    Didelot, Xavier; Bowden, Rory; Wilson, Daniel J; Peto, Tim E A; Crook, Derrick W

    2012-09-01

    Whole-genome sequencing of bacteria has recently emerged as a cost-effective and convenient approach for addressing many microbiological questions. Here, we review the current status of clinical microbiology and how it has already begun to be transformed by using next-generation sequencing. We focus on three essential tasks: identifying the species of an isolate, testing its properties, such as resistance to antibiotics and virulence, and monitoring the emergence and spread of bacterial pathogens. We predict that the application of next-generation sequencing will soon be sufficiently fast, accurate and cheap to be used in routine clinical microbiology practice, where it could replace many complex current techniques with a single, more efficient workflow.

  8. Recurrent Somatic Mutations in Regulatory Regions of Human Cancer Genomes

    PubMed Central

    Melton, Collin; Reuter, Jason A.; Spacek, Damek V.; Snyder, Michael

    2015-01-01

    Aberrant regulation of gene expression in cancer can promote survival and proliferation of cancer cells. Here we integrate TCGA whole genome sequencing data of 436 patients from eight cancer subtypes with ENCODE and other regulatory annotations to identify point mutations in regulatory regions. We find evidence for positive selection of mutations in transcription factor binding sites, consistent with these sites regulating important cancer cell functions. Using a novel method that adjusts for sample- and genomic locus-specific mutation rate, we identify recurrently mutated sites across cancer patients. Mutated regulatory sites include known sites in the TERT promoter and many novel sites, including a subset in proximity to cancer genes. In reporter assays, two novel sites display decreased enhancer activity upon mutation. These data demonstrate that many regulatory regions contain mutations under selective pressure and suggest a larger role for regulatory mutations in cancer than previously appreciated. PMID:26053494

  9. Cancer Genomics: Diversity and Disparity Across Ethnicity and Geography.

    PubMed

    Tan, Daniel S W; Mok, Tony S K; Rebbeck, Timothy R

    2016-01-01

    Ethnic and geographic differences in cancer incidence, prognosis, and treatment outcomes can be attributed to diversity in the inherited (germline) and somatic genome. Although international large-scale sequencing efforts are beginning to unravel the genomic underpinnings of cancer traits, much remains to be known about the underlying mechanisms and determinants of genomic diversity. Carcinogenesis is a dynamic, complex phenomenon representing the interplay between genetic and environmental factors that results in divergent phenotypes across ethnicities and geography. For example, compared with whites, there is a higher incidence of prostate cancer among Africans and African Americans, and the disease is generally more aggressive and fatal. Genome-wide association studies have identified germline susceptibility loci that may account for differences between the African and non-African patients, but the lack of availability of appropriate cohorts for replication studies and the incomplete understanding of genomic architecture across populations pose major limitations. We further discuss the transformative potential of routine diagnostic evaluation for actionable somatic alterations, using lung cancer as an example, highlighting implications of population disparities, current hurdles in implementation, and the far-reaching potential of clinical genomics in enhancing cancer prevention, diagnosis, and treatment. As we enter the era of precision cancer medicine, a concerted multinational effort is key to addressing population and genomic diversity as well as overcoming barriers and geographical disparities in research and health care delivery.

  10. The UCSC Cancer Genomics Browser: update 2011.

    PubMed

    Sanborn, J Zachary; Benz, Stephen C; Craft, Brian; Szeto, Christopher; Kober, Kord M; Meyer, Laurence; Vaske, Charles J; Goldman, Mary; Smith, Kayla E; Kuhn, Robert M; Karolchik, Donna; Kent, W James; Stuart, Joshua M; Haussler, David; Zhu, Jingchun

    2011-01-01

    The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu) comprises a suite of web-based tools to integrate, visualize and analyze cancer genomics and clinical data. The browser displays whole-genome views of genome-wide experimental measurements for multiple samples alongside their associated clinical information. Multiple data sets can be viewed simultaneously as coordinated 'heatmap tracks' to compare across studies or different data modalities. Users can order, filter, aggregate, classify and display data interactively based on any given feature set including clinical features, annotated biological pathways and user-contributed collections of genes. Integrated standard statistical tools provide dynamic quantitative analysis within all available data sets. The browser hosts a growing body of publicly available cancer genomics data from a variety of cancer types, including data generated from the Cancer Genome Atlas project. Multiple consortiums use the browser on confidential prepublication data enabled by private installations. Many new features have been added, including the hgMicroscope tumor image viewer, hgSignature for real-time genomic signature evaluation on any browser track, and 'PARADIGM' pathway tracks to display integrative pathway activities. The browser is integrated with the UCSC Genome Browser; thus inheriting and integrating the Genome Browser's rich set of human biology and genetics data that enhances the interpretability of the cancer genomics data.

  11. Management of familial cancer: sequencing, surveillance and society.

    PubMed

    Samuel, Nardin; Villani, Anita; Fernandez, Conrad V; Malkin, David

    2014-12-01

    The clinical management of familial cancer begins with recognition of patterns of cancer occurrence suggestive of genetic susceptibility in a proband or pedigree, to enable subsequent investigation of the underlying DNA mutations. In this regard, next-generation sequencing of DNA continues to transform cancer diagnostics, by enabling screening for cancer-susceptibility genes in the context of known and emerging familial cancer syndromes. Increasingly, not only are candidate cancer genes sequenced, but also entire 'healthy' genomes are mapped in children with cancer and their family members. Although large-scale genomic analysis is considered intrinsic to the success of cancer research and discovery, a number of accompanying ethical and technical issues must be addressed before this approach can be adopted widely in personalized therapy. In this Perspectives article, we describe our views on how the emergence of new sequencing technologies and cancer surveillance strategies is altering the framework for the clinical management of hereditary cancer. Genetic counselling and disclosure issues are discussed, and strategies for approaching ethical dilemmas are proposed.

  12. Why Assembling Plant Genome Sequences Is So Challenging

    PubMed Central

    Claros, Manuel Gonzalo; Bautista, Rocío; Guerrero-Fernández, Darío; Benzerki, Hicham; Seoane, Pedro; Fernández-Pozo, Noé

    2012-01-01

    In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. PMID:24832233

  13. Functional genomics of tomato in a post-genome-sequencing phase

    PubMed Central

    Aoki, Koh; Ogata, Yoshiyuki; Igarashi, Kaori; Yano, Kentaro; Nagasaki, Hideki; Kaminuma, Eli; Toyoda, Atsushi

    2013-01-01

    Completion of tomato genome sequencing project has broad impacts on genetic and genomic studies of tomato and Solanaceae plants. The reference genome sequence derived from Solanum lycopersicum cv ‘Heinz 1706’ serves as the firm basis for sequencing-based approaches to tomato genomics. In this article, we first present a brief summary of the genome sequencing project and a summary of the reference genome sequence. We then focus on recent progress in transcriptome sequencing and small RNA sequencing and show how the reference genome sequence makes these analyses more comprehensive than before. We discuss the potential of in-depth analysis that is based on DNA methylome sequencing and transcription start-site detection. Finally, we describe the current status of efforts to resequence S. lycopersicum cultivars to demonstrate how resequencing can allow the use of intraspecific genomic diversity for detailed phenotyping and breeding. PMID:23641177

  14. Revealing the Complexity of Breast Cancer by Next Generation Sequencing

    PubMed Central

    Verigos, John; Magklara, Angeliki

    2015-01-01

    Over the last few years the increasing usage of “-omic” platforms, supported by next-generation sequencing, in the analysis of breast cancer samples has tremendously advanced our understanding of the disease. New driver and passenger mutations, rare chromosomal rearrangements and other genomic aberrations identified by whole genome and exome sequencing are providing missing pieces of the genomic architecture of breast cancer. High resolution maps of breast cancer methylomes and sequencing of the miRNA microworld are beginning to paint the epigenomic landscape of the disease. Transcriptomic profiling is giving us a glimpse into the gene regulatory networks that govern the fate of the breast cancer cell. At the same time, integrative analysis of sequencing data confirms an extensive intertumor and intratumor heterogeneity and plasticity in breast cancer arguing for a new approach to the problem. In this review, we report on the latest findings on the molecular characterization of breast cancer using NGS technologies, and we discuss their potential implications for the improvement of existing therapies. PMID:26561834

  15. Simple sequence repeats in bryophyte mitochondrial genomes.

    PubMed

    Zhao, Chao-Xian; Zhu, Rui-Liang; Liu, Yang

    2016-01-01

    Simple sequence repeats (SSRs) are thought to be common in plant mitochondrial (mt) genomes, but have yet to be fully described for bryophytes. We screened the mt genomes of two liverworts (Marchantia polymorpha and Pleurozia purpurea), two mosses (Physcomitrella patens and Anomodon rugelii) and two hornworts (Phaeoceros laevis and Nothoceros aenigmaticus), and detected 475 SSRs. Some SSRs are found conserved during the evolution, among which except one exists in both liverworts and mosses, all others are shared only by the two liverworts, mosses or hornworts. SSRs are known as DNA tracts having high mutation rates; however, according to our observations, they still can evolve slowly. The conservativeness of these SSRs suggests that they are under strong selection and could play critical roles in maintaining the gene functions.

  16. Complete mitochondrial genome sequence of Nectogale elegans.

    PubMed

    Huang, Ting; Yan, Chaochao; Tan, Zheng; Tu, Feiyun; Yue, Bisong; Zhang, Xiuyue

    2014-08-01

    The elegant water shrew (Nectogale elegans) belongs to the family Soricidae, and distributes in northern South Asia, central and southern China and northern Southeast Asia. In this study, the complete mitochondrial genome of N. elegans was sequenced. It was determined to be 17,460 bases, and included 13 protein-coding genes (PCGs), 22 tRNA genes, 2 ribosomal RNA genes and one non-coding region, which is similar to other mammalian mitochondrial genomes. Bayesian inference and maximum likelihood methods were used to construct phylogenetic trees based on 12 heavy-strand concatenated PCGs. Phylogenetic analyses further confirmed that Crocidurinae diverged prior to Soricinae, and Sorex unguiculatus differentiated earlier than N. elegans.

  17. MBD-isolated Genome Sequencing provides a high-throughput and comprehensive survey of DNA methylation in the human genome.

    PubMed

    Serre, David; Lee, Byron H; Ting, Angela H

    2010-01-01

    DNA methylation is an epigenetic modification involved in both normal developmental processes and disease states through the modulation of gene expression and the maintenance of genomic organization. Conventional methods of DNA methylation analysis, such as bisulfite sequencing, methylation sensitive restriction enzyme digestion and array-based detection techniques, have major limitations that impede high-throughput genome-wide analysis. We describe a novel technique, MBD-isolated Genome Sequencing (MiGS), which combines precipitation of methylated DNA by recombinant methyl-CpG binding domain of MBD2 protein and sequencing of the isolated DNA by a massively parallel sequencer. We utilized MiGS to study three isogenic cancer cell lines with varying degrees of DNA methylation. We successfully detected previously known methylated regions in these cells and identified hundreds of novel methylated regions. This technique is highly specific and sensitive and can be applied to any biological settings to identify differentially methylated regions at the genomic scale.

  18. Investigating core genetic-and-epigenetic cell cycle networks for stemness and carcinogenic mechanisms, and cancer drug design using big database mining and genome-wide next-generation sequencing data

    PubMed Central

    Li, Cheng-Wei; Chen, Bor-Sen

    2016-01-01

    ABSTRACT Recent studies have demonstrated that cell cycle plays a central role in development and carcinogenesis. Thus, the use of big databases and genome-wide high-throughput data to unravel the genetic and epigenetic mechanisms underlying cell cycle progression in stem cells and cancer cells is a matter of considerable interest. Real genetic-and-epigenetic cell cycle networks (GECNs) of embryonic stem cells (ESCs) and HeLa cancer cells were constructed by applying system modeling, system identification, and big database mining to genome-wide next-generation sequencing data. Real GECNs were then reduced to core GECNs of HeLa cells and ESCs by applying principal genome-wide network projection. In this study, we investigated potential carcinogenic and stemness mechanisms for systems cancer drug design by identifying common core and specific GECNs between HeLa cells and ESCs. Integrating drug database information with the specific GECNs of HeLa cells could lead to identification of multiple drugs for cervical cancer treatment with minimal side-effects on the genes in the common core. We found that dysregulation of miR-29C, miR-34A, miR-98, and miR-215; and methylation of ANKRD1, ARID5B, CDCA2, PIF1, STAMBPL1, TROAP, ZNF165, and HIST1H2AJ in HeLa cells could result in cell proliferation and anti-apoptosis through NFκB, TGF-β, and PI3K pathways. We also identified 3 drugs, methotrexate, quercetin, and mimosine, which repressed the activated cell cycle genes, ARID5B, STK17B, and CCL2, in HeLa cells with minimal side-effects. PMID:27295129

  19. Identifying Human Genome-Wide CNV, LOH and UPD by Targeted Sequencing of Selected Regions.

    PubMed

    Wang, Yu; Li, Wei; Xia, Yingying; Wang, Chongzhi; Tang, Y Tom; Guo, Wenying; Li, Jinliang; Zhao, Xia; Sun, Yepeng; Hu, Juan; Zhen, Hefu; Zhang, Xiandong; Chen, Chao; Shi, Yujian; Li, Lin; Cao, Hongzhi; Du, Hongli; Li, Jian

    2014-01-01

    Copy-number variations (CNV), loss of heterozygosity (LOH), and uniparental disomy (UPD) are large genomic aberrations leading to many common inherited diseases, cancers, and other complex diseases. An integrated tool to identify these aberrations is essential in understanding diseases and in designing clinical interventions. Previous discovery methods based on whole-genome sequencing (WGS) require very high depth of coverage on the whole genome scale, and are cost-wise inefficient. Another approach, whole exome genome sequencing (WEGS), is limited to discovering variations within exons. Thus, we are lacking efficient methods to detect genomic aberrations on the whole genome scale using next-generation sequencing technology. Here we present a method to identify genome-wide CNV, LOH and UPD for the human genome via selectively sequencing a small portion of genome termed Selected Target Regions (SeTRs). In our experiments, the SeTRs are covered by 99.73%~99.95% with sufficient depth. Our developed bioinformatics pipeline calls genome-wide CNVs with high confidence, revealing 8 credible events of LOH and 3 UPD events larger than 5M from 15 individual samples. We demonstrate that genome-wide CNV, LOH and UPD can be detected using a cost-effective SeTRs sequencing approach, and that LOH and UPD can be identified using just a sample grouping technique, without using a matched sample or familial information. PMID:25919136

  20. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  1. Next-Generation Sequencing: Role in Gynecologic Cancers.

    PubMed

    Evans, Tarra; Matulonis, Ursula

    2016-09-01

    Next-generation sequencing (NGS) has risen to the forefront of tumor analysis and has enabled unprecedented advances in the molecular profiling of solid tumors. Through massively parallel sequencing, previously unrecognized genomic alterations have been unveiled in many malignancies, including gynecologic cancers, thus expanding the potential repertoire for the use of targeted therapies. NGS has expanded the understanding of the genomic foundation of gynecologic malignancies and has allowed identification of germline and somatic mutations associated with cancer development, enabled tumor reclassification, and helped determine mechanisms of treatment resistance. NGS has also facilitated rationale therapeutic strategies based on actionable molecular aberrations. However, issues remain regarding cost and clinical utility. This review covers NGS analysis of and its impact thus far on gynecologic cancers, specifically ovarian, endometrial, cervical, and vulvar cancers. PMID:27587626

  2. Painting the rye genome with genome-specific sequences.

    PubMed

    González-García, Miriam; Cuacos, María; González-Sánchez, Mónica; Puertas, María J; Vega, Juan M

    2011-07-01

    We used rye-specific repetitive DNA sequences in fluorescence in situ hybridization (FISH) to paint the rye genome and to identify rye DNA in a wheat background. A 592 bp fragment from the rye-specific dispersed repetitive family R173 (named UCM600) was cloned and used as a FISH probe. UCM600 is dispersed over the seven rye chromosomes, being absent from the pericentromeric and subtelomeric regions. A similar pattern of distribution was also observed on the rye B chromosomes, but with weaker signals. The FISH hybridization patterns using UCM600 as probe were comparable with those obtained with the genomic in situ hybridization (GISH) procedure. There were, however, sharper signals and less background with FISH. UCM600 was combined with the rye-specific sequences Bilby and pSc200 to obtain a more complete painting. With these probes, the rye chromosomes were labeled with distinctive patterns; thus, allowing the rye cultivar 'Imperial' to be karyotyped. It was also possible to distinguish rye chromosomes in triticale and alien rye chromatin in wheat-rye addition and translocation lines. The distribution of UCM600 was similar in cultivated rye and in the wild Secale species Secale vavilovii Grossh., Secale sylvestre Host, and Secale africanum Stapf. Thus, UCM600 can be used to detect Secale DNA introgressed from wild species in a wheat background.

  3. First Draft Genome Sequence of Staphylococcus condimenti F-2T

    PubMed Central

    Zheng, Beiwen; Hu, Xinjun; Jiang, Xiawei; Li, Ang; Yao, Jian

    2016-01-01

    This report describes the draft genome sequence of S. condimenti strain F-2T (DSM 11674), a potential starter culture. The genome assembly comprised 2,616,174 bp with 34.6% GC content. To the best of our knowledge, this is the first documentation that reports the whole-genome sequence of S. condimenti. PMID:27257207

  4. Genome sequence of the Pea Aphid Acyrthosiphon pisum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The International aphid genome consortium, IAGC, herein presents the 464 Mb draft genome assembly sequence of the pea aphid Acyrthosiphon pisum. This is the first published whole genome sequence from the diverse assemblage of hemimetabolous insects, providing an outgroup to the multiple published g...

  5. Draft Genome Sequence of the Fungus Trametes hirsuta 072

    PubMed Central

    Tyazhelova, Tatiana V.; Moiseenko, Konstantin V.; Vasina, Daria V.; Mosunova, Olga V.; Fedorova, Tatiana V.; Maloshenok, Lilya G.; Landesman, Elena O.; Bruskin, Sergei A.; Psurtseva, Nadezhda V.; Slesarev, Alexei I.; Kozyavkin, Sergei A.; Koroleva, Olga V.

    2015-01-01

    A standard draft genome sequence of the white rot saprotrophic fungus Trametes hirsuta 072 (Basidiomycota, Polyporales) is presented. The genome sequence contains about 33.6 Mb assembled in 141 scaffolds with a G+C content of ~57.6%. The draft genome annotation predicts 14,598 putative protein-coding open reading frames (ORFs). PMID:26586872

  6. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  7. Detecting long tandem duplications in genomic sequences

    PubMed Central

    2012-01-01

    Background Detecting duplication segments within completely sequenced genomes provides valuable information to address genome evolution and in particular the important question of the emergence of novel functions. The usual approach to gene duplication detection, based on all-pairs protein gene comparisons, provides only a restricted view of duplication. Results In this paper, we introduce ReD Tandem, a software using a flow based chaining algorithm targeted at detecting tandem duplication arrays of moderate to longer length regions, with possibly locally weak similarities, directly at the DNA level. On the A. thaliana genome, using a reference set of tandem duplicated genes built using TAIR,a we show that ReD Tandem is able to predict a large fraction of recently duplicated genes (dS < 1) and that it is also able to predict tandem duplications involving non coding elements such as pseudo-genes or RNA genes. Conclusions ReD Tandem allows to identify large tandem duplications without any annotation, leading to agnostic identification of tandem duplications. This approach nicely complements the usual protein gene based which ignores duplications involving non coding regions. It is however inherently restricted to relatively recent duplications. By recovering otherwise ignored events, ReD Tandem gives a more comprehensive view of existing evolutionary processes and may also allow to improve existing annotations. PMID:22568762

  8. Genomic Sequence Comparisons, 1987-2003 Final Report

    SciTech Connect

    George M. Church

    2004-07-29

    This project was to develop new DNA sequencing and RNA and protein quantitation methods and related genome annotation tools. The project began in 1987 with the development of multiplex sequencing (published in Science in 1988), and one of the first automated sequencing methods. This lead to the first commercial genome sequence in 1994 and to the establishment of the main commercial participants (GTC then Agencourt) in the public DOE/NIH genome project. In collaboration with GTC we contributed to one of the first complete DOE genome sequences, in 1997, that of Methanobacterium thermoautotropicum, a species of great relevance to energy-rich gas production.

  9. Complete genome sequence of Methanocorpusculum labreanum type strain Z

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanocorpusculum labreanum is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain Z was isolated from surface sediments of Tar Pit Lake in the La Brea Tar Pits in Los Angeles, California. M. labreanum is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. labreanum type strain Z and its annotation. This is part of a 2006 Joint Genome Institute Community Sequencing Program project to sequence genomes of diverse Archaea.

  10. Complete genome sequence of Methanoculleus marisnigri type strain JR1

    SciTech Connect

    Anderson, Iain; Sieprawska-Lupa, Magdalena; Goltsman, Eugene; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Tice, Hope; Dalin, Eileen; Barry, Kerrie; Saunders, Elizabeth H; Han, Cliff; Brettin, Tom; Detter, J. Chris; Bruce, David; Mikhailova, Natalia; Pitluck, Sam; Hauser, Loren John; Land, Miriam L; Lucas, Susan; Richardson, P M; Whitman, W. B.; Kyrpides, Nikos C

    2009-01-01

    Methanoculleus marisnigri Romesser et al. 1981 is a methanogen belonging to the order Methanomicrobiales within the archaeal phylum Euryarchaeota. The type strain, JR1, was isolated from anoxic sediments of the Black Sea. M. marisnigri is of phylogenetic interest because at the time the sequencing project began only one genome had previously been sequenced from the order Methanomicrobiales. We report here the complete genome sequence of M. marisnigri type strain JR1 and its annotation. This is part of a Joint Genome Institute 2006 Community Sequencing Program to sequence genomes of diverse Archaea.

  11. Rapid whole genome sequencing and precision neonatology.

    PubMed

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care.

  12. Rapid whole genome sequencing and precision neonatology.

    PubMed

    Petrikin, Joshua E; Willig, Laurel K; Smith, Laurie D; Kingsmore, Stephen F

    2015-12-01

    Traditionally, genetic testing has been too slow or perceived to be impractical to initial management of the critically ill neonate. Technological advances have led to the ability to sequence and interpret the entire genome of a neonate in as little as 26 h. As the cost and speed of testing decreases, the utility of whole genome sequencing (WGS) of neonates for acute and latent genetic illness increases. Analyzing the entire genome allows for concomitant evaluation of the currently identified 5588 single gene diseases. When applied to a select population of ill infants in a level IV neonatal intensive care unit, WGS yielded a diagnosis of a causative genetic disease in 57% of patients. These diagnoses may lead to clinical management changes ranging from transition to palliative care for uniformly lethal conditions for alteration or initiation of medical or surgical therapy to improve outcomes in others. Thus, institution of 2-day WGS at time of acute presentation opens the possibility of early implementation of precision medicine. This implementation may create opportunities for early interventional, frequently novel or off-label therapies that may alter disease trajectory in infants with what would otherwise be fatal disease. Widespread deployment of rapid WGS and precision medicine will raise ethical issues pertaining to interpretation of variants of unknown significance, discovery of incidental findings related to adult onset conditions and carrier status, and implementation of medical therapies for which little is known in terms of risks and benefits. Despite these challenges, precision neonatology has significant potential both to decrease infant mortality related to genetic diseases with onset in newborns and to facilitate parental decision making regarding transition to palliative care. PMID:26521050

  13. Weighted gene co-expression network analysis of colorectal cancer liver metastasis genome sequencing data and screening of anti-metastasis drugs.

    PubMed

    Gao, Bo; Shao, Qin; Choudhry, Hani; Marcus, Victoria; Dong, Kung; Ragoussis, Jiannis; Gao, Zu-Hua

    2016-09-01

    Approximately 9% of cancer-related deaths are caused by colorectal cancer (CRC). CRC patients are prone to liver metastasis, which is the most important cause for the high CRC mortality rate. Understanding the molecular mechanism of CRC liver metastasis could help us to find novel targets for the effective treatment of this deadly disease. Using weighted gene co-expression network analysis on the sequencing data of CRC with and with metastasis, we identified 5 colorectal cancer liver metastasis related modules which were labeled as brown, blue, grey, yellow and turquoise. In the brown module, which represents the metastatic tumor in the liver, gene ontology (GO) analysis revealed functions including the G-protein coupled receptor protein signaling pathway, epithelial cell differentiation and cell surface receptor linked signal transduction. In the blue module, which represents the primary CRC that has metastasized, GO analysis showed that the genes were mainly enriched in GO terms including G-protein coupled receptor protein signaling pathway, cell surface receptor linked signal transduction, and negative regulation of cell differentiation. In the yellow and turquoise modules, which represent the primary non-metastatic CRC, 13 downregulated CRC liver metastasis-related candidate miRNAs were identified (e.g. hsa-miR-204, hsa-miR-455, etc.). Furthermore, analyzing the DrugBank database and mining the literature identified 25 and 12 candidate drugs that could potentially block the metastatic processes of the primary tumor and inhibit the progression of metastatic tumors in the liver, respectively. Data generated from this study not only furthers our understanding of the genetic alterations that drive the metastatic process, but also guides the development of molecular-targeted therapy of colorectal cancer liver metastasis. PMID:27571956

  14. Weighted gene co-expression network analysis of colorectal cancer liver metastasis genome sequencing data and screening of anti-metastasis drugs.

    PubMed

    Gao, Bo; Shao, Qin; Choudhry, Hani; Marcus, Victoria; Dong, Kung; Ragoussis, Jiannis; Gao, Zu-Hua

    2016-09-01

    Approximately 9% of cancer-related deaths are caused by colorectal cancer (CRC). CRC patients are prone to liver metastasis, which is the most important cause for the high CRC mortality rate. Understanding the molecular mechanism of CRC liver metastasis could help us to find novel targets for the effective treatment of this deadly disease. Using weighted gene co-expression network analysis on the sequencing data of CRC with and with metastasis, we identified 5 colorectal cancer liver metastasis related modules which were labeled as brown, blue, grey, yellow and turquoise. In the brown module, which represents the metastatic tumor in the liver, gene ontology (GO) analysis revealed functions including the G-protein coupled receptor protein signaling pathway, epithelial cell differentiation and cell surface receptor linked signal transduction. In the blue module, which represents the primary CRC that has metastasized, GO analysis showed that the genes were mainly enriched in GO terms including G-protein coupled receptor protein signaling pathway, cell surface receptor linked signal transduction, and negative regulation of cell differentiation. In the yellow and turquoise modules, which represent the primary non-metastatic CRC, 13 downregulated CRC liver metastasis-related candidate miRNAs were identified (e.g. hsa-miR-204, hsa-miR-455, etc.). Furthermore, analyzing the DrugBank database and mining the literature identified 25 and 12 candidate drugs that could potentially block the metastatic processes of the primary tumor and inhibit the progression of metastatic tumors in the liver, respectively. Data generated from this study not only furthers our understanding of the genetic alterations that drive the metastatic process, but also guides the development of molecular-targeted therapy of colorectal cancer liver metastasis.

  15. Sequence analysis of mutations and translocations across breast cancer subtypes

    PubMed Central

    Banerji, Shantanu; Cibulskis, Kristian; Rangel-Escareno, Claudia; Brown, Kristin K.; Carter, Scott L.; Frederick, Abbie M.; Lawrence, Michael S.; Sivachenko, Andrey Y.; Sougnez, Carrie; Zou, Lihua; Cortes, Maria L.; Fernandez-Lopez, Juan C.; Peng, Shouyong; Ardlie, Kristin G.; Auclair, Daniel; Bautista-Piña, Veronica; Duke, Fujiko; Francis, Joshua; Jung, Joonil; Maffuz-Aziz, Antonio; Onofrio, Robert C.; Parkin, Melissa; Pho, Nam H.; Quintanar-Jurado, Valeria; Ramos, Alex H.; Rebollar-Vega, Rosa; Rodriguez-Cuevas, Sergio; Romero-Cordoba, Sandra L.; Schumacher, Steven E.; Stransky, Nicolas; Thompson, Kristin M.; Uribe-Figueroa, Laura; Baselga, Jose; Beroukhim, Rameen; Polyak, Kornelia; Sgroi, Dennis C.; Richardson, Andrea L.; Jimenez-Sanchez, Gerardo; Lander, Eric S.; Gabriel, Stacey B.; Garraway, Levi A.; Golub, Todd R.; Melendez-Zajgla, Jorge; Toker, Alex; Getz, Gad; Hidalgo-Miranda, Alfredo; Meyerson, Matthew

    2014-01-01

    Breast carcinoma is the leading cause of cancer-related mortality in women worldwide with an estimated 1.38 million new cases and 458,000 deaths in 2008 alone1. This malignancy represents a heterogeneous group of tumours with characteristic molecular features, prognosis, and responses to available therapy2–4. Recurrent somatic alterations in breast cancer have been described including mutations and copy number alterations, notably ERBB2 amplifications, the first successful therapy target defined by a genomic aberration5. Prior DNA sequencing studies of breast cancer genomes have revealed additional candidate mutations and gene rearrangements 6–10. Here we report the whole-exome sequences of DNA from 103 human breast cancers of diverse subtypes from patients in Mexico and Vietnam compared to matched-normal DNA, together with whole-genome sequences of 22 breast cancer/normal pairs. Beyond confirming recurrent somatic mutations in PIK3CA11, TP536, AKT112, GATA313, and MAP3K110, we discovered recurrent mutations in the CBFB transcription factor gene and deletions of its partner RUNX1. Furthermore, we have identified a recurrent MAGI3-AKT3 fusion enriched in triple-negative breast cancer lacking estrogen and progesterone receptors and ERBB2 expression. The Magi3-Akt3 fusion leads to constitutive activation of Akt kinase, which is abolished by treatment with an ATP-competitive Akt small-molecule inhibitor. PMID:22722202

  16. Whole Genome Sequencing of Newly Established Pancreatic Cancer Lines Identifies Novel Somatic Mutation (c.2587G>A) in Axon Guidance Receptor Plexin A1 as Enhancer of Proliferation and Invasion

    PubMed Central

    Abisoye-Ogunniyan, Abisola; Waterfall, Joshua J.; Davis, Sean; Killian, J. Keith; Pineda, Marbin; Ray, Satyajit; McCord, Matt R.; Pflicke, Holger; Burkett, Sandra Sczerba; Meltzer, Paul S.; Rudloff, Udo

    2016-01-01

    The genetic profile of human pancreatic cancers harbors considerable heterogeneity, which suggests a possible explanation for the pronounced inefficacy of single therapies in this disease. This observation has led to a belief that custom therapies based on individual tumor profiles are necessary to more effectively treat pancreatic cancer. It has recently been discovered that axon guidance genes are affected by somatic structural variants in up to 25% of human pancreatic cancers. Thus far, however, some of these mutations have only been correlated to survival probability and no function has been assigned to these observed axon guidance gene mutations in pancreatic cancer. In this study we established three novel pancreatic cancer cell lines and performed whole genome sequencing to discover novel mutations in axon guidance genes that may contribute to the cancer phenotype of these cells. We discovered, among other novel somatic variants in axon guidance pathway genes, a novel mutation in the PLXNA1 receptor (c.2587G>A) in newly established cell line SB.06 that mediates oncogenic cues of increased invasion and proliferation in SB.06 cells and increased invasion in 293T cells upon stimulation with the receptor’s natural ligand semaphorin 3A compared to wild type PLXNA1 cells. Mutant PLXNA1 signaling was associated with increased Rho-GTPase and p42/p44 MAPK signaling activity and cytoskeletal expansion, but not changes in E-cadherin, vimentin, or metalloproteinase 9 expression levels. Pharmacologic inhibition of the Rho-GTPase family member CDC42 selectively abrogated PLXNA1 c.2587G>A-mediated increased invasion. These findings provide in-vitro confirmation that somatic mutations in axon guidance genes can provide oncogenic gain-of-function signals and may contribute to pancreatic cancer progression. PMID:26962861

  17. Whole Genome Sequencing of Newly Established Pancreatic Cancer Lines Identifies Novel Somatic Mutation (c.2587G>A) in Axon Guidance Receptor Plexin A1 as Enhancer of Proliferation and Invasion.

    PubMed

    Sorber, Rebecca; Teper, Yaroslav; Abisoye-Ogunniyan, Abisola; Waterfall, Joshua J; Davis, Sean; Killian, J Keith; Pineda, Marbin; Ray, Satyajit; McCord, Matt R; Pflicke, Holger; Burkett, Sandra Sczerba; Meltzer, Paul S; Rudloff, Udo

    2016-01-01

    The genetic profile of human pancreatic cancers harbors considerable heterogeneity, which suggests a possible explanation for the pronounced inefficacy of single therapies in this disease. This observation has led to a belief that custom therapies based on individual tumor profiles are necessary to more effectively treat pancreatic cancer. It has recently been discovered that axon guidance genes are affected by somatic structural variants in up to 25% of human pancreatic cancers. Thus far, however, some of these mutations have only been correlated to survival probability and no function has been assigned to these observed axon guidance gene mutations in pancreatic cancer. In this study we established three novel pancreatic cancer cell lines and performed whole genome sequencing to discover novel mutations in axon guidance genes that may contribute to the cancer phenotype of these cells. We discovered, among other novel somatic variants in axon guidance pathway genes, a novel mutation in the PLXNA1 receptor (c.2587G>A) in newly established cell line SB.06 that mediates oncogenic cues of increased invasion and proliferation in SB.06 cells and increased invasion in 293T cells upon stimulation with the receptor's natural ligand semaphorin 3A compared to wild type PLXNA1 cells. Mutant PLXNA1 signaling was associated with increased Rho-GTPase and p42/p44 MAPK signaling activity and cytoskeletal expansion, but not changes in E-cadherin, vimentin, or metalloproteinase 9 expression levels. Pharmacologic inhibition of the Rho-GTPase family member CDC42 selectively abrogated PLXNA1 c.2587G>A-mediated increased invasion. These findings provide in-vitro confirmation that somatic mutations in axon guidance genes can provide oncogenic gain-of-function signals and may contribute to pancreatic cancer progression. PMID:26962861

  18. Whole Genome Sequencing of Newly Established Pancreatic Cancer Lines Identifies Novel Somatic Mutation (c.2587G>A) in Axon Guidance Receptor Plexin A1 as Enhancer of Proliferation and Invasion.

    PubMed

    Sorber, Rebecca; Teper, Yaroslav; Abisoye-Ogunniyan, Abisola; Waterfall, Joshua J; Davis, Sean; Killian, J Keith; Pineda, Marbin; Ray, Satyajit; McCord, Matt R; Pflicke, Holger; Burkett, Sandra Sczerba; Meltzer, Paul S; Rudloff, Udo

    2016-01-01

    The genetic profile of human pancreatic cancers harbors considerable heterogeneity, which suggests a possible explanation for the pronounced inefficacy of single therapies in this disease. This observation has led to a belief that custom therapies based on individual tumor profiles are necessary to more effectively treat pancreatic cancer. It has recently been discovered that axon guidance genes are affected by somatic structural variants in up to 25% of human pancreatic cancers. Thus far, however, some of these mutations have only been correlated to survival probability and no function has been assigned to these observed axon guidance gene mutations in pancreatic cancer. In this study we established three novel pancreatic cancer cell lines and performed whole genome sequencing to discover novel mutations in axon guidance genes that may contribute to the cancer phenotype of these cells. We discovered, among other novel somatic variants in axon guidance pathway genes, a novel mutation in the PLXNA1 receptor (c.2587G>A) in newly established cell line SB.06 that mediates oncogenic cues of increased invasion and proliferation in SB.06 cells and increased invasion in 293T cells upon stimulation with the receptor's natural ligand semaphorin 3A compared to wild type PLXNA1 cells. Mutant PLXNA1 signaling was associated with increased Rho-GTPase and p42/p44 MAPK signaling activity and cytoskeletal expansion, but not changes in E-cadherin, vimentin, or metalloproteinase 9 expression levels. Pharmacologic inhibition of the Rho-GTPase family member CDC42 selectively abrogated PLXNA1 c.2587G>A-mediated increased invasion. These findings provide in-vitro confirmation that somatic mutations in axon guidance genes can provide oncogenic gain-of-function signals and may contribute to pancreatic cancer progression.

  19. Complete Genomic Sequence of Duck Flavivirus from China

    PubMed Central

    Liu, Ming; Liu, Chunguo; Li, Gang; Li, Xiaojun; Yin, Xiuchen; Chen, Yuhuan

    2012-01-01

    We report here the complete genomic sequence of the Chinese duck flavivirus TA strain. This work is the first to document the complete genomic sequence of this previously unknown duck flavivirus strain. The sequence will help further relevant epidemiological studies and extend our general knowledge of flaviviruses. PMID:22354941

  20. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo.

    PubMed

    Lumley, Sarah; Horton, Daniel L; Marston, Denise A; Johnson, Nicholas; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2016-04-14

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates.

  1. Complete Genome Sequence of Rift Valley Fever Virus Strain Lunyo

    PubMed Central

    Horton, Daniel L.; Marston, Denise A.; Johnson, Nicholas; Ellis, Richard J.; Fooks, Anthony R.; Hewson, Roger

    2016-01-01

    Using next-generation sequencing technologies, the first complete genome sequence of Rift Valley fever virus strain Lunyo is reported here. Originally reported as an attenuated antigenic variant strain from Uganda, genomic sequence analysis shows that Lunyo clusters together with other Ugandan isolates. PMID:27081121

  2. First Complete Genome Sequence of Cherry virus A

    PubMed Central

    Koinuma, Hiroaki; Nijo, Takamichi; Iwabuchi, Nozomu; Yoshida, Tetsuya; Keima, Takuya; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki

    2016-01-01

    The 5′-terminal genomic sequence of Cherry virus A (CVA) has long been unknown. We determined the first complete genome sequence of an apricot isolate of CVA (7,434 nucleotides [nt]). The 5′-untranslated region was 107 nt in length, which was 53 nt longer than those of known CVA sequences. PMID:27284130

  3. Genome Sequence of Stachybotrys chartarum Strain 51-11.

    PubMed

    Betancourt, Doris A; Dean, Timothy R; Kim, Jean; Levy, Josh

    2015-01-01

    The Stachybotrys chartarum strain 51-11 genome was sequenced by shotgun sequencing utilizing Illumina HiSeq 2000 and PacBio technologies. Since S. chartarum has been implicated as having health impacts within water-damaged buildings, any information extracted from the genomic sequence data relating to toxins or the metabolism of the fungus might be useful.

  4. Next Generation Sequencing at the University of Chicago Genomics Core

    SciTech Connect

    Faber, Pieter

    2013-04-24

    The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.

  5. First Complete Genome Sequence of Cherry virus A.

    PubMed

    Koinuma, Hiroaki; Nijo, Takamichi; Iwabuchi, Nozomu; Yoshida, Tetsuya; Keima, Takuya; Okano, Yukari; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

    2016-01-01

    The 5'-terminal genomic sequence of Cherry virus A (CVA) has long been unknown. We determined the first complete genome sequence of an apricot isolate of CVA (7,434 nucleotides [nt]). The 5'-untranslated region was 107 nt in length, which was 53 nt longer than those of known CVA sequences. PMID:27284130

  6. Isolation of Human Genomic DNA Sequences with Expanded Nucleobase Selectivity.

    PubMed

    Rathi, Preeti; Maurer, Sara; Kubik, Grzegorz; Summerer, Daniel

    2016-08-10

    We report the direct isolation of user-defined DNA sequences from the human genome with programmable selectivity for both canonical and epigenetic nucleobases. This is enabled by the use of engineered transcription-activator-like effectors (TALEs) as DNA major groove-binding probes in affinity enrichment. The approach provides the direct quantification of 5-methylcytosine (5mC) levels at single genomic nucleotide positions in a strand-specific manner. We demonstrate the simple, multiplexed typing of a variety of epigenetic cancer biomarker 5mC with custom TALE mixes. Compared to antibodies as the most widely used affinity probes for 5mC analysis, i.e., employed in the methylated DNA immunoprecipitation (MeDIP) protocol, TALEs provide superior sensitivity, resolution and technical ease. We engineer a range of size-reduced TALE repeats and establish full selectivity profiles for their binding to all five human cytosine nucleobases. These provide insights into their nucleobase recognition mechanisms and reveal the ability of TALEs to isolate genomic target sequences with selectivity for single 5-hydroxymethylcytosine and, in combination with sodium borohydride reduction, single 5-formylcytosine nucleobases. PMID:27429302

  7. Current challenges in de novo plant genome sequencing and assembly

    PubMed Central

    2012-01-01

    Genome sequencing is now affordable, but assembling plant genomes de novo remains challenging. We assess the state of the art of assembly and review the best practices for the community. PMID:22546054

  8. Enhancing cancer clonality analysis with integrative genomics

    PubMed Central

    2015-01-01

    Introduction It is understood that cancer is a clonal disease initiated by a single cell, and that metastasis, which is the spread of cancer from the primary site, is also initiated by a single cell. The seemingly natural capability of cancer to adapt dynamically in a Darwinian manner is a primary reason for therapeutic failures. Survival advantages may be induced by cancer therapies and also occur as a result of inherent cell and microenvironmental factors. The selected "more fit" clones outmatch their competition and then become dominant in the tumor via propagation of progeny. This clonal expansion leads to relapse, therapeutic resistance and eventually death. The goal of this study is to develop and demonstrate a more detailed clonality approach by utilizing integrative genomics. Methods Patient tumor samples were profiled by Whole Exome Sequencing (WES) and RNA-seq on an Illumina HiSeq 2500 and methylation profiling was performed on the Illumina Infinium 450K array. STAR and the Haplotype Caller were used for RNA-seq processing. Custom approaches were used for the integration of the multi-omic datasets. Results Reported are major enhancements to CloneViz, which now provides capabilities enabling a formal tumor multi-dimensional clonality analysis by integrating: i) DNA mutations, ii) RNA expressed mutations, and iii) DNA methylation data. RNA and DNA methylation integration were not previously possible, by CloneViz (previous version) or any other clonality method to date. This new approach, named iCloneViz (integrated CloneViz) employs visualization and quantitative methods, revealing an integrative genomic mutational dissection and traceability (DNA, RNA, epigenetics) thru the different layers of molecular structures. Conclusion The iCloneViz approach can be used for analysis of clonal evolution and mutational dynamics of multi-omic data sets. Revealing tumor clonal complexity in an integrative and quantitative manner facilitates improved mutational

  9. Complete genome sequence of Arcanobacterium haemolyticum type strain (11018T)

    SciTech Connect

    Yasawong, Montri; Teshima, Hazuki; Lapidus, Alla L.; Nolan, Matt; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Bruce, David; Detter, J. Chris; Tapia, Roxanne; Han, Cliff; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sikorski, Johannes; Pukall, Rudiger; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Vulcanisaeta distributa Itoh et al. 2002 belongs to the family Thermoproteaceae in the phylum Crenarchaeota. The genus Vulcanisaeta is characterized by a global distribution in hot and acidic springs. This is the first genome sequence from a member of the genus Vulcanisaeta and seventh genome sequence in the family Thermoproteaceae. The 2,374,137 bp long genome with its 2,544 protein-coding and 49 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  10. Draft Genome Sequences of Klebsiella variicola Plant Isolates.

    PubMed

    Martínez-Romero, Esperanza; Silva-Sanchez, Jesús; Barrios, Humberto; Rodríguez-Medina, Nadia; Martínez-Barnetche, Jesús; Téllez-Sosa, Juan; Gómez-Barreto, Rosa Elena; Garza-Ramos, Ulises

    2015-01-01

    Three endophytic Klebsiella variicola isolates-T29A, 3, and 6A2, obtained from sugar cane stem, maize shoots, and banana leaves, respectively-were used for whole-genome sequencing. Here, we report the draft genome sequences of circular chromosomes and plasmids. The genomes contain plant colonization and cellulases genes. This study will help toward understanding the genomic basis of K. variicola interaction with plant hosts. PMID:26358599

  11. Next-generation sequencing strategies for characterizing the turkey genome.

    PubMed

    Dalloul, Rami A; Zimin, Aleksey V; Settlage, Robert E; Kim, Sungwon; Reed, Kent M

    2014-02-01

    The turkey genome sequencing project was initiated in 2008 and has relied primarily on next-generation sequencing (NGS) technologies. Our first efforts used a synergistic combination of 2 NGS platforms (Roche/454 and Illumina GAII), detailed bacterial artificial chromosome (BAC) maps, and unique assembly tools to sequence and assemble the genome of the domesticated turkey, Meleagris gallopavo. Since the first release in 2010, efforts to improve the genome assembly, gene annotation, and genomic analyses continue. The initial assembly build (2.01) represented about 89% of the genome sequence with 17X coverage depth (931 Mb). Sequence contigs were assigned to 30 of the 40 chromosomes with approximately 10% of the assembled sequence corresponding to unassigned chromosomes (ChrUn). The sequence has been refined through both genome-wide and area-focused sequencing, including shotgun and paired-end sequencing, and targeted sequencing of chromosomal regions with low or incomplete coverage. These additional efforts have improved the sequence assembly resulting in 2 subsequent genome builds of higher genome coverage (25X/Build3.0 and 30X/Build4.0) with a current sequence totaling 1,010 Mb. Further, BAC with end sequences assigned to the Z/W and MG18 (MHC) chromosomes, ChrUn, or not placed in the previous build were isolated, deeply sequenced (Hi-Seq), and incorporated into the latest build (5.0). To aid in the annotation and to generate a gene expression atlas of major tissues, a comprehensive set of RNA samples was collected at various developmental stages of female and male turkeys. Transcriptome sequencing data (using Illumina Hi-Seq) will provide information to enhance the final assembly and ultimately improve sequence annotation. The most current sequence covers more than 95% of the turkey genome and should yield a much improved gene level of annotation, making it a valuable resource for studying genetic variations underlying economically important traits in poultry.

  12. Integration of new alternative reference strain genome sequences into the Saccharomyces genome database

    PubMed Central

    Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla; Demeter, Janos; Engel, Stacia; Hellerstedt, Sage T.; Karra, Kalpana; Hitz, Benjamin C.; Nash, Robert S.; Paskov, Kelley; Sheppard, Travis; Skrzypek, Marek; Weng, Shuai; Wong, Edith; Michael Cherry, J.

    2016-01-01

    The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org/) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. To provide a wider scope of genetic and phenotypic variation in yeast, the genome sequences and their corresponding annotations from 11 alternative S. cerevisiae reference strains have been integrated into SGD. Genomic and protein sequence information for genes from these strains are now available on the Sequence and Protein tab of the corresponding Locus Summary pages. We illustrate how these genome sequences can be utilized to aid our understanding of strain-specific functional and phenotypic differences. Database URL: www.yeastgenome.org PMID:27252399

  13. Comparative DNA Sequence Analysis of Wheat and Rice Genomes

    PubMed Central

    Sorrells, Mark E.; La Rota, Mauricio; Bermudez-Kandianis, Catherine E.; Greene, Robert A.; Kantety, Ramesh; Munkvold, Jesse D.; Miftahudin; Mahmoud, Ahmed; Ma, Xuefeng; Gustafson, Perry J.; Qi, Lili L.; Echalier, Benjamin; Gill, Bikram S.; Matthews, David E.; Lazo, Gerard R.; Chao, Shiaoman; Anderson, Olin D.; Edwards, Hugh; Linkiewicz, Anna M.; Dubcovsky, Jorge; Akhunov, Eduard D.; Dvorak, Jan; Zhang, Deshui; Nguyen, Henry T.; Peng, Junhua; Lapitan, Nora L.V.; Gonzalez-Hernandez, Jose L.; Anderson, James A.; Hossain, Khwaja; Kalavacharla, Venu; Kianian, Shahryar F.; Choi, Dong-Woog; Close, Timothy J.; Dilbirligi, Muharrem; Gill, Kulvinder S.; Steber, Camille; Walker-Simmons, Mary K.; McGuire, Patrick E.; Qualset, Calvin O.

    2003-01-01

    The use of DNA sequence-based comparative genomics for evolutionary studies and for transferring information from model species to crop species has revolutionized molecular genetics and crop improvement strategies. This study compared 4485 expressed sequence tags (ESTs) that were physically mapped in wheat chromosome bins, to the public rice genome sequence data from 2251 ordered BAC/PAC clones using BLAST. A rice genome view of homologous wheat genome locations based on comparative sequence analysis revealed numerous chromosomal rearrangements that will significantly complicate the use of rice as a model for cross-species transfer of information in nonconserved regions. PMID:12902377

  14. Selection to sequence: opportunities in fungal genomics

    SciTech Connect

    Baker, Scott E.

    2009-12-01

    Selection is a biological force, causing genotypic and phenotypic change over time. Whether environmental or human induced, selective pressures shape the genotypes and the phenotypes of organisms both in nature and in the laboratory. In nature, selective pressure is highly dynamic and the sum of the environment and other organisms. In the laboratory, selection is used in genetic studies and industrial strain development programs to isolate mutants affecting biological processes of interest to researchers. Selective pressures are important considerations for fungal biology. In the laboratory a number of fungi are used as experimental systems to study a wide range of biological processes and in nature fungi are important pathogens of plants and animals and play key roles in carbon and nitrogen cycling. The continued development of high throughput sequencing technologies makes it possible to characterize at the genomic level, the effect of selective pressures both in the lab and in nature for filamentous fungi as well as other organisms.

  15. The UCSC Cancer Genomics Browser: update 2015.

    PubMed

    Goldman, Mary; Craft, Brian; Swatloski, Teresa; Cline, Melissa; Morozova, Olena; Diekhans, Mark; Haussler, David; Zhu, Jingchun

    2015-01-01

    The UCSC Cancer Genomics Browser (https://genome-cancer.ucsc.edu/) is a web-based application that integrates relevant data, analysis and visualization, allowing users to easily discover and share their research observations. Users can explore the relationship between genomic alterations and phenotypes by visualizing various -omic data alongside clinical and phenotypic features, such as age, subtype classifications and genomic biomarkers. The Cancer Genomics Browser currently hosts 575 public datasets from genome-wide analyses of over 227,000 samples, including datasets from TCGA, CCLE, Connectivity Map and TARGET. Users can download and upload clinical data, generate Kaplan-Meier plots dynamically, export data directly to Galaxy for analysis, plus generate URL bookmarks of specific views of the data to share with others.

  16. Insights into cancer biology through next-generation sequencing.

    PubMed

    Nik-Zainal, Serena

    2014-12-01

    Cancer is the ultimate disorder of the genome, characterised not by just one or two mutations, but by hundreds to thousands of acquired mutations that have been accrued through the development of a tumour. Thanks to the recent increase in the speed of sequencing offered by modern sequencing technologies, we are no longer restricted to exploring tiny fragments of protein-coding portions of the human genome. We can now read all the genetic material in human cells. Here, the framework of a next-generation sequencing experiment is explained, giving insight into the advances and difficulties posed by processing the enormous datasets generated through these methods. Some of the recent insights into tumour biology, that exploit the extraordinary surge in scale and the digital nature of next-generation sequencing, are highlighted, including cancer gene discovery, the detection of mutation signatures and cancer evolution. Technological and intellectual developments are starting to shape the personalized cancer genomic profiles of tomorrow. Let's train the next-generation of clinicians to be able to read them from today.

  17. Insights into cancer biology through next-generation sequencing.

    PubMed

    Nik-Zainal, Serena

    2014-12-01

    Cancer is the ultimate disorder of the genome, characterised not by just one or two mutations, but by hundreds to thousands of acquired mutations that have been accrued through the development of a tumour. Thanks to the recent increase in the speed of sequencing offered by modern sequencing technologies, we are no longer restricted to exploring tiny fragments of protein-coding portions of the human genome. We can now read all the genetic material in human cells. Here, the framework of a next-generation sequencing experiment is explained, giving insight into the advances and difficulties posed by processing the enormous datasets generated through these methods. Some of the recent insights into tumour biology, that exploit the extraordinary surge in scale and the digital nature of next-generation sequencing, are highlighted, including cancer gene discovery, the detection of mutation signatures and cancer evolution. Technological and intellectual developments are starting to shape the personalized cancer genomic profiles of tomorrow. Let's train the next-generation of clinicians to be able to read them from today. PMID:25468925

  18. A sequence-based survey of the complex structural organization of tumor genomes

    SciTech Connect

    Collins, Colin; Raphael, Benjamin J.; Volik, Stanislav; Yu, Peng; Wu, Chunxiao; Huang, Guiqing; Linardopoulou, Elena V.; Trask, Barbara J.; Waldman, Frederic; Costello, Joseph; Pienta, Kenneth J.; Mills, Gordon B.; Bajsarowicz, Krystyna; Kobayashi, Yasuko; Sridharan, Shivaranjani; Paris, Pamela; Tao, Quanzhou; Aerni, Sarah J.; Brown, Raymond P.; Bashir, Ali; Gray, Joe W.; Cheng, Jan-Fang; de Jong, Pieter; Nefedov, Mikhail; Ried, Thomas; Padilla-Nash, Hesed M.; Collins, Colin C.

    2008-04-03

    The genomes of many epithelial tumors exhibit extensive chromosomal rearrangements. All classes of genome rearrangements can be identified using End Sequencing Profiling (ESP), which relies on paired-end sequencing of cloned tumor genomes. In this study, brain, breast, ovary and prostate tumors along with three breast cancer cell lines were surveyed with ESP yielding the largest available collection of sequence-ready tumor genome breakpoints and providing evidence that some rearrangements may be recurrent. Sequencing and fluorescence in situ hybridization (FISH) confirmed translocations and complex tumor genome structures that include coamplification and packaging of disparate genomic loci with associated molecular heterogeneity. Comparison of the tumor genomes suggests recurrent rearrangements. Some are likely to be novel structural polymorphisms, whereas others may be bona fide somatic rearrangements. A recurrent fusion transcript in breast tumors and a constitutional fusion transcript resulting from a segmental duplication were identified. Analysis of end sequences for single nucleotide polymorphisms (SNPs) revealed candidate somatic mutations and an elevated rate of novel SNPs in an ovarian tumor. These results suggest that the genomes of many epithelial tumors may be far more dynamic and complex than previously appreciated and that genomic fusions including fusion transcripts and proteins may be common, possibly yielding tumor-specific biomarkers and therapeutic targets.

  19. CaPSID: A bioinformatics platform for computational pathogen sequence identification in human genomes and transcriptomes

    PubMed Central

    2012-01-01

    Background It is now well established that nearly 20% of human cancers are caused by infectious agents, and the list of human oncogenic pathogens will grow in the future for a variety of cancer types. Whole tumor transcriptome and genome sequencing by next-generation sequencing technologies presents an unparalleled opportunity for pathogen detection and discovery in human tissues but requires development of new genome-wide bioinformatics tools. Results Here we present CaPSID (Computational Pathogen Sequence IDentification), a comprehensive bioinformatics platform for identifying, querying and visualizing both exogenous and endogenous pathogen nucleotide sequences in tumor genomes and transcriptomes. CaPSID includes a scalable, high performance database for data storage and a web application that integrates the genome browser JBrowse. CaPSID also provides useful metrics for sequence analysis of pre-aligned BAM files, such as gene and genome coverage, and is optimized to run efficiently on multiprocessor computers with low memory usage. Conclusions To demonstrate the usefulness and efficiency of CaPSID, we carried out a comprehensive analysis of both a simulated dataset and transcriptome samples from ovarian cancer. CaPSID correctly identified all of the human and pathogen sequences in the simulated dataset, while in the ovarian dataset CaPSID’s predictions were successfully validated in vitro. PMID:22901030

  20. The reference genome sequence of Saccharomyces cerevisiae: then and now.

    PubMed

    Engel, Stacia R; Dietrich, Fred S; Fisk, Dianna G; Binkley, Gail; Balakrishnan, Rama; Costanzo, Maria C; Dwight, Selina S; Hitz, Benjamin C; Karra, Kalpana; Nash, Robert S; Weng, Shuai; Wong, Edith D; Lloyd, Paul; Skrzypek, Marek S; Miyasato, Stuart R; Simison, Matt; Cherry, J Michael

    2014-03-01

    The genome of the budding yeast Saccharomyces cerevisiae was the first completely sequenced from a eukaryote. It was released in 1996 as the work of a worldwide effort of hundreds of researchers. In the time since, the yeast genome has been intensively studied by geneticists, molecular biologists, and computational scientists all over the world. Maintenance and annotation of the genome sequence have long been provided by the Saccharomyces Genome Database, one of the original model organism databases. To deepen our understanding of the eukaryotic genome, the S. cerevisiae strain S288C reference genome sequence was updated recently in its first major update since 1996. The new version, called "S288C 2010," was determined from a single yeast colony using modern sequencing technologies and serves as the anchor for further innovations in yeast genomic science. PMID:24374639

  1. The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

    PubMed

    Droege, Marcus; Hill, Brendon

    2008-08-31

    The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer. PMID:18616967

  2. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution. PMID:26620110

  3. A taste of pineapple evolution through genome sequencing.

    PubMed

    Xu, Qing; Liu, Zhong-Jian

    2015-12-01

    The genome sequence assembly of the highly heterozygous Ananas comosus and its varieties is an impressive technical achievement. The sequence opens the door to a greater understanding of pineapple morphology and evolution.

  4. Insights from twenty years of bacterial genome sequencing

    SciTech Connect

    Land, Miriam L; Hauser, Loren John; Jun, Se Ran; Nookaew, Intawat; Leuze, Michael Rex; Ahn, Tae-Hyuk; Karpinets, Tatiana V; Lund, Ole; Kora, Guruprasad H; Wassenaar, Trudy; Poudel, Suresh; Ussery, David W

    2015-01-01

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date, there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome

  5. Wavelet analysis in current cancer genome research: a survey.

    PubMed

    Meng, Tao; Soliman, Ahmed T; Shyu, Mei-Ling; Yang, Yimin; Chen, Shu-Ching; Iyengar, S S; Yordy, John S; Iyengar, Puneeth

    2013-01-01

    With the rapid development of next generation sequencing technology, the amount of biological sequence data of the cancer genome increases exponentially, which calls for efficient and effective algorithms that may identify patterns hidden underneath the raw data that may distinguish cancer Achilles' heels. From a signal processing point of view, biological units of information, including DNA and protein sequences, have been viewed as one-dimensional signals. Therefore, researchers have been applying signal processing techniques to mine the potentially significant patterns within these sequences. More specifically, in recent years, wavelet transforms have become an important mathematical analysis tool, with a wide and ever increasing range of applications. The versatility of wavelet analytic techniques has forged new interdisciplinary bounds by offering common solutions to apparently diverse problems and providing a new unifying perspective on problems of cancer genome research. In this paper, we provide a survey of how wavelet analysis has been applied to cancer bioinformatics questions. Specifically, we discuss several approaches of representing the biological sequence data numerically and methods of using wavelet analysis on the numerical sequences. PMID:24407303

  6. Spatial genomic heterogeneity within localized, multifocal prostate cancer.

    PubMed

    Boutros, Paul C; Fraser, Michael; Harding, Nicholas J; de Borja, Richard; Trudel, Dominique; Lalonde, Emilie; Meng, Alice; Hennings-Yeomans, Pablo H; McPherson, Andrew; Sabelnykova, Veronica Y; Zia, Amin; Fox, Natalie S; Livingstone, Julie; Shiah, Yu-Jia; Wang, Jianxin; Beck, Timothy A; Have, Cherry L; Chong, Taryne; Sam, Michelle; Johns, Jeremy; Timms, Lee; Buchner, Nicholas; Wong, Ada; Watson, John D; Simmons, Trent T; P'ng, Christine; Zafarana, Gaetano; Nguyen, Francis; Luo, Xuemei; Chu, Kenneth C; Prokopec, Stephenie D; Sykes, Jenna; Dal Pra, Alan; Berlin, Alejandro; Brown, Andrew; Chan-Seng-Yue, Michelle A; Yousif, Fouad; Denroche, Robert E; Chong, Lauren C; Chen, Gregory M; Jung, Esther; Fung, Clement; Starmans, Maud H W; Chen, Hanbo; Govind, Shaylan K; Hawley, James; D'Costa, Alister; Pintilie, Melania; Waggott, Daryl; Hach, Faraz; Lambin, Philippe; Muthuswamy, Lakshmi B; Cooper, Colin; Eeles, Rosalind; Neal, David; Tetu, Bernard; Sahinalp, Cenk; Stein, Lincoln D; Fleshner, Neil; Shah, Sohrab P; Collins, Colin C; Hudson, Thomas J; McPherson, John D; van der Kwast, Theodorus; Bristow, Robert G

    2015-07-01

    Herein we provide a detailed molecular analysis of the spatial heterogeneity of clinically localized, multifocal prostate cancer to delineate new oncogenes or tumor suppressors. We initially determined the copy number aberration (CNA) profiles of 74 patients with index tumors of Gleason score 7. Of these, 5 patients were subjected to whole-genome sequencing using DNA quantities achievable in diagnostic biopsies, with detailed spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity in focal genomics. Multifocal tumors are highly heterogeneous for single-nucleotide variants (SNVs), CNAs and genomic rearrangements. We identified and validated a new recurrent amplification of MYCL, which is associated with TP53 deletion and unique profiles of DNA damage and transcriptional dysregulation. Moreover, we demonstrate divergent tumor evolution in multifocal cancer and, in some cases, tumors of independent clonal origin. These data represent the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcome and inform the development of novel biomarkers that reflect individual prognosis.

  7. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  8. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  9. The genomic landscapes of human breast and colorectal cancers.

    PubMed

    Wood, Laura D; Parsons, D Williams; Jones, Siân; Lin, Jimmy; Sjöblom, Tobias; Leary, Rebecca J; Shen, Dong; Boca, Simina M; Barber, Thomas; Ptak, Janine; Silliman, Natalie; Szabo, Steve; Dezso, Zoltan; Ustyanksky, Vadim; Nikolskaya, Tatiana; Nikolsky, Yuri; Karchin, Rachel; Wilson, Paul A; Kaminker, Joshua S; Zhang, Zemin; Croshaw, Randal; Willis, Joseph; Dawson, Dawn; Shipitsin, Michail; Willson, James K V; Sukumar, Saraswati; Polyak, Kornelia; Park, Ben Ho; Pethiyagoda, Charit L; Pant, P V Krishna; Ballinger, Dennis G; Sparks, Andrew B; Hartigan, James; Smith, Douglas R; Suh, Erick; Papadopoulos, Nickolas; Buckhaults, Phillip; Markowitz, Sanford D; Parmigiani, Giovanni; Kinzler, Kenneth W; Velculescu, Victor E; Vogelstein, Bert

    2007-11-16

    Human cancer is caused by the accumulation of mutations in oncogenes and tumor suppressor genes. To catalog the genetic changes that occur during tumorigenesis, we isolated DNA from 11 breast and 11 colorectal tumors and determined the sequences of the genes in the Reference Sequence database in these samples. Based on analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene "mountains" and a much larger number of gene "hills" that are mutated at low frequency. We describe statistical and bioinformatic tools that may help identify mutations with a role in tumorigenesis. These results have implications for understanding the nature and heterogeneity of human cancers and for using personal genomics for tumor diagnosis and therapy.

  10. Genome Project Standards in a New Era of Sequencing

    SciTech Connect

    GSC Consortia; HMP Jumpstart Consortia; Chain, P. S. G.; Grafham, D. V.; Fulton, R. S.; FitzGerald, M. G.; Hostetler, J.; Muzny, D.; Detter, J. C.; Ali, J.; Birren, B.; Bruce, D. C.; Buhay, C.; Cole, J. R.; Ding, Y.; Dugan, S.; Field, D.; Garrity, G. M.; Gibbs, R.; Graves, T.; Han, C. S.; Harrison, S. H.; Highlander, S.; Hugenholtz, P.; Khouri, H. M.; Kodira, C. D.; Kolker, E.; Kyrpides, N. C.; Lang, D.; Lapidus, A.; Malfatti, S. A.; Markowitz, V.; Metha, T.; Nelson, K. E.; Parkhill, J.; Pitluck, S.; Qin, X.; Read, T. D.; Schmutz, J.; Sozhamannan, S.; Strausberg, R.; Sutton, G.; Thomson, N. R.; Tiedje, J. M.; Weinstock, G.; Wollam, A.

    2009-06-01

    For over a decade, genome 43 sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole genome sequencing that requires a careful reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker 'draft', however these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and contributed to many wasted hours of (mis)interpretation. These same novel sequencing technologies have also brought an exponential leap in raw sequencing capability, and at greatly reduced prices that have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The resulting effect is an ever-widening gap between drafted and finished genomes that only promises to continue (Figure 1), hence there is an urgent need to distinguish good and poor datasets. The sequencing institutes in the authorship, along with the NIH's Human Microbiome Project Jumpstart Consortium (3), strongly believe that a new set of standards is required for genome sequences. The following represents a set of six community-defined categories of genome sequence standards that better reflect the

  11. Chapter 27 -- Breast Cancer Genomics, Section VI, Pathology and Biological Markers of Invasive Breast Cancer

    SciTech Connect

    Spellman, Paul T.; Heiser, Laura; Gray, Joe W.

    2009-06-18

    Breast cancer is predominantly a disease of the genome with cancers arising and progressing through accumulation of aberrations that alter the genome - by changing DNA sequence, copy number, and structure in ways that that contribute to diverse aspects of cancer pathophysiology. Classic examples of genomic events that contribute to breast cancer pathophysiology include inherited mutations in BRCA1, BRCA2, TP53, and CHK2 that contribute to the initiation of breast cancer, amplification of ERBB2 (formerly HER2) and mutations of elements of the PI3-kinase pathway that activate aspects of epidermal growth factor receptor (EGFR) signaling and deletion of CDKN2A/B that contributes to cell cycle deregulation and genome instability. It is now apparent that accumulation of these aberrations is a time-dependent process that accelerates with age. Although American women living to an age of 85 have a 1 in 8 chance of developing breast cancer, the incidence of cancer in women younger than 30 years is uncommon. This is consistent with a multistep cancer progression model whereby mutation and selection drive the tumor's development, analogous to traditional Darwinian evolution. In the case of cancer, the driving events are changes in sequence, copy number, and structure of DNA and alterations in chromatin structure or other epigenetic marks. Our understanding of the genetic, genomic, and epigenomic events that influence the development and progression of breast cancer is increasing at a remarkable rate through application of powerful analysis tools that enable genome-wide analysis of DNA sequence and structure, copy number, allelic loss, and epigenomic modification. Application of these techniques to elucidation of the nature and timing of these events is enriching our understanding of mechanisms that increase breast cancer susceptibility, enable tumor initiation and progression to metastatic disease, and determine therapeutic response or resistance. These studies also reveal the

  12. Genome Wide Characterization of Simple Sequence Repeats in Cucumber

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...

  13. Finishing The Euchromatic Sequence Of The Human Genome

    SciTech Connect

    Rubin, Edward M.; Lucas, Susan; Richardson, Paul; Rokhsar, Daniel; Pennacchio, Len

    2004-09-07

    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process.The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers {approx}99% of the euchromatic genome and is accurate to an error rate of {approx}1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number,birth and death. Notably, the human genome seems to encode only20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead.

  14. Validation of rice genome sequence by optical mapping

    PubMed Central

    Zhou, Shiguo; Bechner, Michael C; Place, Michael; Churas, Chris P; Pape, Louise; Leong, Sally A; Runnheim, Rod; Forrest, Dan K; Goldstein, Steve; Livny, Miron; Schwartz, David C

    2007-01-01

    Background Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data. Results To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and in silico restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies. Conclusion Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences

  15. Draft Genome Sequence of a Diarrheagenic Morganella morganii Isolate

    PubMed Central

    Singh, Pallavi; Mosci, Rebekah; Rudrik, James T.

    2015-01-01

    This is a report of the whole-genome draft sequence of a diarrheagenic Morganella morganii isolate from a patient in Michigan, USA. This genome represents an important addition to the limited number of pathogenic M. morganii genomes available. PMID:26450735

  16. Genome sequence of Xanthomonas axonopodis pv. punicae strain LMG 859.

    PubMed

    Sharma, Vikas; Midha, Samriti; Ranjan, Manish; Pinnaka, Anil Kumar; Patil, Prabhu B

    2012-05-01

    We report the 4.94-Mb genome sequence of Xanthomonas axonopodis pv. punicae strain LMG 859, the causal agent of bacterial leaf blight disease in pomegranate. The draft genome will aid in comparative genomics, epidemiological studies, and quarantine of this devastating phytopathogen. PMID:22493202

  17. Draft Genome Sequence of a Diarrheagenic Morganella morganii Isolate.

    PubMed

    Singh, Pallavi; Mosci, Rebekah; Rudrik, James T; Manning, Shannon D

    2015-10-08

    This is a report of the whole-genome draft sequence of a diarrheagenic Morganella morganii isolate from a patient in Michigan, USA. This genome represents an important addition to the limited number of pathogenic M. morganii genomes available.

  18. Water buffalo (Bubalus bubalis): complete nucleotide mitochondrial genome sequence.

    PubMed

    Parma, Pietro; Erra-Pujada, Marta; Feligini, Maria; Greppi, Gianfranco; Enne, Giuseppe

    2004-01-01

    In this work, we report the whole sequence of the water buffalo (Bubalus bubalis) mitochondrial genome. The water buffalo mt molecule is 16.355 base pair length and shows a genome organization similar to those reported for other mitochondrial genome. These new data provide an useful tool for many research area, i.e. evolutionary study and identification of food origin.

  19. Complete Genome Sequence of Bacillus megaterium Bacteriophage Eldridge

    PubMed Central

    Reveille, Alexandra M.; Eldridge, Kimberly A.

    2016-01-01

    In this study the complete genome sequence of the unique bacteriophage Eldridge, isolated from soil using Bacillus megaterium as the host organism, was determined. Eldridge is a myovirus with a genome consisting of 242 genes and is unique when compared to phage sequences in GenBank. PMID:27103735

  20. The carrot genome sequence brings colors out of the dark.

    PubMed

    Garcia-Mas, Jordi; Rodriguez-Concepcion, Manuel

    2016-05-27

    The genome sequence of carrot (Daucus carota L.) is the first completed for an Apiaceae species, furthering knowledge of the evolution of the important euasterid II clade. Analyzing the whole-genome sequence allowed for the identification of a gene that may regulate the accumulation of carotenoids in the root. PMID:27230684

  1. Genome sequencing and annotation of Cellulomonas sp. HZM.

    PubMed

    Chua, Patric; Har, Zi Mei; Austin, Christopher M; Yule, Catherine M; Dykes, Gary A; Lee, Sui Mae

    2015-09-01

    We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA).

  2. Draft Genome Sequence of Raoultella planticola, Isolated from River Water.

    PubMed

    Jothikumar, Narayanan; Kahler, Amy; Strockbine, Nancy; Gladney, Lori; Hill, Vincent R

    2014-10-16

    We isolated Raoultella planticola from a river water sample, which was phenotypically indistinguishable from Escherichia coli on MI agar. The genome sequence of R. planticola was determined to gain information about its metabolic functions contributing to its false positive appearance of E. coli on MI agar. We report the first whole genome sequence of Raoultella planticola.

  3. Complete Genome Sequence of a Clinical Isolate of Enterobacter asburiae

    PubMed Central

    Liu, Feng; Yang, Jian; Xiao, Yan; Li, Li; Jin, Qi

    2016-01-01

    We report here the complete genome sequence of Enterobacter asburiae strain ENIPBJ-CG1, isolated from a bone marrow transplant patient. The size of the genome sequence is approximately 4.65 Mb, with a G+C content of 55.76%, and it is predicted to contain 4,790 protein-coding genes. PMID:27284137

  4. Nearly Complete Genome Sequence of Lactobacillus plantarum Strain NIZO2877

    PubMed Central

    Bayjanov, Jumamurat R.; Joncour, Pauline; Hughes, Sandrine; Gillet, Benjamin; Kleerebezem, Michiel; Siezen, Roland; van Hijum, Sacha A. F. T.

    2015-01-01

    Lactobacillus plantarum is a versatile bacterial species that is isolated mostly from foods. Here, we present the first genome sequence of L. plantarum strain NIZO2877 isolated from a hot dog in Vietnam. Its two contigs represent a nearly complete genome sequence. PMID:26607887

  5. Draft Genome Sequence of Neurospora crassa Strain FGSC 73

    DOE PAGESBeta

    Baker, Scott E.; Schackwitz, Wendy; Lipzen, Anna; Martin, Joel; Haridas, Sajeet; LaButti, Kurt; Grigoriev, Igor V.; Simmons, Blake A.; McCluskey, Kevin

    2015-04-02

    We report the elucidation of the complete genome of the Neurospora crassa (Shear and Dodge) strain FGSC 73, a mat-a, trp-3 mutant strain. The genome sequence around the idiotypic mating type locus represents the only publicly available sequence for a mat-a strain. 40.42 Megabases are assembled into 358 scaffolds carrying 11,978 gene models.

  6. De Novo Genome Sequence of Yersinia aleksiciae Y159T

    PubMed Central

    Neubauer, Heinrich

    2015-01-01

    We report here on the genome sequence of Yersinia aleksiciae Y159T, isolated in Finland in 1981. The genome has a size of 4 Mb, a G+C content of 49%, and is predicted to contain 3,423 coding sequences. PMID:26383649

  7. Draft Genome Sequence of the Fish Pathogen Piscirickettsia salmonis

    PubMed Central

    Eppinger, Mark; McNair, Katelyn; Zogaj, Xhavit; Dinsdale, Elizabeth A.; Edwards, Robert A.

    2013-01-01

    Piscirickettsia salmonis is a Gram-negative intracellular fish pathogen that has a significant impact on the salmon industry. Here, we report the genome sequence of P. salmonis strain LF-89. This is the first draft genome sequence of P. salmonis, and it reveals interesting attributes, including flagellar genes, despite this bacterium being considered nonmotile. PMID:24201203

  8. Draft Genome Sequence of Vibrio (Listonella) anguillarum ATCC 14181

    PubMed Central

    Grim, Christopher J.

    2016-01-01

    We report the draft genome sequence of Vibrio anguillarum ATCC 14181, a Gram-negative, hemolytic, O2 serotype marine bacterium that causes mortality in mariculture species. The availability of this genome sequence will add to our knowledge of diversity and virulence mechanisms of Vibrio anguillarum as well as other pathogenic Vibrio spp.

  9. Complete Genome Sequence of Mycobacterium ulcerans subsp. shinshuense

    PubMed Central

    Yoshida, Mitsunori; Nakanaga, Kazue; Ogura, Yoshitoshi; Toyoda, Atsushi; Ooka, Tadasuke; Kazumi, Yuko; Mitarai, Satoshi; Ishii, Norihisa; Hayashi, Tetsuya

    2016-01-01

    Mycobacterium ulcerans subsp. shinshuense produces mycolactone and causes Buruli ulcer. Here, we report the complete sequence of its genome, which comprises a 5.9-Mb chromosome and a 166-kb plasmid (pShT-P). The sequence will represent the essential data for future phylogenetic and comparative genome studies of mycolactone-producing mycobacteria. PMID:27688344

  10. Complete Genome Sequence of Mycobacterium ulcerans subsp. shinshuense.

    PubMed

    Yoshida, Mitsunori; Nakanaga, Kazue; Ogura, Yoshitoshi; Toyoda, Atsushi; Ooka, Tadasuke; Kazumi, Yuko; Mitarai, Satoshi; Ishii, Norihisa; Hayashi, Tetsuya; Hoshino, Yoshihiko

    2016-01-01

    Mycobacterium ulcerans subsp. shinshuense produces mycolactone and causes Buruli ulcer. Here, we report the complete sequence of its genome, which comprises a 5.9-Mb chromosome and a 166-kb plasmid (pShT-P). The sequence will represent the essential data for future phylogenetic and comparative genome studies of mycolactone-producing mycobacteria. PMID:27688344

  11. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

  12. Draft Genome Sequence of “Cohnella kolymensis” B-2846

    PubMed Central

    Kudryashova, Ekaterina B.; Ariskina, Elena V.

    2016-01-01

    A draft genome sequence of “Cohnella kolymensis” strain B-2846 was derived using IonTorrent sequencing technology. The size of the assembly and G+C content were in agreement with those of other species of this genus. Characterization of the genome of a novel species of Cohnella will assist in bacterial systematics. PMID:26769947

  13. Initial sequencing and analysis of the human genome.

    PubMed

    Lander, E S; Linton, L M; Birren, B; Nusbaum, C; Zody, M C; Baldwin, J; Devon, K; Dewar, K; Doyle, M; FitzHugh, W; Funke, R; Gage, D; Harris, K; Heaford, A; Howland, J; Kann, L; Lehoczky, J; LeVine, R; McEwan, P; McKernan, K; Meldrim, J; Mesirov, J P; Miranda, C; Morris, W; Naylor, J; Raymond, C; Rosetti, M; Santos, R; Sheridan, A; Sougnez, C; Stange-Thomann, Y; Stojanovic, N; Subramanian, A; Wyman, D; Rogers, J; Sulston, J; Ainscough, R; Beck, S; Bentley, D; Burton, J; Clee, C; Carter, N; Coulson, A; Deadman, R; Deloukas, P; Dunham, A; Dunham, I; Durbin, R; French, L; Grafham, D; Gregory, S; Hubbard, T; Humphray, S; Hunt, A; Jones, M; Lloyd, C; McMurray, A; Matthews, L; Mercer, S; Milne, S; Mullikin, J C; Mungall, A; Plumb, R; Ross, M; Shownkeen, R; Sims, S; Waterston, R H; Wilson, R K; Hillier, L W; McPherson, J D; Marra, M A; Mardis, E R; Fulton, L A; Chinwalla, A T; Pepin, K H; Gish, W R; Chissoe, S L; Wendl, M C; Delehaunty, K D; Miner, T L; Delehaunty, A; Kramer, J B; Cook, L L; Fulton, R S; Johnson, D L; Minx, P J; Clifton, S W; Hawkins, T; Branscomb, E; Predki, P; Richardson, P; Wenning, S; Slezak, T; Doggett, N; Cheng, J F; Olsen, A; Lucas, S; Elkin, C; Uberbacher, E; Frazier, M; Gibbs, R A; Muzny, D M; Scherer, S E; Bouck, J B; Sodergren, E J; Worley, K C; Rives, C M; Gorrell, J H; Metzker, M L; Naylor, S L; Kucherlapati, R S; Nelson, D L; Weinstock, G M; Sakaki, Y; Fujiyama, A; Hattori, M; Yada, T; Toyoda, A; Itoh, T; Kawagoe, C; Watanabe, H; Totoki, Y; Taylor, T; Weissenbach, J; Heilig, R; Saurin, W; Artiguenave, F; Brottier, P; Bruls, T; Pelletier, E; Robert, C; Wincker, P; Smith, D R; Doucette-Stamm, L; Rubenfield, M; Weinstock, K; Lee, H M; Dubois, J; Rosenthal, A; Platzer, M; Nyakatura, G; Taudien, S; Rump, A; Yang, H; Yu, J; Wang, J; Huang, G; Gu, J; Hood, L; Rowen, L; Madan, A; Qin, S; Davis, R W; Federspiel, N A; Abola, A P; Proctor, M J; Myers, R M; Schmutz, J; Dickson, M; Grimwood, J; Cox, D R; Olson, M V; Kaul, R; Raymond, C; Shimizu, N; Kawasaki, K; Minoshima, S; Evans, G A; Athanasiou, M; Schultz, R; Roe, B A; Chen, F; Pan, H; Ramser, J; Lehrach, H; Reinhardt, R; McCombie, W R; de la Bastide, M; Dedhia, N; Blöcker, H; Hornischer, K; Nordsiek, G; Agarwala, R; Aravind, L; Bailey, J A; Bateman, A; Batzoglou, S; Birney, E; Bork, P; Brown, D G; Burge, C B; Cerutti, L; Chen, H C; Church, D; Clamp, M; Copley, R R; Doerks, T; Eddy, S R; Eichler, E E; Furey, T S; Galagan, J; Gilbert, J G; Harmon, C; Hayashizaki, Y; Haussler, D; Hermjakob, H; Hokamp, K; Jang, W; Johnson, L S; Jones, T A; Kasif, S; Kaspryzk, A; Kennedy, S; Kent, W J; Kitts, P; Koonin, E V; Korf, I; Kulp, D; Lancet, D; Lowe, T M; McLysaght, A; Mikkelsen, T; Moran, J V; Mulder, N; Pollara, V J; Ponting, C P; Schuler, G; Schultz, J; Slater, G; Smit, A F; Stupka, E; Szustakowki, J; Thierry-Mieg, D; Thierry-Mieg, J; Wagner, L; Wallis, J; Wheeler, R; Williams, A; Wolf, Y I; Wolfe, K H; Yang, S P; Yeh, R F; Collins, F; Guyer, M S; Peterson, J; Felsenfeld, A; Wetterstrand, K A; Patrinos, A; Morgan, M J; de Jong, P; Catanese, J J; Osoegawa, K; Shizuya, H; Choi, S; Chen, Y J; Szustakowki, J

    2001-02-15

    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence. PMID:11237011

  14. Complete genome sequence of Enterobacter aerogenes KCTC 2190.

    PubMed

    Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon; Yang, Kap-Seok

    2012-05-01

    This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs.

  15. Complete genome sequence of ‘Candidatus Liberibacter africanus’

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of ‘Candidatus Liberibacter africanus’ (Laf), strain ptsapsy, was obtained by an Illumina HiSeq 2000. The Laf genome comprises 1,192,232 nucleotides, 34.5% GC content, 1,141 predicted coding sequences, 44 tRNAs, 3 complete copies of ribosomal RNA genes (16S, 23S and 5S) ...

  16. Selection of sequence variants to improve dairy cattle genomic predictions

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genomic prediction reliabilities improved when adding selected sequence variants from run 5 of the 1,000 bull genomes project. High density (HD) imputed genotypes for 26,970 progeny tested Holstein bulls were combined with sequence variants for 444 Holstein animals. The first test included 481,904 c...

  17. Complete Genome Sequence of Enterobacter aerogenes KCTC 2190

    PubMed Central

    Shin, Sang Heum; Kim, Sewhan; Kim, Jae Young; Lee, Soojin; Um, Youngsoon; Oh, Min-Kyu; Kim, Young-Rok; Lee, Jinwon

    2012-01-01

    This is the first complete genome sequence of the Enterobacter aerogenes species. Here we present the genome sequence of E. aerogenes KCTC 2190, which contains 5,280,350 bp with a G + C content of 54.8 mol%, 4,912 protein-coding genes, and 109 structural RNAs. PMID:22493190

  18. Genome sequencing and annotation of Cellulomonas sp. HZM

    PubMed Central

    Chua, Patric; Har, Zi Mei; Austin, Christopher M.; Yule, Catherine M.; Dykes, Gary A.; Lee, Sui Mae

    2015-01-01

    We report the draft genome sequence of Cellulomonas sp. HZM, isolated from a tropical peat swamp forest. The draft genome size is 3,559,280 bp with a G + C content of 73% and contains 3 rRNA sequences (single copies of 5S, 16S and 23S rRNA). PMID:26484221

  19. Unexpected cross-species contamination in genome sequencing projects

    PubMed Central

    Merchant, Samier; Wood, Derrick E.

    2014-01-01

    The raw data from a genome sequencing project sometimes contains DNA from contaminating organisms, which may be introduced during sample collection or sequence preparation. In some instances, these contaminants remain in the sequence even after assembly and deposition of the genome into public databases. As a result, searches of these databases may yield erroneous and confusing results. We used efficient microbiome analysis software to scan the draft assembly of domestic cow, Bos taurus, and identify 173 small contigs that appeared to derive from microbial contaminants. In the course of verifying these findings, we discovered that one genome, Neisseria gonorrhoeae TCDC-NG08107, although putatively a complete genome, contained multiple sequences that actually derived from the cow and sheep genomes. Our findings illustrate the need to carefully validate findings of anomalous DNA that rely on comparisons to either draft or finished genomes. PMID:25426337

  20. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  1. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  2. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-03-19

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data.

  3. De novo assembly of a bell pepper endornavirus genome sequence using RNA sequencing data.

    PubMed

    Jo, Yeonhwa; Choi, Hoseng; Cho, Won Kyong

    2015-01-01

    The genus Endornavirus is a double-stranded RNA virus that infects a wide range of hosts. In this study, we report on the de novo assembly of a bell pepper endornavirus genome sequence by RNA sequencing (RNA-Seq). Our result demonstrates the successful application of RNA-Seq to obtain a complete viral genome sequence from the transcriptome data. PMID:25792042

  4. Microbial genome sequencing using optical mapping and Illumina sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...

  5. Draft sequences of the radish (Raphanus sativus L.) genome.

    PubMed

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified. PMID:24848699

  6. Draft sequences of the radish (Raphanus sativus L.) genome.

    PubMed

    Kitashiba, Hiroyasu; Li, Feng; Hirakawa, Hideki; Kawanabe, Takahiro; Zou, Zhongwei; Hasegawa, Yoichi; Tonosaki, Kaoru; Shirasawa, Sachiko; Fukushima, Aki; Yokoi, Shuji; Takahata, Yoshihito; Kakizaki, Tomohiro; Ishida, Masahiko; Okamoto, Shunsuke; Sakamoto, Koji; Shirasawa, Kenta; Tabata, Satoshi; Nishio, Takeshi

    2014-10-01

    Radish (Raphanus sativus L., n = 9) is one of the major vegetables in Asia. Since the genomes of Brassica and related species including radish underwent genome rearrangement, it is quite difficult to perform functional analysis based on the reported genomic sequence of Brassica rapa. Therefore, we performed genome sequencing of radish. Short reads of genomic sequences of 191.1 Gb were obtained by next-generation sequencing (NGS) for a radish inbred line, and 76,592 scaffolds of ≥ 300 bp were constructed along with the bacterial artificial chromosome-end sequences. Finally, the whole draft genomic sequence of 402 Mb spanning 75.9% of the estimated genomic size and containing 61,572 predicted genes was obtained. Subsequently, 221 single nucleotide polymorphism markers and 768 PCR-RFLP markers were used together with the 746 markers produced in our previous study for the construction of a linkage map. The map was combined further with another radish linkage map constructed mainly with expressed sequence tag-simple sequence repeat markers into a high-density integrated map of 1,166 cM with 2,553 DNA markers. A total of 1,345 scaffolds were assigned to the linkage map, spanning 116.0 Mb. Bulked PCR products amplified by 2,880 primer pairs were sequenced by NGS, and SNPs in eight inbred lines were identified.

  7. Cancer Genetic Markers of Susceptibility | Office of Cancer Genomics

    Cancer.gov

    The Cancer Genetic Markers of Susceptibility (CGEMS) project began in 2005 as a 3-year pilot study to identify inherited genetic susceptibility to prostate and breast cancer. CGEMS has developed into a successful research program of genome-wide association studies (GWAS) to identify genetic variants that affect a person’s risk of developing cancer.

  8. Endometrial and acute myeloid leukemia cancer genomes characterized

    Cancer.gov

    Two studies from The Cancer Genome Atlas (TCGA) program reveal details about the genomic landscapes of acute myeloid leukemia (AML) and endometrial cancer. Both provide new insights into the molecular underpinnings of these cancers with the potential to i

  9. The Brachypodium genome sequence: a resource for oat genomics research

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Oat (Avena sativa) is an important cereal crop used as both an animal feed and for human consumption. Genetic and genomic research on oat is hindered because it is hexaploid and possesses a large (13 Gb) genome. Diploid Avena relatives have been employed for genetic and genomic studies, but only mod...

  10. Standards for Sequencing Viral Genomes in the Era of High-Throughput Sequencing

    PubMed Central

    Beitzel, Brett; Chain, Patrick S. G.; Davenport, Matthew G.; Donaldson, Eric; Frieman, Matthew; Kugelman, Jeffrey; Kuhn, Jens H.; O’Rear, Jules; Sabeti, Pardis C.; Wentworth, David E.; Wiley, Michael R.; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher

    2014-01-01

    ABSTRACT Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five “standard” categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques. PMID:24939889

  11. Standards for sequencing viral genomes in the era of high-throughput sequencing.

    PubMed

    Ladner, Jason T; Beitzel, Brett; Chain, Patrick S G; Davenport, Matthew G; Donaldson, Eric F; Frieman, Matthew; Kugelman, Jeffrey R; Kuhn, Jens H; O'Rear, Jules; Sabeti, Pardis C; Wentworth, David E; Wiley, Michael R; Yu, Guo-Yun; Sozhamannan, Shanmuga; Bradburne, Christopher; Palacios, Gustavo

    2014-01-01

    Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five "standard" categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques.

  12. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  13. Progress in understanding and sequencing the genome of Brassica rapa.

    PubMed

    Hong, Chang Pyo; Kwon, Soo-Jin; Kim, Jung Sun; Yang, Tae-Jin; Park, Beom-Seok; Lim, Yong Pyo

    2008-01-01

    Brassica rapa, which is closely related to Arabidopsis thaliana, is an important crop and a model plant for studying genome evolution via polyploidization. We report the current understanding of the genome structure of B. rapa and efforts for the whole-genome sequencing of the species. The tribe Brassicaceae, which comprises ca. 240 species, descended from a common hexaploid ancestor with a basic genome similar to that of Arabidopsis. Chromosome rearrangements, including fusions and/or fissions, resulted in the present-day "diploid" Brassica species with variation in chromosome number and phenotype. Triplicated genomic segments of B. rapa are collinear to those of A. thaliana with InDels. The genome triplication has led to an approximately 1.7-fold increase in the B. rapa gene number compared to that of A. thaliana. Repetitive DNA of B. rapa has also been extensively amplified and has diverged from that of A. thaliana. For its whole-genome sequencing, the Brassica rapa Genome Sequencing Project (BrGSP) consortium has developed suitable genomic resources and constructed genetic and physical maps. Ten chromosomes of B. rapa are being allocated to BrGSP consortium participants, and each chromosome will be sequenced by a BAC-by-BAC approach. Genome sequencing of B. rapa will offer a new perspective for plant biology and evolution in the context of polyploidization.

  14. Assessing the Drosophila melanogaster and Anopheles gambiae Genome Annotations Using Genome-Wide Sequence Comparisons

    PubMed Central

    Jaillon, Olivier; Dossat, Carole; Eckenberg, Ralph; Eiglmeier, Karin; Segurens, Béatrice; Aury, Jean-Marc; Roth, Charles W.; Scarpelli, Claude; Brey, Paul T.; Weissenbach, Jean; Wincker, Patrick

    2003-01-01

    We performed genome-wide sequence comparisons at the protein coding level between the genome sequences of Drosophila melanogaster and Anopheles gambiae. Such comparisons detect evolutionarily conserved regions (ecores) that can be used for a qualitative and quantitative evaluation of the available annotations of both genomes. They also provide novel candidate features for annotation. The percentage of ecores mapping outside annotations in the A. gambiae genome is about fourfold higher than in D. melanogaster. The A. gambiae genome assembly also contains a high proportion of duplicated ecores, possibly resulting from artefactual sequence duplications in the genome assembly. The occurrence of 4063 ecores in the D. melanogaster genome outside annotations suggests that some genes are not yet or only partially annotated. The present work illustrates the power of comparative genomics approaches towards an exhaustive and accurate establishment of gene models and gene catalogues in insect genomes. PMID:12840038

  15. Deep sequencing of 10,000 human genomes

    PubMed Central

    Pierce, Levi C. T.; Biggs, William H.; di Iulio, Julia; Wong, Emily H. M.; Fabani, Martin M.; Kirkness, Ewen F.; Moustafa, Ahmed; Shah, Naisha; Xie, Chao; Brewerton, Suzanne C.; Bulsara, Nadeem; Garner, Chad; Metzker, Gary; Sandoval, Efren; Perkins, Brad A.; Och, Franz J.; Turpaz, Yaron; Venter, J. Craig

    2016-01-01

    We report on the sequencing of 10,545 human genomes at 30×–40× coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high-confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single-nucleotide variants in the coding and noncoding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries on average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high-resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use. PMID:27702888

  16. A microscopic landscape of the invasive breast cancer genome

    PubMed Central

    Ping, Zheng; Xia, Yuchao; Shen, Tiansheng; Parekh, Vishwas; Siegal, Gene P.; Eltoum, Isam-Eldin; He, Jianbo; Chen, Dongquan; Deng, Minghua; Xi, Ruibin; Shen, Dejun

    2016-01-01

    Histologic grade is one of the most important microscopic features used to predict the prognosis of invasive breast cancer and may serve as a marker for studying cancer driving genomic abnormalities in vivo. We analyzed whole genome sequencing data from 680 cases of TCGA invasive ductal carcinomas of the breast and correlated them to corresponding pathology information. Ten genetic abnormalities were found to be statistically associated with histologic grade, including three most prevalent cancer driver events, TP53 and PIK3CA mutations and MYC amplification. A distinct genetic interaction among these genomic abnormalities was revealed as measured by the histologic grading score. While TP53 mutation and MYC amplification were synergistic in promoting tumor progression, PIK3CA mutation was found to have alleviated the oncogenic effect of either the TP53 mutation or MYC amplification, and was associated with a significant reduction in mitotic activity in TP53 mutated and/or MYC amplified breast cancer. Furthermore, we discovered that different types of genetic abnormalities (mutation versus amplification) within the same cancer driver gene (PIK3CA or GATA3) were associated with opposite histologic changes in invasive breast cancer. In conclusion, our study suggests that histologic grade may serve as a biomarker to define cancer driving genetic events in vivo. PMID:27283966

  17. Using Partial Genomic Fosmid Libraries for Sequencing CompleteOrganellar Genomes

    SciTech Connect

    McNeal, Joel R.; Leebens-Mack, James H.; Arumuganathan, K.; Kuehl, Jennifer V.; Boore, Jeffrey L.; dePamphilis, Claude W.

    2005-08-26

    Organellar genome sequences provide numerous phylogenetic markers and yield insight into organellar function and molecular evolution. These genomes are much smaller in size than their nuclear counterparts; thus, their complete sequencing is much less expensive than total nuclear genome sequencing, making broader phylogenetic sampling feasible. However, for some organisms it is challenging to isolate plastid DNA for sequencing using standard methods. To overcome these difficulties, we constructed partial genomic libraries from total DNA preparations of two heterotrophic and two autotrophic angiosperm species using fosmid vectors. We then used macroarray screening to isolate clones containing large fragments of plastid DNA. A minimum tiling path of clones comprising the entire genome sequence of each plastid was selected, and these clones were shotgun-sequenced and assembled into complete genomes. Although this method worked well for both heterotrophic and autotrophic plants, nuclear genome size had a dramatic effect on the proportion of screened clones containing plastid DNA and, consequently, the overall number of clones that must be screened to ensure full plastid genome coverage. This technique makes it possible to determine complete plastid genome sequences for organisms that defy other available organellar genome sequencing methods, especially those for which limited amounts of tissue are available.

  18. Mechanisms of Base Substitution Mutagenesis in Cancer Genomes

    PubMed Central

    Bacolla, Albino; Cooper, David N.; Vasquez, Karen M.

    2014-01-01

    Cancer genome sequence data provide an invaluable resource for inferring the key mechanisms by which mutations arise in cancer cells, favoring their survival, proliferation and invasiveness. Here we examine recent advances in understanding the molecular mechanisms responsible for the predominant type of genetic alteration found in cancer cells, somatic single base substitutions (SBSs). Cytosine methylation, demethylation and deamination, charge transfer reactions in DNA, DNA replication timing, chromatin status and altered DNA proofreading activities are all now known to contribute to the mechanisms leading to base substitution mutagenesis. We review current hypotheses as to the major processes that give rise to SBSs and evaluate their relative relevance in the light of knowledge acquired from cancer genome sequencing projects and the study of base modifications, DNA repair and lesion bypass. Although gene expression data on APOBEC3B enzymes provide support for a role in cancer mutagenesis through U:G mismatch intermediates, the enzyme preference for single-stranded DNA may limit its activity genome-wide. For SBSs at both CG:CG and YC:GR sites, we outline evidence for a prominent role of damage by charge transfer reactions that follow interactions of the DNA with reactive oxygen species (ROS) and other endogenous or exogenous electron-abstracting molecules. PMID:24705290

  19. Subclonal diversification of primary breast cancer revealed by multiregion sequencing

    SciTech Connect

    Yates, Lucy R.; Gerstung, Moritz; Knappskog, Stian; Desmedt, Christine; Gundem, Gunes; Van Loo, Peter; Aas, Turid; Alexandrov, Ludmil B.; Larsimont, Denis; Davies, Helen; Li, Yilong; Ju, Young Seok; Ramakrishna, Manasa; Haugland, Hans Kristian; Lilleng, Peer Kaare; Nik-Zainal, Serena; McLaren, Stuart; Butler, Adam; Martin, Sancha; Glodzik, Dominic; Menzies, Andrew; Raine, Keiran; Hinton, Jonathan; Jones, David; Mudie, Laura J.; Jiang, Bing; Vincent, Delphine; Greene-Colozzi, April; Adnet, Pierre -Yves; Fatima, Aquila; Maetens, Marion; Ignatiadis, Michail; Stratton, Michael R.; Sotiriou, Christos; Richardson, Andrea L.; Lønning, Per Eystein; Wedge, David C.; Campbell, Peter J.

    2015-06-22

    Sequencing cancer genomes may enable tailoring of therapeutics to the underlying biological abnormalities driving a particular patient's tumor. However, sequencing-based strategies rely heavily on representative sampling of tumors. To understand the subclonal structure of primary breast cancer, we applied whole-genome and targeted sequencing to multiple samples from each of 50 patients' tumors (303 samples in total). The extent of subclonal diversification varied among cases and followed spatial patterns. No strict temporal order was evident, with point mutations and rearrangements affecting the most common breast cancer genes, including PIK3CA, TP53, PTEN, BRCA2 and MYC, occurring early in some tumors and late in others. In 13 out of 50 cancers, potentially targetable mutations were subclonal. Landmarks of disease progression, such as resistance to chemotherapy and the acquisition of invasive or metastatic potential, arose within detectable subclones of antecedent lesions. These findings highlight the importance of including analyses of subclonal structure and tumor evolution in clinical trials of primary breast cancer.

  20. Subclonal diversification of primary breast cancer revealed by multiregion sequencing

    DOE PAGESBeta

    Yates, Lucy R.; Gerstung, Moritz; Knappskog, Stian; Desmedt, Christine; Gundem, Gunes; Van Loo, Peter; Aas, Turid; Alexandrov, Ludmil B.; Larsimont, Denis; Davies, Helen; et al

    2015-06-22

    Sequencing cancer genomes may enable tailoring of therapeutics to the underlying biological abnormalities driving a particular patient's tumor. However, sequencing-based strategies rely heavily on representative sampling of tumors. To understand the subclonal structure of primary breast cancer, we applied whole-genome and targeted sequencing to multiple samples from each of 50 patients' tumors (303 samples in total). The extent of subclonal diversification varied among cases and followed spatial patterns. No strict temporal order was evident, with point mutations and rearrangements affecting the most common breast cancer genes, including PIK3CA, TP53, PTEN, BRCA2 and MYC, occurring early in some tumors and latemore » in others. In 13 out of 50 cancers, potentially targetable mutations were subclonal. Landmarks of disease progression, such as resistance to chemotherapy and the acquisition of invasive or metastatic potential, arose within detectable subclones of antecedent lesions. These findings highlight the importance of including analyses of subclonal structure and tumor evolution in clinical trials of primary breast cancer.« less

  1. Subclonal diversification of primary breast cancer revealed by multiregion sequencing.

    PubMed

    Yates, Lucy R; Gerstung, Moritz; Knappskog, Stian; Desmedt, Christine; Gundem, Gunes; Van Loo, Peter; Aas, Turid; Alexandrov, Ludmil B; Larsimont, Denis; Davies, Helen; Li, Yilong; Ju, Young Seok; Ramakrishna, Manasa; Haugland, Hans Kristian; Lilleng, Peer Kaare; Nik-Zainal, Serena; McLaren, Stuart; Butler, Adam; Martin, Sancha; Glodzik, Dominic; Menzies, Andrew; Raine, Keiran; Hinton, Jonathan; Jones, David; Mudie, Laura J; Jiang, Bing; Vincent, Delphine; Greene-Colozzi, April; Adnet, Pierre-Yves; Fatima, Aquila; Maetens, Marion; Ignatiadis, Michail; Stratton, Michael R; Sotiriou, Christos; Richardson, Andrea L; Lønning, Per Eystein; Wedge, David C; Campbell, Peter J

    2015-07-01

    The sequencing of cancer genomes may enable tailoring of therapeutics to the underlying biological abnormalities driving a particular patient's tumor. However, sequencing-based strategies rely heavily on representative sampling of tumors. To understand the subclonal structure of primary breast cancer, we applied whole-genome and targeted sequencing to multiple samples from each of 50 patients' tumors (303 samples in total). The extent of subclonal diversification varied among cases and followed spatial patterns. No strict temporal order was evident, with point mutations and rearrangements affecting the most common breast cancer genes, including PIK3CA, TP53, PTEN, BRCA2 and MYC, occurring early in some tumors and late in others. In 13 out of 50 cancers, potentially targetable mutations were subclonal. Landmarks of disease progression, such as resistance to chemotherapy and the acquisition of invasive or metastatic potential, arose within detectable subclones of antecedent lesions. These findings highlight the importance of including analyses of subclonal structure and tumor evolution in clinical trials of primary breast cancer.

  2. Emerging Role of Genomic Rearrangements in Breast Cancer: Applying Knowledge from Other Cancers.

    PubMed

    Paratala, Bhavna S; Dolfi, Sonia C; Khiabanian, Hossein; Rodriguez-Rodriguez, Lorna; Ganesan, Shridar; Hirshfield, Kim M

    2016-01-01

    Significant advances in our knowledge of cancer genomes are rapidly changing the way we think about tumor biology and the heterogeneity of cancer. Recent successes in genomically-guided treatment approaches accompanied by more sophisticated sequencing techniques have paved the way for deeper investigation into the landscape of genomic rearrangements in cancer. While considerable research on solid tumors has focused on point mutations that directly alter the coding sequence of key genes, far less is known about the role of somatic rearrangements. With many recurring alterations observed across tumor types, there is an obvious need for functional characterization of these genomic biomarkers in order to understand their relevance to tumor biology, therapy, and prognosis. As personalized therapy approaches are turning toward genomic alterations for answers, these biomarkers will become increasingly relevant to the practice of precision medicine. This review discusses the emerging role of genomic rearrangements in breast cancer, with a particular focus on fusion genes. In addition, it raises several key questions on the therapeutic value of such rearrangements and provides a framework to evaluate their significance as predictive and prognostic biomarkers. PMID:26917980

  3. Emerging Role of Genomic Rearrangements in Breast Cancer: Applying Knowledge from Other Cancers

    PubMed Central

    Paratala, Bhavna S.; Dolfi, Sonia C.; Khiabanian, Hossein; Rodriguez-Rodriguez, Lorna; Ganesan, Shridar; Hirshfield, Kim M.

    2016-01-01

    Significant advances in our knowledge of cancer genomes are rapidly changing the way we think about tumor biology and the heterogeneity of cancer. Recent successes in genomically-guided treatment approaches accompanied by more sophisticated sequencing techniques have paved the way for deeper investigation into the landscape of genomic rearrangements in cancer. While considerable research on solid tumors has focused on point mutations that directly alter the coding sequence of key genes, far less is known about the role of somatic rearrangements. With many recurring alterations observed across tumor types, there is an obvious need for functional characterization of these genomic biomarkers in order to understand their relevance to tumor biology, therapy, and prognosis. As personalized therapy approaches are turning toward genomic alterations for answers, these biomarkers will become increasingly relevant to the practice of precision medicine. This review discusses the emerging role of genomic rearrangements in breast cancer, with a particular focus on fusion genes. In addition, it raises several key questions on the therapeutic value of such rearrangements and provides a framework to evaluate their significance as predictive and prognostic biomarkers. PMID:26917980

  4. Genomic diversity of colorectal cancer: Changing landscape and emerging targets.

    PubMed

    Ahn, Daniel H; Ciombor, Kristen K; Mikhail, Sameh; Bekaii-Saab, Tanios

    2016-07-01

    Improvements in screening and preventive measures have led to an increased detection of early stage colorectal cancers (CRC) where patients undergo treatment with a curative intent. Despite these efforts, a high proportion of patients are diagnosed with advanced stage disease that is associated with poor outcomes, as CRC remains one of the leading causes of cancer-related deaths in the world. The development of next generation sequencing and collaborative multi-institutional efforts to characterize the cancer genome has afforded us with a comprehensive assessment of the genomic makeup present in CRC. This knowledge has translated into understanding the prognostic role of various tumor somatic variants in this disease. Additionally, the awareness of the genomic alterations present in CRC has resulted in an improvement in patient outcomes, largely due to better selection of personalized therapies based on an individual's tumor genomic makeup. The benefit of various treatments is often limited, where recent studies assessing the genomic diversity in CRC have identified the development of secondary tumor somatic variants that likely contribute to acquired treatment resistance. These studies have begun to alter the landscape of treatment for CRC that include investigating novel targeted therapies, assessing the role of immunotherapy and prospective, dynamic assessment of changes in tumor genomic alterations that occur during the treatment of CRC.

  5. Genomic diversity of colorectal cancer: Changing landscape and emerging targets.

    PubMed

    Ahn, Daniel H; Ciombor, Kristen K; Mikhail, Sameh; Bekaii-Saab, Tanios

    2016-07-01

    Improvements in screening and preventive measures have led to an increased detection of early stage colorectal cancers (CRC) where patients undergo treatment with a curative intent. Despite these efforts, a high proportion of patients are diagnosed with advanced stage disease that is associated with poor outcomes, as CRC remains one of the leading causes of cancer-related deaths in the world. The development of next generation sequencing and collaborative multi-institutional efforts to characterize the cancer genome has afforded us with a comprehensive assessment of the genomic makeup present in CRC. This knowledge has translated into understanding the prognostic role of various tumor somatic variants in this disease. Additionally, the awareness of the genomic alterations present in CRC has resulted in an improvement in patient outcomes, largely due to better selection of personalized therapies based on an individual's tumor genomic makeup. The benefit of various treatments is often limited, where recent studies assessing the genomic diversity in CRC have identified the development of secondary tumor somatic variants that likely contribute to acquired treatment resistance. These studies have begun to alter the landscape of treatment for CRC that include investigating novel targeted therapies, assessing the role of immunotherapy and prospective, dynamic assessment of changes in tumor genomic alterations that occur during the treatment of CRC. PMID:27433082

  6. Genomic diversity of colorectal cancer: Changing landscape and emerging targets

    PubMed Central

    Ahn, Daniel H; Ciombor, Kristen K; Mikhail, Sameh; Bekaii-Saab, Tanios

    2016-01-01

    Improvements in screening and preventive measures have led to an increased detection of early stage colorectal cancers (CRC) where patients undergo treatment with a curative intent. Despite these efforts, a high proportion of patients are diagnosed with advanced stage disease that is associated with poor outcomes, as CRC remains one of the leading causes of cancer-related deaths in the world. The development of next generation sequencing and collaborative multi-institutional efforts to characterize the cancer genome has afforded us with a comprehensive assessment of the genomic makeup present in CRC. This knowledge has translated into understanding the prognostic role of various tumor somatic variants in this disease. Additionally, the awareness of the genomic alterations present in CRC has resulted in an improvement in patient outcomes, largely due to better selection of personalized therapies based on an individual’s tumor genomic makeup. The benefit of various treatments is often limited, where recent studies assessing the genomic diversity in CRC have identified the development of secondary tumor somatic variants that likely contribute to acquired treatment resistance. These studies have begun to alter the landscape of treatment for CRC that include investigating novel targeted therapies, assessing the role of immunotherapy and prospective, dynamic assessment of changes in tumor genomic alterations that occur during the treatment of CRC. PMID:27433082

  7. The topography of mutational processes in breast cancer genomes

    DOE PAGESBeta

    Morganella, Sandro; Alexandrov, Ludmil B.; Glodzik, Dominik; Zou, Xueqing; Davies, Helen; Staaf, Johan; Sieuwerts, Anieta M.; Brinkman, Arie B.; Martin, Sancha; Ramakrishna, Manasa; et al

    2016-01-01

    Somatic mutations in human cancers show unevenness in genomic distribution that correlate with aspects of genome structure and function. These mutations are, however, generated by multiple mutational processes operating through the cellular lineage between the fertilized egg and the cancer cell, each composed of specific DNA damage and repair components and leaving its own characteristic mutational signature on the genome. Using somatic mutation catalogues from 560 breast cancer whole-genome sequences, here we show that each of 12 base substitution, 2 insertion/deletion (indel) and 6 rearrangement mutational signatures present in breast tissue, exhibit distinct relationships with genomic features relating to transcription,more » DNA replication and chromatin organization. This signature-based approach permits visualization of the genomic distribution of mutational processes associated with APOBEC enzymes, mismatch repair deficiency and homologous recombinational repair deficiency, as well as mutational processes of unknown aetiology. Lastly, it highlights mechanistic insights including a putative replication-dependent mechanism of APOBEC-related mutagenesis.« less

  8. Real-time, portable genome sequencing for Ebola surveillance.

    PubMed

    Quick, Joshua; Loman, Nicholas J; Duraffour, Sophie; Simpson, Jared T; Severi, Ettore; Cowley, Lauren; Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan H J; Becker-Ziaja, Beate; Boettcher, Jan Peter; Cabeza-Cabrerizo, Mar; Camino-Sánchez, Álvaro; Carter, Lisa L; Doerrbecker, Juliane; Enkirch, Theresa; García-Dorival, Isabel; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigael; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallasch, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Y; Sachse, Andreas; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Racine, Trina; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N'Faly; Williams, Cecelia V; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Frank; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, James; Rachwal, Phillip; Turner, Daniel J; Pollakis, Georgios; Hiscox, Julian A; Matthews, David A; O'Shea, Matthew K; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Wölfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A; Koivogui, Lamine; Diallo, Boubacar; Keïta, Sakoba; Rambaut, Andrew; Formenty, Pierre; Günther, Stephan; Carroll, Miles W

    2016-02-11

    The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  9. Data structures and compression algorithms for genomic sequence data

    PubMed Central

    Brandon, Marty C.; Wallace, Douglas C.; Baldi, Pierre

    2009-01-01

    Motivation: The continuing exponential accumulation of full genome data, including full diploid human genomes, creates new challenges not only for understanding genomic structure, function and evolution, but also for the storage, navigation and privacy of genomic data. Here, we develop data structures and algorithms for the efficient storage of genomic and other sequence data that may also facilitate querying and protecting the data. Results: The general idea is to encode only the differences between a genome sequence and a reference sequence, using absolute or relative coordinates for the location of the differences. These locations and the corresponding differential variants can be encoded into binary strings using various entropy coding methods, from fixed codes such as Golomb and Elias codes, to variables codes, such as Huffman codes. We demonstrate the approach and various tradeoffs using highly variables human mitochondrial genome sequences as a testbed. With only a partial level of optimization, 3615 genome sequences occupying 56 MB in GenBank are compressed down to only 167 KB, achieving a 345-fold compression rate, using the revised Cambridge Reference Sequence as the reference sequence. Using the consensus sequence as the reference sequence, the data can be stored using only 133 KB, corresponding to a 433-fold level of compression, roughly a 23% improvement. Extensions to nuclear genomes and high-throughput sequencing data are discussed. Availability: Data are publicly available from GenBank, the HapMap web site, and the MITOMAP database. Supplementary materials with additional results, statistics, and software implementations are available from http://mammag.web.uci.edu/bin/view/Mitowiki/ProjectDNACompression. Contact: pfbaldi@ics.uci.edu PMID:19447783

  10. Sequencing and annotation of mitochondrial genomes from individual parasitic helminths.

    PubMed

    Jex, Aaron R; Littlewood, D Timothy; Gasser, Robin B

    2015-01-01

    Mitochondrial (mt) genomics has significant implications in a range of fundamental areas of parasitology, including evolution, systematics, and population genetics as well as explorations of mt biochemistry, physiology, and function. Mt genomes also provide a rich source of markers to aid molecular epidemiological and ecological studies of key parasites. However, there is still a paucity of information on mt genomes for many metazoan organisms, particularly parasitic helminths, which has often related to challenges linked to sequencing from tiny amounts of material. The advent of next-generation sequencing (NGS) technologies has paved the way for low cost, high-throughput mt genomic research, but there have been obstacles, particularly in relation to post-sequencing assembly and analyses of large datasets. In this chapter, we describe protocols for the efficient amplification and sequencing of mt genomes from small portions of individual helminths, and highlight the utility of NGS platforms to expedite mt genomics. In addition, we recommend approaches for manual or semi-automated bioinformatic annotation and analyses to overcome the bioinformatic "bottleneck" to research in this area. Taken together, these approaches have demonstrated applicability to a range of parasites and provide prospects for using complete mt genomic sequence datasets for large-scale molecular systematic and epidemiological studies. In addition, these methods have broader utility and might be readily adapted to a range of other medium-sized molecular regions (i.e., 10-100 kb), including large genomic operons, and other organellar (e.g., plastid) and viral genomes.

  11. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Yang, Xiaohan; Ye, Chuyu; Tuskan, Gerald A

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The {approx}400-Mb assembly covers {approx}80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  12. Reference genome sequence of the model plant Setaria

    SciTech Connect

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C.; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chuyu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela; Panaud, Olivier; Kellogg, Elizabeth A.; Brutnell, Thomas P.; Doust, Andrew N.; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-01-01

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  13. Genome Science and Personalized Cancer Treatment

    SciTech Connect

    Gray, Joe

    2009-08-04

    Summer Lecture Series 2009: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  14. Genome Science and Personalized Cancer Treatment

    SciTech Connect

    Gray, Joe

    2009-08-07

    August 4, 2009 Berkeley Lab lecture: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  15. Genome Science and Personalized Cancer Treatment

    ScienceCinema

    Gray, Joe

    2016-07-12

    August 4, 2009 Berkeley Lab lecture: Results from the Human Genome Project are enabling scientists to understand how individual cancers form and progress. This information, when combined with newly developed drugs, can optimize the treatment of individual cancers. Joe Gray, director of Berkeley Labs Life Sciences Division and Associate Laboratory Director for Life and Environmental Sciences, will focus on this approach, its promise, and its current roadblocks — particularly with regard to breast cancer.

  16. CTD² Publication Guidelines | Office of Cancer Genomics

    Cancer.gov

    The Cancer Target Discovery and Development (CTD2) Network is a “community resource project” supported by the National Cancer Institute’s Office of Cancer Genomics. Members of the Network release data to the broader research community by depositing data into NCI-supported or public databases. Data deposition is NOT equivalent to publishing in a peer-reviewed journal. Unless there is a manuscript associated with a dataset, the Network considers data to be formally unpublished.

  17. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    ERIC Educational Resources Information Center

    Flowers, Susan K.; Easter, Carla; Holmes, Andrea; Cohen, Brian; Bednarski, April E.; Mardis, Elaine R.; Wilson, Richard K.; Elgin, Sarah C. R.

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington…

  18. Genome-Wide Association Studies of Cancer

    PubMed Central

    Stadler, Zsofia K.; Thom, Peter; Robson, Mark E.; Weitzel, Jeffrey N.; Kauff, Noah D.; Hurley, Karen E.; Devlin, Vincent; Gold, Bert; Klein, Robert J.; Offit, Kenneth

    2010-01-01

    Knowledge of the inherited risk for cancer is an important component of preventive oncology. In addition to well-established syndromes of cancer predisposition, much remains to be discovered about the genetic variation underlying susceptibility to common malignancies. Increased knowledge about the human genome and advances in genotyping technology have made possible genome-wide association studies (GWAS) of human diseases. These studies have identified many important regions of genetic variation associated with an increased risk for human traits and diseases including cancer. Understanding the principles, major findings, and limitations of GWAS is becoming increasingly important for oncologists as dissemination of genomic risk tests directly to consumers is already occurring through commercial companies. GWAS have contributed to our understanding of the genetic basis of cancer and will shed light on biologic pathways and possible new strategies for targeted prevention. To date, however, the clinical utility of GWAS-derived risk markers remains limited. PMID:20585100

  19. Genome sequencing of the redbanded stink bug (Piezodorus guildinii)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    We assembled a partial genome sequence from the redbanded stink bug, Piezodorus guildinii from Illumina MiSeq sequencing runs. The sequence has been submitted and published under NCBI GenBank Accession Number JTEQ01000000. The BioProject and BioSample Accession numbers are PRJNA263369 and SAMN030997...

  20. Complete genome sequence of Gordonia bronchialis type strain (3410T)

    SciTech Connect

    Ivanova, N; Sikorski, Johannes; Jando, Marlen; Lapidus, Alla L.; Nolan, Matt; Glavina Del Rio, Tijana; Tice, Hope; Copeland, A; Cheng, Jan-Fang; Chen, Feng; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Saunders, Elizabeth H; Han, Cliff; Detter, J C; Brettin, Thomas S; Rohde, Manfred; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2010-01-01

    Gordonia bronchialis Tsukamura 1971 is the type species of the genus. G. bronchialis is a human-pathogenic organism that has been isolated from a large variety of human tissues. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the family Gordoniaceae. The 5,290,012 bp long genome with its 4,944 protein-coding and 55 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  1. Complete genome sequence of Acidimicrobium ferrooxidans type strain (ICPT)

    SciTech Connect

    Clum, Alicia; Nolan, Matt; Lang, Elke; Glavina Del Rio, Tijana; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavrommatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Goker, Markus; Spring, Stefan; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jefferies, Cynthia C.; Chain, Patrick; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Lapidus, Alla

    2009-05-20

    Acidimicrobium ferrooxidans (Clark and Norris 1996) is the sole and type species of the genus, which until recently was the only genus within the actinobacterial family Acidimicrobiaceae and in the order Acidomicrobiales. Rapid oxidation of iron pyrite during autotrophic growth in the absence of an enhanced CO2 concentration is characteristic for A. ferrooxidans. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the order Acidomicrobiales, and the 2,158,157 bp long single replicon genome with its 2038 protein coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  2. Complete genome sequence of Thermomonospora curvata type strain (B9)

    SciTech Connect

    Chertkov, Olga; Sikorski, Johannes; Nolan, Matt; Lapidus, Alla L.; Lucas, Susan; Glavina Del Rio, Tijana; Tice, Hope; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Mikhailova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Ngatchou, Olivier Duplex; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Brettin, Thomas S; Han, Cliff; Detter, J. Chris; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Klenk, Hans-Peter; Kyrpides, Nikos C

    2011-01-01

    Thermomonospora curvata Henssen 1957 is the type species of the genus Thermomonospora. This genus is of interest because members of this clade are sources of new antibiotics, enzymes, and products with pharmacological activity. In addition, members of this genus participate in the active degradation of cellulose. This is the first complete genome sequence of a member of the family Thermomonosporaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 5,639,016 bp long genome with its 4,985 protein-coding and 76 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Complete genome sequence of Sulfurospirillum deleyianum type strain (5175T)

    SciTech Connect

    Sikorski, Johannes; Lapidus, Alla L.; Copeland, A; Glavina Del Rio, Tijana; Nolan, Matt; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth H; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ovchinnikova, Galina; Pati, Amrita; Ivanova, N; Mavromatis, K; Chen, Amy; Palaniappan, Krishna; Chain, Patrick S. G.; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Detter, J. Chris; Han, Cliff; Rohde, Manfred; Lang, Elke; Spring, Stefan; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Sulfurospirillum deleyianum Schumacher et al. 1993 is the type species of the genus Sulfurospirillum. S. deleyianum is a model organism for studying sulfur reduction and dissimilatory nitrate reduction as energy source for growth. Also, it is a prominent model organism for studying the structural and functional characteristics of the cytochrome c nitrite reductase. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of the genus Sulfurospirillum. The 2,306,351 bp long genome with its 2291 protein-coding and 52 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  4. Complete genome sequence of Spirosoma linguale type strain (1T)

    SciTech Connect

    Lail, Kathleen; Sikorski, Johannes; Saunders, Elizabeth H; Lapidus, Alla L.; Glavina Del Rio, Tijana; Copeland, A; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Nolan, Matt; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Chain, Patrick S. G.; Detter, J. Chris; Schutze, Andrea; Rohde, Manfred; Tindall, Brian; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chen, Feng

    2010-01-01

    Spirosoma linguale Migula 1894 is the type species of the genus. S. linguale is a free-living and non-pathogenic organism, known for its peculiar ringlike and horseshoe-shaped cell morphology. Here we describe the features of this organism, together with the complete ge-nome sequence and annotation. This is only the third completed genome sequence of a member of the family Cytophagaceae. The 8,491,258 bp long genome with its eight plas-mids, 7,069 protein-coding and 60 RNA genes is part of the Genomic Encyclopedia of Bacte-ria and Archaea project.

  5. Draft genome sequence of Enterococcus faecium strain LMG 8148.

    PubMed

    Michiels, Joran E; Van den Bergh, Bram; Fauvart, Maarten; Michiels, Jan

    2016-01-01

    Enterococcus faecium, traditionally considered a harmless gut commensal, is emerging as an important nosocomial pathogen showing increasing rates of multidrug resistance. We report the draft genome sequence of E. faecium strain LMG 8148, isolated in 1968 from a human in Gothenburg, Sweden. The draft genome has a total length of 2,697,490 bp, a GC-content of 38.3 %, and 2,402 predicted protein-coding sequences. The isolation of this strain predates the emergence of E. faecium as a nosocomial pathogen. Consequently, its genome can be useful in comparative genomic studies investigating the evolution of E. faecium as a pathogen. PMID:27610213

  6. Genome sequencing: a systematic review of health economic evidence

    PubMed Central

    2013-01-01

    Recently the sequencing of the human genome has become a major biological and clinical research field. However, the public health impact of this new technology with focus on the financial effect is not yet to be foreseen. To provide an overview of the current health economic evidence for genome sequencing, we conducted a thorough systematic review of the literature from 17 databases. In addition, we conducted a hand search. Starting with 5 520 records we ultimately included five full-text publications and one internet source, all focused on cost calculations. The results were very heterogeneous and, therefore, difficult to compare. Furthermore, because the methodology of the publications was quite poor, the reliability and validity of the results were questionable. The real costs for the whole sequencing workflow, including data management and analysis, remain unknown. Overall, our review indicates that the current health economic evidence for genome sequencing is quite poor. Therefore, we listed aspects that needed to be considered when conducting health economic analyses of genome sequencing. Thereby, specifics regarding the overall aim, technology, population, indication, comparator, alternatives after sequencing, outcomes, probabilities, and costs with respect to genome sequencing are discussed. For further research, at the outset, a comprehensive cost calculation of genome sequencing is needed, because all further health economic studies rely on valid cost data. The results will serve as an input parameter for budget-impact analyses or cost-effectiveness analyses. PMID:24330507

  7. The Genomic Scrapheap Challenge; Extracting Relevant Data from Unmapped Whole Genome Sequencing Reads, Including Strain Specific Genomic Segments, in Rats

    PubMed Central

    van der Weide, Robin H.; Simonis, Marieke; Hermsen, Roel; Toonen, Pim; Cuppen, Edwin; de Ligt, Joep

    2016-01-01

    Unmapped next-generation sequencing reads are typically ignored while they contain biologically relevant information. We systematically analyzed unmapped reads from whole genome sequencing of 33 inbred rat strains. High quality reads were selected and enriched for biologically relevant sequences; similarity-based analysis revealed clustering similar to previously reported phylogenetic trees. Our results demonstrate that on average 20% of all unmapped reads harbor sequences that can be used to improve reference genomes and generate hypotheses on potential genotype-phenotype relationships. Analysis pipelines would benefit from incorporating the described methods and reference genomes would benefit from inclusion of the genomic segments obtained through these efforts. PMID:27501045

  8. Genome sequence and comparative virulence of raccoonpox virus: the first North American poxvirus sequence.

    PubMed

    Fleischauer, Clare; Upton, Chris; Victoria, Joseph; Jones, Gwendolyn J B; Roper, Rachel L

    2015-09-01

    We report here the complete genome sequence of raccoonpox virus (RCNV), a naturally occurring North American poxvirus. This is the first such North American sequence to the best of our knowledge, and the data showed that RCNV forms a new phylogenetic branch between orthopoxviruses and Yoka poxvirus. RCNV shared overall similarity in genome organization with orthopoxviruses, and the proteins in the central conserved region shared approximately 90  % amino acid identity with orthopoxviruses. RCNV proteins shared approximately 81  % amino acid identity with Yokapox virus proteins. RCNV is missing 10 genes normally conserved in orthopoxviruses, most of which are implicated in virulence. These gene deletions may explain the attenuated phenotype of RCNV in mammals. RCNV contained one unique genome region containing approximately 1 kb of DNA sequence that is not present in any reported poxvirus. It contained a unique ORF predicted to encode a protein with a transmembrane domain. RCNV replicates well in mammalian cells, is naturally attenuated and has been shown to be effective as a vaccine vector platform, so we further tested its safety. We showed here that RCNV is substantially more attenuated than even the highly attenuated VACV-A35Del mutant virus in pregnant, nude and severe combined immunodeficient (SCID) mouse models. RCNV was much safer in pregnant mice and was cleared rapidly from tissues, even in immunocompromised animals, whereas the VACV-A35Del mutant retains virulence and persists in tissues. Thus, RCNV is expected to be a superior vaccine vector for infectious diseases and cancer due to its excellent safety profile, reported vaccine efficacy and ability to replicate in mammalian cells.

  9. Genome sequencing and annotation of Aeromonas sp. HZM

    PubMed Central

    Chua, Patric; Har, Zi Mei; Austin, Christopher M.; Yule, Catherine M.; Dykes, Gary A.; Lee, Sui Mae

    2015-01-01

    We report the draft genome sequence of Aeromonas sp. strain HZM, isolated from tropical peat swamp forest soil. The draft genome size is 4,451,364 bp with a G + C content of 61.7% and contains 10 rRNA sequences (eight copies of 5S rRNA genes, single copy of 16S and 23S rRNA each). The genome sequence can be accessed at DDBJ/EMBL/GenBank under the accession no. JEMQ00000000. PMID:26484220

  10. Complete genome sequence of Ferroglobus placidus AEDII12DO.

    PubMed

    Anderson, Iain; Risso, Carla; Holmes, Dawn; Lucas, Susan; Copeland, Alex; Lapidus, Alla; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne; Pitluck, Samuel; Saunders, Elizabeth; Brettin, Thomas; Detter, John C; Han, Cliff; Tapia, Roxanne; Larimer, Frank; Land, Miriam; Hauser, Loren; Woyke, Tanja; Lovley, Derek; Kyrpides, Nikos; Ivanova, Natalia

    2011-10-15

    Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryarchaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemolithoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and annotation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was sequenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project. PMID:22180810

  11. Complete genome sequence of Staphylothermus hellenicus P8T

    SciTech Connect

    Anderson, Iain; Wirth, Reinhard; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Davenport, Karen W.; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Land, Miriam L; Hauser, Loren John; Pati, Amrita; Mikhailova, Natalia; Woyke, Tanja; Klenk, Hans-Peter; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phy- lum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shal- low hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the com- plete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein- coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) La- boratory Sequencing Program (LSP) project.

  12. [Cancer Genome Atlas Pan-cancer Analysis Project].

    PubMed

    Zhang, Kun; Wang, Hong

    2015-04-01

    Cancer can exhibit different forms depending on the site of origin, cell types, the different forms of genetic mutations which also affect cancer therapeutic effect. Although many genes have been demonstrated to change a direct result of the change in phenotype, however, many cancers lineage complex molecular mechanisms are still not fully elucidated. Therefore, The Cancer Genome Atlas (TCGA) Research Network analyzed a large human tumors, in order to find the molecular changes in DNA, RNA, protein and epigenetic level, The results contain a wealth of data provides us with an opportunity for common, personality and new ideas throughout the cancer lineages form a whole description. Pan-cancer genome program first compares the 12 kinds of cancer types. Analysis of different tumor molecular changes and their functions, will tell us how effective treatment method is applied to a similar phenotype of the tumor.

  13. Ovarian Cancer Biomarker Discovery Based on Genomic Approaches

    PubMed Central

    Lee, Jung-Yun; Kim, Hee Seung; Suh, Dong Hoon; Kim, Mi-Kyung; Chung, Hyun Hoon; Song, Yong-Sang

    2013-01-01

    Ovarian cancer presents at an advanced stage in more than 75% of patients. Early detection has great promise to improve clinical outcomes. Although the advancing proteomic technologies led to the discovery of numerous ovarian cancer biomarkers, no screening method has been recommended for early detection of ovarian cancer. Complexity and heterogeneity of ovarian carcinogenesis is a major obstacle to discover biomarkers. As cancer arises due to accumulation of genetic change, understanding the close connection between genetic changes and ovarian carcinogenesis would provide the opportunity to find novel gene-level ovarian cancer biomarkers. In this review, we summarize the various gene-based biomarkers by genomic technologies, including inherited gene mutations, epigenetic changes, and differential gene expression. In addition, we suggest the strategy to discover novel gene-based biomarkers with recently introduced next generation sequencing. PMID:25337559

  14. Accurate whole human genome sequencing using reversible terminator chemistry.

    PubMed

    Bentley, David R; Balasubramanian, Shankar; Swerdlow, Harold P; Smith, Geoffrey P; Milton, John; Brown, Clive G; Hall, Kevin P; Evers, Dirk J; Barnes, Colin L; Bignell, Helen R; Boutell, Jonathan M; Bryant, Jason; Carter, Richard J; Keira Cheetham, R; Cox, Anthony J; Ellis, Darren J; Flatbush, Michael R; Gormley, Niall A; Humphray, Sean J; Irving, Leslie J; Karbelashvili, Mirian S; Kirk, Scott M; Li, Heng; Liu, Xiaohai; Maisinger, Klaus S; Murray, Lisa J; Obradovic, Bojan; Ost, Tobias; Parkinson, Michael L; Pratt, Mark R; Rasolonjatovo, Isabelle M J; Reed, Mark T; Rigatti, Roberto; Rodighiero, Chiara; Ross, Mark T; Sabot, Andrea; Sankar, Subramanian V; Scally, Aylwyn; Schroth, Gary P; Smith, Mark E; Smith, Vincent P; Spiridou, Anastassia; Torrance, Peta E; Tzonev, Svilen S; Vermaas, Eric H; Walter, Klaudia; Wu, Xiaolin; Zhang, Lu; Alam, Mohammed D; Anastasi, Carole; Aniebo, Ify C; Bailey, David M D; Bancarz, Iain R; Banerjee, Saibal; Barbour, Selena G; Baybayan, Primo A; Benoit, Vincent A; Benson, Kevin F; Bevis, Claire; Black, Phillip J; Boodhun, Asha; Brennan, Joe S; Bridgham, John A; Brown, Rob C; Brown, Andrew A; Buermann, Dale H; Bundu, Abass A; Burrows, James C; Carter, Nigel P; Castillo, Nestor; Chiara E Catenazzi, Maria; Chang, Simon; Neil Cooley, R; Crake, Natasha R; Dada, Olubunmi O; Diakoumakos, Konstantinos D; Dominguez-Fernandez, Belen; Earnshaw, David J; Egbujor, Ugonna C; Elmore, David W; Etchin, Sergey S; Ewan, Mark R; Fedurco, Milan; Fraser, Louise J; Fuentes Fajardo, Karin V; Scott Furey, W; George, David; Gietzen, Kimberley J; Goddard, Colin P; Golda, George S; Granieri, Philip A; Green, David E; Gustafson, David L; Hansen, Nancy F; Harnish, Kevin; Haudenschild, Christian D; Heyer, Narinder I; Hims, Matthew M; Ho, Johnny T; Horgan, Adrian M; Hoschler, Katya; Hurwitz, Steve; Ivanov, Denis V; Johnson, Maria Q; James, Terena; Huw Jones, T A; Kang, Gyoung-Dong; Kerelska, Tzvetana H; Kersey, Alan D; Khrebtukova, Irina; Kindwall, Alex P; Kingsbury, Zoya; Kokko-Gonzales, Paula I; Kumar, Anil; Laurent, Marc A; Lawley, Cynthia T; Lee, Sarah E; Lee, Xavier; Liao, Arnold K; Loch, Jennifer A; Lok, Mitch; Luo, Shujun; Mammen, Radhika M; Martin, John W; McCauley, Patrick G; McNitt, Paul; Mehta, Parul; Moon, Keith W; Mullens, Joe W; Newington, Taksina; Ning, Zemin; Ling Ng, Bee; Novo, Sonia M; O'Neill, Michael J; Osborne, Mark A; Osnowski, Andrew; Ostadan, Omead; Paraschos, Lambros L; Pickering, Lea; Pike, Andrew C; Pike, Alger C; Chris Pinkard, D; Pliskin, Daniel P; Podhasky, Joe; Quijano, Victor J; Raczy, Come; Rae, Vicki H; Rawlings, Stephen R; Chiva Rodriguez, Ana; Roe, Phyllida M; Rogers, John; Rogert Bacigalupo, Maria C; Romanov, Nikolai; Romieu, Anthony; Roth, Rithy K; Rourke, Natalie J; Ruediger, Silke T; Rusman, Eli; Sanches-Kuiper, Raquel M; Schenker, Martin R; Seoane, Josefina M; Shaw, Richard J; Shiver, Mitch K; Short, Steven W; Sizto, Ning L; Sluis, Johannes P; Smith, Melanie A; Ernest Sohna Sohna, Jean; Spence, Eric J; Stevens, Kim; Sutton, Neil; Szajkowski, Lukasz; Tregidgo, Carolyn L; Turcatti, Gerardo; Vandevondele, Stephanie; Verhovsky, Yuli; Virk, Selene M; Wakelin, Suzanne; Walcott, Gregory C; Wang, Jingwen; Worsley, Graham J; Yan, Juying; Yau, Ling; Zuerlein, Mike; Rogers, Jane; Mullikin, James C; Hurles, Matthew E; McCooke, Nick J; West, John S; Oaks, Frank L; Lundberg, Peter L; Klenerman, David; Durbin, Richard; Smith, Anthony J

    2008-11-01

    DNA sequence information underpins genetic research, enabling discoveries of important biological or medical benefit. Sequencing projects have traditionally used long (400-800 base pair) reads, but the existence of reference sequences for the human and many other genomes makes it possible to develop new, fast approaches to re-sequencing, whereby shorter reads are compared to a reference to identify intraspecies genetic variation. Here we report an approach that generates several billion bases of accurate nucleotide sequence per experiment at low cost. Single molecules of DNA are attached to a flat surface, amplified in situ and used as templates for synthetic sequencing with fluorescent reversible terminator deoxyribonucleotides. Images of the surface are analysed to generate high-quality sequence. We demonstrate application of this approach to human genome sequencing on flow-sorted X chromosomes and then scale the approach to determine the genome sequence of a male Yoruba from Ibadan, Nigeria. We build an accurate consensus sequence from >30x average depth of paired 35-base reads. We characterize four million single-nucleotide polymorphisms and four hundred thousand structural variants, many of which were previously unknown. Our approach is effective for accurate, rapid and economical whole-genome re-sequencing and many other biomedical applications.

  15. ICDS database: interrupted CoDing sequences in prokaryotic genomes.

    PubMed

    Perrodou, Emmanuel; Deshayes, Caroline; Muller, Jean; Schaeffer, Christine; Van Dorsselaer, Alain; Ripp, Raymond; Poch, Olivier; Reyrat, Jean-Marc; Lecompte, Odile

    2006-01-01

    Unrecognized frameshifts, in-frame stop codons and sequencing errors lead to Interrupted CoDing Sequence (ICDS) that can seriously affect all subsequent steps of functional characterization, from in silico analysis to high-throughput proteomic projects. Here, we describe the Interrupted CoDing Sequence database containing ICDS detected by a similarity-based approach in 80 complete prokaryotic genomes. ICDS can be retrieved by species browsing or similarity searches via a web interface (http://www-bio3d-igbmc.u-strasbg.fr/ICDS/). The definition of each interrupted gene is provided as well as the ICDS genomic localization with the surrounding sequence. Furthermore, to facilitate the experimental characterization of ICDS, we propose optimized primers for re-sequencing purposes. The database will be regularly updated with additional data from ongoing sequenced genomes. Our strategy has been validated by three independent tests: (i) ICDS prediction on a benchmark of artificially created frameshifts, (ii) comparison of predicted ICDS and results obtained from the comparison of the two genomic sequences of Bacillus licheniformis strain ATCC 14580 and (iii) re-sequencing of 25 predicted ICDS of the recently sequenced genome of Mycobacterium smegmatis. This allows us to estimate the specificity and sensitivity (95 and 82%, respectively) of our program and the efficiency of primer determination.

  16. Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens.

    PubMed

    Staats, Martijn; Erkens, Roy H J; van de Vossenberg, Bart; Wieringa, Jan J; Kraaijeveld, Ken; Stielow, Benjamin; Geml, József; Richardson, James E; Bakker, Freek T

    2013-01-01

    Unlocking the vast genomic diversity stored in natural history collections would create unprecedented opportunities for genome-scale evolutionary, phylogenetic, domestication and population genomic studies. Many researchers have been discouraged from using historical specimens in molecular studies because of both generally limited success of DNA extraction and the challenges associated with PCR-amplifying highly degraded DNA. In today's next-generation sequencing (NGS) world, opportunities and prospects for historical DNA have changed dramatically, as most NGS methods are actually designed for taking short fragmented DNA molecules as templates. Here we show that using a standard multiplex and paired-end Illumina sequencing approach, genome-scale sequence data can be generated reliably from dry-preserved plant, fungal and insect specimens collected up to 115 years ago, and with minimal destructive sampling. Using a reference-based assembly approach, we were able to produce the entire nuclear genome of a 43-year-old Arabidopsis thaliana (Brassicaceae) herbarium specimen with high and uniform sequence coverage. Nuclear genome sequences of three fungal specimens of 22-82 years of age (Agaricus bisporus, Laccaria bicolor, Pleurotus ostreatus) were generated with 81.4-97.9% exome coverage. Complete organellar genome sequences were assembled for all specimens. Using de novo assembly we retrieved between 16.2-71.0% of coding sequence regions, and hence remain somewhat cautious about prospects for de novo genome assembly from historical specimens. Non-target sequence contaminations were observed in 2 of our insect museum specimens. We anticipate that future museum genomics projects will perhaps not generate entire genome sequences in all cases (our specimens contained relatively small and low-complexity genomes), but at least generating vital comparative genomic data for testing (phylo)genetic, demographic and genetic hypotheses, that become increasingly more horizontal

  17. BAC-pool 454-sequencing: A rapid and efficient approach to sequence complex tetraploid cotton genomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    New and emerging next generation sequencing technologies have been promising in reducing sequencing costs, but not significantly for complex polyploid plant genomes such as cotton. Large and highly repetitive genome of G. hirsutum (~2.5GB) is less amenable and cost-intensive with traditional BAC-by...

  18. Genome Sequence of a Novel Iflavirus from mRNA Sequencing of the Butterfly Heliconius erato

    PubMed Central

    Macias-Muñoz, Aide; Briscoe, Adriana D.

    2014-01-01

    Here, we report the genome sequence of a novel iflavirus strain recovered from the neotropical butterfly Heliconius erato. The coding DNA sequence (CDS) of the iflavirus genome was 8,895 nucleotides in length, encoding a polyprotein that was 2,965 amino acids long. PMID:24831145

  19. Real-time, portable genome sequencing for Ebola surveillance

    PubMed Central

    Bore, Joseph Akoi; Koundouno, Raymond; Dudas, Gytis; Mikhail, Amy; Ouédraogo, Nobila; Afrough, Babak; Bah, Amadou; Baum, Jonathan HJ; Becker-Ziaja, Beate; Boettcher, Jan-Peter; Cabeza-Cabrerizo, Mar; Camino-Sanchez, Alvaro; Carter, Lisa L.; Doerrbecker, Juiliane; Enkirch, Theresa; Dorival, Isabel Graciela García; Hetzelt, Nicole; Hinzmann, Julia; Holm, Tobias; Kafetzopoulou, Liana Eleni; Koropogui, Michel; Kosgey, Abigail; Kuisma, Eeva; Logue, Christopher H; Mazzarelli, Antonio; Meisel, Sarah; Mertens, Marc; Michel, Janine; Ngabo, Didier; Nitzsche, Katja; Pallash, Elisa; Patrono, Livia Victoria; Portmann, Jasmine; Repits, Johanna Gabriella; Rickett, Natasha Yasmin; Sachse, Andrea; Singethan, Katrin; Vitoriano, Inês; Yemanaberhan, Rahel L; Zekeng, Elsa G; Trina, Racine; Bello, Alexander; Sall, Amadou Alpha; Faye, Ousmane; Faye, Oumar; Magassouba, N’Faly; Williams, Cecelia V.; Amburgey, Victoria; Winona, Linda; Davis, Emily; Gerlach, Jon; Washington, Franck; Monteil, Vanessa; Jourdain, Marine; Bererd, Marion; Camara, Alimou; Somlare, Hermann; Camara, Abdoulaye; Gerard, Marianne; Bado, Guillaume; Baillet, Bernard; Delaune, Déborah; Nebie, Koumpingnin Yacouba; Diarra, Abdoulaye; Savane, Yacouba; Pallawo, Raymond Bernard; Gutierrez, Giovanna Jaramillo; Milhano, Natacha; Roger, Isabelle; Williams, Christopher J; Yattara, Facinet; Lewandowski, Kuiama; Taylor, Jamie; Rachwal, Philip; Turner, Daniel; Pollakis, Georgios; Hiscox, Julian A.; Matthews, David A.; O’Shea, Matthew K.; Johnston, Andrew McD; Wilson, Duncan; Hutley, Emma; Smit, Erasmus; Di Caro, Antonino; Woelfel, Roman; Stoecker, Kilian; Fleischmann, Erna; Gabriel, Martin; Weller, Simon A.; Koivogui, Lamine; Diallo, Boubacar; Keita, Sakoba; Rambaut, Andrew; Formenty, Pierre; Gunther, Stephan; Carroll, Miles W.

    2016-01-01

    The Ebola virus disease (EVD) epidemic in West Africa is the largest on record, responsible for >28,599 cases and >11,299 deaths 1. Genome sequencing in viral outbreaks is desirable in order to characterize the infectious agent to determine its evolutionary rate, signatures of host adaptation, identification and monitoring of diagnostic targets and responses to vaccines and treatments. The Ebola virus genome (EBOV) substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 to 1.42 × 10−3 mutations per site per year. This is equivalent to 16 to 27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic 2-7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought-after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions 8. Genomic surveillance during the epidemic has been sporadic due to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities 9. In order to address this problem, we devised a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. Here we present sequence data and analysis of 142 Ebola virus (EBOV) samples collected during the period March to October 2015. We were able to generate results in less than 24 hours after receiving an Ebola positive sample, with the sequencing process taking as little as 15-60 minutes. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks. PMID:26840485

  20. Sequencing of chloroplast genome using whole cellular DNA and solexa sequencing technology.

    PubMed

    Wu, Jian; Liu, Bo; Cheng, Feng; Ramchiary, Nirala; Choi, Su Ryun; Lim, Yong Pyo; Wang, Xiao-Wu

    2012-01-01

    Sequencing of the chloroplast (cp) genome using traditional sequencing methods has been difficult because of its size (>120 kb) and the complicated procedures required to prepare templates. To explore the feasibility of sequencing the cp genome using DNA extracted from whole cells and Solexa sequencing technology, we sequenced whole cellular DNA isolated from leaves of three Brassicarapa accessions with one lane per accession. In total, 246, 362, and 361 Mb sequence data were generated for the three accessions Chiifu-401-42, Z16, and FT, respectively. Micro-reads were assembled by reference-guided assembly using the cpDNA sequences of B. rapa, Arabidopsis thaliana, and Nicotiana tabacum. We achieved coverage of more than 99.96% of the cp genome in the three tested accessions using the B. rapa sequence as the reference. When A. thaliana or N. tabacum sequences were used as references, 99.7-99.8 or 95.5-99.7% of the B. rapa cp genome was covered, respectively. These results demonstrated that sequencing of whole cellular DNA isolated from young leaves using the Illumina Genome Analyzer is an efficient method for high-throughput sequencing of cp genome.

  1. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Genetic and genomic analyses of Upland cotton (Gossypium hirsutum) are difficult because it has a complex allotetraploid (AADD; 2n = 4x = 52) genome. Here we sequenced, assembled and analyzed the world's most important cultivated cotton genome with 246.2 gigabase (Gb) clean data obtained using whol...

  2. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    SciTech Connect

    Abt, Birte; Foster, Brian; Lapidus, Alla L.; Clum, Alicia; Sun, Hui; Pukall, Rudiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, N; Mavromatis, K; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne A.; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Goker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  3. Genome sequencing and analysis of the model grass Brachypodium distachyon

    SciTech Connect

    Yang, Xiaohan; Kalluri, Udaya C; Tuskan, Gerald A

    2010-01-01

    Three subfamilies of grasses, the Ehrhartoideae, Panicoideae and Pooideae, provide the bulk of human nutrition and are poised to become major sources of renewable energy. Here we describe the genome sequence of the wild grass Brachypodium distachyon (Brachypodium), which is, to our knowledge, the first member of the Pooideae subfamily to be sequenced. Comparison of the Brachypodium, rice and sorghum genomes shows a precise history of genome evolution across a broad diversity of the grasses, and establishes a template for analysis of the large genomes of economically important pooid grasses such as wheat. The high-quality genome sequence, coupled with ease of cultivation and transformation, small size and rapid life cycle, will help Brachypodium reach its potential as an important model system for developing new energy and food crops.

  4. The complete chloroplast genome sequence of Zanthoxylum piperitum.

    PubMed

    Lee, Jonghoon; Lee, Hyeon Ju; Kim, Kyunghee; Lee, Sang-Choon; Sung, Sang Hyun; Yang, Tae-Jin

    2016-09-01

    The complete chloroplast genome sequence of Zanthoxylum piperitum, a plant species with useful aromatic oils in family Rutaceae, was generated in this study by de novo assembly with whole-genome sequence data. The chloroplast genome was 158 154 bp in length with a typical quadripartite structure containing a pair of inverted repeats of 27 644 bp, separated by large single copy and small single copy of 85 340 bp and 17 526 bp, respectively. The chloroplast genome harbored 112 genes consisting of 78 protein-coding genes 30 tRNA genes and 4 rRNA genes. Phylogenetic analysis of the complete chloroplast genome sequences with those of known relatives revealed that Z. piperitum is most closely related to the Citrus species. PMID:26260183

  5. Complete genome sequence of Cellulomonas flavigena type strain (134T)

    PubMed Central

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304688

  6. Complete genome sequence of Cellulomonas flavigena type strain (134).

    PubMed

    Abt, Birte; Foster, Brian; Lapidus, Alla; Clum, Alicia; Sun, Hui; Pukall, Rüdiger; Lucas, Susan; Glavina Del Rio, Tijana; Nolan, Matt; Tice, Hope; Cheng, Jan-Fang; Pitluck, Sam; Liolios, Konstantinos; Ivanova, Natalia; Mavromatis, Konstantinos; Ovchinnikova, Galina; Pati, Amrita; Goodwin, Lynne; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Rohde, Manfred; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter

    2010-07-29

    Cellulomonas flavigena (Kellerman and McBeth 1912) Bergey et al. 1923 is the type species of the genus Cellulomonas of the actinobacterial family Cellulomonadaceae. Members of the genus Cellulomonas are of special interest for their ability to degrade cellulose and hemicellulose, particularly with regard to the use of biomass as an alternative energy source. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of a member of the genus Cellulomonas, and next to the human pathogen Tropheryma whipplei the second complete genome sequence within the actinobacterial family Cellulomonadaceae. The 4,123,179 bp long single replicon genome with its 3,735 protein-coding and 53 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  7. The Release 6 reference sequence of the Drosophila melanogaster genome

    DOE PAGESBeta

    Hoskins, Roger A.; Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; et al

    2015-01-14

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy andmore » middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. In conclusion, further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.« less

  8. The Release 6 reference sequence of the Drosophila melanogaster genome.

    PubMed

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads.

  9. The Release 6 reference sequence of the Drosophila melanogaster genome

    PubMed Central

    Carlson, Joseph W.; Wan, Kenneth H.; Park, Soo; Mendez, Ivonne; Galle, Samuel E.; Booth, Benjamin W.; Pfeiffer, Barret D.; George, Reed A.; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V.; Andreyeva, Evgeniya N.; Boldyreva, Lidiya V.; Marra, Marco; Carvalho, A. Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F.; Rubin, Gerald M.; Karpen, Gary H.

    2015-01-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  10. The Release 6 reference sequence of the Drosophila melanogaster genome.

    PubMed

    Hoskins, Roger A; Carlson, Joseph W; Wan, Kenneth H; Park, Soo; Mendez, Ivonne; Galle, Samuel E; Booth, Benjamin W; Pfeiffer, Barret D; George, Reed A; Svirskas, Robert; Krzywinski, Martin; Schein, Jacqueline; Accardo, Maria Carmela; Damia, Elisabetta; Messina, Giovanni; Méndez-Lago, María; de Pablos, Beatriz; Demakova, Olga V; Andreyeva, Evgeniya N; Boldyreva, Lidiya V; Marra, Marco; Carvalho, A Bernardo; Dimitri, Patrizio; Villasante, Alfredo; Zhimulev, Igor F; Rubin, Gerald M; Karpen, Gary H; Celniker, Susan E

    2015-03-01

    Drosophila melanogaster plays an important role in molecular, genetic, and genomic studies of heredity, development, metabolism, behavior, and human disease. The initial reference genome sequence reported more than a decade ago had a profound impact on progress in Drosophila research, and improving the accuracy and completeness of this sequence continues to be important to further progress. We previously described improvement of the 117-Mb sequence in the euchromatic portion of the genome and 21 Mb in the heterochromatic portion, using a whole-genome shotgun assembly, BAC physical mapping, and clone-based finishing. Here, we report an improved reference sequence of the single-copy and middle-repetitive regions of the genome, produced using cytogenetic mapping to mitotic and polytene chromosomes, clone-based finishing and BAC fingerprint verification, ordering of scaffolds by alignment to cDNA sequences, incorporation of other map and sequence data, and validation by whole-genome optical restriction mapping. These data substantially improve the accuracy and completeness of the reference sequence and the order and orientation of sequence scaffolds into chromosome arm assemblies. Representation of the Y chromosome and other heterochromatic regions is particularly improved. The new 143.9-Mb reference sequence, designated Release 6, effectively exhausts clone-based technologies for mapping and sequencing. Highly repeat-rich regions, including large satellite blocks and functional elements such as the ribosomal RNA genes and the centromeres, are largely inaccessible to current sequencing and assembly methods and remain poorly represented. Further significant improvements will require sequencing technologies that do not depend on molecular cloning and that produce very long reads. PMID:25589440

  11. Characterization of Three Mycobacterium spp. with Potential Use in Bioremediation by Genome Sequencing and Comparative Genomics

    PubMed Central

    Das, Sarbashis; Pettersson, B.M. Fredrik; Behra, Phani Rama Krishna; Ramesh, Malavika; Dasgupta, Santanu; Bhattacharya, Alok; Kirsebom, Leif A.

    2015-01-01

    We provide the genome sequences of the type strains of the polychlorophenol-degrading Mycobacterium chlorophenolicum (DSM43826), the degrader of chlorinated aliphatics Mycobacterium chubuense (DSM44219) and Mycobacterium obuense (DSM44075) that has been tested for use in cancer immunotherapy. The genome sizes of M. chlorophenolicum, M. chubuense, and M. obuense are 6.93, 5.95, and 5.58 Mb with GC-contents of 68.4%, 69.2%, and 67.9%, respectively. Comparative genomic analysis revealed that 3,254 genes are common and we predicted approximately 250 genes acquired through horizontal gene transfer from different sources including proteobacteria. The data also showed that the biodegrading Mycobacterium spp. NBB4, also referred to as M. chubuense NBB4, is distantly related to the M. chubuense type strain and should be considered as a separate species, we suggest it to be named Mycobacterium ethylenense NBB4. Among different categories we identified genes with potential roles in: biodegradation of aromatic compounds and copper homeostasis. These are the first nonpathogenic Mycobacterium spp. found harboring genes involved in copper homeostasis. These findings would therefore provide insight into the role of this group of Mycobacterium spp. in bioremediation as well as the evolution of copper homeostasis within the Mycobacterium genus. PMID:26079817

  12. Characterization of Three Mycobacterium spp. with Potential Use in Bioremediation by Genome Sequencing and Comparative Genomics.

    PubMed

    Das, Sarbashis; Pettersson, B M Fredrik; Behra, Phani Rama Krishna; Ramesh, Malavika; Dasgupta, Santanu; Bhattacharya, Alok; Kirsebom, Leif A

    2015-07-01

    We provide the genome sequences of the type strains of the polychlorophenol-degrading Mycobacterium chlorophenolicum (DSM43826), the degrader of chlorinated aliphatics Mycobacterium chubuense (DSM44219) and Mycobacterium obuense (DSM44075) that has been tested for use in cancer immunotherapy. The genome sizes of M. chlorophenolicum, M. chubuense, and M. obuense are 6.93, 5.95, and 5.58 Mb with GC-contents of 68.4%, 69.2%, and 67.9%, respectively. Comparative genomic analysis revealed that 3,254 genes are common and we predicted approximately 250 genes acquired through horizontal gene transfer from different sources including proteobacteria. The data also showed that the biodegrading Mycobacterium spp. NBB4, also referred to as M. chubuense NBB4, is distantly related to the M. chubuense type strain and should be considered as a separate species, we suggest it to be named Mycobacterium ethylenense NBB4. Among different categories we identified genes with potential roles in: biodegradation of aromatic compounds and copper homeostasis. These are the first nonpathogenic Mycobacterium spp. found harboring genes involved in copper homeostasis. These findings would therefore provide insight into the role of this group of Mycobacterium spp. in bioremediation as well as the evolution of copper homeostasis within the Mycobacterium genus.

  13. Genome sequence of Roseivirga sp. strain D-25 and its potential applications from the genomic aspect.

    PubMed

    Selvaratnam, Chitra; Thevarajoo, Suganthi; Ee, Robson; Chan, Kok-Gan; Bennett, Joseph P; Goh, Kian Mau; Chong, Chun Shiong

    2016-08-01

    Roseivirga sp. strain D-25 is an aerobic marine bacterium isolated from seawater collected from Desaru beach, Malaysia. To date, the genus Roseivirga consists of only four species with no genome sequence reported. Here, we present the genome sequence of Roseivirga sp. strain D-25 (=KCTC 42709=DSM 101709), with a genome size of approximately 4.08Mbp and G+C content of 39.18%. Genome sequence analysis of strain D-25 revealed the presence of genes related to petroleum hydrocarbon degradation, 2,4,6-trinitrotoluene detoxification, heavy metals bioremediation and production of carotenoids, which shed light on the potential application of this strain. PMID:27107724

  14. The Arabidopsis lyrata genome sequence and the basis of rapid genome size change

    SciTech Connect

    Hu, Tina T.; Pattyn, Pedro; Bakker, Erica G.; Cao, Jun; Cheng, Jan-Fang; Clark, Richard M.; Fahlgren, Noah; Fawcett, Jeffrey A.; Grimwood, Jane; Gundlach, Heidrun; Haberer, Georg; Hollister, Jesse D.; Ossowski, Stephan; Ottilar, Robert P.; Salamov, Asaf A.; Schneeberger, Korbinian; Spannagl, Manuel; Wang, Xi; Yang, Liang; Nasrallah, Mikhail E.; Bergelson, Joy; Carrington, James C.; Gaut, Brandon S.; Schmutz, Jeremy; Mayer, Klaus F. X.; Van de Peer, Yves; Grigoriev, Igor V.; Nordborg, Magnus; Weigel, Detlef; Guo, Ya-Long

    2011-04-29

    In our manuscript, we present a high-quality genome sequence of the Arabidopsis thaliana relative, Arabidopsis lyrata, produced by dideoxy sequencing. We have performed the usual types of genome analysis (gene annotation, dN/dS studies etc. etc.), but this is relegated to the Supporting Information. Instead, we focus on what was a major motivation for sequencing this genome, namely to understand how A. thaliana lost half its genome in a few million years and lived to tell the tale. The rather surprising conclusion is that there is not a single genomic feature that accounts for the reduced genome, but that every aspect centromeres, intergenic regions, transposable elements, gene family number is affected through hundreds of thousands of cuts. This strongly suggests that overall genome size in itself is what has been under selection, a suggestion that is strongly supported by our demonstration (using population genetics data from A. thaliana) that new deletions seem to be driven to fixation.

  15. Molecular evolution of herpesviruses: genomic and protein sequence comparisons.

    PubMed Central

    Karlin, S; Mocarski, E S; Schachtel, G A

    1994-01-01

    Phylogenetic reconstruction of herpesvirus evolution is generally founded on amino acid sequence comparisons of specific proteins. These are relevant to the evolution of the specific gene (or set of genes), but the resulting phylogeny may vary depending on the particular sequence chosen for analysis (or comparison). In the first part of this report, we compare 13 herpesvirus genomes by using a new multidimensional methodology based on distance measures and partial orderings of dinucleotide relative abundances. The sequences were analyzed with respect to (i) genomic compositional extremes; (ii) total distances within and between genomes; (iii) partial orderings among genomes relative to a set of sequence standards; (iv) concordance correlations of genome distances; and (v) consistency with the alpha-, beta-, gammaherpesvirus classification. Distance assessments within individual herpesvirus genomes show each to be quite homogeneous relative to the comparisons between genomes. The gammaherpesviruses, Epstein-Barr virus (EBV), herpesvirus saimiri, and bovine herpesvirus 4 are both diverse and separate from other herpesvirus classes, whereas alpha- and betaherpesviruses overlap. The analysis revealed that the most central genome (closest to a consensus herpesvirus genome and most individual herpesvirus sequences of different classes) is that of human herpesvirus 6, suggesting that this genome is closest to a progenitor herpesvirus. The shorter DNA distances among alphaherpesviruses supports the hypothesis that the alpha class is of relatively recent ancestry. In our collection, equine herpesvirus 1 (EHV1) stands out as the most central alphaherpesvirus, suggesting it may approximate an ancestral alphaherpesvirus. Among all herpesviruses, the EBV genome is closest to human sequences. In the DNA partial orderings, the chicken sequence collection is invariably as close as or closer to all herpesvirus sequences than the human sequence collection is, which may imply that

  16. Complete genome sequence of Allochromatium vinosum DSM 180T

    PubMed Central

    Weissgerber, Thomas; Zigann, Renate; Bruce, David; Chang, Yun-juan; Detter, John C.; Han, Cliff; Hauser, Loren; Jeffries, Cynthia D.; Land, Miriam; Munk, A. Christine; Tapia, Roxanne; Dahl, Christiane

    2011-01-01

    Allochromatium vinosum formerly Chromatium vinosum is a mesophilic purple sulfur bacterium belonging to the family Chromatiaceae in the bacterial class Gammaproteobacteria. The genus Allochromatium contains currently five species. All members were isolated from freshwater, brackish water or marine habitats and are predominately obligate phototrophs. Here we describe the features of the organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the Chromatiaceae within the purple sulfur bacteria thriving in globally occurring habitats. The 3,669,074 bp genome with its 3,302 protein-coding and 64 RNA genes was sequenced within the Joint Genome Institute Community Sequencing Program. PMID:22675582

  17. Genome Sequence of Mycoplasma hyorhinis Isolated from Cell Cultures

    PubMed Central

    Cibulski, Samuel Paulo; Siqueira, Franciele Maboni; Teixeira, Thais Fumaco; Mayer, Fabiana Quoos; Almeida, Luiz Gonzaga

    2016-01-01

    Mycoplasmas are major contaminants of mammalian cell cultures. Here, the complete genome sequence of Mycoplasma hyorhinis recovered from Madin-Darby bovine kidney (MDBK) cells is reported. PMID:27738034

  18. Draft Genome Sequences of Three Mycobacterium chimaera Respiratory Isolates.

    PubMed

    Mac Aogáin, Micheál; Roycroft, Emma; Raftery, Philomena; Mok, Simone; Fitzgibbon, Margaret; Rogers, Thomas R

    2015-12-03

    Mycobacterium chimaera is an opportunistic human pathogen implicated in both pulmonary and cardiovascular infections. Here, we report the draft genome sequences of three strains isolated from human respiratory specimens.

  19. Draft Genome Sequences of Gammaproteobacterial Methanotrophs Isolated from Marine Ecosystems

    PubMed Central

    Flynn, James D.; Hirayama, Hisako; Sakai, Yasuyoshi; Dunfield, Peter F.; Knief, Claudia; Op den Camp, Huub J. M.; Jetten, Mike S. M.; Khmelenina, Valentina N.; Trotsenko, Yuri A.; Murrell, J. Colin; Semrau, Jeremy D.; Svenning, Mette M.; Stein, Lisa Y.; Kyrpides, Nikos; Shapiro, Nicole; Woyke, Tanja; Bringel, Françoise; Vuilleumier, Stéphane; DiSpirito, Alan A.

    2016-01-01

    The genome sequences of Methylobacter marinus A45, Methylobacter sp. strain BBA5.1, and Methylomarinum vadi IT-4 were obtained. These aerobic methanotrophs are typical members of coastal and hydrothermal vent marine ecosystems. PMID:26798114

  20. Draft Genome Sequence of Paecilomyces hepiali, Isolated from Cordyceps sinensis.

    PubMed

    Yu, Yi; Wang, Wenting; Wang, Linping; Pang, Fang; Guo, Lanping; Song, Lai; Liu, Guiming; Feng, Chengqiang

    2016-01-01

    Paecilomyces hepiali is an endoparasitic fungus that commonly exists in the natural Cordyceps sinensis Here, we report the draft genome sequence of P. hepiali, which will facilitate the exploitation of medicinal compounds produced by the fungus. PMID:27389266

  1. First Draft Genome Sequence of a Mycobacterium gordonae Clinical Isolate

    PubMed Central

    Smirnova, T.; Blagodatskikh, K.; Varlamov, D.; Sochivko, D.; Larionova, E.; Andreevskaya, S.; Andrievskaya, I.; Chernousova, L.

    2016-01-01

    Here, we report the first draft genome sequence of the clinically relevant species Mycobacterium gordonae. The clinical isolate Mycobacterium gordonae 14-8773 was obtained from the sputum of a patient with mycobacteriosis. PMID:27365356

  2. First Draft Genome Sequence of a Mycobacterium gordonae Clinical Isolate.

    PubMed

    Ustinova, V; Smirnova, T; Blagodatskikh, K; Varlamov, D; Sochivko, D; Larionova, E; Andreevskaya, S; Andrievskaya, I; Chernousova, L

    2016-01-01

    Here, we report the first draft genome sequence of the clinically relevant species Mycobacterium gordonae The clinical isolate Mycobacterium gordonae 14-8773 was obtained from the sputum of a patient with mycobacteriosis. PMID:27365356

  3. Genome sequence of the fish pathogen Flavobacterium columnare ATCC 49512

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Flavobacterium columnare is a Gram-negative, rod shaped, motile, and highly prevalent fish pathogen causing columnaris disease in freshwater fish worldwide. Here, we present the complete genome sequence of F. columnare strain ATCC 49512. ...

  4. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    PubMed Central

    Bruce, David; Detter, Chris; Goodwin, Lynne A.; Han, James; Han, Cliff S.; Held, Brittany; Land, Miriam L.; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobecky, Patricia A.

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis. PMID:22582378

  5. Draft Genome Sequence of Paecilomyces hepiali, Isolated from Cordyceps sinensis

    PubMed Central

    Yu, Yi; Wang, Wenting; Wang, Linping; Pang, Fang; Guo, Lanping; Song, Lai

    2016-01-01

    Paecilomyces hepiali is an endoparasitic fungus that commonly exists in the natural Cordyceps sinensis. Here, we report the draft genome sequence of P. hepiali, which will facilitate the exploitation of medicinal compounds produced by the fungus. PMID:27389266

  6. Draft Genome Sequence of Lactobacillus plantarum Strain IPLA 88

    PubMed Central

    Ladero, Victor; Alvarez-Sieiro, Patricia; Redruello, Begoña; del Rio, Beatriz; Linares, Daniel M.; Martin, M. Cruz; Fernández, María

    2013-01-01

    Here, we report a 3.2-Mbp draft assembly for the genome of Lactobacillus plantarum IPLA 88. The sequence of this sourdough isolate provides insight into the adaptation of this versatile species to different environments. PMID:23887921

  7. Genome Sequence of the Immunomodulatory Strain Bifidobacterium bifidum LMG 13195

    PubMed Central

    Gueimonde, Miguel; Ventura, Marco; Margolles, Abelardo

    2012-01-01

    In this work, we report the genome sequences of Bifidobacterium bifidum strain LMG13195. Results from our research group show that this strain is able to interact with human immune cells, generating functional regulatory T cells. PMID:23209243

  8. Complete Genome Sequence of Rahnella aquatilis CIP 78.65

    SciTech Connect

    Martinez, Robert J; Bruce, David; Detter, J C; Goodwin, Lynne A.; Han, James; Han, Cliff; Held, Brittany; Land, Miriam L; Mikhailova, Natalia; Nolan, Matt; Pennacchio, Len; Pitluck, Sam; Tapia, Roxanne; Woyke, Tanja; Sobeckya, Patricia A.

    2012-01-01

    Rahnella aquatilis CIP 78.65 is a gammaproteobacterium isolated from a drinking water source in Lille, France. Here we report the complete genome sequence of Rahnella aquatilis CIP 78.65, the type strain of R. aquatilis.

  9. Draft Genome Sequence of Lactobacillus casei W56

    PubMed Central

    Hochwind, Kerstin; Weinmaier, Thomas; Schmid, Michael; van Hemert, Saskia; Hartmann, Anton; Rattei, Thomas

    2012-01-01

    We announce the draft genome sequence of Lactobacillus casei W56 in one contig. This strain shows immunomodulatory and probiotic properties. The strain is also an ingredient of commercially available probiotic products. PMID:23144392

  10. Draft genome sequence of Lactobacillus casei W56.

    PubMed

    Hochwind, Kerstin; Weinmaier, Thomas; Schmid, Michael; van Hemert, Saskia; Hartmann, Anton; Rattei, Thomas; Rothballer, Michael

    2012-12-01

    We announce the draft genome sequence of Lactobacillus casei W56 in one contig. This strain shows immunomodulatory and probiotic properties. The strain is also an ingredient of commercially available probiotic products. PMID:23144392

  11. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    PubMed

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus. PMID:25252813

  12. Genome sequence of vanilla distortion mosaic virus infecting Coriandrum sativum.

    PubMed

    Adams, I P; Rai, S; Deka, M; Harju, V; Hodges, T; Hayward, G; Skelton, A; Fox, A; Boonham, N

    2014-12-01

    The 9573-nucleotide genome of a potyvirus was sequenced from a Coriandrum sativum plant from India with viral symptoms. On analysis, this virus was shown to have greater than 85 % nucleotide sequence identity to vanilla distortion mosaic virus (VDMV). Analysis of the putative coat protein sequence confirmed that this virus was in fact VDMV, with greater than 91 % amino acid sequence identity. The genome appears to encode a 3083-amino-acid polyprotein potentially cleaved into the 10 mature proteins expected in potyviruses. Phylogenetic analysis confirmed that VDMV is a distinct but ungrouped member of the genus Potyvirus.

  13. Large-scale profiling of microRNAs for The Cancer Genome Atlas

    PubMed Central

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J.; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A.

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ∼11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts. PMID:26271990

  14. Large-scale profiling of microRNAs for The Cancer Genome Atlas.

    PubMed

    Chu, Andy; Robertson, Gordon; Brooks, Denise; Mungall, Andrew J; Birol, Inanc; Coope, Robin; Ma, Yussanne; Jones, Steven; Marra, Marco A

    2016-01-01

    The comprehensive multiplatform genomics data generated by The Cancer Genome Atlas (TCGA) Research Network is an enabling resource for cancer research. It includes an unprecedented amount of microRNA sequence data: ~11 000 libraries across 33 cancer types. Combined with initiatives like the National Cancer Institute Genomics Cloud Pilots, such data resources will make intensive analysis of large-scale cancer genomics data widely accessible. To support such initiatives, and to enable comparison of TCGA microRNA data to data from other projects, we describe the process that we developed and used to generate the microRNA sequence data, from library construction through to submission of data to repositories. In the context of this process, we describe the computational pipeline that we used to characterize microRNA expression across large patient cohorts.

  15. Intra-species sequence comparisons for annotating genomes

    SciTech Connect

    Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.

    2004-07-15

    Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.

  16. Complete genome sequencing and comparative genomic analysis of functionally diverse Lysinibacillus sphaericus III(3)7.

    PubMed

    Rey, Andrés; Silva-Quintero, Laura; Dussán, Jenny

    2016-09-01

    Lysinibacillus sphaericus III(3)7 is a native Colombian strain, the first one isolated from soil samples. This strain has shown high levels of pathogenic activity against Culex quinquefaciatus larvae in laboratory assays compared to other members of the same species. Using Pacific Biosciences sequencing technology we sequenced, annotated (de novo) and described the genome of strain III(3)7, achieving a complete genome sequence status. We then performed a comparative analysis between the newly sequenced genome and the ones previously reported for Colombian isolates L. sphaericus OT4b.31, CBAM5 and OT4b.25, with the inclusion of L. sphaericus C3-41 that has been used as a reference genome for most of previous genome sequencing projects. We concluded that L. sphaericus III(3)7 is highly similar with strain OT4b.25 and shares high levels of synteny with isolates CBAM5 and C3-41. PMID:27419068

  17. A Decision Support Framework for Genomically Informed Investigational Cancer Therapy

    PubMed Central

    Johnson, Amber; Holla, Vijaykumar; Bailey, Ann Marie; Brusco, Lauren; Chen, Ken; Routbort, Mark; Patel, Keyur P.; Zeng, Jia; Kopetz, Scott; Davies, Michael A.; Piha-Paul, Sarina A.; Hong, David S.; Eterovic, Agda Karina; Tsimberidou, Apostolia M.; Broaddus, Russell; Bernstam, Elmer V.; Shaw, Kenna R.; Mendelsohn, John; Mills, Gordon B.

    2015-01-01

    Rapidly improving understanding of molecular oncology, emerging novel therapeutics, and increasingly available and affordable next-generation sequencing have created an opportunity for delivering genomically informed personalized cancer therapy. However, to implement genomically informed therapy requires that a clinician interpret the patient’s molecular profile, including molecular characterization of the tumor and the patient’s germline DNA. In this Commentary, we review existing data and tools for precision oncology and present a framework for reviewing the available biomedical literature on therapeutic implications of genomic alterations. Genomic alterations, including mutations, insertions/deletions, fusions, and copy number changes, need to be curated in terms of the likelihood that they alter the function of a “cancer gene” at the level of a specific variant in order to discriminate so-called “drivers” from “passengers.” Alterations that are targetable either directly or indirectly with approved or investigational therapies are potentially “actionable.” At this time, evidence linking predictive biomarkers to therapies is strong for only a few genomic markers in the context of specific cancer types. For these genomic alterations in other diseases and for other genomic alterations, the clinical data are either absent or insufficient to support routine clinical implementation of biomarker-based therapy. However, there is great interest in optimally matching patients to early-phase clinical trials. Thus, we need accessible, comprehensive, and frequently updated knowledge bases that describe genomic changes and their clinical implications, as well as continued education of clinicians and patients. PMID:25863335

  18. Complete genome sequence of Bacillus cereus bacteriophage PBC1.

    PubMed

    Kong, Minsuk; Kim, Minsik; Ryu, Sangryeol

    2012-06-01

    Bacillus cereus is a ubiquitous, spore-forming bacterium associated with food poisoning cases. To develop an efficient biocontrol agent against B. cereus, we isolated lytic phage PBC1 and sequenced its genome. PBC1 showed a very low degree of homology to previously reported phages, implying that it is novel. Here we report the complete genome sequence of PBC1 and describe major findings from our analysis.

  19. Draft genome sequence of Therminicola potens strain JR

    SciTech Connect

    Byrne-Bailey, K.G.; Wrighton, K.C.; Melnyk, R.A.; Agbo, P.; Hazen, T.C.; Coates, J.D.

    2010-07-01

    'Thermincola potens' strain JR is one of the first Gram-positive dissimilatory metal-reducing bacteria (DMRB) for which there is a complete genome sequence. Consistent with the physiology of this organism, preliminary annotation revealed an abundance of multiheme c-type cytochromes that are putatively associated with the periplasm and cell surface in a Gram-positive bacterium. Here we report the complete genome sequence of strain JR.

  20. Genome Sequence of the Biocontrol Strain Pseudomonas fluorescens F113

    PubMed Central

    Redondo-Nieto, Miguel; Barret, Matthieu; Morrisey, John P.; Germaine, Kieran; Martínez-Granero, Francisco; Barahona, Emma; Navazo, Ana; Sánchez-Contreras, María; Moynihan, Jennifer A.; Giddens, Stephen R.; Coppoolse, Eric R.; Muriel, Candela; Stiekema, Willem J.; Rainey, Paul B.; Dowling, David; O'Gara, Fergal; Martín, Marta

    2012-01-01

    Pseudomonas fluorescens F113 is a plant growth-promoting rhizobacterium (PGPR) that has biocontrol activity against fungal plant pathogens and is a model for rhizosphere colonization. Here, we present its complete genome sequence, which shows that besides a core genome very similar to those of other strains sequenced within this species, F113 possesses a wide array of genes encoding specialized functions for thriving in the rhizosphere and interacting with eukaryotic organisms. PMID:22328765

  1. Complete Mitochondrial Genome Sequence of Sunflower (Helianthus annuus L.).

    PubMed

    Grassa, Christopher J; Ebert, Daniel P; Kane, Nolan C; Rieseberg, Loren H

    2016-01-01

    This is the first complete mitochondrial genome sequence for sunflower and the first complete mitochondrial genome for any member of Asteraceae, the largest plant family, which includes over 23,000 named species. The master circle is 300,945-bp long and includes 27 protein-coding sequences, 18 tRNAs, and the 26S, 5S, and 18S rRNAs. PMID:27635002

  2. Draft genome sequence of Gluconobacter thailandicus NBRC 3257

    PubMed Central

    Matsutani, Minenosuke; Yakushi, Toshiharu

    2014-01-01

    Gluconobacter thailandicus strain NBRC 3257, isolated from downy cherry (Prunus tomentosa), is a strict aerobic rod-shaped Gram-negative bacterium. Here, we report the features of this organism, together with the draft genome sequence and annotation. The draft genome sequence is composed of 107 contigs for 3,446,046 bp with 56.17% G+C content and contains 3,360 protein-coding genes and 54 RNA genes. PMID:25197448

  3. Complete Mitochondrial Genome Sequence of Sunflower (Helianthus annuus L.)

    PubMed Central

    Ebert, Daniel P.; Kane, Nolan C.; Rieseberg, Loren H.

    2016-01-01

    This is the first complete mitochondrial genome sequence for sunflower and the first complete mitochondrial genome for any member of Asteraceae, the largest plant family, which includes over 23,000 named species. The master circle is 300,945-bp long and includes 27 protein-coding sequences, 18 tRNAs, and the 26S, 5S, and 18S rRNAs. PMID:27635002

  4. A computational genomics pipeline for prokaryotic sequencing projects

    PubMed Central

    Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King

    2010-01-01

    Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285

  5. Genomic heterogeneity of multiple synchronous lung cancer

    PubMed Central

    Liu, Yu; Zhang, Jianjun; Li, Lin; Yin, Guangliang; Zhang, Jianhua; Zheng, Shan; Cheung, Hannah; Wu, Ning; Lu, Ning; Mao, Xizeng; Yang, Longhai; Zhang, Jiexin; Zhang, Li; Seth, Sahil; Chen, Huang; Song, Xingzhi; Liu, Kan; Xie, Yongqiang; Zhou, Lina; Zhao, Chuanduo; Han, Naijun; Chen, Wenting; Zhang, Susu; Chen, Longyun; Cai, Wenjun; Li, Lin; Shen, Miaozhong; Xu, Ningzhi; Cheng, Shujun; Yang, Huanming; Lee, J. Jack; Correa, Arlene; Fujimoto, Junya; Behrens, Carmen; Chow, Chi-Wan; William, William N.; Heymach, John V.; Hong, Waun Ki; Swisher, Stephen; Wistuba, Ignacio I.; Wang, Jun; Lin, Dongmei; Liu, Xiangyang; Futreal, P. Andrew; Gao, Yanning

    2016-01-01

    Multiple synchronous lung cancers (MSLCs) present a clinical dilemma as to whether individual tumours represent intrapulmonary metastases or independent tumours. In this study we analyse genomic profiles of 15 lung adenocarcinomas and one regional lymph node metastasis from 6 patients with MSLC. All 15 lung tumours demonstrate distinct genomic profiles, suggesting all are independent primary tumours, which are consistent with comprehensive histopathological assessment in 5 of the 6 patients. Lung tumours of the same individuals are no more similar to each other than are lung adenocarcinomas of different patients from TCGA cohort matched for tumour size and smoking status. Several known cancer-associated genes have different mutations in different tumours from the same patients. These findings suggest that in the context of identical constitutional genetic background and environmental exposure, different lung cancers in the same individual may have distinct genomic profiles and can be driven by distinct molecular events. PMID:27767028

  6. Complete genome sequence of pronghorn virus, a pestivirus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The complete genome sequence of Pronghorn virus, a member of the Pestivirus genus of the Flaviviridae, was determined. The virus, originally isolated from a pronghorn antelope, had a genome of 12,287 nucleotides with a single open reading frame of 11,694 bases encoding 3898 amino acids....

  7. Complete Genome Sequence of Pronghorn Virus, a Pestivirus

    PubMed Central

    Ridpath, Julia F.; Fischer, Nicole; Grundhoff, Adam; Postel, Alexander; Becher, Paul

    2014-01-01

    The complete genome sequence of pronghorn virus, a member of the Pestivirus genus of the family Flaviviridae, was determined here. The virus, originally isolated from a pronghorn antelope, has a genome of 12,273 nucleotides, with a single open reading frame of 11,694 bases encoding 3,897 amino acids. PMID:24926058

  8. Complete genome sequence of pronghorn virus, a pestivirus.

    PubMed

    Neill, John D; Ridpath, Julia F; Fischer, Nicole; Grundhoff, Adam; Postel, Alexander; Becher, Paul

    2014-01-01

    The complete genome sequence of pronghorn virus, a member of the Pestivirus genus of the family Flaviviridae, was determined here. The virus, originally isolated from a pronghorn antelope, has a genome of 12,273 nucleotides, with a single open reading frame of 11,694 bases encoding 3,897 amino acids. PMID:24926058

  9. Complete Genome Sequence of Bacillus thuringiensis Bacteriophage Smudge

    PubMed Central

    Cornell, Jessica L.; Breslin, Eileen; Schuhmacher, Zachary; Himelright, Madison; Berluti, Cassandra; Boyd, Charles; Carson, Rachel; Del Gallo, Elle; Giessler, Caris; Gilliam, Benjamin; Heatherly, Catherine; Nevin, Julius; Nguyen, Bryan; Nguyen, Justin; Parada, Jocelyn; Sutterfield, Blake; Tukruni, Muruj

    2016-01-01

    Smudge, a bacteriophage enriched from soil using Bacillus thuringiensis DSM-350 as the host, had its complete genome sequenced. Smudge is a myovirus with a genome consisting of 292 genes and was identified as belonging to the C1 cluster of Bacillus phages. PMID:27540049

  10. Complete Genome Sequence of Cyanobacterial Siphovirus KBS2A.

    PubMed

    Ponsero, Alise J; Chen, Feng; Lennon, Jay T; Wilhelm, Steven W

    2013-01-01

    We present the genome of a cyanosiphovirus (KBS2A) that infects a marine Synechococcus sp. (strain WH7803). Unique to this genome, relative to other sequenced cyanosiphoviruses, is the absence of elements associated with integration into the host chromosome, suggesting this virus may not be able to establish a lysogenic relationship. PMID:23969045

  11. Genome Sequence of Mycoplasma ovipneumoniae Strain SC01 ▿

    PubMed Central

    Yang, Falong; Tang, Cheng; Wang, Yong; Zhang, Huanrong; Yue, Hua

    2011-01-01

    Mycoplasma ovipneumoniae is associated with chronic nonprogressive pneumonia in both sheep and goats. Studies concerning its molecular pathogenesis, genetic analysis, and vaccine development have been hindered due to limited genomic information. Here, we announce the first complete genome sequence of this organism. PMID:21742877

  12. Draft Genome Sequence of Brucella abortus Virulent Strain 544

    PubMed Central

    Singh, D. K.; Kumar, Ashok; Tiwari, A. K.; Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S.; Sridhar, Jayavel; Gunasekaran, Paramasamy

    2015-01-01

    Here, we present the draft genome sequence and annotation of Brucella abortus virulent strain 544. The genome of this strain is 3,289,405 bp long, with 57.2% G+C content. A total of 3,259 protein-coding genes and 60 RNA genes were predicted. PMID:25953161

  13. Draft Genome Sequence of Brucella abortus Virulent Strain 544.

    PubMed

    Singh, D K; Kumar, Ashok; Tiwari, A K; Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2015-05-07

    Here, we present the draft genome sequence and annotation of Brucella abortus virulent strain 544. The genome of this strain is 3,289,405 bp long, with 57.2% G+C content. A total of 3,259 protein-coding genes and 60 RNA genes were predicted.

  14. Genome sequence of the cultivated cotton Gossypium arboreum

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Cotton is one of the most economically important natural fiber crops in the world, and the complex tetraploid nature of its genome (AADD, 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled 98.3% of the 1.7-gigabase G. arboreum (AA, 2n = 26...

  15. Mitochondrial Genome Sequence of the Glass Sponge Oopsacas minuta

    PubMed Central

    Santini, Sébastien; Rocher, Caroline; Le Bivic, André

    2015-01-01

    We report the complete mitochondrial genome sequence of the Mediterranean glass sponge Oopsacas minuta. This 19-kb mitochondrial genome has 24 noncoding genes (22 tRNAs and 2 rRNAs) and 14 protein-encoding genes coding for 11 subunits of respiratory chain complexes and 3 ATP synthase subunits. PMID:26227597

  16. Complete Mitochondrial Genome Sequence of the Pezizomycete Pyronema confluens

    PubMed Central

    2016-01-01

    The complete mitochondrial genome of the ascomycete Pyronema confluens has been sequenced. The circular genome has a size of 191 kb and contains 48 protein-coding genes, 26 tRNA genes, and two rRNA genes. Of the protein-coding genes, 14 encode conserved mitochondrial proteins, and 31 encode predicted homing endonuclease genes. PMID:27174271

  17. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    PubMed

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  18. Genome Sequences of Five B1 Subcluster Mycobacteriophages

    PubMed Central

    Barrus, E. Zane; Benedict, Alex B.; Brighton, Alicia K.; Fisher, Joshua N. B.; Gardner, Adam V.; Kartchner, Brittany J.; Ladle, Kara C.; Lunt, Bryce L.; Merrill, Bryan D.; Morrell, John D.; Burnett, Sandra H.

    2013-01-01

    Mycobacteriophages infect members of the Mycobacterium genus in the phylum Actinobacteria and exhibit remarkable diversity. Genome analysis groups the thousands of known mycobacteriophages into clusters, of which the B1 subcluster is currently the third most populous. We report the complete genome sequences of five additional members of the B1 subcluster. PMID:24285667

  19. Genome sequences of five b1 subcluster mycobacteriophages.

    PubMed

    Breakwell, Donald P; Barrus, E Zane; Benedict, Alex B; Brighton, Alicia K; Fisher, Joshua N B; Gardner, Adam V; Kartchner, Brittany J; Ladle, Kara C; Lunt, Bryce L; Merrill, Bryan D; Morrell, John D; Burnett, Sandra H; Grose, Julianne H

    2013-11-27

    Mycobacteriophages infect members of the Mycobacterium genus in the phylum Actinobacteria and exhibit remarkable diversity. Genome analysis groups the thousands of known mycobacteriophages into clusters, of which the B1 subcluster is currently the third most populous. We report the complete genome sequences of five additional members of the B1 subcluster.

  20. Complete genome sequence of Campylobacter gracilis ATCC 33236T

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The human oral pathogen Campylobacter gracilis has been isolated from periodontal and endodontal infections, and also from non-oral head, neck or lung infections. This study describes the whole-genome sequence of the human periodontal isolate ATCC 33236T (=FDC 1084), which is the first closed genome...

  1. Complete Genome Sequence of Bacillus thuringiensis Bacteriophage Smudge.

    PubMed

    Cornell, Jessica L; Breslin, Eileen; Schuhmacher, Zachary; Himelright, Madison; Berluti, Cassandra; Boyd, Charles; Carson, Rachel; Del Gallo, Elle; Giessler, Caris; Gilliam, Benjamin; Heatherly, Catherine; Nevin, Julius; Nguyen, Bryan; Nguyen, Justin; Parada, Jocelyn; Sutterfield, Blake; Tukruni, Muruj; Temple, Louise

    2016-01-01

    Smudge, a bacteriophage enriched from soil using Bacillus thuringiensis DSM-350 as the host, had its complete genome sequenced. Smudge is a myovirus with a genome consisting of 292 genes and was identified as belonging to the C1 cluster of Bacillus phages. PMID:27540049

  2. Complete Genome Sequence of Pseudomonas aeruginosa Phage AAT-1

    PubMed Central

    Andrade-Domínguez, Andrés

    2016-01-01

    Aspects of the interaction between phages and animals are of interest and importance for medical applications. Here, we report the genome sequence of the lytic Pseudomonas phage AAT-1, isolated from mammalian serum. AAT-1 is a double-stranded DNA phage, with a genome of 57,599 bp, containing 76 predicted open reading frames. PMID:27563032

  3. First Complete Genome Sequence of Felis catus Gammaherpesvirus 1

    PubMed Central

    Lee, Justin S.; Vuyisich, Momchilo; Chain, Patrick; Lo, Chien-Chi; Kronmiller, Brent; Bracha, Shay; Avery, Anne C.; VandeWoude, Sue

    2015-01-01

    We sequenced the complete genome of Felis catus gammaherpesvirus 1 (FcaGHV1) from lymph node DNA of an infected cat. The genome includes a 121,556-nucleotide unique region with 87 predicted open reading frames (61 gammaherpesvirus conserved and 26 unique) flanked by multiple copies of a 966-nucleotide terminal repeat. PMID:26543105

  4. Complete genome sequence of Aeromonas hydrophila AL06-06

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aeromonas hydrophila occurs in freshwater environments and infects fish and mammals. In this work, we report the complete genome sequence of Aeromonas hydrophila AL06-06, which was isolated from diseased goldfish and is being used for comparative genomic studies with A. hydrophila strains causing ba...

  5. Draft genome sequence of Phomopsis longicolla MSPL 10-6

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Phomopsis longicolla T.W. Hobbs is the primary cause of Phomopsis seed decay in soybean. We report the de novo assembled draft genome sequence of P. longicolla isolate MSPL10-6 with a 54.8-fold depth of coverage. The resulting draft genome was estimated to be approximately 64 Mb in size with an over...

  6. Complete Genome Sequence of Mycobacterium bovis Strain BCG-1 (Russia)

    PubMed Central

    Shitikov, Egor A.; Malakhova, Maja V.; Kostryukova, Elena S.; Ilina, Elena N.; Atrasheuskaya, Alena V.; Ignatyev, Georgy M.; Vinokurova, Nataliya V.; Gorbachyov, Vyacheslav Y.

    2016-01-01

    Mycobacterium bovis BCG (Bacille Calmette-Guérin) is a vaccine strain used for protection against tuberculosis. Here, we announce the complete genome sequence of M. bovis strain BCG-1 (Russia). Extensive use of this strain necessitates the study of its genome stability by comparative analysis. PMID:27034492

  7. Complete Genome Sequence of Pseudomonas aeruginosa Phage AAT-1.

    PubMed

    Andrade-Domínguez, Andrés; Kolter, Roberto

    2016-01-01

    Aspects of the interaction between phages and animals are of interest and importance for medical applications. Here, we report the genome sequence of the lytic Pseudomonas phage AAT-1, isolated from mammalian serum. AAT-1 is a double-stranded DNA phage, with a genome of 57,599 bp, containing 76 predicted open reading frames. PMID:27563032

  8. The tomato genome sequence provides insight into fleshy fruit evolution

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the inbred tomato cultivar ‘Heinz 1706’ was sequenced and assembled using a combination of Sanger and “next generation” technologies. The predicted genome size is ~900 Mb, consistent with prior estimates, of which 760 Mb were assembled in 91 scaffolds aligned to the 12 tomato chromosom...

  9. Complete Chloroplast Genome Sequence of Phagomixotrophic Green Alga Cymbomonas tetramitiformis

    PubMed Central

    Paasch, Amber E.; Graham, Linda E.; Kim, Eunsoo

    2016-01-01

    We report here the complete chloroplast genome sequence of Cymbomonas tetramitiformis strain PLY262, which is a prasinophycean green alga that retains a phagomixotrophic mode of nutrition. The genome is 84,524 bp in length, with a G+C content of 37%, and contains 3 rRNAs, 26 tRNAs, and 76 protein-coding genes. PMID:27313295

  10. Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...

  11. Coelacanth genome sequence reveals the evolutionary history of vertebrate genes.

    PubMed

    Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M

    2004-12-01

    The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.

  12. Salmonella serotype determination utilizing high-throughput genome sequencing data.

    PubMed

    Zhang, Shaokang; Yin, Yanlong; Jones, Marcus B; Zhang, Zhenzhen; Deatherage Kaiser, Brooke L; Dinsmore, Blake A; Fitzgerald, Collette; Fields, Patricia I; Deng, Xiangyu

    2015-05-01

    Serotyping forms the basis of national and international surveillance networks for Salmonella, one of the most prevalent foodborne pathogens worldwide (1-3). Public health microbiology is currently being transformed by whole-genome sequencing (WGS), which opens the door to serotype determination using WGS data. SeqSero (www.denglab.info/SeqSero) is a novel Web-based tool for determining Salmonella serotypes using high-throughput genome sequencing data. SeqSero is based on curated databases of Salmonella serotype determinants (rfb gene cluster, fliC and fljB alleles) and is predicted to determine serotype rapidly and accurately for nearly the full spectrum of Salmonella serotypes (more than 2,300 serotypes), from both raw sequencing reads and genome assemblies. The performance of SeqSero was evaluated by testing (i) raw reads from genomes of 308 Salmonella isolates of known serotype; (ii) raw reads from genomes of 3,306 Salmonella isolates sequenced and made publicly available by GenomeTrakr, a U.S. national monitoring network operated by the Food and Drug Administration; and (iii) 354 other publicly available draft or complete Salmonella genomes. We also demonstrated Salmonella serotype determination from raw sequencing reads of fecal metagenomes from mice orally infected with this pathogen. SeqSero can help to maintain the well-established utility of Salmonella serotyping when integrated into a platform of WGS-based pathogen subtyping and characterization.

  13. Genomic distribution of simple sequence repeats in Brassica rapa.

    PubMed

    Hong, Chang Pyo; Piao, Zhong Yun; Kang, Tae Wook; Batley, Jacqueline; Yang, Tae-Jin; Hur, Yoon-Kang; Bhak, Jong; Park, Beom-Seok; Edwards, David; Lim, Yong Pyo

    2007-06-30

    Simple Sequence Repeats (SSRs) represent short tandem duplications found within all eukaryotic organisms. To examine the distribution of SSRs in the genome of Brassica rapa ssp. pekinensis, SSRs from different genomic regions representing 17.7 Mb of genomic sequence were surveyed. SSRs appear more abundant in non-coding regions (86.6%) than in coding regions (13.4%). Comparison of SSR densities in different genomic regions demonstrated that SSR density was greatest within the 5'-flanking regions of the predicted genes. The proportion of different repeat motifs varied between genomic regions, with trinucleotide SSRs more prevalent in predicted coding regions, reflecting the codon structure in these regions. SSRs were also preferentially associated with gene-rich regions, with peri-centromeric heterochromatin SSRs mostly associated with retrotransposons. These results indicate that the distribution of SSRs in the genome is non-random. Comparison of SSR abundance between B. rapa and the closely related species Arabidopsis thaliana suggests a greater abundance of SSRs in B. rapa, which may be due to the proposed genome triplication. Our results provide a comprehensive view of SSR genomic distribution and evolution in Brassica for comparison with the sequenced genomes of A. thaliana and Oryza sativa.

  14. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

  15. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile. PMID:24071849

  16. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species.

    PubMed

    Hirakawa, Hideki; Shirasawa, Kenta; Kosugi, Shunichi; Tashiro, Kosuke; Nakayama, Shinobu; Yamada, Manabu; Kohara, Mistuyo; Watanabe, Akiko; Kishida, Yoshie; Fujishiro, Tsunakazu; Tsuruoka, Hisano; Minami, Chiharu; Sasamoto, Shigemi; Kato, Midori; Nanri, Keiko; Komaki, Akiko; Yanagi, Tomohiro; Guoxin, Qin; Maeda, Fumi; Ishikawa, Masami; Kuhara, Satoru; Sato, Shusei; Tabata, Satoshi; Isobe, Sachiko N

    2014-01-01

    Cultivated strawberry (Fragaria x ananassa) is octoploid and shows allogamous behaviour. The present study aims at dissecting this octoploid genome through comparison with its wild relatives, F. iinumae, F. nipponica, F. nubicola, and F. orientalis by de novo whole-genome sequencing on an Illumina and Roche 454 platforms. The total length of the assembled Illumina genome sequences obtained was 698 Mb for F. x ananassa, and ∼200 Mb each for the four wild species. Subsequently, a virtual reference genome termed FANhybrid_r1.2 was constructed by integrating the sequences of the four homoeologous subgenomes of F. x ananassa, from which heterozygous regions in the Roche 454 and Illumina genome sequences were eliminated. The total length of FANhybrid_r1.2 thus created was 173.2 Mb with the N50 length of 5137 bp. The Illumina-assembled genome sequences of F. x ananassa and the four wild species were then mapped onto the reference genome, along with the previously published F. vesca genome sequence to establish the subgenomic structure of F. x ananassa. The strategy adopted in this study has turned out to be successful in dissecting the genome of octoploid F. x ananassa and appears promising when applied to the analysis of other polyploid plant species. PMID:24282021

  17. Relevant genomics of neurotensin receptor in cancer.

    PubMed

    Elek, J; Pinzon, W; Park, K H; Narayanan, R

    2000-01-01

    The expressed sequence tag (EST) databases are an attractive starting point for gene discovery for diseases like cancer. Validation of gene targets from these sequences (both known and novel) in cancers requires a comprehensive expression profiling. We identified from the Cancer Gene Anatomy Project database (CGAP), a hit called neurotensin receptor (NT-r) that was expressed in the pancreatic cancer cDNA libraries. Neurotensin (NT), a neuroendocrine peptide, exerts trophic effects in vivo and stimulates the growth of cancer-derived cell lines in vitro. High affinity neurotensin receptors (NT-r) are expressed in cancer-derived cell lines and in some primary tumors. To date, a comprehensive expression profile of the NT-r in diverse cancers and normal tissues has not been reported. A cancer-selective expression of NT-r, if demonstrable, may provide a basis for a diagnostic and potential therapeutic utility. We demonstrate that the NT-r is expressed in a variety of cancer-derived cell lines as well as primary tumors, but only in a select few normal tissues. The expression of NT, on the other hand, was detected in many normal tissues, but not in the cancer-derived cell lines. The NT expression however, was detected in the primary tumors. We further demonstrate that NT expression is stimulated by androgen deprivation in the prostate cancer models. These results demonstrate the usefulness of a panel of cDNA repository for rapid validation of potential cancer targets.

  18. BreCAN-DB: a repository cum browser of personalized DNA breakpoint profiles of cancer genomes.

    PubMed

    Narang, Pankaj; Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    BreCAN-DB (http://brecandb.igib.res.in) is a repository cum browser of whole genome somatic DNA breakpoint profiles of cancer genomes, mapped at single nucleotide resolution using deep sequencing data. These breakpoints are associated with deletions, insertions, inversions, tandem duplications, translocations and a combination of these structural genomic alterations. The current release of BreCAN-DB features breakpoint profiles from 99 cancer-normal pairs, comprising five cancer types. We identified DNA breakpoints across genomes using high-coverage next-generation sequencing data obtained from TCGA and dbGaP. Further, in these cancer genomes, we methodically identified breakpoint hotspots which were significantly enriched with somatic structural alterations. To visualize the breakpoint profiles, a next-generation genome browser was integrated with BreCAN-DB. Moreover, we also included previously reported breakpoint profiles from 138 cancer-normal pairs, spanning 10 cancer types into the browser. Additionally, BreCAN-DB allows one to identify breakpoint hotspots in user uploaded data set. We have also included a functionality to query overlap of any breakpoint profile with regions of user's interest. Users can download breakpoint profiles from the database or may submit their data to be integrated in BreCAN-DB. We believe that BreCAN-DB will be useful resource for genomics scientific community and is a step towards personalized cancer genomics. PMID:26586806

  19. BreCAN-DB: a repository cum browser of personalized DNA breakpoint profiles of cancer genomes

    PubMed Central

    Narang, Pankaj; Dhapola, Parashar; Chowdhury, Shantanu

    2016-01-01

    BreCAN-DB (http://brecandb.igib.res.in) is a repository cum browser of whole genome somatic DNA breakpoint profiles of cancer genomes, mapped at single nucleotide resolution using deep sequencing data. These breakpoints are associated with deletions, insertions, inversions, tandem duplications, translocations and a combination of these structural genomic alterations. The current release of BreCAN-DB features breakpoint profiles from 99 cancer-normal pairs, comprising five cancer types. We identified DNA breakpoints across genomes using high-coverage next-generation sequencing data obtained from TCGA and dbGaP. Further, in these cancer genomes, we methodically identified breakpoint hotspots which were significantly enriched with somatic structural alterations. To visualize the breakpoint profiles, a next-generation genome browser was integrated with BreCAN-DB. Moreover, we also included previously reported breakpoint profiles from 138 cancer-normal pairs, spanning 10 cancer types into the browser. Additionally, BreCAN-DB allows one to identify breakpoint hotspots in user uploaded data set. We have also included a functionality to query overlap of any breakpoint profile with regions of user's interest. Users can download breakpoint profiles from the database or may submit their data to be integrated in BreCAN-DB. We believe that BreCAN-DB will be useful resource for genomics scientific community and is a step towards personalized cancer genomics. PMID:26586806

  20. Systematic deciphering of cancer genome networks.

    PubMed

    Fendler, Bernard; Atwal, Gurinder

    2012-09-01

    When growth regulatory genes are damaged in a cell, it may become cancerous. Current technological advances in the last decade have allowed the characterization of the whole genome of these cells by directly or indirectly measuring DNA changes. Complementary analyses were developed to make sense of the massive amounts of data generated. A large majority of these analyses were developed to construct interaction networks between genes from, primarily, expression array data. We review the current technologies and analyses that have developed in the last decade. We further argue that as cancer genomics evolves from single gene validations to gene network inferences, new analyses must be developed for the different technological platforms.

  1. Interplay Between the Cancer Genome and Epigenome

    PubMed Central

    Shen, Hui; Laird, Peter W.

    2013-01-01

    Cancer arises as a consequence of cumulative disruptions to cellular growth control, with Darwinian selection for those heritable changes which provide the greatest clonal advantage. These traits can be acquired and stably maintained by either genetic or epigenetic means. Here we explore the ways in which alterations in the genome and epigenome influence each other and cooperate to promote oncogenic transformation. Disruption of epigenomic control is pervasive in malignancy, and can be classified as an enabling characteristic of cancer cells, akin to genome instability and mutation. PMID:23540689

  2. Open-Access Cancer Genomics - Office of Cancer Clinical Proteomics Research

    Cancer.gov

    The completion of the Human Genome Project sparked a revolution in high-throughput genomics applied towards deciphering genetically complex diseases, like cancer. Now, almost 10 years later, we have a mountain of genomics data on many different cancer type

  3. Childhood Cancer Genomics Gaps and Opportunities - Workshop Summary

    Cancer.gov

    NCI convened a workshop of representative research teams that have been leaders in defining the genomic landscape of childhood cancers to discuss the influence of genomic discoveries on the future of childhood cancer research.

  4. The Cancer Genome Atlas (TCGA): The next stage - TCGA

    Cancer.gov

    The Cancer Genome Atlas (TCGA), the NIH research program that has helped set the standards for characterizing the genomic underpinnings of dozens of cancers on a large scale, is moving to its next phase.

  5. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis

    PubMed Central

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-01-01

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled. PMID:26586576

  6. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  7. Accessing complex crop genomes with next-generation sequencing.

    PubMed

    Edwards, David; Batley, Jacqueline; Snowdon, Rod J

    2013-01-01

    Many important crop species have genomes originating from ancestral or recent polyploidisation events. Multiple homoeologous gene copies, chromosomal rearrangements and amplification of repetitive DNA within large and complex crop genomes can considerably complicate genome analysis and gene discovery by conventional, forward genetics approaches. On the other hand, ongoing technological advances in molecular genetics and genomics today offer unprecedented opportunities to analyse and access even more recalcitrant genomes. In this review, we describe next-generation sequencing and data analysis techniques that vastly improve our ability to dissect and mine genomes for causal genes underlying key traits and allelic variation of interest to breeders. We focus primarily on wheat and oilseed rape, two leading examples of major polyploid crop genomes whose size or complexity present different, significant challenges. In both cases, the latest DNA sequencing technologies, applied using quite different approaches, have enabled considerable progress towards unravelling the respective genomes. Our ability to discover the extent and distribution of genetic diversity in crop gene pools, and its relationship to yield and quality-related traits, is swiftly gathering momentum as DNA sequencing and the bioinformatic tools to deal with growing quantities of genomic data continue to develop. In the coming decade, genomic and transcriptomic sequencing, discovery and high-throughput screening of single nucleotide polymorphisms, presence-absence variations and other structural chromosomal variants in diverse germplasm collections will give detailed insight into the origins, domestication and available trait-relevant variation of polyploid crops, in the process facilitating novel approaches and possibilities for genomics-assisted breeding.

  8. Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

    PubMed

    Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

    2016-06-01

    Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine.

  9. Clinical Sequencing Exploratory Research Consortium: Accelerating Evidence-Based Practice of Genomic Medicine.

    PubMed

    Green, Robert C; Goddard, Katrina A B; Jarvik, Gail P; Amendola, Laura M; Appelbaum, Paul S; Berg, Jonathan S; Bernhardt, Barbara A; Biesecker, Leslie G; Biswas, Sawona; Blout, Carrie L; Bowling, Kevin M; Brothers, Kyle B; Burke, Wylie; Caga-Anan, Charlisse F; Chinnaiyan, Arul M; Chung, Wendy K; Clayton, Ellen W; Cooper, Gregory M; East, Kelly; Evans, James P; Fullerton, Stephanie M; Garraway, Levi A; Garrett, Jeremy R; Gray, Stacy W; Henderson, Gail E; Hindorff, Lucia A; Holm, Ingrid A; Lewis, Michelle Huckaby; Hutter, Carolyn M; Janne, Pasi A; Joffe, Steven; Kaufman, David; Knoppers, Bartha M; Koenig, Barbara A; Krantz, Ian D; Manolio, Teri A; McCullough, Laurence; McEwen, Jean; McGuire, Amy; Muzny, Donna; Myers, Richard M; Nickerson, Deborah A; Ou, Jeffrey; Parsons, Donald W; Petersen, Gloria M; Plon, Sharon E; Rehm, Heidi L; Roberts, J Scott; Robinson, Dan; Salama, Joseph S; Scollon, Sarah; Sharp, Richard R; Shirts, Brian; Spinner, Nancy B; Tabor, Holly K; Tarczy-Hornoch, Peter; Veenstra, David L; Wagle, Nikhil; Weck, Karen; Wilfond, Benjamin S; Wilhelmsen, Kirk; Wolf, Susan M; Wynn, Julia; Yu, Joon-Ho

    2016-06-01

    Despite rapid technical progress and demonstrable effectiveness for some types of diagnosis and therapy, much remains to be learned about clinical genome and exome sequencing (CGES) and its role within the practice of medicine. The Clinical Sequencing Exploratory Research (CSER) consortium includes 18 extramural research projects, one National Human Genome Research Institute (NHGRI) intramural project, and a coordinating center funded by the NHGRI and National Cancer Institute. The consortium is exploring analytic and clinical validity and utility, as well as the ethical, legal, and social implications of sequencing via multidisciplinary approaches; it has thus far recruited 5,577 participants across a spectrum of symptomatic and healthy children and adults by utilizing both germline and cancer sequencing. The CSER consortium is analyzing data and creating publically available procedures and tools related to participant preferences and consent, variant classification, disclosure and management of primary and secondary findings, health outcomes, and integration with electronic health records. Future research directions will refine measures of clinical utility of CGES in both germline and somatic testing, evaluate the use of CGES for screening in healthy individuals, explore the penetrance of pathogenic variants through extensive phenotyping, reduce discordances in public databases of genes and variants, examine social and ethnic disparities in the provision of genomics services, explore regulatory issues, and estimate the value and downstream costs of sequencing. The CSER consortium has established a shared community of research sites by using diverse approaches to pursue the evidence-based development of best practices in genomic medicine. PMID:27181682

  10. A survey of tools for variant analysis of next-generation genome sequencing data

    PubMed Central

    Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

    2014-01-01

    Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494

  11. Complete genome sequence of Serratia plymuthica strain AS12

    SciTech Connect

    Neupane, Saraswoti; Finlay, Roger D.; Alstrom, Sadhna; Goodwin, Lynne A.; Kyrpides, Nikos C; Lucas, Susan; Lapidus, Alla L.; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, J. Chris; Land, Miriam L; Hauser, Loren John; Cheng, Jan-Fang; Ivanova, N; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Hogberg, Nils

    2012-01-01

    A plant associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest due to its plant growth promoting and plant pathogen inhibiting ability. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled 'Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens'.

  12. Complete mitochondrial genome sequence of Aoluguya reindeer (Rangifer tarandus).

    PubMed

    Ju, Yan; Liu, Huamiao; Rong, Min; Yang, Yifeng; Wei, Haijun; Shao, Yuanchen; Chen, Xiumin; Xing, Xiumei

    2016-05-01

    The complete mitochondria genome of the reindeer, Rangifer tarandus, was determined by accurate polymerase chain reaction. The entire genome is 16,357 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a D-loop region, all of which are arranged in a typical vertebrate manner. The overall base composition of the reindeer's mitochondrial genome is 33.7% of A, 23.1% of C, 30.1% of T and 13.2%of G. A termination associated sequence and several conserved central sequence block domains were discovered within the control region.

  13. Complete genome sequence of Ferroglobus placidus AEDII12DO

    SciTech Connect

    Anderson, Iain; Risso, Carla; Holmes, Dawn; Lucas, Susan; Copeland, A; Lapidus, Alla L.; Cheng, Jan-Fang; Bruce, David; Goodwin, Lynne A.; Pitluck, Sam; Saunders, Elizabeth H; Brettin, Thomas S; Detter, J. Chris; Han, Cliff; Tapia, Roxanne; Larimer, Frank W; Land, Miriam L; Hauser, Loren John; Woyke, Tanja; Lovley, Derek; Kyrpides, Nikos C; Ivanova, N

    2011-01-01

    Ferroglobus placidus belongs to the order Archaeoglobales within the archaeal phylum Euryar- chaeota. Strain AEDII12DO is the type strain of the species and was isolated from a shallow marine hydrothermal system at Vulcano, Italy. It is a hyperthermophilic, anaerobic chemoli- thoautotroph, but it can also use a variety of aromatic compounds as electron donors. Here we describe the features of this organism together with the complete genome sequence and anno- tation. The 2,196,266 bp genome with its 2,567 protein-coding and 55 RNA genes was se- quenced as part of a DOE Joint Genome Institute Laboratory Sequencing Program (LSP) project.

  14. Complete genome sequence of Serratia plymuthica strain AS12

    PubMed Central

    Finlay, Roger D.; Alström, Sadhna; Goodwin, Lynne; Kyrpides, Nikos C.; Lucas, Susan; Lapidus, Alla; Bruce, David; Pitluck, Sam; Peters, Lin; Ovchinnikova, Galina; Chertkov, Olga; Han, James; Han, Cliff; Tapia, Roxanne; Detter, John C.; Land, Miriam; Hauser, Loren; Cheng, Jan-Fang; Ivanova, Natalia; Pagani, Ioanna; Klenk, Hans-Peter; Woyke, Tanja; Högberg, Nils

    2012-01-01

    A plant-associated member of the family Enterobacteriaceae, Serratia plymuthica strain AS12 was isolated from rapeseed roots. It is of scientific interest because it promotes plant growth and inhibits plant pathogens. The genome of S. plymuthica AS12 comprises a 5,443,009 bp long circular chromosome, which consists of 4,952 protein-coding genes, 87 tRNA genes and 7 rRNA operons. This genome was sequenced within the 2010 DOE-JGI Community Sequencing Program (CSP2010) as part of the project entitled “Genomics of four rapeseed plant growth promoting bacteria with antagonistic effect on plant pathogens”. PMID:22768360

  15. Complete mitochondrial genome sequence of Aoluguya reindeer (Rangifer tarandus).

    PubMed

    Ju, Yan; Liu, Huamiao; Rong, Min; Yang, Yifeng; Wei, Haijun; Shao, Yuanchen; Chen, Xiumin; Xing, Xiumei

    2016-05-01

    The complete mitochondria genome of the reindeer, Rangifer tarandus, was determined by accurate polymerase chain reaction. The entire genome is 16,357 bp in length and contains 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a D-loop region, all of which are arranged in a typical vertebrate manner. The overall base composition of the reindeer's mitochondrial genome is 33.7% of A, 23.1% of C, 30.1% of T and 13.2%of G. A termination associated sequence and several conserved central sequence block domains were discovered within the control region. PMID:25469816

  16. Single-Cell Analysis in Cancer Genomics.

    PubMed

    Saadatpour, Assieh; Lai, Shujing; Guo, Guoji; Yuan, Guo-Cheng

    2015-10-01

    Genetic changes and environmental differences result in cellular heterogeneity among cancer cells within the same tumor, thereby complicating treatment outcomes. Recent advances in single-cell technologies have opened new avenues to characterize the intra-tumor cellular heterogeneity, identify rare cell types, measure mutation rates, and, ultimately, guide diagnosis and treatment. In this paper we review the recent single-cell technological and computational advances at the genomic, transcriptomic, and proteomic levels, and discuss their applications in cancer research. PMID:26450340

  17. Corruption of genomic databases with anomalous sequence.

    PubMed Central

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-01-01

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%. PMID:1614861

  18. Corruption of genomic databases with anomalous sequence.

    PubMed

    Lamperti, E D; Kittelberger, J M; Smith, T F; Villa-Komaroff, L

    1992-06-11

    We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.

  19. Overview | Office of Cancer Genomics

    Cancer.gov

    The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative uses comprehensive molecular characterization to determine the genetic changes that drive the initiation and progression of hard-to-treat childhood cancers. TARGET aims to identify therapeutic targets and prognostic markers so that new, more effective treatment strategies can be developed and applied. Novel pediatric cancer treatments are needed because:

  20. Biased distribution of DNA uptake sequences towards genome maintenance genes.

    PubMed

    Davidsen, Tonje; Rødland, Einar A; Lagesen, Karin; Seeberg, Erling; Rognes, Torbjørn; Tønjum, Tone

    2004-01-01

    Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H.influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions. These results imply that the high frequency of DUS in genome maintenance genes is conserved among phylogenetically divergent species and thus are of significant biological importance. Increased DUS density is expected to enhance DNA uptake and the over-representation of DUS in genome maintenance genes might reflect facilitated recovery of genome preserving functions. For example, transient and beneficial increase in genome instability can be allowed during pathogenesis simply through loss of antimutator genes, since these DUS-containing sequences will be preferentially recovered. Furthermore, uptake of such genes could provide a mechanism for facilitated recovery from DNA damage after genotoxic stress. PMID:14960717

  1. Emerging knowledge from genome sequencing of crop species.

    PubMed

    Barabaschi, Delfina; Guerra, Davide; Lacrima, Katia; Laino, Paolo; Michelotti, Vania; Urso, Simona; Valè, Giampiero; Cattivelli, Luigi

    2012-03-01

    Extensive insights into the genome composition, organization, and evolution have been gained from the plant genome sequencing and annotation ongoing projects. The analysis of crop genomes provided surprising evidences with important implications in plant origin and evolution: genome duplication, ancestral re-arrangements and unexpected polyploidization events opened new doors to address fundamental questions related to species proliferation, adaptation, and functional modulations. Detailed paleogenomic analysis led to many speculation on how chromosomes have been shaped over time in terms of gene content and order. The completion of the genome sequences of several major crops, prompted to a detailed identification and annotation of transposable elements: new hypothesis related to their composition, chromosomal distribution, insertion models, amplification rate, and evolution patterns are coming up. Availability of full genome sequence of several crop species as well as from many accessions within species is providing new keys for biodiversity exploitation and interpretation. Re-sequencing is enabling high-throughput genotyping to identify a wealth of SNP and afterward to produce haplotype maps necessary to accurately associate molecular variation to phenotype. Conservation genomics is emerging as a powerful tool to explain adaptation, genetic drift, natural selection, hybridization and to estimate genetic variation, fitness and population's viability. PMID:21822975

  2. Sequencing of a new target genome: the Pediculus humanus humanus (Phthiraptera: Pediculidae) genome project.

    PubMed

    Pittendrigh, B R; Clark, J M; Johnston, J S; Lee, S H; Romero-Severson, J; Dasch, G A

    2006-11-01

    The human body louse, Pediculus humanus humanus (L.), and the human head louse, Pediculus humanus capitis, belong to the hemimetabolous order Phthiraptera. The body louse is the primary vector that transmits the bacterial agents of louse-borne relapsing fever, trench fever, and epidemic typhus. The genomes of the bacterial causative agents of several of these aforementioned diseases have been sequenced. Thus, determining the body louse genome will enhance studies of host-vector-pathogen interactions. Although not important as a major disease vector, head lice are of major social concern. Resistance to traditional pesticides used to control head and body lice have developed. It is imperative that new molecular targets be discovered for the development of novel compounds to control these insects. No complete genome sequence exists for a hemimetabolous insect species primarily because hemimetabolous insects often have large (2000 Mb) to very large (up to 16,300 Mb) genomes. Fortuitously, we determined that the human body louse has one of the smallest genome sizes known in insects, suggesting it may be a suitable choice as a minimal hemimetabolous genome in which many genes have been eliminated during its adaptation to human parasitism. Because many louse species infest birds and mammals, the body louse genome-sequencing project will facilitate studies of their comparative genomics. A 6-8X coverage of the body louse genome, plus sequenced expressed sequence tags, should provide the entomological, evolutionary biology, medical, and public health communities with useful genetic information.

  3. Widespread mitovirus sequences in plant genomes

    PubMed Central

    Warner, Benjamin E.; Yerramsetty, Pradeep

    2015-01-01

    The exploration of the evolution of RNA viruses has been aided recently by the discovery of copies of fragments or complete genomes of non-retroviral RNA viruses (Non-retroviral Endogenous RNA Viral Elements, or NERVEs) in many eukaryotic nuclear genomes. Among the most prominent NERVEs are partial copies of the RNA dependent RNA polymerase (RdRP) of the mitoviruses in plant mitochondrial genomes. Mitoviruses are in the family Narnaviridae, which are the simplest viruses, encoding only a single protein (the RdRP) in their unencapsidated viral plus strand. Narnaviruses are known only in fungi, and the origin of plant mitochondrial mitovirus NERVEs appears to be horizontal transfer from plant pathogenic fungi. At least one mitochondrial mitovirus NERVE, but not its nuclear copy, is expressed. PMID:25870770

  4. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    SciTech Connect

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  5. Sequencing and comparative analyses of the genomes of zoysiagrasses

    PubMed Central

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-01-01

    Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196

  6. Sequencing and comparative analyses of the genomes of zoysiagrasses.

    PubMed

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-04-01

    Zoysiais a warm-season turfgrass, which comprises 11 allotetraploid species (2n= 4x= 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession 'Nagirizaki' (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella 'Wakaba' and Z. pacifica 'Zanpa' were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica'Kyoto', Z. japonica'Miyagi' and Z. matrella'Chiba Fair Green', were accumulated, and aligned against the reference genome of 'Nagirizaki' along with those from 'Wakaba' and 'Zanpa'. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the 'Zoysia Genome Database' at http://zoysia.kazusa.or.jp.

  7. Complete genome sequence of equine herpesvirus type 9.

    PubMed

    Fukushi, Hideto; Yamaguchi, Tsuyoshi; Yamada, Souichi

    2012-12-01

    Equine herpesvirus type 9 (EHV-9), which we isolated from a case of epizootic encephalitis in a herd of Thomson's gazelles (Gazella thomsoni) in 1993, has been known to cause fatal encephalitis in Thomson's gazelle, giraffe, and polar bear in natural infections. Our previous report indicated that EHV-9 was similar to the equine pathogen equine herpesvirus type 1 (EHV-1), which mainly causes abortion, respiratory infection, and equine herpesvirus myeloencephalopathy. We determined the genome sequence of EHV-9. The genome has a length of 148,371 bp and all 80 of the open reading frames (ORFs) found in the genome of EHV-1. The nucleotide sequences of the ORFs in EHV-9 were 86 to 95% identical to those in EHV-1. The whole genome sequence should help to reveal the neuropathogenicity of EHV-9. PMID:23166237

  8. Complete genome sequence of Streptobacillus moniliformis type strain (9901T)

    SciTech Connect

    Nolan, Matt; Gronow, Sabine; Lapidus, Alla L.; Ivanova, N; Copeland, A; Lucas, Susan; Glavina Del Rio, Tijana; Chen, Feng; Sims, David; Meincke, Linda; Bruce, David; Goodwin, Lynne A.; Han, Cliff; Detter, J. Chris; Ovchinnikova, Galina; Pati, Amrita; Mavromatis, K; Mikhailova, Natalia; Chen, Amy; Palaniappan, Krishna; Land, Miriam L; Hauser, Loren John; Chang, Yun-Juan; Jeffries, Cynthia; Rohde, Manfred; Sproer, Cathrin; Goker, Markus; Bristow, James; Eisen, Jonathan; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Chain, Patrick S. G.

    2009-01-01

    Streptobacillus moniliformis Levaditi et al. 1925 is the sole and type species of the genus, and is of phylogenetic interest because of its isolated location in the sparsely populated and neither taxonomically nor genomically much accessed family 'Leptotrichiaceae' within the phylum 'Fusobacteria'. S. moniliformis, a Gram-negative, non-motile and pleomorphic bacterium, is the etiologic agent of rat bite fever and Haverhill fever. Strain 9901T, the type strain of the species, was isolated from a patient with rat bite fever. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is only the second completed genome sequence of the order 'Fusobacteriales' and no more than the third sequence from the phylum 'Fusobacteria'. The 1,662,578 bp long chromosome and the 10,702 bp plasmid with a total of 1511 protein-coding and 55 RNA genes are part of the Genomic Encyclopedia of Bacteria and Archaea project.

  9. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon. PMID:25746396

  10. Comparison of mitochondrial genome sequences of pangolins (Mammalia, Pholidota).

    PubMed

    Hassanin, Alexandre; Hugot, Jean-Pierre; van Vuuren, Bettine Jansen

    2015-04-01

    The complete mitochondrial genome was sequenced for three species of pangolins, Manis javanica, Phataginus tricuspis, and Smutsia temminckii, and comparisons were made with two other species, Manis pentadactyla and Phataginus tetradactyla. The genome of Manidae contains the 37 genes found in a typical mammalian genome, and the structure of the control region is highly conserved among species. In Manis, the overall base composition differs from that found in African genera. Phylogenetic analyses support the monophyly of the genera Manis, Phataginus, and Smutsia, as well as the basal division between Maninae and Smutsiinae. Comparisons with GenBank sequences reveal that the reference genomes of M. pentadactyla and P. tetradactyla (accession numbers NC_016008 and NC_004027) were sequenced from misidentified taxa, and that a new species of tree pangolin should be described in Gabon.

  11. Draft genome sequence of adzuki bean, Vigna angularis.

    PubMed

    Kang, Yang Jae; Satyawan, Dani; Shim, Sangrea; Lee, Taeyoung; Lee, Jayern; Hwang, Won Joo; Kim, Sue K; Lestari, Puji; Laosatit, Kularb; Kim, Kil Hyun; Ha, Tae Joung; Chitikineni, Annapurna; Kim, Moon Young; Ko, Jong-Min; Gwag, Jae-Gyun; Moon, Jung-Kyung; Lee, Yeong-Ho; Park, Beom-Seok; Varshney, Rajeev K; Lee, Suk-Ha

    2015-01-28

    Adzuki bean (Vigna angularis var. angularis) is a dietary legume crop in East Asia. The presumed progenitor (Vigna angularis var. nipponensis) is widely found in East Asia, suggesting speciation and domestication in these temperate climate regions. Here, we report a draft genome sequence of adzuki bean. The genome assembly covers 75% of the estimated genome and was mapped to 11 pseudo-chromosomes. Gene prediction revealed 26,857 high confidence protein-coding genes evidenced by RNAseq of different tissues. Comparative gene expression analysis with V. radiata showed that the tissue specificity of orthologous genes was highly conserved. Additional re-sequencing of wild adzuki bean, V. angularis var. nipponensis, and V. nepalensis, was performed to analyze the variations between cultivated and wild adzuki bean. The determined divergence time of adzuki bean and the wild species predated archaeology-based domestication time. The present genome assembly will accelerate the genomics-assisted breeding of adzuki bean.

  12. Long-read sequence assembly of the gorilla genome.

    PubMed

    Gordon, David; Huddleston, John; Chaisson, Mark J P; Hill, Christopher M; Kronenberg, Zev N; Munson, Katherine M; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K; Haussler, David; Chin, Chen-Shan; Eichler, Evan E

    2016-04-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome.

  13. Draft genome sequence of adzuki bean, Vigna angularis.

    PubMed

    Kang, Yang Jae; Satyawan, Dani; Shim, Sangrea; Lee, Taeyoung; Lee, Jayern; Hwang, Won Joo; Kim, Sue K; Lestari, Puji; Laosatit, Kularb; Kim, Kil Hyun; Ha, Tae Joung; Chitikineni, Annapurna; Kim, Moon Young; Ko, Jong-Min; Gwag, Jae-Gyun; Moon, Jung-Kyung; Lee, Yeong-Ho; Park, Beom-Seok; Varshney, Rajeev K; Lee, Suk-Ha

    2015-01-01

    Adzuki bean (Vigna angularis var. angularis) is a dietary legume crop in East Asia. The presumed progenitor (Vigna angularis var. nipponensis) is widely found in East Asia, suggesting speciation and domestication in these temperate climate regions. Here, we report a draft genome sequence of adzuki bean. The genome assembly covers 75% of the estimated genome and was mapped to 11 pseudo-chromosomes. Gene prediction revealed 26,857 high confidence protein-coding genes evidenced by RNAseq of different tissues. Comparative gene expression analysis with V. radiata showed that the tissue specificity of orthologous genes was highly conserved. Additional re-sequencing of wild adzuki bean, V. angularis var. nipponensis, and V. nepalensis, was performed to analyze the variations between cultivated and wild adzuki bean. The determined divergence time of adzuki bean and the wild species predated archaeology-based domestication time. The present genome assembly will accelerate the genomics-assisted breeding of adzuki bean. PMID:25626881

  14. Long-read sequence assembly of the gorilla genome

    PubMed Central

    Gordon, David; Huddleston, John; Chaisson, Mark J. P.; Hill, Christopher M.; Kronenberg, Zev N.; Munson, Katherine M.; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W.; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K.; Haussler, David; Chin, Chen-Shan; Eichler, Evan E.

    2016-01-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  15. Long-read sequence assembly of the gorilla genome.

    PubMed

    Gordon, David; Huddleston, John; Chaisson, Mark J P; Hill, Christopher M; Kronenberg, Zev N; Munson, Katherine M; Malig, Maika; Raja, Archana; Fiddes, Ian; Hillier, LaDeana W; Dunn, Christopher; Baker, Carl; Armstrong, Joel; Diekhans, Mark; Paten, Benedict; Shendure, Jay; Wilson, Richard K; Haussler, David; Chin, Chen-Shan; Eichler, Evan E

    2016-04-01

    Accurate sequence and assembly of genomes is a critical first step for studies of genetic variation. We generated a high-quality assembly of the gorilla genome using single-molecule, real-time sequence technology and a string graph de novo assembly algorithm. The new assembly improves contiguity by two to three orders of magnitude with respect to previously released assemblies, recovering 87% of missing reference exons and incomplete gene models. Although regions of large, high-identity segmental duplications remain largely unresolved, this comprehensive assembly provides new biological insight into genetic diversity, structural variation, gene loss, and representation of repeat structures within the gorilla genome. The approach provides a path forward for the routine assembly of mammalian genomes at a level approaching that of the current quality of the human genome. PMID:27034376

  16. Complete Genome Sequence of Zika Virus Isolated from Semen

    PubMed Central

    Graham, Victoria; Lewandowski, Kuiama; Dowall, Stuart D.; Pullan, Steven T.; Hewson, Roger

    2016-01-01

    Zika virus (ZIKV) is an emerging pathogenic flavivirus currently circulating in numerous countries in South America, the Caribbean, and the Western Pacific Region. Using an unbiased metagenomic sequencing approach, we report here the first complete genome sequence of ZIKV isolated from a clinical semen sample. PMID:27738033

  17. Draft Genome Sequence of Biocontrol Agent Bacillus cereus UW85

    PubMed Central

    Lozano, Gabriel L.; Holt, Jonathan; Rasko, David A.; Thomas, Michael G.

    2016-01-01

    Bacillus cereus UW85 was isolated from a root of a field-grown alfalfa plant from Arlington, WI, and identified for its ability to suppress damping off, a disease caused by Phytophthora megasperma f. sp. medicaginis on alfalfa. Here, we report the draft genome sequence of B. cereus UW85, obtained by a combination of Sanger and Illumina sequencing. PMID:27587823

  18. Complete Genome Sequence of the Alfalfa latent virus

    PubMed Central

    Shao, Jonathan; Postnikova, Olga A.

    2015-01-01

    The first complete genome sequence of the Alfalfa latent carlavirus (ALV) was obtained by primer walking and Illumina RNA sequencing. The virus differs substantially from the Czech ALV isolate and the Pea streak virus isolate from Wisconsin. The absence of a clear nucleic acid-binding protein indicates ALV divergence from other carlaviruses. PMID:25883281

  19. Complete Genomic Sequence of Issyk-Kul Virus

    PubMed Central

    Marston, Denise A.; Ellis, Richard J.; Fooks, Anthony R.; Hewson, Roger

    2015-01-01

    Issyk-Kul virus (ISKV) is an ungrouped virus tentatively assigned to the Bunyaviridae family and is associated with an acute febrile illness in several central Asian countries. Using next-generation sequencing technologies, we report here the full-genome sequence for this novel unclassified arboviral pathogen circulating in central Asia. PMID:26139711

  20. Complete Genomic Sequence of Issyk-Kul Virus.

    PubMed

    Atkinson, Barry; Marston, Denise A; Ellis, Richard J; Fooks, Anthony R; Hewson, Roger

    2015-07-02

    Issyk-Kul virus (ISKV) is an ungrouped virus tentatively assigned to the Bunyaviridae family and is associated with an acute febrile illness in several central Asian countries. Using next-generation sequencing technologies, we report here the full-genome sequence for this novel unclassified arboviral pathogen circulating in central Asia.