Science.gov

Sample records for multi-platform whole-genome microarray

  1. Microarray-based whole-genome hybridization as a tool for determining procaryotic species relatedness

    SciTech Connect

    Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.

    2008-01-15

    The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.

  2. Construction and Evaluation of a Clostridium thermocellum ATCC 27405 Whole-Genome Oligonucleotide Microarray

    NASA Astrophysics Data System (ADS)

    Brown, Steven D.; Raman, Babu; McKeown, Catherine K.; Kale, Shubha P.; He, Zhili; Mielenz, Jonathan R.

    Clostridium thermocellum is an anaerobic, thermophilic bacterium that can directly convert cellulosic substrates into ethanol. Microarray technology is a powerful tool to gain insights into cellular processes by examining gene expression under various physiological states. Oligonucleotide microarray probes were designed for 96.7% of the 3163 C. thermocellum ATCC 27405 candidate protein-encoding genes and then a partial-genome microarray containing 70 C. thermocellum specific probes was constructed and evaluated. We detected a signal-to-noise ratio of three with as little as 1.0 ng of genomic DNA and only low signals from negative control probes (nonclostridial DNA), indicating the probes were sensitive and specific. In order to further test the specificity of the array we amplified and hybridized 10 C. thermocellum polymerase chain reaction products that represented different genes and found gene specific hybridization in each case. We also constructed a whole-genome microarray and prepared total cellular RNA from the same point in early-logarithmic growth phase from two technical replicates during cellobiose fermentation. The reliability of the microarray data was assessed by cohybridization of labeled complementary DNA from the cellobiose fermentation samples and the pattern of hybridization revealed a linear correlation. These results taken together suggest that our oligonucleotide probe set can be used for sensitive and specific C. thermocellum transcriptomic studies in the future.

  3. Comprehensive Analysis of Prokaryotes in Environmental Water Using DNA Microarray Analysis and Whole Genome Amplification

    PubMed Central

    Akama, Takeshi; Kawashima, Akira; Tanigawa, Kazunari; Hayashi, Moyuru; Ishido, Yuko; Luo, Yuqian; Hata, Akihisa; Fujitani, Noboru; Ishii, Norihisa; Suzuki, Koichi

    2013-01-01

    The microflora in environmental water consists of a high density and diversity of bacterial species that form the foundation of the water ecosystem. Because the majority of these species cannot be cultured in vitro, a different approach is needed to identify prokaryotes in environmental water. A novel DNA microarray was developed as a simplified detection protocol. Multiple DNA probes were designed against each of the 97,927 sequences in the DNA Data Bank of Japan and mounted on a glass chip in duplicate. Evaluation of the microarray was performed using the DNA extracted from one liter of environmental water samples collected from seven sites in Japan. The extracted DNA was uniformly amplified using whole genome amplification (WGA), labeled with Cy3-conjugated 16S rRNA specific primers and hybridized to the microarray. The microarray successfully identified soil bacteria and environment-specific bacteria clusters. The DNA microarray described herein can be a useful tool in evaluating the diversity of prokaryotes and assessing environmental changes such as global warming. PMID:25437334

  4. Construction and evaluation of a Clostridium thermocellum ATCC 27405 whole-genome oligonucleotide microarray

    SciTech Connect

    Brown, Steven David; Raman, Babu; McKeown, Catherine K; Kale, Shubhangi P; He, Zhili; Mielenz, Jonathan R

    2007-04-01

    Clostridium thermocellum is an anaerobic, thermophilic bacterium that can directly convert cellulosic substrates into ethanol. Microarray technology is a powerful tool to gain insights into cellular processes by examining gene expression under various physiological states. Oligonucleotide microarray probes were designed for 96.7% of the 3163 C. thermocellum ATCC 27405 candidate protein-encoding genes and then a partial-genome microarray containing 70 C. thermocellum specific probes was constructed and evaluated. We detected a signal-to-noise ratio of three with as little as 1.0 ng of genomic DNA and only low signals from negative control probes (nonclostridial DNA), indicating the probes were sensitive and specific. In order to further test the specificity of the array we amplified and hybridized 10 C. thermocellum polymerase chain reaction products that represented different genes and found gene specific hybridization in each case. We also constructed a whole-genome microarray and prepared total cellular RNA from the same point in early-logarithmic growth phase from two technical replicates during cellobiose fermentation. The reliability of the microarray data was assessed by cohybridization of labeled complementary DNA from the cellobiose fermentation samples and the pattern of hybridization revealed a linear correlation. These results taken together suggest that our oligonucleotide probe set can be used for sensitive and specific C. thermocellum transcriptomic studies in the future.

  5. Construction of Whole Genome Microarrays, and Expression Analysis of Desulfovibrio vulgaris cells in Metal-Reducing Conditions (Uranium and Chromium)

    SciTech Connect

    Fields, Matthew W.

    2005-06-01

    One of the major goals of the project is to construct whole-genome microarrays for Desulfovibrio vulgaris. Previous whole-genome microarrays constructed at ORNL have been PCR-amplimer based, and we wanted to re-evaluate the type of microarrays being built because oligonucleotide probes have several advantages. Microarrays have been generally constructed with two types of probes, PCR-generated probes that typically range in size between 200 and 2000 bp, and oligonucleotide probes with typical size of 20-70 nt. Producing PCR product-based DNA arrays can be a time-consuming procedure that includes PCR primer design, amplification, size verification, product purification, and product quantification. Also, some ORFs are difficult to amplify and thus the construction of comprehensive arrays can be a challenge. Recently, to alleviate some of the problems associated with PCR product-based microarrays, oligonucleotide microarrays that contain probes longer than 40 nt have been evaluated and used for whole genome expression studies. These microarrays should have higher specificity and are easy to construct, and can thus provide an important alternative approach to monitor gene expression. However, due to the smaller probe size, it is expected that the detection sensitivity of oligonucleotide arrays will be lower than PCR product-based probes.

  6. Construction and Evaluation of Desulfovibrio vulgaris Whole-Genome Oligonucleotide Microarrays

    SciTech Connect

    Z. He; Q. He; L. Wu; M.E. Clark; J.D. Wall; Jizhong Zhou; Matthew W. Fields

    2004-03-17

    Desulfovibrio vulgaris Hildenborough has been the focus of biochemical and physiological studies in the laboratory, and the metabolic versatility of this organism has been largely recognized, particularly the reduction of sulfate, fumarate, iron, uranium and chromium. In addition, a Desulfovibrio sp. has been shown to utilize uranium as the sole electron acceptor. D. vulgaris is a d-Proteobacterium with a genome size of 3.6 Mb and 3584 ORFs. The whole-genome microarrays of D. vulgaris have been constructed using 70mer oligonucleotides. All ORFs in the genome were represented with 3471 (97.1%) unique probes and 103 (2.9%) non-specific probes that may have cross-hybridization with other ORFs. In preparation for use of the experimental microarrays, artificial probes and targets were designed to assess specificity and sensitivity and identify optimal hybridization conditions for oligonucleotide microarrays. The results indicated that for 50mer and 70mer oligonucleotide arrays, hybridization at 45 C to 50 C, washing at 37 C and a wash time of 2.5 to 5 minutes obtained specific and strong hybridization signals. In order to evaluate the performance of the experimental microarrays, growth conditions were selected that were expected to give significant hybridization differences for different sets of genes. The initial evaluations were performed using D. vulgaris cells grown at logarithmic and stationary phases. Transcriptional analysis of D. vulgaris cells sampled during logarithmic phase growth indicated that 25% of annotated ORFs were up-regulated and 3% of annotated ORFs were downregulated compared to stationary phase cells. The up-regulated genes included ORFs predicted to be involved with acyl chain biosynthesis, amino acid ABC transporter, translational initiation factors, and ribosomal proteins. In the stationary phase growth cells, the two most up-regulated ORFs (70-fold) were annotated as a carboxynorspermidine decarboxylase and a 2C-methyl-D-erythritol-2

  7. Whole genome microarray analysis in non-small cell lung cancer

    PubMed Central

    AL Zeyadi, Mohammad; Dimova, Ivanka; Ranchich, Vladislav; Rukova, Blaga; Nesheva, Desislava; Hamude, Zora; Georgiev, Sevdalin; Petrov, Danail; Toncheva, Draga

    2015-01-01

    Lung cancer is a serious health problem, since it is one of the leading causes for death worldwide. Molecular–cytogenetic studies could provide reliable data about genetic alterations which could be related to disease pathogenesis and be used for better prognosis and treatment strategies. We performed whole genome oligonucleotide microarray-based comparative genomic hybridization in 10 samples of non-small cell lung cancer. Trisomies were discovered for chromosomes 1, 13, 18 and 20. Chromosome arms 5p, 7p, 11q, 20q and Хq were affected by genetic gains, and 1p, 5q, 10q and 15q, by genetic losses. Microstructural (<5 Mbp) genomic aberrations were revealed: gains in regions 7p (containing the epidermal growth factor receptor gene) and 12p (containing KRAS) and losses in 3p26 and 4q34. Based on high amplitude of alterations and small overlapping regions, new potential oncogenes may be suggested: NBPF4 (1p13.3); ETV1, AGR3 and TSPAN13 (7p21.3-7p21.1); SOX5 and FGFR1OP2 (12p12.1-12p11.22); GPC6 (13q32.1). Significant genetic losses were assumed to contain potential tumour-suppressor genes: DPYD (1p21.3); CLDN22, CLDN24, ING2, CASP3, SORBS2 (4q34.2-q35.1); DEFB (8p23.1). Our results complement the picture of genomic characterization of non-small cell lung cancer. PMID:26019623

  8. Whole genome protein microarrays for serum profiling of immunodominant antigens of Bacillus anthracis.

    PubMed

    Kempsell, Karen E; Kidd, Stephen P; Lewandowski, Kuiama; Elmore, Michael J; Charlton, Sue; Yeates, Annemarie; Cuthbertson, Hannah; Hallis, Bassam; Altmann, Daniel M; Rogers, Mitch; Wattiau, Pierre; Ingram, Rebecca J; Brooks, Tim; Vipond, Richard

    2015-01-01

    A commercial Bacillus anthracis (Anthrax) whole genome protein microarray has been used to identify immunogenic Anthrax proteins (IAP) using sera from groups of donors with (a) confirmed B. anthracis naturally acquired cutaneous infection, (b) confirmed B. anthracis intravenous drug use-acquired infection, (c) occupational exposure in a wool-sorters factory, (d) humans and rabbits vaccinated with the UK Anthrax protein vaccine and compared to naïve unexposed controls. Anti-IAP responses were observed for both IgG and IgA in the challenged groups; however the anti-IAP IgG response was more evident in the vaccinated group and the anti-IAP IgA response more evident in the B. anthracis-infected groups. Infected individuals appeared somewhat suppressed for their general IgG response, compared with other challenged groups. Immunogenic protein antigens were identified in all groups, some of which were shared between groups whilst others were specific for individual groups. The toxin proteins were immunodominant in all vaccinated, infected or other challenged groups. However, a number of other chromosomally-located and plasmid encoded open reading frame proteins were also recognized by infected or exposed groups in comparison to controls. Some of these antigens e.g., BA4182 are not recognized by vaccinated individuals, suggesting that there are proteins more specifically expressed by live Anthrax spores in vivo that are not currently found in the UK licensed Anthrax Vaccine (AVP). These may perhaps be preferentially expressed during infection and represent expression of alternative pathways in the B. anthracis "infectome." These may make highly attractive candidates for diagnostic and vaccine biomarker development as they may be more specifically associated with the infectious phase of the pathogen. A number of B. anthracis small hypothetical protein targets have been synthesized, tested in mouse immunogenicity studies and validated in parallel using human sera from the

  9. Whole genome protein microarrays for serum profiling of immunodominant antigens of Bacillus anthracis

    PubMed Central

    Kempsell, Karen E.; Kidd, Stephen P.; Lewandowski, Kuiama; Elmore, Michael J.; Charlton, Sue; Yeates, Annemarie; Cuthbertson, Hannah; Hallis, Bassam; Altmann, Daniel M.; Rogers, Mitch; Wattiau, Pierre; Ingram, Rebecca J.; Brooks, Tim; Vipond, Richard

    2015-01-01

    A commercial Bacillus anthracis (Anthrax) whole genome protein microarray has been used to identify immunogenic Anthrax proteins (IAP) using sera from groups of donors with (a) confirmed B. anthracis naturally acquired cutaneous infection, (b) confirmed B. anthracis intravenous drug use-acquired infection, (c) occupational exposure in a wool-sorters factory, (d) humans and rabbits vaccinated with the UK Anthrax protein vaccine and compared to naïve unexposed controls. Anti-IAP responses were observed for both IgG and IgA in the challenged groups; however the anti-IAP IgG response was more evident in the vaccinated group and the anti-IAP IgA response more evident in the B. anthracis-infected groups. Infected individuals appeared somewhat suppressed for their general IgG response, compared with other challenged groups. Immunogenic protein antigens were identified in all groups, some of which were shared between groups whilst others were specific for individual groups. The toxin proteins were immunodominant in all vaccinated, infected or other challenged groups. However, a number of other chromosomally-located and plasmid encoded open reading frame proteins were also recognized by infected or exposed groups in comparison to controls. Some of these antigens e.g., BA4182 are not recognized by vaccinated individuals, suggesting that there are proteins more specifically expressed by live Anthrax spores in vivo that are not currently found in the UK licensed Anthrax Vaccine (AVP). These may perhaps be preferentially expressed during infection and represent expression of alternative pathways in the B. anthracis “infectome.” These may make highly attractive candidates for diagnostic and vaccine biomarker development as they may be more specifically associated with the infectious phase of the pathogen. A number of B. anthracis small hypothetical protein targets have been synthesized, tested in mouse immunogenicity studies and validated in parallel using human sera from

  10. Expression profiling of five different xenobiotics using a Caenorhabditis elegans whole genome microarray.

    PubMed

    Reichert, Kerstin; Menzel, Ralph

    2005-10-01

    The soil nematode Caenorhabditis elegans is frequently used in ecotoxicological studies due to its wide distribution in terrestrial habitats, its easy handling in the laboratory, and its sensitivity against different kinds of stress. Since its genome has been completely sequenced, more and more studies are investigating the functional relation of gene expression and phenotypic response. For these reasons C. elegans seems to be an attractive animal for the development of a new, genome based, ecotoxicological test system. In recent years, the DNA array technique has been established as a powerful tool to obtain distinct gene expression patterns in response to different experimental conditions. Using a C. elegans whole genome DNA microarray in this study, the effects of five different xenobiotics on the gene expression of the nematode were investigated. The exposure time for the following five applied compounds beta-NF (5 mg/l), Fla (0.5 mg/l), atrazine (25 mg/l), clofibrate (10 mg/l) and DES (0.5 mg/l) was 48+/-5 h. The analysis of the data showed a clear induction of 203 genes belonging to different families like the cytochromes P450, UDP-glucoronosyltransferases (UDPGT), glutathione S-transferases (GST), carboxylesterases, collagenes, C-type lectins and others. Under the applied conditions, fluoranthene was able to induce most of the induceable genes, followed by clofibrate, atrazine, beta-naphthoflavone and diethylstilbestrol. A decreased expression could be shown for 153 genes with atrazine having the strongest effect followed by fluoranthene, diethylstilbestrol, beta-naphthoflavone and clofibrate. For upregulated genes a change ranging from approximately 2.1- till 42.3-fold and for downregulated genes from approximately 2.1 till 6.6-fold of gene expression could be affected through the applied xenobiotics. The results confirm the applicability of the gene expression for the development of an ecotoxicological test system. Compared to classical tests the main

  11. Detecting Staphylococcus aureus Virulence and Resistance Genes: a Comparison of Whole-Genome Sequencing and DNA Microarray Technology.

    PubMed

    Strauß, Lena; Ruffing, Ulla; Abdulla, Salim; Alabi, Abraham; Akulenko, Ruslan; Garrine, Marcelino; Germann, Anja; Grobusch, Martin Peter; Helms, Volkhard; Herrmann, Mathias; Kazimoto, Theckla; Kern, Winfried; Mandomando, Inácio; Peters, Georg; Schaumburg, Frieder; von Müller, Lutz; Mellmann, Alexander

    2016-04-01

    Staphylococcus aureusis a major bacterial pathogen causing a variety of diseases ranging from wound infections to severe bacteremia or intoxications. Besides host factors, the course and severity of disease is also widely dependent on the genotype of the bacterium. Whole-genome sequencing (WGS), followed by bioinformatic sequence analysis, is currently the most extensive genotyping method available. To identify clinically relevant staphylococcal virulence and resistance genes in WGS data, we developed anin silicotyping scheme for the software SeqSphere(+)(Ridom GmbH, Münster, Germany). The implemented target genes (n= 182) correspond to those queried by the IdentibacS. aureusGenotyping DNA microarray (Alere Technologies, Jena, Germany). Thein silicoscheme was evaluated by comparing the typing results of microarray and of WGS for 154 humanS. aureusisolates. A total of 96.8% (n= 27,119) of all typing results were equally identified with microarray and WGS (40.6% present and 56.2% absent). Discrepancies (3.2% in total) were caused by WGS errors (1.7%), microarray hybridization failures (1.3%), wrong prediction of ambiguous microarray results (0.1%), or unknown causes (0.1%). Superior to the microarray, WGS enabled the distinction of allelic variants, which may be essential for the prediction of bacterial virulence and resistance phenotypes. Multilocus sequence typing clonal complexes and staphylococcal cassette chromosomemecelement types inferred from microarray hybridization patterns were equally determined by WGS. In conclusion, WGS may substitute array-based methods due to its universal methodology, open and expandable nature, and rapid parallel analysis capacity for different characteristics in once-generated sequences. PMID:26818676

  12. Differential Gene Expression Analysis of Placentas with Increased Vascular Resistance and Pre-Eclampsia Using Whole-Genome Microarrays

    PubMed Central

    Centlow, M.; Wingren, C.; Borrebaeck, C.; Brownstein, M. J.; Hansson, S. R.

    2011-01-01

    Pre-eclampsia is a pregnancy complication characterized by hypertension and proteinuria. There are several factors associated with an increased risk of developing pre-eclampsia, one of which is increased uterine artery resistance, referred to as “notching”. However, some women do not progress into pre-eclampsia whereas others may have a higher risk of doing so. The placenta, central in pre-eclampsia pathology, may express genes associated with either protection or progression into pre-eclampsia. In order to search for genes associated with protection or progression, whole-genome profiling was performed. Placental tissue from 15 controls, 10 pre-eclamptic, 5 pre-eclampsia with notching, and 5 with notching only were analyzed using microarray and antibody microarrays to study some of the same gene product and functionally related ones. The microarray showed 148 genes to be significantly altered between the four groups. In the preeclamptic group compared to notch only, there was increased expression of genes related to chemotaxis and the NF-kappa B pathway and decreased expression of genes related to antigen processing and presentation, such as human leukocyte antigen B. Our results indicate that progression of pre-eclampsia from notching may involve the development of inflammation. Increased expression of antigen-presenting genes, as seen in the notch-only placenta, may prevent this inflammatory response and, thereby, protect the patient from developing pre-eclampsia. PMID:21490790

  13. Construction and evaluation of a whole genome microarray of Chlamydomonas reinhardtii

    PubMed Central

    2011-01-01

    Background Chlamydomonas reinhardtii is widely accepted as a model organism regarding photosynthesis, circadian rhythm, cell mobility, phototaxis, and biotechnology. The complete annotation of the genome allows transcriptomic studies, however a new microarray platform was needed. Based on the completed annotation of Chlamydomonas reinhardtii a new microarray on an Agilent platform was designed using an extended JGI 3.1 genome data set which included 15000 transcript models. Results In total 44000 probes were determined (3 independent probes per transcript model) covering 93% of the transcriptome. Alignment studies with the recently published AUGUSTUS 10.2 annotation confirmed 11000 transcript models resulting in a very good coverage of 70% of the transcriptome (17000). Following the estimation of 10000 predicted genes in Chlamydomonas reinhardtii our new microarray, nevertheless, covers the expected genome by 90-95%. Conclusions To demonstrate the capabilities of the new microarray, we analyzed transcript levels for cultures grown under nitrogen as well as sulfate limitation, and compared the results with recently published microarray and RNA-seq data. We could thereby confirm previous results derived from data on nutrient-starvation induced gene expression of a group of genes related to protein transport and adaptation of the metabolism as well as genes related to efficient light harvesting, light energy distribution and photosynthetic electron transport. PMID:22118351

  14. Comparative genomic analysis of Acidithiobacillus ferrooxidans strains using the A. ferrooxidans ATCC 23270 whole-genome oligonucleotide microarray.

    PubMed

    Luo, Hailang; Shen, Li; Yin, Huaqun; Li, Qian; Chen, Qijiong; Luo, Yanjie; Liao, Liqin; Qiu, Guanzhou; Liu, Xueduan

    2009-05-01

    Acidithiobacillus ferrooxidans is an important microorganism used in biomining operations for metal recovery. Whole-genomic diversity analysis based on the oligonucleotide microarray was used to analyze the gene content of 12 strains of A. ferrooxidans purified from various mining areas in China. Among the 3100 open reading frames (ORFs) on the slides, 1235 ORFs were absent in at least 1 strain of bacteria and 1385 ORFs were conserved in all strains. The hybridization results showed that these strains were highly diverse from a genomic perspective. The hybridization results of 4 major functional gene categories, namely electron transport, carbon metabolism, extracellular polysaccharides, and detoxification, were analyzed. Based on the hybridization signals obtained, a phylogenetic tree was built to analyze the evolution of the 12 tested strains, which indicated that the geographic distribution was the main factor influencing the strain diversity of these strains. Based on the hybridization signals of genes associated with bioleaching, another phylogenetic tree showed an evolutionary relationship from which the co-relation between the clustering of specific genes and geochemistry could be observed. The results revealed that the main factor was geochemistry, among which the following 6 factors were the most important: pH, Mg, Cu, S, Fe, and Al. PMID:19483787

  15. Epigenetic mapping and functional analysis in a breast cancer metastasis model using whole-genome promoter tiling microarrays

    PubMed Central

    Rodenhiser, David I; Andrews, Joseph; Kennette, Wendy; Sadikovic, Bekim; Mendlowitz, Ariel; Tuck, Alan B; Chambers, Ann F

    2008-01-01

    Introduction Breast cancer metastasis is a complex, multi-step biological process. Genetic mutations along with epigenetic alterations in the form of DNA methylation patterns and histone modifications contribute to metastasis-related gene expression changes and genomic instability. So far, these epigenetic contributions to breast cancer metastasis have not been well characterized, and there is only a limited understanding of the functional mechanisms affected by such epigenetic alterations. Furthermore, no genome-wide assessments have been undertaken to identify altered DNA methylation patterns in the context of metastasis and their effects on specific functional pathways or gene networks. Methods We have used a human gene promoter tiling microarray platform to analyze a cell line model of metastasis to lymph nodes composed of a poorly metastatic MDA-MB-468GFP human breast adenocarcinoma cell line and its highly metastatic variant (468LN). Gene networks and pathways associated with metastasis were identified, and target genes associated with epithelial–mesenchymal transition were validated with respect to DNA methylation effects on gene expression. Results We integrated data from the tiling microarrays with targets identified by Ingenuity Pathways Analysis software and observed epigenetic variations in genes implicated in epithelial–mesenchymal transition and with tumor cell migration. We identified widespread genomic hypermethylation and hypomethylation events in these cells and we confirmed functional associations between methylation status and expression of the CDH1, CST6, EGFR, SNAI2 and ZEB2 genes by quantitative real-time PCR. Our data also suggest that the complex genomic reorganization present in cancer cells may be superimposed over promoter-specific methylation events that are responsible for gene-specific expression changes. Conclusion This is the first whole-genome approach to identify genome-wide and gene-specific epigenetic alterations, and the

  16. Final Report Construction of Whole Genome Microarrays, and Expression Analysis of Desulfovibrio vulgaris cells in Metal-Reducing Conditions

    SciTech Connect

    M.W. Fields; J.D. Wall; J. Keasling; J. Zhou

    2008-05-15

    We continue to utilize the oligonucleotide microarrays that were constructed through funding with this project to characterize growth responses of Desulfovibrio vulgaris relevant to metal-reducing conditions. To effectively immobilize heavy metals and radionuclides via sulfate-reduction, it is important to understand the cellular responses to adverse factors observed at contaminated subsurface environments (e.g., nutrients, pH, contaminants, growth requirements and products). One of the major goals of the project is to construct whole-genome microarrays for Desulfovibrio vulgaris. First, in order to experimentally establish the criteria for designing gene-specific oligonucleotide probes, an oligonucleotide array was constructed that contained perfect match (PM) and mismatch (MM) probes (50mers and 70mers) based upon 4 genes. The effects of probe-target identity, continuous stretch, mismatch position, and hybridization free energy on specificity were examined. Little hybridization was observed at a probe-target identity of <85% for both 50mer and 70mer probes. 33 to 48% of the PM signal intensities were detected at a probe-target identity of 94% for 50mer oligonucleotides, and 43 to 55% for 70mer probes at a probe-target identity of 96%. When the effects of sequence identity and continuous stretch were considered independently, a stretch probe (>15 bases) contributed an additional 9% of the PM signal intensity compared to a non-stretch probe (< 15 bases) at the same identity level. Cross-hybridization increased as the length of continuous stretch increased. A 35-base stretch for 50mer probes or a 50-base stretch for 70mer probes had approximately 55% of the PM signal. Mismatches should be as close to the middle position of an oligonucleotide probe as possible to minimize cross-hybridization. Little cross-hybridization was observed for probes with a minimal binding free energy greater than -30 kcal/mol for 50mer probes or -40 kcal/mol for 70mer probes. Based on the

  17. Analysis of Campylobacter jejuni whole-genome DNA microarrays: Significance of prophage and hypervariable regions for discriminating isolates

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Campylobacter is a leading cause of food borne illness in humans and improving our understanding of the epidemiology of this organism is essential. The objective of this study was to identify the genes that were most significant for discriminating isolates of C. jejuni by analyzing whole genome DNA ...

  18. Development and Assessment of Whole-Genome Oligonucleotide Microarrays to Analyze an Anaerobic Microbial Community and its Responses to Oxidative Stress

    SciTech Connect

    Scholten, Johannes C.; Culley, David E.; Nie, Lei; Munn, Kyle J.; Chow, Lely; Brockman, Fred J.; Zhang, Weiwen

    2007-06-29

    The application of DNA microarray technology to investigate multiple-species microbial community presents great challenges. In this study, we reported the design and quality assessment of four whole genome oligonucleotide microarrays for two syntroph bacteria, Desulfovibrio vulgaris and Syntrophobacter fumaroxidans, and two archaeal methanogens, Methanosarcina barkeri and Methanospirillum hungatei, and their application to analyze global gene expression of this four-species microbial community in response to oxidative stress. In order to minimize the possible cross-hybridization, cross-genome comparison was performed to assure all probes unique to each genome so that the microarrays could provide species-level resolution. Microarray quality was validated by the good reproducibility of experimental measurements of multiple biological and analytical replicates. Microarray analysis showed that S. fumaroxidans and M. hungatei responded to the stress with up-regulation of several genes known to be involved in ROS detoxification, such as catalase and rubrerythrin in S. fumaroxidans and thioredoxin and heat shock protein Hsp20 in M. hungatei. Consistent with previous study in pure culture, the microarray analysis showed that genes involved in methane production and energy metabolism were down-regulated by oxidative stress in M. barkeri. However, D. vulgaris seemed less sensitive to the oxidative stress when grown in a community, with almost no gene up-regulated. The study demonstrated the successful application of microarray technology to multiple-species microbial community, and our preliminary results indicated that the approach can provide novel insights on the metabolic and regulatory networks within microbial communities.

  19. EFFECTS OF TEMPERATURE ON GENE EXPRESSION PATTERNS IN LEPTOSPIRA INTERROGANS SEROVAR LAI AS ASSESSED BY WHOLE-GENOME MICROARRAYS

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The availability of genome sequences for two serovars of Leptospira interrogans, Lai and Copenhageni, has opened up opportunities to examine global transcription profiles using microarray technology. Temperature is a key environmental factor, which is known to affect leptospiral protein expression....

  20. Whole Genome Sequencing

    MedlinePlus

    ... you want to learn. Search form Search Whole Genome Sequencing You are here Home Testing & Services Testing ... the full story, click here . What is whole genome sequencing? Whole genome sequencing is the mapping out ...

  1. Ropinirole alters gene expression profiles in SH-SY5Y cells: a whole genome microarray study

    PubMed Central

    Zhu, M.Z.; Le, W.D.; Jin, G.

    2016-01-01

    Ropinirole (ROP) is a dopamine agonist that has been used as therapy for Parkinson's disease. In the present study, we aimed to detect whether gene expression was modulated by ROP in SH-SY5Y cells. SH-SY5Y cell lines were treated with 10 µM ROP for 2 h, after which total RNA was extracted for whole genome analysis. Gene expression profiling revealed that 113 genes were differentially expressed after ROP treatment compared with control cells. Further pathway analysis revealed modulation of the phosphatidylinositol 3-kinase (PI3K) signaling pathway, with prominent upregulation of PIK3C2B. Moreover, batches of regulated genes, including PIK3C2B, were found to be located on chromosome 1. These findings were validated by quantitative RT-PCR and Western blot analysis. Our study, therefore, revealed that ROP altered gene expression in SH-SY5Y cells, and future investigation of PIK3C2B and other loci on chromosome 1 may provide long-term implications for identifying novel target genes of Parkinson's disease. PMID:26785691

  2. Shared clonal cytogenetic abnormalities in aberrant mast cells and leukemic myeloid blasts detected by single nucleotide polymorphism microarray-based whole-genome scanning.

    PubMed

    Frederiksen, John K; Shao, Lina; Bixby, Dale L; Ross, Charles W

    2016-04-01

    Systemic mastocytosis (SM) is characterized by a clonal proliferation of aberrant mast cells within extracutaneous sites. In a subset of SM cases, a second associated hematologic non-mast cell disease (AHNMD) is also present, usually of myeloid origin. Polymerase chain reaction and targeted fluorescence in situ hybridization studies have provided evidence that, in at least some cases, the aberrant mast cells are related clonally to the neoplastic cells of the AHNMD. In this work, a single nucleotide polymorphism microarray (SNP-A) was used to characterize the cytogenetics of the aberrant mast cells from a patient with acute myeloid leukemia and concomitant mast cell leukemia associated with a KIT D816A mutation. The results demonstrate the presence of shared cytogenetic abnormalities between the mast cells and myeloid blasts, as well as additional abnormalities within mast cells (copy-neutral loss of heterozygosity) not detectable by routine karyotypic analysis. To our knowledge, this work represents the first application of SNP-A whole-genome scanning to the detection of shared cytogenetic abnormalities between the two components of a case of SM-AHNMD. The findings provide additional evidence of a frequent clonal link between aberrant mast cells and cells of myeloid AHNMDs, and also highlight the importance of direct sequencing for identifying uncommon activating KIT mutations. PMID:26865278

  3. Case of 7p22.1 Microduplication Detected by Whole Genome Microarray (REVEAL) in Workup of Child Diagnosed with Autism

    PubMed Central

    Goitia, Veronica; Oquendo, Marcial; Stratton, Robert

    2015-01-01

    Introduction. More than 60 cases of 7p22 duplications and deletions have been reported with over 16 of them occurring without concomitant chromosomal abnormalities. Patient and Methods. We report a 29-month-old male diagnosed with autism. Whole genome chromosome SNP microarray (REVEAL) demonstrated a 1.3 Mb interstitial duplication of 7p22.1 ->p22.1 arr 7p22.1 (5,436,367–6,762,394), the second smallest interstitial 7p duplication reported to date. This interval included 14 OMIM annotated genes (FBXL18, ACTB, FSCN1, RNF216, OCM, EIF2AK1, AIMP2, PMS2, CYTH3, RAC1, DAGLB, KDELR2, GRID2IP, and ZNF12). Results. Our patient presented features similar to previously reported cases with 7p22 duplication, including brachycephaly, prominent ears, cryptorchidism, speech delay, poor eye contact, and outburst of aggressive behavior with autism-like features. Among the genes located in the duplicated segment, ACTB gene has been proposed as a candidate gene for the alteration of craniofacial development. Overexpression of RNF216L has been linked to autism. FSCN1 may play a role in neurodevelopmental disease. Conclusion. Characterization of a possible 7p22.1 Duplication Syndrome has yet to be made. Recognition of the clinical spectrum in patients with a smaller duplication of 7p should prove valuable for determining the minimal critical region, helping delineate a better prediction of outcome and genetic counseling PMID:25893121

  4. A functional genomics tool for the Pacific bluefin tuna: Development of a 44K oligonucleotide microarray from whole-genome sequencing data for global transcriptome analysis.

    PubMed

    Yasuike, Motoshige; Fujiwara, Atushi; Nakamura, Yoji; Iwasaki, Yuki; Nishiki, Issei; Sugaya, Takuma; Shimizu, Akio; Sano, Motohiko; Kobayashi, Takanori; Ototake, Mitsuru

    2016-02-01

    Bluefin tunas are one of the most important fishery resources worldwide. Because of high market values, bluefin tuna farming has been rapidly growing during recent years. At present, the most common form of the tuna farming is based on the stocking of wild-caught fish. Therefore, concerns have been raised about the negative impact of the tuna farming on wild stocks. Recently, the Pacific bluefin tuna (PBT), Thunnus orientalis, has succeeded in completing the reproduction cycle under aquaculture conditions, but production bottlenecks remain to be solved because of very little biological information on bluefin tunas. Functional genomics approaches promise to rapidly increase our knowledge on biological processes in the bluefin tuna. Here, we describe the development of the first 44K PBT oligonucleotide microarray (oligo-array), based on whole-genome shotgun (WGS) sequencing and large-scale expressed sequence tags (ESTs) data. In addition, we also introduce an initial 44K PBT oligo-array experiment using in vitro grown peripheral blood leukocytes (PBLs) stimulated with immunostimulants such as lipopolysaccharide (LPS: a cell wall component of Gram-negative bacteria) or polyinosinic:polycytidylic acid (poly I:C: a synthetic mimic of viral infection). This pilot 44K PBT oligo-array analysis successfully addressed distinct immune processes between LPS- and poly I:C- stimulated PBLs. Thus, we expect that this oligo-array will provide an excellent opportunity to analyze global gene expression profiles for a better understanding of diseases and stress, as well as for reproduction, development and influence of nutrition on tuna aquaculture production. PMID:26477480

  5. Whole Genome Selection

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Whole genome selection (WGS) is an approach to using DNA markers that are distributed throughout the entire genome. Genes affecting most economically-important traits are distributed throughout the genome and there are relatively few that have large effects with many more genes with progressively sm...

  6. A Whole-Genome Microarray Study of Arabidopsis thaliana Semisolid Callus Cultures Exposed to Microgravity and Nonmicrogravity Related Spaceflight Conditions for 5 Days on Board of Shenzhou 8

    PubMed Central

    Neef, Maren; Ecke, Margret; Hampp, Rüdiger

    2015-01-01

    The Simbox mission was the first joint space project between Germany and China in November 2011. Eleven-day-old Arabidopsis thaliana wild type semisolid callus cultures were integrated into fully automated plant cultivation containers and exposed to spaceflight conditions within the Simbox hardware on board of the spacecraft Shenzhou 8. The related ground experiment was conducted under similar conditions. The use of an in-flight centrifuge provided a 1 g gravitational field in space. The cells were metabolically quenched after 5 days via RNAlater injection. The impact on the Arabidopsis transcriptome was investigated by means of whole-genome gene expression analysis. The results show a major impact of nonmicrogravity related spaceflight conditions. Genes that were significantly altered in transcript abundance are mainly involved in protein phosphorylation and MAPK cascade-related signaling processes, as well as in the cellular defense and stress responses. In contrast to short-term effects of microgravity (seconds, minutes), this mission identified only minor changes after 5 days of microgravity. These concerned genes coding for proteins involved in the plastid-associated translation machinery, mitochondrial electron transport, and energy production. PMID:25654111

  7. A whole-genome microarray study of Arabidopsis thaliana semisolid callus cultures exposed to microgravity and nonmicrogravity related spaceflight conditions for 5 days on board of Shenzhou 8.

    PubMed

    Fengler, Svenja; Spirer, Ina; Neef, Maren; Ecke, Margret; Nieselt, Kay; Hampp, Rüdiger

    2015-01-01

    The Simbox mission was the first joint space project between Germany and China in November 2011. Eleven-day-old Arabidopsis thaliana wild type semisolid callus cultures were integrated into fully automated plant cultivation containers and exposed to spaceflight conditions within the Simbox hardware on board of the spacecraft Shenzhou 8. The related ground experiment was conducted under similar conditions. The use of an in-flight centrifuge provided a 1 g gravitational field in space. The cells were metabolically quenched after 5 days via RNAlater injection. The impact on the Arabidopsis transcriptome was investigated by means of whole-genome gene expression analysis. The results show a major impact of nonmicrogravity related spaceflight conditions. Genes that were significantly altered in transcript abundance are mainly involved in protein phosphorylation and MAPK cascade-related signaling processes, as well as in the cellular defense and stress responses. In contrast to short-term effects of microgravity (seconds, minutes), this mission identified only minor changes after 5 days of microgravity. These concerned genes coding for proteins involved in the plastid-associated translation machinery, mitochondrial electron transport, and energy production. PMID:25654111

  8. Phylogenetic Analysis of Shewanella Strains by DNA Relatedness Derived from Whole Genome Microarray DNA-DNA Hybridization and Comparison with Other Methods

    SciTech Connect

    Wu, Liyou; Yi, T. Y.; Van Nostrand, Joy; Zhou, Jizhong

    2010-05-17

    Phylogenetic analyses were done for the Shewanella strains isolated from Baltic Sea (38 strains), US DOE Hanford Uranium bioremediation site [Hanford Reach of the Columbia River (HRCR), 11 strains], Pacific Ocean and Hawaiian sediments (8 strains), and strains from other resources (16 strains) with three out group strains, Rhodopseudomonas palustris, Clostridium cellulolyticum, and Thermoanaerobacter ethanolicus X514, using DNA relatedness derived from WCGA-based DNA-DNA hybridizations, sequence similarities of 16S rRNA gene and gyrB gene, and sequence similarities of 6 loci of Shewanella genome selected from a shared gene list of the Shewanella strains with whole genome sequenced based on the average nucleotide identity of them (ANI). The phylogenetic trees based on 16S rRNA and gyrB gene sequences, and DNA relatedness derived from WCGA hybridizations of the tested Shewanella strains share exactly the same sub-clusters with very few exceptions, in which the strains were basically grouped by species. However, the phylogenetic analysis based on DNA relatedness derived from WCGA hybridizations dramatically increased the differentiation resolution at species and strains level within Shewanella genus. When the tree based on DNA relatedness derived from WCGA hybridizations was compared to the tree based on the combined sequences of the selected functional genes (6 loci), we found that the resolutions of both methods are similar, but the clustering of the tree based on DNA relatedness derived from WMGA hybridizations was clearer. These results indicate that WCGA-based DNA-DNA hybridization is an idea alternative of conventional DNA-DNA hybridization methods and it is superior to the phylogenetics methods based on sequence similarities of single genes. Detailed analysis is being performed for the re-classification of the strains examined.

  9. A Whole-Genome Microarray Study of Arabidopis Thaliana Cell Cultures Exposed to Real and Simulated Partial-G Forces: A Comparison of Parabolic Flight and Clinostat Data

    NASA Astrophysics Data System (ADS)

    Fengler, S.; Spirer, I.; Neef, M.; Ecke, M.; Hauslage, J.; Hampp, R.

    2015-09-01

    Cell cultures of the plant model organism Arabidopsis thaliana were exposed to partial-g forces during parabolic flight and clinostat experiments (0.38 g, 0. 16 g and 0.5 g). To investigate gravity-dependent alterations in gene expression, samples were metabolically quenched and used for microarray analysis. An attempt to identify the potential threshold acceleration showed that the smaller the experienced g-force, the greater was the susceptibility of the cell cultures. Compared to short-term ~sg during a regular parabolic flight, the number of differentially expressed genes under partial-g was lower. In addition, the effect on the alteration of amounts of transcripts decreased during partial-g parabolic flight due to the sequence of the different parabolas (0.38 g, 0.16 g and ~sg). A time-dependent analysis under simulated 0.5 g indicates that adaptation occurs within minutes. Differentially expressed genes (at least 2-fold altered in expression) under real flight conditions were to some extent identical with those affected by clinorotation. The highest number of identical genes was detected within seconds of exposure to 0.38 g.

  10. Multi-Platform Avionics Simulator

    NASA Technical Reports Server (NTRS)

    Clark, Micah; Steinke, Robert; McMahon, Elihu

    2006-01-01

    Multi-Platform Avionics Simulator (MPAvSim) is a software library for development of simulations of avionic hardware. MPAvSim facilitates simulation of interactions between flight software and such avionic peripheral equipment as telecommunication devices, thrusters, pyrotechnic devices, motor controllers, and scientific instruments. MPAvSim focuses on the behavior of avionics as seen by flight software, rather than on performing high-fidelity simulations of dynamics. However, MPAvSim is easily integrable with other programs that do perform such simulations. MPAvSim makes it possible to do real-time partial hardware- in-the-loop simulations. An MPAvSim simulation consists of execution chains (see figure) represented by flow graphs of models, defined here as stateless procedures that do some work. During a simulation, MPAvSim walks the execution chain, running each model in turn. Using MPAvSim, flight software can be run against a spacecraft that is all simulation, all hardware, or part hardware and part simulation. With respect to a specific piece of hardware, either the hardware itself or its simulation can be plugged in without affecting the rest of the system. Thus, flight software can be tested before hardware is available, and as items of hardware become available, they can be substituted for their simulations, with minimal disruption.

  11. Whole genome linkage disequilibrium maps in cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bovine whole genome linkage disequilibrium maps were constructed for eight breeds of cattle. These data provide fundamental information concerning bovine genome organization which will allow the design of studies to associate genetic variation with economically important traits and also provides bac...

  12. Microarrays

    ERIC Educational Resources Information Center

    Plomin, Robert; Schalkwyk, Leonard C.

    2007-01-01

    Microarrays are revolutionizing genetics by making it possible to genotype hundreds of thousands of DNA markers and to assess the expression (RNA transcripts) of all of the genes in the genome. Microarrays are slides the size of a postage stamp that contain millions of DNA sequences to which single-stranded DNA or RNA can hybridize. This…

  13. Microbial species delineation using whole genome sequences

    SciTech Connect

    Kyrpides, Nikos; Mukherjee, Supratim; Ivanova, Natalia; Mavrommatics, Kostas; Pati, Amrita; Konstantinidis, Konstantinos

    2014-10-20

    Species assignments in prokaryotes use a manual, poly-phasic approach utilizing both phenotypic traits and sequence information of phylogenetic marker genes. With thousands of genomes being sequenced every year, an automated, uniform and scalable approach exploiting the rich genomic information in whole genome sequences is desired, at least for the initial assignment of species to an organism. We have evaluated pairwise genome-wide Average Nucleotide Identity (gANI) values and alignment fractions (AFs) for nearly 13,000 genomes using our fast implementation of the computation, identifying robust and widely applicable hard cut-offs for species assignments based on AF and gANI. Using these cutoffs, we generated stable species-level clusters of organisms, which enabled the identification of several species mis-assignments and facilitated the assignment of species for organisms without species definitions.

  14. Whole genome sequence analysis of Mycobacterium suricattae.

    PubMed

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; van der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah Musa; Siame, Kabengele Keith; Gey van Pittius, Nicolaas Claudius; van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-12-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi. PMID:26542221

  15. Strategies and tools for whole genome alignments

    SciTech Connect

    Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas; Ishkhanov,Tigran; Ryaboy, Dmitriy; Rubin, Edward; Pachter, Lior; Dubchak, Inna

    2002-11-25

    The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With a view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.

  16. Small sample whole-genome amplification

    NASA Astrophysics Data System (ADS)

    Hara, Christine; Nguyen, Christine; Wheeler, Elizabeth; Sorensen, Karen; Arroyo, Erin; Vrankovich, Greg; Christian, Allen

    2005-11-01

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  17. Microbial species delineation using whole genome sequences

    PubMed Central

    Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita

    2015-01-01

    Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420

  18. Whole-genome sequencing in outbreak analysis.

    PubMed

    Gilchrist, Carol A; Turner, Stephen D; Riley, Margaret F; Petri, William A; Hewlett, Erik L

    2015-07-01

    In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  19. Whole-Genome Sequencing in Outbreak Analysis

    PubMed Central

    Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.

    2015-01-01

    SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885

  20. Small Sample Whole-Genome Amplification

    SciTech Connect

    Hara, C A; Nguyen, C P; Wheeler, E K; Sorensen, K J; Arroyo, E S; Vrankovich, G P; Christian, A T

    2005-09-20

    Many challenges arise when trying to amplify and analyze human samples collected in the field due to limitations in sample quantity, and contamination of the starting material. Tests such as DNA fingerprinting and mitochondrial typing require a certain sample size and are carried out in large volume reactions; in cases where insufficient sample is present whole genome amplification (WGA) can be used. WGA allows very small quantities of DNA to be amplified in a way that enables subsequent DNA-based tests to be performed. A limiting step to WGA is sample preparation. To minimize the necessary sample size, we have developed two modifications of WGA: the first allows for an increase in amplified product from small, nanoscale, purified samples with the use of carrier DNA while the second is a single-step method for cleaning and amplifying samples all in one column. Conventional DNA cleanup involves binding the DNA to silica, washing away impurities, and then releasing the DNA for subsequent testing. We have eliminated losses associated with incomplete sample release, thereby decreasing the required amount of starting template for DNA testing. Both techniques address the limitations of sample size by providing ample copies of genomic samples. Carrier DNA, included in our WGA reactions, can be used when amplifying samples with the standard purification method, or can be used in conjunction with our single-step DNA purification technique to potentially further decrease the amount of starting sample necessary for future forensic DNA-based assays.

  1. Whole genome analysis of a Vietnamese trio.

    PubMed

    Hai, Dang Thanh; Thanh, Nguyen Dai; Trang, Pham Thi Minh; Quang, Le Si; Hang, Phan Thi Thu; Cuong, Dang Cao; Phuc, Hoang Kim; Duc, Nguyen Huu; Dong, Do Duc; Minh, Bui Quang; Son, Pham Bao; Vinh, Le Sy

    2015-03-01

    We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91 percent of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3 percent) SNPs and 59,119 (7.1 percent) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5 percent) were large indels. There were 6,681 large indels in the range 0.1-100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44 percent) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length greater than or equal to 300 bp. There were 235 contigs from the child genome of which 199 (84.7 percent) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam. PMID:25740146

  2. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    SciTech Connect

    Fröhlich, Eleonore; Meindl, Claudia; Wagner, Karin; Leitinger, Gerd; Roblegg, Eva

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.

  3. Post-Fragmentation Whole Genome Amplification-Based Method

    NASA Technical Reports Server (NTRS)

    Benardini, James; LaDuc, Myron T.; Langmore, John

    2011-01-01

    This innovation is derived from a proprietary amplification scheme that is based upon random fragmentation of the genome into a series of short, overlapping templates. The resulting shorter DNA strands (<400 bp) constitute a library of DNA fragments with defined 3 and 5 termini. Specific primers to these termini are then used to isothermally amplify this library into potentially unlimited quantities that can be used immediately for multiple downstream applications including gel eletrophoresis, quantitative polymerase chain reaction (QPCR), comparative genomic hybridization microarray, SNP analysis, and sequencing. The standard reaction can be performed with minimal hands-on time, and can produce amplified DNA in as little as three hours. Post-fragmentation whole genome amplification-based technology provides a robust and accurate method of amplifying femtogram levels of starting material into microgram yields with no detectable allele bias. The amplified DNA also facilitates the preservation of samples (spacecraft samples) by amplifying scarce amounts of template DNA into microgram concentrations in just a few hours. Based on further optimization of this technology, this could be a feasible technology to use in sample preservation for potential future sample return missions. The research and technology development described here can be pivotal in dealing with backward/forward biological contamination from planetary missions. Such efforts rely heavily on an increasing understanding of the burden and diversity of microorganisms present on spacecraft surfaces throughout assembly and testing. The development and implementation of these technologies could significantly improve the comprehensiveness and resolving power of spacecraft-associated microbial population censuses, and are important to the continued evolution and advancement of planetary protection capabilities. Current molecular procedures for assaying spacecraft-associated microbial burden and diversity have

  4. Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness

    MedlinePlus

    ... Consumers Consumer Updates Whole Genome Sequencing: Cracking the Genetic Code for Foodborne Illness Share Tweet Linkedin Pin ... have millions of different genomes, or sequences of genetic code, each as unique as a fingerprint. Get ...

  5. Multiple Whole Genome Alignments Without a Reference Organism

    SciTech Connect

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  6. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    PubMed Central

    Fröhlich, Eleonore; Meindl, Claudia; Wagner, Karin; Leitinger, Gerd; Roblegg, Eva

    2014-01-01

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays for NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. PMID:25102311

  7. Prospects and pitfalls in whole genome association studies

    PubMed Central

    Lawrence, Robert W; Evans, David M; Cardon, Lon R

    2005-01-01

    Recent large-scale studies of common genetic variation throughout the human genome are making it feasible to conduct whole genome studies of genotype–phenotype associations. Such studies have the potential to uncover novel contributors to common complex traits and thus lead to insights into the aetiology of multifactorial phenotypes. Despite this promise, it is important to recognize that the availability of genetic markers and the ability to assay them at realistic cost does not guarantee success of this approach. There are a number of practical issues that require close attention, some forms of allelic architecture are not readily amenable to the association approach with even the most rigorous design, and doubtless new hurdles will emerge as the studies begin. Here we discuss the promise and current challenges of the whole genome approach, and raise some issues to consider in interpreting the results of the first whole genome studies. PMID:16096108

  8. Isprs Benchmark for Multi-Platform Photogrammetry

    NASA Astrophysics Data System (ADS)

    Nex, F.; Gerke, M.; Remondino, F.; Przybilla, H.-J.; Bäumker, M.; Zurhorst, A.

    2015-03-01

    Airborne high resolution oblique imagery systems and RPAS/UAVs are very promising technologies that will keep on influencing the development of geomatics in the future years closing the gap between terrestrial and classical aerial acquisitions. These two platforms are also a promising solution for National Mapping and Cartographic Agencies (NMCA) as they allow deriving complementary mapping information. Although the interest for the registration and integration of aerial and terrestrial data is constantly increasing, only limited work has been truly performed on this topic. Several investigations still need to be undertaken concerning algorithms ability for automatic co-registration, accurate point cloud generation and feature extraction from multiplatform image data. One of the biggest obstacles is the non-availability of reliable and free datasets to test and compare new algorithms and procedures. The Scientific Initiative "ISPRS benchmark for multi-platform photogrammetry", run in collaboration with EuroSDR, aims at collecting and sharing state-of-the-art multi-sensor data (oblique airborne, UAV-based and terrestrial images) over an urban area. These datasets are used to assess different algorithms and methodologies for image orientation and dense matching. As ground truth, Terrestrial Laser Scanning (TLS), Aerial Laser Scanning (ALS) as well as topographic networks and GNSS points were acquired to compare 3D coordinates on check points (CPs) and evaluate cross sections and residuals on generated point cloud surfaces. In this paper, the acquired data, the pre-processing steps, the evaluation procedures as well as some preliminary results achieved with commercial software will be presented.

  9. Whole-genome sequences of three symbiotic endozoicomonas strains.

    PubMed

    Neave, Matthew J; Michell, Craig T; Apprill, Amy; Voolstra, Christian R

    2014-01-01

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp. PMID:25125646

  10. Whole-Genome Sequences of Three Symbiotic Endozoicomonas Bacteria

    PubMed Central

    Neave, Matthew J.; Michell, Craig T.

    2014-01-01

    Members of the genus Endozoicomonas associate with a wide range of marine organisms. Here, we report on the whole-genome sequencing, assembly, and annotation of three Endozoicomonas type strains. These data will assist in exploring interactions between Endozoicomonas organisms and their hosts, and it will aid in the assembly of genomes from uncultivated Endozoicomonas spp. PMID:25125646

  11. Whole-Genome Sequencing of Two Bartonella bacilliformis Strains.

    PubMed

    Guillen, Yolanda; Casadellà, Maria; García-de-la-Guarda, Ruth; Espinoza-Culupú, Abraham; Paredes, Roger; Ruiz, Joaquim; Noguera-Julian, Marc

    2016-01-01

    Bartonella bacilliformis is the causative agent of Carrion's disease, a highly endemic human bartonellosis in Peru. We performed a whole-genome assembly of two B. bacilliformis strains isolated from the blood of infected patients in the acute phase of Carrion's disease from the Cusco and Piura regions in Peru. PMID:27389274

  12. Whole-Genome Sequencing of Two Bartonella bacilliformis Strains

    PubMed Central

    Guillen, Yolanda; Casadellà, Maria; García-de-la-Guarda, Ruth; Espinoza-Culupú, Abraham; Paredes, Roger; Ruiz, Joaquim

    2016-01-01

    Bartonella bacilliformis is the causative agent of Carrion’s disease, a highly endemic human bartonellosis in Peru. We performed a whole-genome assembly of two B. bacilliformis strains isolated from the blood of infected patients in the acute phase of Carrion’s disease from the Cusco and Piura regions in Peru. PMID:27389274

  13. Whole-Genome Sequence of Staphylococcus epidermidis Tü3298

    PubMed Central

    Moran, Josephine C.

    2016-01-01

    Staphylococcus epidermidis Tü3298 is a frequently used laboratory strain, known for its production of epidermin and absence of the icaABCD operon. We report the whole-genome sequence of this strain, a 2.5-kb genome containing 2,332 genes. PMID:26966218

  14. WHOLE GENOME COMPARISON OF ASPERGILLUS FLAVUS AND A. ORYZAE

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Aspergillus flavus is a plant and animal pathogen that also produces the potent carcinogen aflatoxin. Aspergillus oryzae is a closely related species that has been used for centuries in the food fermentation industry and is generally regarded as safe (GRAS). Whole genome sequences for these two fu...

  15. Whole genome amplification - Review of applications and advances

    SciTech Connect

    Hawkins, Trevor L.; Detter, J.C.; Richardson, Paul

    2001-11-15

    The concept of Whole Genome Amplification is something that has arisen in the past few years as modifications to the polymerase chain reaction (PCR) have been adapted to replicate regions of genomes which are of biological interest. The applications here are many--forensics, embryonic disease diagnosis, bio terrorism genome detection, ''imoralization'' of clinical samples, microbial diversity, and genotyping. The key question is if DNA can be replicated a genome at a time without bias or non random distribution of the target. Several papers published in the last year and currently in preparation may lead to the conclusion that whole genome amplification may indeed be possible and therefore open up a new avenue to molecular biology.

  16. Whole Genome and Transcriptome Sequencing of a B3 Thymoma

    PubMed Central

    Petrini, Iacopo; Rajan, Arun; Pham, Trung; Voeller, Donna; Davis, Sean; Gao, James; Wang, Yisong; Giaccone, Giuseppe

    2013-01-01

    Molecular pathology of thymomas is poorly understood. Genomic aberrations are frequently identified in tumors but no extensive sequencing has been reported in thymomas. Here we present the first comprehensive view of a B3 thymoma at whole genome and transcriptome levels. A 55-year-old Caucasian female underwent complete resection of a stage IVA B3 thymoma. RNA and DNA were extracted from a snap frozen tumor sample with a fraction of cancer cells over 80%. We performed array comparative genomic hybridization using Agilent platform, transcriptome sequencing using HiSeq 2000 (Illumina) and whole genome sequencing using Complete Genomics Inc platform. Whole genome sequencing determined, in tumor and normal, the sequence of both alleles in more than 95% of the reference genome (NCBI Build 37). Copy number (CN) aberrations were comparable with those previously described for B3 thymomas, with CN gain of chromosome 1q, 5, 7 and X and CN loss of 3p, 6, 11q42.2-qter and q13. One translocation t(11;X) was identified by whole genome sequencing and confirmed by PCR and Sanger sequencing. Ten single nucleotide variations (SNVs) and 2 insertion/deletions (INDELs) were identified; these mutations resulted in non-synonymous amino acid changes or affected splicing sites. The lack of common cancer-associated mutations in this patient suggests that thymomas may evolve through mechanisms distinctive from other tumor types, and supports the rationale for additional high-throughput sequencing screens to better understand the somatic genetic architecture of thymoma. PMID:23577124

  17. Whole genome sequencing of clinical isolates of Giardia lamblia.

    PubMed

    Hanevik, K; Bakken, R; Brattbakk, H R; Saghaug, C S; Langeland, N

    2015-02-01

    Clinical isolates from protozoan parasites such as Giardia lamblia are at present practically impossible to culture. By using simple cyst purification methods, we show that Giardia whole genome sequencing of clinical stool samples is possible. Immunomagnetic separation after sucrose gradient flotation gave superior results compared to sucrose gradient flotation alone. The method enables detailed analysis of a wide range of genes of interest for genotyping, virulence and drug resistance. PMID:25596782

  18. Comparative genomic hybridization with single cells after whole genome amplification

    SciTech Connect

    Haddad, B.R.; Baldini, A.; Hughes, M.R.

    1994-09-01

    Conventional karyotype analysis is the ideal way to diagnose chromosomal imbalances. However it requires cell culture and chromosome preparation. There are instances where a very small number of cells are available for cytogenetic evaluation and chromosomes cannot be obtained. Comparative genomic hybridization (CGH) is a novel molecular cytogenetic technique that provides information about genetic imbalances affecting the genome. The power of this technique lies in its ability to detect genetic imbalances using total genomic DNA. We have previously demonstrated the feasibility of whole genome amplification from single cells for subsequent analysis of multiple genetic loci by PCR. In this present work, we combine whole genome amplification with CGH to detect chromosomal imbalances from small numbers of cells. Both cytogenetically normal and abnormal cells were individually picked by micromanipulation and subjected to whole genome amplification using random oligonucleotide primers. Amplified test and control DNA were differentially labeled by incorporation of digoxigenin or biotin, mixed together and hybridized to normal male metaphase spreads. Hybridization was detected with two fluorochromes, rhodamine-anti-digoxigenin and FITC -Avidin. Ratio of intensities of the two fluorochromes along the target chromosomes was analyzed using locally developed computer imaging software. Using the combination of whole genome amplification and CGH, we were able to detect different chromosomal aneuploidies from 30, 20, and 10 cells. It can also be applied to the analysis of fetal cells sorted from maternal circulation, or to tumor cells obtained from needle biopsies or from different body fluids and effusions. Finally, its successful application to single cells will have a great impact on preimplantation diagnosis.

  19. Mapping Challenging Mutations by Whole-Genome Sequencing

    PubMed Central

    Smith, Harold E.; Fabritius, Amy S.; Jaramillo-Lambert, Aimee; Golden, Andy

    2016-01-01

    Whole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semidominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means. PMID:26945029

  20. Whole-genome sequence-based analysis of thyroid function

    PubMed Central

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J.; Traglia, Michela; Brown, Suzanne J.; Mullin, Benjamin H.; Shihab, Hashem A.; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R.; Beilby, John P.; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D.; Hui, Jennie; Lim, Ee M.; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R.B.; Bell, Jordana T.; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L.; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M.; Naitza, Silvia; Walsh, John P.; Spector, Tim; Davey Smith, George; Durbin, Richard; Brent Richards, J.; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J.; Wilson, Scott G.; Turki, Saeed Al; Anderson, Carl; Anney, Richard; Antony, Dinu; Artigas, Maria Soler; Ayub, Muhammad; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Barroso, Inês; Beales, Phil; Bentham, Jamie; Bhattacharya, Shoumo; Birney, Ewan; Blackwood, Douglas; Bobrow, Martin; Bochukova, Elena; Bolton, Patrick; Bounds, Rebecca; Boustred, Chris; Breen, Gerome; Calissano, Mattia; Carss, Keren; Chatterjee, Krishna; Chen, Lu; Ciampi, Antonio; Cirak, Sebhattin; Clapham, Peter; Clement, Gail; Coates, Guy; Collier, David; Cosgrove, Catherine; Cox, Tony; Craddock, Nick; Crooks, Lucy; Curran, Sarah; Curtis, David; Daly, Allan; Day-Williams, Aaron; Day, Ian N.M.; Down, Thomas; Du, Yuanping; Dunham, Ian; Edkins, Sarah; Ellis, Peter; Evans, David; Faroogi, Sadaf; Fatemifar, Ghazaleh; Fitzpatrick, David R.; Flicek, Paul; Flyod, James; Foley, A. Reghan; Franklin, Christopher S.; Futema, Marta; Gallagher, Louise; Geihs, Matthias; Geschwind, Daniel; Griffin, Heather; Grozeva, Detelina; Guo, Xueqin; Guo, Xiaosen; Gurling, Hugh; Hart, Deborah; Hendricks, Audrey; Holmans, Peter; Howie, Bryan; Huang, Liren; Hubbard, Tim; Humphries, Steve E.; Hurles, Matthew E.; Hysi, Pirro; Jackson, David K.; Jamshidi, Yalda; Jing, Tian; Joyce, Chris; Kaye, Jane; Keane, Thomas; Keogh, Julia; Kemp, John; Kennedy, Karen; Kolb-Kokocinski, Anja; Lachance, Genevieve; Langford, Cordelia; Lawson, Daniel; Lee, Irene; Lek, Monkol; Liang, Jieqin; Lin, Hong; Li, Rui; Li, Yingrui; Liu, Ryan; Lönnqvist, Jouko; Lopes, Margarida; Lotchkova, Valentina; MacArthur, Daniel; Marchini, Jonathan; Maslen, John; Massimo, Mangino; Mathieson, Iain; Marenne, Gaëlle; McGuffin, Peter; McIntosh, Andrew; McKechanie, Andrew G.; McQuillin, Andrew; Metrustry, Sarah; Mitchison, Hannah; Moayyeri, Alireza; Morris, James; Muntoni, Francesco; Northstone, Kate; O'Donnovan, Michael; Onoufriadis, Alexandros; O'Rahilly, Stephen; Oualkacha, Karim; Owen, Michael J.; Palotie, Aarno; Panoutsopoulou, Kalliope; Parker, Victoria; Parr, Jeremy R.; Paternoster, Lavinia; Paunio, Tiina; Payne, Felicity; Pietilainen, Olli; Plagnol, Vincent; Quaye, Lydia; Quai, Michael A.; Raymond, Lucy; Rehnström, Karola; Richards, Brent; Ring, Susan; Ritchie, Graham R.S.; Roberts, Nicola; Savage, David B.; Scambler, Peter; Schiffels, Stephen; Schmidts, Miriam; Schoenmakers, Nadia; Semple, Robert K.; Serra, Eva; Sharp, Sally I.; Shin, So-Youn; Skuse, David; Small, Kerrin; Southam, Lorraine; Spasic-Boskovic, Olivera; Clair, David St; Stalker, Jim; Stevens, Elizabeth; Pourcian, Beate St; Sun, Jianping; Suvisaari, Jaana; Tachmazidou, Ionna; Tobin, Martin D.; Valdes, Ana; Kogelenberg, Margriet Van; Vijayarangakannan, Parthiban; Visscher, Peter M.; Wain, Louise V.; Walters, James T.R.; Wang, Guangbiao; Wang, Jun; Wang, Yu; Ward, Kirsten; Wheeler, Elanor; Whyte, Tamieka; Williams, Hywel; Williamson, Kathleen A.; Wilson, Crispian; Wong, Kim; Xu, ChangJiang; Yang, Jian; Zhang, Fend; Zhang, Pingbo

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10−9) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10−14). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10−9) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10−11). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  1. Whole-genome shotgun optical mapping of Rhodospirillum rubrum

    SciTech Connect

    Reslewic, S.; Zhou, S.; Place, M.; Zhang, Y.; Briska, A.; Goldstein, S.; Churas, C.; Runnheim, R.; Forrest, D.; Lim, A.; Lapidus, A.; Han, C. S.; Roberts, G. P.; Schwartz, D. C.

    2005-09-01

    Rhodospirillum rubrum is a phototrophic purple nonsulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems and as a source of hydrogen and biodegradable plastic production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction endonuclease maps (XbaI, NheI, and HindIII) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction endonuclease maps from randomly sheared genomic DNA molecules extracted from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the HindIII map acted as a scaffold for high-resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and confirmation of genome sequence, this work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a "molecular cytogenetics" approach to solving problems in genomic analysis.

  2. Priors in Whole-Genome Regression: The Bayesian Alphabet Returns

    PubMed Central

    Gianola, Daniel

    2013-01-01

    Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p. PMID:23636739

  3. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function. PMID:25743335

  4. Mapping Challenging Mutations by Whole-Genome Sequencing.

    PubMed

    Smith, Harold E; Fabritius, Amy S; Jaramillo-Lambert, Aimee; Golden, Andy

    2016-01-01

    Whole-genome sequencing provides a rapid and powerful method for identifying mutations on a global scale, and has spurred a renewed enthusiasm for classical genetic screens in model organisms. The most commonly characterized category of mutation consists of monogenic, recessive traits, due to their genetic tractability. Therefore, most of the mapping methods for mutation identification by whole-genome sequencing are directed toward alleles that fulfill those criteria (i.e., single-gene, homozygous variants). However, such approaches are not entirely suitable for the characterization of a variety of more challenging mutations, such as dominant and semidominant alleles or multigenic traits. Therefore, we have developed strategies for the identification of those classes of mutations, using polymorphism mapping in Caenorhabditis elegans as our model for validation. We also report an alternative approach for mutation identification from traditional recombinant crosses, and a solution to the technical challenge of sequencing sterile or terminally arrested strains where population size is limiting. The methods described herein extend the applicability of whole-genome sequencing to a broader spectrum of mutations, including classes that are difficult to map by traditional means. PMID:26945029

  5. Whole-genome shotgun optical mapping of rhodospirillumrubrum

    SciTech Connect

    Reslewic, Susan; Zhou, Shiguo; Place, Mike; Zhang, Yaoping; Briska, Adam; Goldstein, Steve; Churas, Chris; Runnheim, Rod; Forrest,Dan; Lim, Alex; Lapidus, Alla; Han, Cliff S.; Roberts, Gary P.; Schwartz,David C.

    2004-07-01

    Rhodospirillum rubrum is a phototrophic purple non-sulfur bacterium known for its unique and well-studied nitrogen fixation and carbon monoxide oxidation systems, and as a source of hydrogen and biodegradable plastics production. To better understand this organism and to facilitate assembly of its sequence, three whole-genome restriction maps (Xba I, Nhe I, and Hind III) of R. rubrum strain ATCC 11170 were created by optical mapping. Optical mapping is a system for creating whole-genome ordered restriction maps from randomly sheared genomic DNA molecules extracted directly from cells. During the sequence finishing process, all three optical maps confirmed a putative error in sequence assembly, while the Hind III map acted as a scaffold for high resolution alignment with sequence contigs spanning the whole genome. In addition to highlighting optical mapping's role in the assembly and validation of genome sequence, our work underscores the unique niche in resolution occupied by the optical mapping system. With a resolution ranging from 6.5 kb (previously published) to 45 kb (reported here), optical mapping advances a ''molecular cytogenetics'' approach to solving problems in genomic analysis.

  6. Whole-genome transcriptional analysis of heavy metal stresses inCaulobacter crescentus

    SciTech Connect

    Hu, Ping; Brodie, Eoin L.; Suzuki, Yohey; McAdams, Harley H.; Andersen, Gary L.

    2005-09-21

    The bacterium Caulobacter crescentus and related stalkbacterial species are known for their distinctive ability to live in lownutrient environments, a characteristic of most heavy metal contaminatedsites. Caulobacter crescentus is a model organism for studying cell cycleregulation with well developed genetics. We have identified the pathwaysresponding to heavy metal toxicity in C. crescentus to provide insightsfor possible application of Caulobacter to environmental restoration. Weexposed C. crescentus cells to four heavy metals (chromium, cadmium,selenium and uranium) and analyzed genome wide transcriptional activitiespost exposure using a Affymetrix GeneChip microarray. C. crescentusshowed surprisingly high tolerance to uranium, a possible mechanism forwhich may be formation of extracellular calcium-uranium-phosphateprecipitates. The principal response to these metals was protectionagainst oxidative stress (up-regulation of manganese-dependent superoxidedismutase, sodA). Glutathione S-transferase, thioredoxin, glutaredoxinsand DNA repair enzymes responded most strongly to cadmium and chromate.The cadmium and chromium stress response also focused on reducing theintracellular metal concentration, with multiple efflux pumps employed toremove cadmium while a sulfate transporter was down-regulated to reducenon-specific uptake of chromium. Membrane proteins were also up-regulatedin response to most of the metals tested. A two-component signaltransduction system involved in the uranium response was identified.Several differentially regulated transcripts from regions previously notknown to encode proteins were identified, demonstrating the advantage ofevaluating the transcriptome using whole genome microarrays.

  7. Comparative Whole-Genome Hybridization Reveals Genomic Islands in Brucella Species†

    PubMed Central

    Rajashekara, Gireesh; Glasner, Jeremy D.; Glover, David A.; Splitter, Gary A.

    2004-01-01

    Brucella species are responsible for brucellosis, a worldwide zoonotic disease causing abortion in domestic animals and Malta fever in humans. Based on host preference, the genus is divided into six species. Brucella abortus, B. melitensis, and B. suis are pathogenic to humans, whereas B. ovis and B. neotomae are nonpathogenic to humans and B. canis human infections are rare. Limited genome diversity exists among Brucella species. Comparison of Brucella species whole genomes is, therefore, likely to identify factors responsible for differences in host preference and virulence restriction. To facilitate such studies, we used the complete genome sequence of B. melitensis 16M, the species highly pathogenic to humans, to construct a genomic microarray. Hybridization of labeled genomic DNA from Brucella species to this microarray revealed a total of 217 open reading frames (ORFs) altered in five Brucella species analyzed. These ORFs are often found in clusters (islands) in the 16M genome. Examination of the genomic context of these islands suggests that many are horizontally acquired. Deletions of genetic content identified in Brucella species are conserved in multiple strains of the same species, and genomic islands missing in a given species are often restricted to that particular species. These findings suggest that, whereas the loss or gain of genetic material may be related to the host range and virulence restriction of certain Brucella species for humans, independent mechanisms involving gene inactivation or altered expression of virulence determinants may also contribute to these differences. PMID:15262941

  8. Whole genome comparison of donor and cloned dogs

    PubMed Central

    Kim, Hak-Min; Cho, Yun Sung; Kim, Hyunmin; Jho, Sungwoong; Son, Bongjun; Choi, Joung Yoon; Kim, Sangsoo; Lee, Byeong Chun; Bhak, Jong; Jang, Goo

    2013-01-01

    Cloning is a process that produces genetically identical organisms. However, the genomic degree of genetic resemblance in clones needs to be determined. In this report, the genomes of a cloned dog and its donor were compared. Compared with a human monozygotic twin, the genome of the cloned dog showed little difference from the genome of the nuclear donor dog in terms of single nucleotide variations, chromosomal instability, and telomere lengths. These findings suggest that cloning by somatic cell nuclear transfer produced an almost identical genome. The whole genome sequence data of donor and cloned dogs can provide a resource for further investigations on epigenetic contributions in phenotypic differences. PMID:24141358

  9. Whole-Genome Sequences of Thirteen Isolates of Borrelia burgdorferi

    SciTech Connect

    Schutzer S. E.; Dunn J.; Fraser-Liggett, C. M.; Casjens, S. R.; Qiu, W.-G.; Mongodin, E. F.; Luft, B. J.

    2011-02-01

    Borrelia burgdorferi is a causative agent of Lyme disease in North America and Eurasia. The first complete genome sequence of B. burgdorferi strain 31, available for more than a decade, has assisted research on the pathogenesis of Lyme disease. Because a single genome sequence is not sufficient to understand the relationship between genotypic and geographic variation and disease phenotype, we determined the whole-genome sequences of 13 additional B. burgdorferi isolates that span the range of natural variation. These sequences should allow improved understanding of pathogenesis and provide a foundation for novel detection, diagnosis, and prevention strategies.

  10. Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes.

    PubMed

    de Leeuw, Ronald J; Davies, Jonathan J; Rosenwald, Andreas; Bebb, Gwyn; Gascoyne, Randy D; Dyer, Martin J S; Staudt, Louis M; Martinez-Climent, Jose A; Lam, Wan L

    2004-09-01

    Mantle cell lymphoma (MCL) is an aggressive non-Hodgkin's lymphoma with median patient survival times of approximately 3 years. Although the characteristic t(11;14)(q13;q32) is found in virtually all cases, experimental evidence suggests that this event alone is insufficient to result in lymphoma and secondary genomic alterations are required. Using a newly developed DNA microarray of 32 433 overlapping genomic segments spanning the entire human genome, we can for the first time move beyond marker based analysis and comprehensively search for secondary genomic alterations concomitant with the t(11;14) in eight commonly used cell models of MCL (Granta-519, HBL-2, NCEB-1, Rec-1, SP49, UPN-1, Z138C and JVM-2). Examining these genomes at tiling resolution identified an unexpected average of 35 genetic alterations per cell line, with equal numbers of amplifications and deletions. Recurrent high-level amplifications were identified at 18q21 containing BCL2, and at 13q31 containing GPC5. In addition, a recurrent homozygous deletion was identified at 9p21 containing p15 and p16. Alignment of these profiles revealed 14 recurrent losses and 21 recurrent gains as small as 130 kb. Remarkably, even the intra immunoglobulin gene deletions at 2p11 and 22q11 were detected, demonstrating the power of combining the detection sensitivity of array comparative genomic hybridization (CGH) with the resolution of an overlapping whole genome tiling-set. These alterations not only coincided with previously described aberrations in MCL, but also defined 13 novel regions. Further characterization of such minimally altered genomic regions identified using whole genome array CGH will define novel dominant oncogenes and tumor suppressor genes that play important roles in the pathogenesis of MCL. PMID:15229187

  11. Whole genome sequencing of Ethiopian highlanders reveals conserved hypoxia tolerance genes

    PubMed Central

    2014-01-01

    Background Although it has long been proposed that genetic factors contribute to adaptation to high altitude, such factors remain largely unverified. Recent advances in high-throughput sequencing have made it feasible to analyze genome-wide patterns of genetic variation in human populations. Since traditionally such studies surveyed only a small fraction of the genome, interpretation of the results was limited. Results We report here the results of the first whole genome resequencing-based analysis identifying genes that likely modulate high altitude adaptation in native Ethiopians residing at 3,500 m above sea level on Bale Plateau or Chennek field in Ethiopia. Using cross-population tests of selection, we identify regions with a significant loss of diversity, indicative of a selective sweep. We focus on a 208 kbp gene-rich region on chromosome 19, which is significant in both of the Ethiopian subpopulations sampled. This region contains eight protein-coding genes and spans 135 SNPs. To elucidate its potential role in hypoxia tolerance, we experimentally tested whether individual genes from the region affect hypoxia tolerance in Drosophila. Three genes significantly impact survival rates in low oxygen: cic, an ortholog of human CIC, Hsl, an ortholog of human LIPE, and Paf-AHα, an ortholog of human PAFAH1B3. Conclusions Our study reveals evolutionarily conserved genes that modulate hypoxia tolerance. In addition, we show that many of our results would likely be unattainable using data from exome sequencing or microarray studies. This highlights the importance of whole genome sequencing for investigating adaptation by natural selection. PMID:24555826

  12. Performance Evaluation of NIPT in Detection of Chromosomal Copy Number Variants Using Low-Coverage Whole-Genome Sequencing of Plasma DNA

    PubMed Central

    Lin, Linhua; Yin, Xuyang; Wang, Jun; Chen, Dayang; Chen, Fang; Jiang, Hui; Ren, Jinghui; Wang, Wei

    2016-01-01

    Objectives The aim of this study was to assess the performance of noninvasively prenatal testing (NIPT) for fetal copy number variants (CNVs) in clinical samples, using a whole-genome sequencing method. Method A total of 919 archived maternal plasma samples with karyotyping/microarray results, including 33 CNVs samples and 886 normal samples from September 1, 2011 to May 31, 2013, were enrolled in this study. The samples were randomly rearranged and blindly sequenced by low-coverage (about 7M reads) whole-genome sequencing of plasma DNA. Fetal CNVs were detected by Fetal Copy-number Analysis through Maternal Plasma Sequencing (FCAPS) to compare to the karyotyping/microarray results. Sensitivity, specificity and were evaluated. Results 33 samples with deletions/duplications ranging from 1 to 129 Mb were detected with the consistent CNV size and location to karyotyping/microarray results in the study. Ten false positive results and two false negative results were obtained. The sensitivity and specificity of detection deletions/duplications were 84.21% and 98.42%, respectively. Conclusion Whole-genome sequencing-based NIPT has high performance in detecting genome-wide CNVs, in particular >10Mb CNVs using the current FCAPS algorithm. It is possible to implement the current method in NIPT to prenatally screening for fetal CNVs. PMID:27415003

  13. Whole Genome Re-Sequencing of Three Domesticated Chicken Breeds.

    PubMed

    Oh, Dongyep; Son, Bongjun; Mun, Seyoung; Oh, Man Hwan; Oh, Sejong; Ha, Jaejung; Yi, Junkoo; Lee, Seunguk; Han, Kyudong

    2016-02-01

    Chicken is one of the most popular domesticated species worldwide, as it can serve an important role in agricultural as well as biomedical research fields. Because it inhabits almost every continent and presents diverse morphology and traits, the need of genetic markers for distinguishing each breed for various purposes has increased. The whole genome sequencing of three different breeds (White Leghorn, Korean domestic, and Araucana) that show similar coloring patterns, with the exception of the White Leghorn breed, have confirmed previously reported genomic alterations and identified many novel variants. Additionally, the Whole Genome Re-Sequencing (WGRS) approach identified an approximately 4 kb insert within SLCO1B3 responsible for blue egg shell color. Targeted investigation of pigment-related genes corroborated previously reported non-synonymous mutations, and provided deeper insight into chicken coloring, where not a single but a combination of non-synonymous mutations in the MC1R gene is likely to be responsible for altered feather coloring. PMID:26853871

  14. Whole-genome validation of high-information-content fingerprinting.

    PubMed

    Nelson, William M; Bharti, Arvind K; Butler, Ed; Wei, Fusheng; Fuks, Galina; Kim, Hyeran; Wing, Rod A; Messing, Joachim; Soderlund, Carol

    2005-09-01

    Fluorescent-based high-information-content fingerprinting (HICF) techniques have recently been developed for physical mapping. These techniques make use of automated capillary DNA sequencing instruments to enable both high-resolution and high-throughput fingerprinting. In this article, we report the construction of a whole-genome HICF FPC map for maize (Zea mays subsp. mays cv B73), using a variant of HICF in which a type IIS restriction enzyme is used to generate the fluorescently labeled fragments. The HICF maize map was constructed from the same three maize bacterial artificial chromosome libraries as previously used for the whole-genome agarose FPC map, providing a unique opportunity for direct comparison of the agarose and HICF methods; as a result, it was found that HICF has substantially greater sensitivity in forming contigs. An improved assembly procedure is also described that uses automatic end-merging of contigs to reduce the effects of contamination and repetitive bands. Several new features in FPC v7.2 are presented, including shared-memory multiprocessing, which allows dramatically faster assemblies, and automatic end-merging, which permits more accurate assemblies. It is further shown that sequenced clones may be digested in silico and located accurately on the HICF assembly, despite size deviations that prevent the precise prediction of experimental fingerprints. Finally, repetitive bands are isolated, and their effect on the assembly is studied. PMID:16166258

  15. Use of Whole Genome Sequence Data To Infer Baculovirus Phylogeny

    PubMed Central

    Herniou, Elisabeth A.; Luque, Teresa; Chen, Xinwen; Vlak, Just M.; Winstanley, Doreen; Cory, Jennifer S.; O'Reilly, David R.

    2001-01-01

    Several phylogenetic methods based on whole genome sequence data were evaluated using data from nine complete baculovirus genomes. The utility of three independent character sets was assessed. The first data set comprised the sequences of the 63 genes common to these viruses. The second set of characters was based on gene order, and phylogenies were inferred using both breakpoint distance analysis and a novel method developed here, termed neighbor pair analysis. The third set recorded gene content by scoring gene presence or absence in each genome. All three data sets yielded phylogenies supporting the separation of the Nucleopolyhedrovirus (NPV) and Granulovirus (GV) genera, the division of the NPVs into groups I and II, and species relationships within group I NPVs. Generation of phylogenies based on the combined sequences of all 63 shared genes proved to be the most effective approach to resolving the relationships among the group II NPVs and the GVs. The history of gene acquisitions and losses that have accompanied baculovirus diversification was visualized by mapping the gene content data onto the phylogenetic tree. This analysis highlighted the fluid nature of baculovirus genomes, with evidence of frequent genome rearrangements and multiple gene content changes during their evolution. Of more than 416 genes identified in the genomes analyzed, only 63 are present in all nine genomes, and 200 genes are found only in a single genome. Despite this fluidity, the whole genome-based methods we describe are sufficiently powerful to recover the underlying phylogeny of the viruses. PMID:11483757

  16. Whole genome sequencing in clinical and public health microbiology

    PubMed Central

    Kwong, J. C.; McCallum, N.; Sintchenko, V.; Howden, B. P.

    2015-01-01

    SummaryGenomics and whole genome sequencing (WGS) have the capacity to greatly enhance knowledge and understanding of infectious diseases and clinical microbiology. The growth and availability of bench-top WGS analysers has facilitated the feasibility of genomics in clinical and public health microbiology. Given current resource and infrastructure limitations, WGS is most applicable to use in public health laboratories, reference laboratories, and hospital infection control-affiliated laboratories. As WGS represents the pinnacle for strain characterisation and epidemiological analyses, it is likely to replace traditional typing methods, resistance gene detection and other sequence-based investigations (e.g., 16S rDNA PCR) in the near future. Although genomic technologies are rapidly evolving, widespread implementation in clinical and public health microbiology laboratories is limited by the need for effective semi-automated pipelines, standardised quality control and data interpretation, bioinformatics expertise, and infrastructure. PMID:25730631

  17. Whole genome SNP scanning of snow sheep (Ovis nivicola).

    PubMed

    Deniskova, T E; Okhlopkov, I M; Sermyagin, A A; Gladyr', E A; Bagirov, V A; Sölkner, J; Mamaev, N V; Brem, G; Zinov'eva, N A

    2016-07-01

    This is the first report performing the whole genome SNP scanning of snow sheep (Ovis nivicola). Samples of snow sheep (n = 18) collected in six different regions of the Republic of Sakha (Yakutia) from 64° to 71° N. For SNP genotyping, we applied Ovine 50K SNP BeadChip (Illumina, United States), designed for domestic sheep. The total number of genotyped SNPs (call rate 90%) was 47796 (88.1% of total SNPs), wherein 1006 SNPs were polymorphic (2.1%). Principal component analysis (PCA) showed the clear differentiation within the species O. nivicola: studied individuals were distributed among five distinct arrays corresponding to the geographical locations of sampling points. Our results demonstrate that the DNA chip designed for domestic sheep can be successfully used to study the allele pool and the genetic structure of snow sheep populations. PMID:27599514

  18. Genetic analysis of type 1 diabetes using whole genome approaches.

    PubMed Central

    Todd, J A

    1995-01-01

    Whole genome linkage analysis of type 1 diabetes using affected sib pair families and semi-automated genotyping and data capture procedures has shown how type 1 diabetes is inherited. A major proportion of clustering of the disease in families can be accounted for by sharing of alleles at susceptibility loci in the major histocompatibility complex on chromosome 6 (IDDM1) and at a minimum of 11 other loci on nine chromosomes. Primary etiological components of IDDM1, the HLA-DQB1 and -DRB1 class II immune response genes, and of IDDM2, the minisatellite repeat sequence in the 5' regulatory region of the insulin gene on chromosome 11p15, have been identified. Identification of the other loci will involve linkage disequilibrium mapping and sequencing of candidate genes in regions of linkage. PMID:7567975

  19. Whole-genome characterization of chemoresistant ovarian cancer.

    PubMed

    Patch, Ann-Marie; Christie, Elizabeth L; Etemadmoghadam, Dariush; Garsed, Dale W; George, Joshy; Fereday, Sian; Nones, Katia; Cowin, Prue; Alsop, Kathryn; Bailey, Peter J; Kassahn, Karin S; Newell, Felicity; Quinn, Michael C J; Kazakoff, Stephen; Quek, Kelly; Wilhelm-Benartzi, Charlotte; Curry, Ed; Leong, Huei San; Hamilton, Anne; Mileshkin, Linda; Au-Yeung, George; Kennedy, Catherine; Hung, Jillian; Chiew, Yoke-Eng; Harnett, Paul; Friedlander, Michael; Quinn, Michael; Pyman, Jan; Cordner, Stephen; O'Brien, Patricia; Leditschke, Jodie; Young, Greg; Strachan, Kate; Waring, Paul; Azar, Walid; Mitchell, Chris; Traficante, Nadia; Hendley, Joy; Thorne, Heather; Shackleton, Mark; Miller, David K; Arnau, Gisela Mir; Tothill, Richard W; Holloway, Timothy P; Semple, Timothy; Harliwong, Ivon; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Idrisoglu, Senel; Bruxner, Timothy J C; Christ, Angelika N; Poudel, Barsha; Holmes, Oliver; Anderson, Matthew; Leonard, Conrad; Lonie, Andrew; Hall, Nathan; Wood, Scott; Taylor, Darrin F; Xu, Qinying; Fink, J Lynn; Waddell, Nick; Drapkin, Ronny; Stronach, Euan; Gabra, Hani; Brown, Robert; Jewell, Andrea; Nagaraj, Shivashankar H; Markham, Emma; Wilson, Peter J; Ellul, Jason; McNally, Orla; Doyle, Maria A; Vedururu, Ravikiran; Stewart, Collin; Lengyel, Ernst; Pearson, John V; Waddell, Nicola; deFazio, Anna; Grimmond, Sean M; Bowtell, David D L

    2015-05-28

    Patients with high-grade serous ovarian cancer (HGSC) have experienced little improvement in overall survival, and standard treatment has not advanced beyond platinum-based combination chemotherapy, during the past 30 years. To understand the drivers of clinical phenotypes better, here we use whole-genome sequencing of tumour and germline DNA samples from 92 patients with primary refractory, resistant, sensitive and matched acquired resistant disease. We show that gene breakage commonly inactivates the tumour suppressors RB1, NF1, RAD51B and PTEN in HGSC, and contributes to acquired chemotherapy resistance. CCNE1 amplification was common in primary resistant and refractory disease. We observed several molecular events associated with acquired resistance, including multiple independent reversions of germline BRCA1 or BRCA2 mutations in individual patients, loss of BRCA1 promoter methylation, an alteration in molecular subtype, and recurrent promoter fusion associated with overexpression of the drug efflux pump MDR1. PMID:26017449

  20. Patterns of tandem repetition in plant whole genome assemblies.

    PubMed

    Navajas-Pérez, Rafael; Paterson, Andrew H

    2009-06-01

    Tandem repeats often confound large genome assemblies. A survey of tandemly arrayed repetitive sequences was carried out in whole genome sequences of the green alga Chlamydomonas reinhardtii, the moss Physcomitrella patens, the monocots rice and sorghum, and the dicots Arabidopsis thaliana, poplar, grapevine, and papaya, in order to test how these assemblies deal with this fraction of DNA. Our results suggest that plant genome assemblies preferentially include tandem repeats composed of shorter monomeric units (especially dinucleotide and 9-30-bp repeats), while higher repetitive units pose more difficulties to assemble. Nevertheless, notwithstanding that currently available sequencing technologies struggle with higher arrays of repeated DNA, major well-known repetitive elements including centromeric and telomeric repeats as well as high copy-number genes, were found to be reasonably well represented. A database including all tandem repeat sequences characterized here was created to benefit future comparative genomic analyses. PMID:19242726

  1. Origin of the Yeast Whole-Genome Duplication.

    PubMed

    Wolfe, Kenneth H

    2015-08-01

    Whole-genome duplications (WGDs) are rare evolutionary events with profound consequences. They double an organism's genetic content, immediately creating a reproductive barrier between it and its ancestors and providing raw material for the divergence of gene functions between paralogs. Almost all eukaryotic genome sequences bear evidence of ancient WGDs, but the causes of these events and the timing of intermediate steps have been difficult to discern. One of the best-characterized WGDs occurred in the lineage leading to the baker's yeast Saccharomyces cerevisiae. Marcet-Houben and Gabaldón now show that, rather than simply doubling the DNA of a single ancestor, the yeast WGD likely involved mating between two different ancestral species followed by a doubling of the genome to restore fertility. PMID:26252643

  2. Whole-genome sequencing to control antimicrobial resistance

    PubMed Central

    Köser, Claudio U.; Ellington, Matthew J.; Peacock, Sharon J.

    2014-01-01

    Following recent improvements in sequencing technologies, whole-genome sequencing (WGS) is positioned to become an essential tool in the control of antibiotic resistance, a major threat in modern healthcare. WGS has already found numerous applications in this area, ranging from the development of novel antibiotics and diagnostic tests through to antibiotic stewardship of currently available drugs via surveillance and the elucidation of the factors that allow the emergence and persistence of resistance. Numerous proof-of-principle studies have also highlighted the value of WGS as a tool for day-to-day infection control and, for some pathogens, as a primary diagnostic tool to detect antibiotic resistance. However, appropriate data analysis platforms will need to be developed before routine WGS can be introduced on a large scale. PMID:25096945

  3. Whole-genome CNV analysis: advances in computational approaches

    PubMed Central

    Pirooznia, Mehdi; Goes, Fernando S.; Zandi, Peter P.

    2015-01-01

    Accumulating evidence indicates that DNA copy number variation (CNV) is likely to make a significant contribution to human diversity and also play an important role in disease susceptibility. Recent advances in genome sequencing technologies have enabled the characterization of a variety of genomic features, including CNVs. This has led to the development of several bioinformatics approaches to detect CNVs from next-generation sequencing data. Here, we review recent advances in CNV detection from whole genome sequencing. We discuss the informatics approaches and current computational tools that have been developed as well as their strengths and limitations. This review will assist researchers and analysts in choosing the most suitable tools for CNV analysis as well as provide suggestions for new directions in future development. PMID:25918519

  4. Origin of the Yeast Whole-Genome Duplication

    PubMed Central

    Wolfe, Kenneth H.

    2015-01-01

    Whole-genome duplications (WGDs) are rare evolutionary events with profound consequences. They double an organism’s genetic content, immediately creating a reproductive barrier between it and its ancestors and providing raw material for the divergence of gene functions between paralogs. Almost all eukaryotic genome sequences bear evidence of ancient WGDs, but the causes of these events and the timing of intermediate steps have been difficult to discern. One of the best-characterized WGDs occurred in the lineage leading to the baker’s yeast Saccharomyces cerevisiae. Marcet-Houben and Gabaldón now show that, rather than simply doubling the DNA of a single ancestor, the yeast WGD likely involved mating between two different ancestral species followed by a doubling of the genome to restore fertility. PMID:26252643

  5. Polyploidy in fungi: evolution after whole-genome duplication

    PubMed Central

    Albertin, Warren; Marullo, Philippe

    2012-01-01

    Polyploidy is a major evolutionary process in eukaryotes—particularly in plants and, to a less extent, in animals, wherein several past and recent whole-genome duplication events have been described. Surprisingly, the incidence of polyploidy in other eukaryote kingdoms, particularly within fungi, remained largely disregarded by the scientific community working on the evolutionary consequences of polyploidy. Recent studies have significantly increased our knowledge of the occurrence and evolutionary significance of fungal polyploidy. The ecological, structural and functional consequences of polyploidy in fungi are reviewed here and compared with the knowledge acquired with conventional plant and animal models. In particular, the genus Saccharomyces emerges as a relevant model for polyploid studies, in addition to plant and animal models. PMID:22492065

  6. Whole Genome Phylogeny of Bacillus by Feature Frequency Profiles (FFP)

    PubMed Central

    Wang, Aisuo; Ash, Gavin J.

    2015-01-01

    Fifty complete Bacillus genome sequences and associated plasmids were compared using the “feature frequency profile” (FFP) method. The resulting whole-genome phylogeny supports the placement of three Bacillus species (B. thuringiensis, B. anthracis and B. cereus) as a single clade. The monophyletic status of B. anthracis was strongly supported by the analysis. FFP proved to be more effective in inferring the phylogeny of Bacillus than methods based on single gene sequences [16s rRNA gene, GryB (gyrase subunit B) and AroE (shikimate-5-dehydrogenase)] analyses. The findings of FFP analysis were verified using kSNP v2 (alignment-free sequence analysis method) and Harvest suite (core genome sequence alignment method).

  7. Plantagora: Modeling Whole Genome Sequencing and Assembly of Plant Genomes

    PubMed Central

    Barthelson, Roger; McFarlin, Adam J.; Rounsley, Steven D.; Young, Sarah

    2011-01-01

    Background Genomics studies are being revolutionized by the next generation sequencing technologies, which have made whole genome sequencing much more accessible to the average researcher. Whole genome sequencing with the new technologies is a developing art that, despite the large volumes of data that can be produced, may still fail to provide a clear and thorough map of a genome. The Plantagora project was conceived to address specifically the gap between having the technical tools for genome sequencing and knowing precisely the best way to use them. Methodology/Principal Findings For Plantagora, a platform was created for generating simulated reads from several different plant genomes of different sizes. The resulting read files mimicked either 454 or Illumina reads, with varying paired end spacing. Thousands of datasets of reads were created, most derived from our primary model genome, rice chromosome one. All reads were assembled with different software assemblers, including Newbler, Abyss, and SOAPdenovo, and the resulting assemblies were evaluated by an extensive battery of metrics chosen for these studies. The metrics included both statistics of the assembly sequences and fidelity-related measures derived by alignment of the assemblies to the original genome source for the reads. The results were presented in a website, which includes a data graphing tool, all created to help the user compare rapidly the feasibility and effectiveness of different sequencing and assembly strategies prior to testing an approach in the lab. Some of our own conclusions regarding the different strategies were also recorded on the website. Conclusions/Significance Plantagora provides a substantial body of information for comparing different approaches to sequencing a plant genome, and some conclusions regarding some of the specific approaches. Plantagora also provides a platform of metrics and tools for studying the process of sequencing and assembly further. PMID:22174807

  8. Endothelial Cell Whole Genome Expression Analysis in a Mouse Model of Early-Onset Fuchs' Endothelial Corneal Dystrophy

    PubMed Central

    Matthaei, Mario; Hu, Jianfei; Meng, Huan; Lackner, Eva-Maria; Eberhart, Charles G.; Qian, Jiang; Hao, Haiping; Jun, Albert S.

    2013-01-01

    Purpose. To investigate the endothelial gene expression profile in a Col8a2 Q455K mutant knock-in mouse model of early-onset Fuchs' endothelial corneal dystrophy (FECD) and identify potential targets that can be correlated to human late-onset FECD. Methods. Diseased or normal endothelial phenotypes were verified in 12-month-old homozygous Col8a2Q455K/Q455K mutant and wild-type mice by clinical confocal microscopy. An endothelial whole genome expression profile was generated by microarray-based analysis. Result validation was performed by real-time PCR. Endothelial COX2 and JUN expression was further studied in human late-onset FECD compared to normal samples. Results. Microarray analysis demonstrated endothelial expression of 24,538 genes (162 up-regulated and 172 down-regulated targets) and identified affected gene ontology terms including Response to Stress, Protein Metabolic Process, Protein Folding, Regulation of Apoptosis, and Transporter Activity. Real-time PCR assessment confirmed increased Cox2 (P = 0.001) and Jun mRNA (P = 0.03) levels in Col8a2Q455K/Q455K mutant compared to wild-type mice. In human FECD samples, real-time PCR demonstrated a statistically significant increase in COX2 mRNA (P < 0.0001) and JUN mRNA (P = 0.002) and tissue microarray analysis showed increased endothelial COX2 (P = 0.02) and JUN protein (P = 0.04). Conclusions. The present study provides the first endothelial whole genome expression analysis in an animal model of FECD and represents a useful resource for future studies of the disease. In particular endothelial COX2 up-regulation warrants further investigation of its role in FECD. PMID:23449721

  9. Comparative whole-genome analysis of virulent and avirulent strains of Porphyromonas gingivalis.

    PubMed

    Chen, Tsute; Hosogi, Yumiko; Nishikawa, Kiyoshi; Abbey, Kevin; Fleischmann, Robert D; Walling, Jennifer; Duncan, Margaret J

    2004-08-01

    We used Porphyromonas gingivalis gene microarrays to compare the total gene contents of the virulent strain W83 and the avirulent type strain, ATCC 33277. Signal ratios and scatter plots indicated that the chromosomes were very similar, with approximately 93% of the predicted genes in common, while at least 7% of them showed very low or no signals in ATCC 33277. Verification of the array results by PCR indicated that several of the disparate genes were either absent from or variant in ATCC 33277. Divergent features included already reported insertion sequences and ragB, as well as additional hypothetical and functionally assigned genes. Several of the latter were organized in a putative operon in W83 and encoded enzymes involved in capsular polysaccharide synthesis. Another cluster was associated with two paralogous regions of the chromosome with a low G+C content, at 41%, compared to that of the whole genome, at 48%. These regions also contained conserved and species-specific hypothetical genes, transposons, insertion sequences, and integrases and were located adjacent to tRNA genes; thus, they had several characteristics of pathogenicity islands. While this global comparative analysis showed the close relationship between W83 and ATCC 33277, the clustering of genes that are present in W83 but divergent in or absent from ATCC 33277 is suggestive of chromosomal islands that may have been acquired by lateral gene transfer. PMID:15292149

  10. Concurrent Whole-Genome Haplotyping and Copy-Number Profiling of Single Cells

    PubMed Central

    Zamani Esteki, Masoud; Dimitriadou, Eftychia; Mateiu, Ligia; Melotte, Cindy; Van der Aa, Niels; Kumar, Parveen; Das, Rakhi; Theunis, Koen; Cheng, Jiqiu; Legius, Eric; Moreau, Yves; Debrock, Sophie; D’Hooghe, Thomas; Verdyck, Pieter; De Rycke, Martine; Sermon, Karen; Vermeesch, Joris R.; Voet, Thierry

    2015-01-01

    Methods for haplotyping and DNA copy-number typing of single cells are paramount for studying genomic heterogeneity and enabling genetic diagnosis. Before analyzing the DNA of a single cell by microarray or next-generation sequencing, a whole-genome amplification (WGA) process is required, but it substantially distorts the frequency and composition of the cell’s alleles. As a consequence, haplotyping methods suffer from error-prone discrete SNP genotypes (AA, AB, BB) and DNA copy-number profiling remains difficult because true DNA copy-number aberrations have to be discriminated from WGA artifacts. Here, we developed a single-cell genome analysis method that reconstructs genome-wide haplotype architectures as well as the copy-number and segregational origin of those haplotypes by employing phased parental genotypes and deciphering WGA-distorted SNP B-allele fractions via a process we coin haplarithmisis. We demonstrate that the method can be applied as a generic method for preimplantation genetic diagnosis on single cells biopsied from human embryos, enabling diagnosis of disease alleles genome wide as well as numerical and structural chromosomal anomalies. Moreover, meiotic segregation errors can be distinguished from mitotic ones. PMID:25983246

  11. DNA Microarrays

    NASA Astrophysics Data System (ADS)

    Nguyen, C.; Gidrol, X.

    Genomics has revolutionised biological and biomedical research. This revolution was predictable on the basis of its two driving forces: the ever increasing availability of genome sequences and the development of new technology able to exploit them. Up until now, technical limitations meant that molecular biology could only analyse one or two parameters per experiment, providing relatively little information compared with the great complexity of the systems under investigation. This gene by gene approach is inadequate to understand biological systems containing several thousand genes. It is essential to have an overall view of the DNA, RNA, and relevant proteins. A simple inventory of the genome is not sufficient to understand the functions of the genes, or indeed the way that cells and organisms work. For this purpose, functional studies based on whole genomes are needed. Among these new large-scale methods of molecular analysis, DNA microarrays provide a way of studying the genome and the transcriptome. The idea of integrating a large amount of data derived from a support with very small area has led biologists to call these chips, borrowing the term from the microelectronics industry. At the beginning of the 1990s, the development of DNA chips on nylon membranes [1, 2], then on glass [3] and silicon [4] supports, made it possible for the first time to carry out simultaneous measurements of the equilibrium concentration of all the messenger RNA (mRNA) or transcribed RNA in a cell. These microarrays offer a wide range of applications, in both fundamental and clinical research, providing a method for genome-wide characterisation of changes occurring within a cell or tissue, as for example in polymorphism studies, detection of mutations, and quantitative assays of gene copies. With regard to the transcriptome, it provides a way of characterising differentially expressed genes, profiling given biological states, and identifying regulatory channels.

  12. Application of Whole-Genome Sequencing to an Unusual Outbreak of Invasive Group A Streptococcal Disease

    PubMed Central

    Galloway-Peña, Jessica; Clement, Meredith E.; Sharma Kuinkel, Batu K.; Ruffin, Felicia; Flores, Anthony R.; Levinson, Howard; Shelburne, Samuel A.; Moore, Zack; Fowler, Vance G.

    2016-01-01

    Whole-genome analysis was applied to investigate atypical point-source transmission of 2 invasive group A streptococcal (GAS) infections. Isolates were serotype M4, ST39, and genetically indistinguishable. Comparison with MGAS10750 revealed nonsynonymous polymorphisms in ropB and increased speB transcription. This study demonstrates the usefulness of whole-genome analyses for GAS outbreaks. PMID:27006966

  13. Identification of Candidate Genes in Rice for Resistance to Sheath Blight Disease by Whole Genome Sequencing

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent advances in whole genome sequencing have allowed identification of genes for disease susceptibility in humans. The objective of our research was to exploit whole genome sequences of 13 rice (Oryza sativa L.) inbred lines to identify non-synonymous SNPs (nsSNPs) and candidate genes for resista...

  14. Whole genome sequencing of Saccharomyces cerevisiae: from genotype to phenotype for improved metabolic engineering applications

    PubMed Central

    2010-01-01

    Background The need for rapid and efficient microbial cell factory design and construction are possible through the enabling technology, metabolic engineering, which is now being facilitated by systems biology approaches. Metabolic engineering is often complimented by directed evolution, where selective pressure is applied to a partially genetically engineered strain to confer a desirable phenotype. The exact genetic modification or resulting genotype that leads to the improved phenotype is often not identified or understood to enable further metabolic engineering. Results In this work we performed whole genome high-throughput sequencing and annotation can be used to identify single nucleotide polymorphisms (SNPs) between Saccharomyces cerevisiae strains S288c and CEN.PK113-7D. The yeast strain S288c was the first eukaryote sequenced, serving as the reference genome for the Saccharomyces Genome Database, while CEN.PK113-7D is a preferred laboratory strain for industrial biotechnology research. A total of 13,787 high-quality SNPs were detected between both strains (reference strain: S288c). Considering only metabolic genes (782 of 5,596 annotated genes), a total of 219 metabolism specific SNPs are distributed across 158 metabolic genes, with 85 of the SNPs being nonsynonymous (e.g., encoding amino acid modifications). Amongst metabolic SNPs detected, there was pathway enrichment in the galactose uptake pathway (GAL1, GAL10) and ergosterol biosynthetic pathway (ERG8, ERG9). Physiological characterization confirmed a strong deficiency in galactose uptake and metabolism in S288c compared to CEN.PK113-7D, and similarly, ergosterol content in CEN.PK113-7D was significantly higher in both glucose and galactose supplemented cultivations compared to S288c. Furthermore, DNA microarray profiling of S288c and CEN.PK113-7D in both glucose and galactose batch cultures did not provide a clear hypothesis for major phenotypes observed, suggesting that genotype to phenotype

  15. Alignathon: a competitive assessment of whole-genome alignment methods.

    PubMed

    Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia-Ming; Erb, Ionas; Poliakov, Alexander; Hou, Minmei; Herrero, Javier; Kent, William James; Solovyev, Victor; Darling, Aaron E; Ma, Jian; Notredame, Cedric; Brudno, Michael; Dubchak, Inna; Haussler, David; Paten, Benedict

    2014-12-01

    Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments. PMID:25273068

  16. Information recovery from low coverage whole-genome bisulfite sequencing

    PubMed Central

    Libertini, Emanuele; Heath, Simon C.; Hamoudi, Rifat A.; Gut, Marta; Ziller, Michael J.; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G.; Frontini, Mattia; Ouwehand, Willem H.; Meissner, Alexander; Gut, Ivo G.; Beck, Stephan

    2016-01-01

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future. PMID:27346250

  17. Whole-genome sequencing of nine esophageal adenocarcinoma cell lines.

    PubMed

    Contino, Gianmarco; Eldridge, Matthew D; Secrier, Maria; Bower, Lawrence; Fels Elliott, Rachael; Weaver, Jamie; Lynch, Andy G; Edwards, Paul A W; Fitzgerald, Rebecca C

    2016-01-01

    Esophageal adenocarcinoma (EAC) is highly mutated and molecularly heterogeneous. The number of cell lines available for study is limited and their genome has been only partially characterized. The availability of an accurate annotation of their mutational landscape is crucial for accurate experimental design and correct interpretation of genotype-phenotype findings. We performed high coverage, paired end whole genome sequencing on eight EAC cell lines-ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4-all verified against original patient material, and one esophageal high grade dysplasia cell line, CP-D. We have made available the aligned sequence data and report single nucleotide variants (SNVs), small insertions and deletions (indels), and copy number alterations, identified by comparison with the human reference genome and known single nucleotide polymorphisms (SNPs). We compare these putative mutations to mutations found in primary tissue EAC samples, to inform the use of these cell lines as a model of EAC. PMID:27594985

  18. Signatures of selection in tilapia revealed by whole genome resequencing

    PubMed Central

    Hong Xia, Jun; Bai, Zhiyi; Meng, Zining; Zhang, Yong; Wang, Le; Liu, Feng; Jing, Wu; Yi Wan, Zi; Li, Jiale; Lin, Haoran; Hua Yue, Gen

    2015-01-01

    Natural selection and selective breeding for genetic improvement have left detectable signatures within the genome of a species. Identification of selection signatures is important in evolutionary biology and for detecting genes that facilitate to accelerate genetic improvement. However, selection signatures, including artificial selection and natural selection, have only been identified at the whole genome level in several genetically improved fish species. Tilapia is one of the most important genetically improved fish species in the world. Using next-generation sequencing, we sequenced the genomes of 47 tilapia individuals. We identified a total of 1.43 million high-quality SNPs and found that the LD block sizes ranged from 10–100 kb in tilapia. We detected over a hundred putative selective sweep regions in each line of tilapia. Most selection signatures were located in non-coding regions of the tilapia genome. The Wnt signaling, gonadotropin-releasing hormone receptor and integrin signaling pathways were under positive selection in all improved tilapia lines. Our study provides a genome-wide map of genetic variation and selection footprints in tilapia, which could be important for genetic studies and accelerating genetic improvement of tilapia. PMID:26373374

  19. Whole genome sequences of two octogenarians with sustained cognitive abilities

    PubMed Central

    Nickles, Dorothee; Madireddy, Lohith; Patel, Nihar; Isobe, Noriko; Miller, Bruce L.; Baranzini, Sergio E.; Kramer, Joel H.; Oksenberg, Jorge R.

    2014-01-01

    Although numerous genetic variants affecting aging and mortality have been identified, e.g. APOE ε4, the genetic component influencing cognitive aging has not been fully defined yet. A better knowledge of the genetics of aging will prove helpful in understanding the underlying biological processes. Here, we describe the whole genome sequences of two female octogenarians. We provide the repertoire of genomic variants that the two octogenarians have in common. We also describe the overlap with the previously reported genomes of two supercentenarians - individuals aged ≥ 110 years. We assessed the genetic disease propensities of the octogenarians and non-aged control genomes and could not find support for the hypothesis that long-lived healthy individuals might exhibit greater genetic fitness than the general population. Furthermore, there is no evidence for an accumulation of previously described variants promoting longevity in the two octogenarians. These findings suggest that genetic fitness, as currently defined, is not the sole factor enabling an increased lifespan. We identified a number of healthy-cognitive-aging candidate genetic loci awaiting confirmation in larger studies. PMID:25618617

  20. Whole-Genome Sequencing of Salivary Gland Adenoid Cystic Carcinoma.

    PubMed

    Rettig, Eleni M; Talbot, C Conover; Sausen, Mark; Jones, Sian; Bishop, Justin A; Wood, Laura D; Tokheim, Collin; Niknafs, Noushin; Karchin, Rachel; Fertig, Elana J; Wheelan, Sarah J; Marchionni, Luigi; Considine, Michael; Fakhry, Carole; Papadopoulos, Nickolas; Kinzler, Kenneth W; Vogelstein, Bert; Ha, Patrick K; Agrawal, Nishant

    2016-04-01

    Adenoid cystic carcinomas (ACC) of the salivary glands are challenging to understand, treat, and cure. To better understand the genetic alterations underlying the pathogenesis of these tumors, we performed comprehensive genome analyses of 25 fresh-frozen tumors, including whole-genome sequencing and expression and pathway analyses. In addition to the well-describedMYB-NFIBfusion that was found in 11 tumors (44%), we observed five different rearrangements involving theNFIBtranscription factor gene in seven tumors (28%). Taken together,NFIBtranslocations occurred in 15 of 25 samples (60%, 95% CI, 41%-77%). In addition, mRNA expression analysis of 17 tumors revealed overexpression ofNFIBin ACC tumors compared with normal tissues (P= 0.002). There was no difference inNFIBmRNA expression in tumors withNFIBfusions compared with those without. We also report somatic mutations of genes involved in the axonal guidance and Rho family signaling pathways. Finally, we confirm previously described alterations in genes related to chromatin regulation and Notch signaling. Our findings suggest a separate role forNFIBin ACC oncogenesis and highlight important signaling pathways for future functional characterization and potential therapeutic targeting.Cancer Prev Res; 9(4); 265-74. ©2016 AACR. PMID:26862087

  1. Evolution After Whole-Genome Duplication: A Network Perspective

    PubMed Central

    Zhu, Yun; Lin, Zhenguo; Nakhleh, Luay

    2013-01-01

    Gene duplication plays an important role in the evolution of genomes and interactomes. Elucidating how evolution after gene duplication interplays at the sequence and network level is of great interest. In this work, we analyze a data set of gene pairs that arose through whole-genome duplication (WGD) in yeast. All these pairs have the same duplication time, making them ideal for evolutionary investigation. We investigated the interplay between evolution after WGD at the sequence and network levels and correlated these two levels of divergence with gene expression and fitness data. We find that molecular interactions involving WGD genes evolve at rates that are three orders of magnitude slower than the rates of evolution of the corresponding sequences. Furthermore, we find that divergence of WGD pairs correlates strongly with gene expression and fitness data. Because of the role of gene duplication in determining redundancy in biological systems and particularly at the network level, we investigated the role of interaction networks in elucidating the evolutionary fate of duplicated genes. We find that gene neighborhoods in interaction networks provide a mechanism for inferring these fates, and we developed an algorithm for achieving this task. Further epistasis analysis of WGD pairs categorized by their inferred evolutionary fates demonstrated the utility of these techniques. Finally, we find that WGD pairs and other pairs of paralogous genes of small-scale duplication origin share similar properties, giving good support for generalizing our results from WGD pairs to evolution after gene duplication in general. PMID:24048644

  2. MIPS: analysis and annotation of proteins from whole genomes.

    PubMed

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de). PMID:14681354

  3. Whole genomes redefine the mutational landscape of pancreatic cancer

    PubMed Central

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K.; Kassahn, Karin S.; Bailey, Peter; Johns, Amber L.; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C. J.; Robertson, Alan J.; Fadlullah, Muhammad Z. H.; Bruxner, Tim J. C.; Christ, Angelika N.; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J.; Fink, J. Lynn; Holmes, Oliver; Kazakoff, Stephen H.; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J.; Lee, Hong C.; Jones, Marc D.; Nagrial, Adnan M.; Humphris, Jeremy; Chantrill, Lorraine A.; Chin, Venessa; Steinmann, Angela M.; Mawson, Amanda; Humphrey, Emily S.; Colvin, Emily K.; Chou, Angela; Scarlett, Christopher J.; Pinho, Andreia V.; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S.; Kench, James G.; Pettitt, Jessica A.; Merrett, Neil D.; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q.; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B.; Graham, Janet S.; Niclou, Simone P.; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H.; Maitra, Anirban; Iacobuzio-Donahue, Christine A.; Wolfgang, Christopher L.; Morgan, Richard A.; Lawlor, Rita T.; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A.; Gill, Anthony J.; Eshleman, James R.; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A.; Pearson, John V.; Biankin, Andrew V.; Grimmond, Sean M.

    2015-01-01

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  4. Whole genomes redefine the mutational landscape of pancreatic cancer.

    PubMed

    Waddell, Nicola; Pajic, Marina; Patch, Ann-Marie; Chang, David K; Kassahn, Karin S; Bailey, Peter; Johns, Amber L; Miller, David; Nones, Katia; Quek, Kelly; Quinn, Michael C J; Robertson, Alan J; Fadlullah, Muhammad Z H; Bruxner, Tim J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Wilson, Peter J; Markham, Emma; Cloonan, Nicole; Anderson, Matthew J; Fink, J Lynn; Holmes, Oliver; Kazakoff, Stephen H; Leonard, Conrad; Newell, Felicity; Poudel, Barsha; Song, Sarah; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Wu, Jianmin; Pinese, Mark; Cowley, Mark J; Lee, Hong C; Jones, Marc D; Nagrial, Adnan M; Humphris, Jeremy; Chantrill, Lorraine A; Chin, Venessa; Steinmann, Angela M; Mawson, Amanda; Humphrey, Emily S; Colvin, Emily K; Chou, Angela; Scarlett, Christopher J; Pinho, Andreia V; Giry-Laterriere, Marc; Rooman, Ilse; Samra, Jaswinder S; Kench, James G; Pettitt, Jessica A; Merrett, Neil D; Toon, Christopher; Epari, Krishna; Nguyen, Nam Q; Barbour, Andrew; Zeps, Nikolajs; Jamieson, Nigel B; Graham, Janet S; Niclou, Simone P; Bjerkvig, Rolf; Grützmann, Robert; Aust, Daniela; Hruban, Ralph H; Maitra, Anirban; Iacobuzio-Donahue, Christine A; Wolfgang, Christopher L; Morgan, Richard A; Lawlor, Rita T; Corbo, Vincenzo; Bassi, Claudio; Falconi, Massimo; Zamboni, Giuseppe; Tortora, Giampaolo; Tempero, Margaret A; Gill, Anthony J; Eshleman, James R; Pilarsky, Christian; Scarpa, Aldo; Musgrove, Elizabeth A; Pearson, John V; Biankin, Andrew V; Grimmond, Sean M

    2015-02-26

    Pancreatic cancer remains one of the most lethal of malignancies and a major health burden. We performed whole-genome sequencing and copy number variation (CNV) analysis of 100 pancreatic ductal adenocarcinomas (PDACs). Chromosomal rearrangements leading to gene disruption were prevalent, affecting genes known to be important in pancreatic cancer (TP53, SMAD4, CDKN2A, ARID1A and ROBO2) and new candidate drivers of pancreatic carcinogenesis (KDM6A and PREX2). Patterns of structural variation (variation in chromosomal structure) classified PDACs into 4 subtypes with potential clinical utility: the subtypes were termed stable, locally rearranged, scattered and unstable. A significant proportion harboured focal amplifications, many of which contained druggable oncogenes (ERBB2, MET, FGFR1, CDK6, PIK3R3 and PIK3CA), but at low individual patient prevalence. Genomic instability co-segregated with inactivation of DNA maintenance genes (BRCA1, BRCA2 or PALB2) and a mutational signature of DNA damage repair deficiency. Of 8 patients who received platinum therapy, 4 of 5 individuals with these measures of defective DNA maintenance responded. PMID:25719666

  5. Whole-genome sequencing of nine esophageal adenocarcinoma cell lines

    PubMed Central

    Contino, Gianmarco; Eldridge, Matthew D.; Secrier, Maria; Bower, Lawrence; Fels Elliott, Rachael; Weaver, Jamie; Lynch, Andy G.; Edwards, Paul A.W.; Fitzgerald, Rebecca C.

    2016-01-01

    Esophageal adenocarcinoma (EAC) is highly mutated and molecularly heterogeneous. The number of cell lines available for study is limited and their genome has been only partially characterized. The availability of an accurate annotation of their mutational landscape is crucial for accurate experimental design and correct interpretation of genotype-phenotype findings. We performed high coverage, paired end whole genome sequencing on eight EAC cell lines—ESO26, ESO51, FLO-1, JH-EsoAd1, OACM5.1 C, OACP4 C, OE33, SK-GT-4—all verified against original patient material, and one esophageal high grade dysplasia cell line, CP-D. We have made available the aligned sequence data and report single nucleotide variants (SNVs), small insertions and deletions (indels), and copy number alterations, identified by comparison with the human reference genome and known single nucleotide polymorphisms (SNPs). We compare these putative mutations to mutations found in primary tissue EAC samples, to inform the use of these cell lines as a model of EAC.

  6. Information recovery from low coverage whole-genome bisulfite sequencing.

    PubMed

    Libertini, Emanuele; Heath, Simon C; Hamoudi, Rifat A; Gut, Marta; Ziller, Michael J; Czyz, Agata; Ruotti, Victor; Stunnenberg, Hendrik G; Frontini, Mattia; Ouwehand, Willem H; Meissner, Alexander; Gut, Ivo G; Beck, Stephan

    2016-01-01

    The cost of whole-genome bisulfite sequencing (WGBS) remains a bottleneck for many studies and it is therefore imperative to extract as much information as possible from a given dataset. This is particularly important because even at the recommend 30X coverage for reference methylomes, up to 50% of high-resolution features such as differentially methylated positions (DMPs) cannot be called with current methods as determined by saturation analysis. To address this limitation, we have developed a tool that dynamically segments WGBS methylomes into blocks of comethylation (COMETs) from which lost information can be recovered in the form of differentially methylated COMETs (DMCs). Using this tool, we demonstrate recovery of ∼30% of the lost DMP information content as DMCs even at very low (5X) coverage. This constitutes twice the amount that can be recovered using an existing method based on differentially methylated regions (DMRs). In addition, we explored the relationship between COMETs and haplotypes in lymphoblastoid cell lines of African and European origin. Using best fit analysis, we show COMETs to be correlated in a population-specific manner, suggesting that this type of dynamic segmentation may be useful for integrated (epi)genome-wide association studies in the future. PMID:27346250

  7. Current Developments in Prokaryotic Single Cell Whole Genome Amplification

    SciTech Connect

    Goudeau, Danielle; Nath, Nandita; Ciobanu, Doina; Cheng, Jan-Fang; Malmstrom, Rex

    2014-03-14

    Our approach to prokaryotic single-cell Whole Genome Amplification at the JGI continues to evolve. To increase both the quality and number of single-cell genomes produced, we explore all aspects of the process from cell sorting to sequencing. For example, we now utilize specialized reagents, acoustic liquid handling, and reduced reaction volumes eliminate non-target DNA contamination in WGA reactions. More specifically, we use a cleaner commercial WGA kit from Qiagen that employs a UV decontamination procedure initially developed at the JGI, and we use the Labcyte Echo for tip-less liquid transfer to set up 2uL reactions. Acoustic liquid handling also dramatically reduces reagent costs. In addition, we are exploring new cell lysis methods including treatment with Proteinase K, lysozyme, and other detergents, in order to complement standard alkaline lysis and allow for more efficient disruption of a wider range of cells. Incomplete lysis represents a major hurdle for WGA on some environmental samples, especially rhizosphere, peatland, and other soils. Finding effective lysis strategies that are also compatible with WGA is challenging, and we are currently assessing the impact of various strategies on genome recovery.

  8. Genetic anchoring of whole-genome shotgun assemblies

    PubMed Central

    Mascher, Martin; Stein, Nils

    2014-01-01

    The recent advances in sequencing throughput and genome assembly algorithms have established whole-genome shotgun (WGS) assemblies as the cornerstone of the genomic infrastructure for many species. WGS assemblies can be constructed with comparative ease and give a comprehensive representation of the gene space even of large and complex genomes. One major obstacle in utilizing WGS assemblies for important research applications such as gene isolation or comparative genomics has been the lack of chromosomal positioning and contextualization of short sequence contigs. Assigning chromosomal locations to sequence contigs required the construction and integration of genome-wide physical maps and dense genetic linkage maps as well as synteny to model species. Recently, methods to rapidly construct ultra-dense linkage maps encompassing millions of genetic markers from WGS sequencing data of segregating populations have made possible the direct assignment of genetic positions to short sequence contigs. Here, we review recent developments in the integration of WGS assemblies and sequence-based linkage maps, discuss challenges for further improvement of the methodology and outline possible applications building on genetically anchored WGS assemblies. PMID:25071835

  9. Use of metaphors about exome and whole genome sequencing.

    PubMed

    Nelson, Sarah C; Crouch, Julia M; Bamshad, Michael J; Tabor, Holly K; Yu, Joon-Ho

    2016-05-01

    Clinical and research uses of exome and whole genome sequencing (ES/WGS) are growing rapidly. An enhanced understanding of how individuals conceptualize and communicate about sequencing results is needed to ensure effective, mutual exchange of information between care providers and patients and between researchers and participants. Focus groups and interviews participants were recruited to discuss their attitudes and preferences for receiving hypothetical results from ES/WGS. African Americans were intentionally oversampled. We qualitatively analyzed participants' speech to identify unsolicited metaphorical language pertaining to genes and health, and grouped these occurrences into metaphorical concepts. Participants compared genetic information to physical objects including tools, weapons, contents of boxes, and formal documents or reports. These metaphorical concepts centered on several key themes, including locus of control; containment versus release of information; and desirability, usability, interpretability, and ownership of genetic results. Metaphorical language is often used intentionally or unintentionally in discussions about receiving results from ES/WGS in both clinical and research settings. Awareness of the use of metaphorical language and attention to its varied meanings facilitates effective communication about return of ES/WGS results. In turn, both should foster shared and informed decision-making and improve the translation of genetic information by clinicians and researchers. © 2016 Wiley Periodicals, Inc. PMID:26822973

  10. Alignathon: a competitive assessment of whole-genome alignment methods

    PubMed Central

    Earl, Dent; Nguyen, Ngan; Hickey, Glenn; Harris, Robert S.; Fitzgerald, Stephen; Beal, Kathryn; Seledtsov, Igor; Molodtsov, Vladimir; Raney, Brian J.; Clawson, Hiram; Kim, Jaebum; Kemena, Carsten; Chang, Jia-Ming; Erb, Ionas; Poliakov, Alexander; Hou, Minmei; Herrero, Javier; Kent, William James; Solovyev, Victor; Darling, Aaron E.; Ma, Jian; Notredame, Cedric; Brudno, Michael; Dubchak, Inna; Haussler, David; Paten, Benedict

    2014-01-01

    Multiple sequence alignments (MSAs) are a prerequisite for a wide variety of evolutionary analyses. Published assessments and benchmark data sets for protein and, to a lesser extent, global nucleotide MSAs are available, but less effort has been made to establish benchmarks in the more general problem of whole-genome alignment (WGA). Using the same model as the successful Assemblathon competitions, we organized a competitive evaluation in which teams submitted their alignments and then assessments were performed collectively after all the submissions were received. Three data sets were used: Two were simulated and based on primate and mammalian phylogenies, and one was comprised of 20 real fly genomes. In total, 35 submissions were assessed, submitted by 10 teams using 12 different alignment pipelines. We found agreement between independent simulation-based and statistical assessments, indicating that there are substantial accuracy differences between contemporary alignment tools. We saw considerable differences in the alignment quality of differently annotated regions and found that few tools aligned the duplications analyzed. We found that many tools worked well at shorter evolutionary distances, but fewer performed competitively at longer distances. We provide all data sets, submissions, and assessment programs for further study and provide, as a resource for future benchmarking, a convenient repository of code and data for reproducing the simulation assessments. PMID:25273068

  11. Clinical use of whole genome sequencing for Mycobacterium tuberculosis.

    PubMed

    Witney, Adam A; Cosgrove, Catherine A; Arnold, Amber; Hinds, Jason; Stoker, Neil G; Butcher, Philip D

    2016-01-01

    Drug-resistant tuberculosis (TB) remains a major challenge to global health and to healthcare in the UK. In 2014, a total of 6,520 cases of TB were recorded in England, of which 1.4 % were multidrug-resistant TB (MDR-TB). Extensively drug-resistant TB (XDR-TB) occurs at a much lower rate, but the impact on the patient and hospital is severe. Current diagnostic methods such as drug susceptibility testing and targeted molecular tests are slow to return or examine only a limited number of target regions, respectively. Faster, more comprehensive diagnostics will enable earlier use of the most appropriate drug regimen, thus improving patient outcomes and reducing overall healthcare costs. Whole genome sequencing (WGS) has been shown to provide a rapid and comprehensive view of the genotype of the organism, and thus enable reliable prediction of the drug susceptibility phenotype within a clinically relevant timeframe. In addition, it provides the highest resolution when investigating transmission events in possible outbreak scenarios. However, robust software and database tools need to be developed for the full potential to be realized in this specialized area of medicine. PMID:27004841

  12. Clinical value of whole-genome sequencing of Mycobacterium tuberculosis.

    PubMed

    Takiff, Howard E; Feo, Oscar

    2015-09-01

    Whole-genome sequencing (WGS) is now common as a result of new technologies that can rapidly sequence a complete bacterial genome for US$500 or less. Many studies have addressed questions about tuberculosis with WGS, and knowing the sequence of the entire genome, rather than only a few fragments, has greatly increased the precision of molecular epidemiology and contact tracing. Additionally, topics such as the mutation rate, drug resistance, the target of new drugs, and the phylogeny and evolution of the Mycobacterium tuberculosis complex bacteria have been elucidated by WGS. Nonetheless, WGS has not explained differences in transmissibility between strains, or why some strains are more virulent than others or more prone to development of multidrug resistance. With advances in technology, WGS of clinical specimens could become routine in high-income countries; however, its relevance will probably depend on easy to use software to efficiently process the sequences produced and accessible genomic databases that can be mined in future studies. PMID:26277037

  13. Whole genome shotgun assembly in theory and practice

    NASA Astrophysics Data System (ADS)

    Chapman, Jarrod Andrew

    The subject of this dissertation is the development of novel analytical and algorithmic approaches to the fragment assembly problem in the context of the Whole Genome Shotgun (WGS) DNA sequencing strategy. A collection of analyses and methods centered on the computational reconstruction of genomic DNA sequence from randomly sampled genome fragments, with particular focus on applications to large, polymorphic, and inhomogeneous datasets are presented. Several novel pre-assembly WGS data analyses are described including assessment of genome size, sequence uniformity, and repetitive element content with particular emphasis on the establishment of standardized quality assurance metrics for large WGS sequencing projects. A theoretical framework for understanding the statistical properties of WGS assemblies in the presence of paired-end sequence data is discussed and the algorithmic sub-problems of quality-based sequence trimming, global pairwise alignment detection, and consensus sequence generation are treated. Finally, as a novel application of these analyses and methods, the results of a collaboration to produce the first WGS sequence reconstruction of a community sample from a natural environment are presented.

  14. Whole-genome sequencing reveals oncogenic mutations in mycosis fungoides

    PubMed Central

    McGirt, Laura Y.; Jia, Peilin; Baerenwald, Devin A.; Duszynski, Robert J.; Dahlman, Kimberly B.; Zic, John A.; Zwerner, Jeffrey P.; Hucks, Donald; Dave, Utpal; Zhao, Zhongming

    2015-01-01

    The pathogenesis of mycosis fungoides (MF), the most common cutaneous T-cell lymphoma (CTCL), is unknown. Although genetic alterations have been identified, none are considered consistently causative in MF. To identify potential drivers of MF, we performed whole-genome sequencing of MF tumors and matched normal skin. Targeted ultra-deep sequencing of MF samples and exome sequencing of CTCL cell lines were also performed. Multiple mutations were identified that affected the same pathways, including epigenetic, cell-fate regulation, and cytokine signaling, in MF tumors and CTCL cell lines. Specifically, interleukin-2 signaling pathway mutations, including activating Janus kinase 3 (JAK3) mutations, were detected. Treatment with a JAK3 inhibitor significantly reduced CTCL cell survival. Additionally, the mutation data identified 2 other potential contributing factors to MF, ultraviolet light, and a polymorphism in the tumor suppressor p53 (TP53). Therefore, genetic alterations in specific pathways in MF were identified that may be viable, effective new targets for treatment. PMID:26082451

  15. Are physicians prepared for whole genome sequencing? a qualitative analysis.

    PubMed

    Christensen, K D; Vassy, J L; Jamal, L; Lehmann, L S; Slashinski, M J; Perry, D L; Robinson, J O; Blumenthal-Barby, J; Feuerman, L Z; Murray, M F; Green, R C; McGuire, A L

    2016-02-01

    Although the integration of whole genome sequencing (WGS) into standard medical practice is rapidly becoming feasible, physicians may be unprepared to use it. Primary care physicians (PCPs) and cardiologists enrolled in a randomized clinical trial of WGS received genomics education before completing semi-structured interviews. Themes about preparedness were identified in transcripts through team-based consensus-coding. Data from 11 PCPs and 9 cardiologists suggested that physicians enrolled in the trial primarily to prepare themselves for widespread use of WGS in the future. PCPs were concerned about their general genomic knowledge, while cardiologists were concerned about how to interpret specific types of results and secondary findings. Both cohorts anticipated preparing extensively before disclosing results to patients by using educational resources with which they were already familiar, and both cohorts anticipated making referrals to genetics specialists as needed. A lack of laboratory guidance, time pressures, and a lack of standards contributed to feeling unprepared. Physicians had specialty-specific concerns about their preparedness to use WGS. Findings identify specific policy changes that could help physicians feel more prepared, and highlight how providers of all types will need to become familiar with interpreting WGS results. PMID:26080898

  16. Whole Genome Analysis of Epidemiologically Closely Related Staphylococcus aureus Isolates

    PubMed Central

    Schijffelen, Maarten; Konstantinov, Sergey R.; Lina, Gérard; Spiliopoulou, Iris; van Duijkeren, Engeline; Brouwer, Ellen C.; Fluit, Ad C.

    2013-01-01

    The change of the bacteria from colonizers to pathogens is accompanied by a drastic change in expression profiles. These changes may be due to environmental signals or to mutational changes. We therefore compared the whole genome sequences of four sets of S. aureus isolates. Three sets were from the same patients. The isolates of each pair (S1800/S1805, S2396/S2395, S2398/S2397, an isolate from colonization and an isolate from infection, respectively) were obtained within <30 days of each other and the isolate from infection caused skin infections. The isolates were then compared for differences in gene content and SNPs. In addition, a set of isolates from a colonized pig and a farmer from the same farm at the same time (S0462 and S0460) were analyzed. The isolates pair S1800/S1805 showed a difference in a prophage, but these are easily lost or acquired. However, S1805 contained an integrative conjugative element not present in S1800. In addition, 92 SNPs were present in a variety of genes and the isolates S1800 and S1805 were not considered a pair. Between S2395/S2396 two SNPs were present: one was in an intergenic region and one was a synonymous mutation in a putative membrane protein. Between S2397/S2398 only one synonymous mutation in a putative lipoprotein was found. The two farm isolates were very similar and showed 12 SNPs in genes that belong to a number of different functional categories. However, we cannot pinpoint any gene that explains the change from carrier status to infection. The data indicate that differences between the isolate from infection and the colonizing isolate for S2395/S2396 and S2397/S2398 exist as well as between isolates from different hosts, but S1800/S1805 are not clonal. PMID:24205205

  17. INTEGRATE: gene fusion discovery using whole genome and transcriptome data

    PubMed Central

    Zhang, Jin; White, Nicole M.; Schmidt, Heather K.; Fulton, Robert S.; Tomlinson, Chad; Warren, Wesley C.; Wilson, Richard K.; Maher, Christopher A.

    2016-01-01

    While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use. PMID:26556708

  18. A whole genome association study on meat palatability in hanwoo.

    PubMed

    Hyeong, K-E; Lee, Y-M; Kim, Y-S; Nam, K C; Jo, C; Lee, K-H; Lee, J-E; Kim, J-J

    2014-09-01

    A whole genome association (WGA) study was carried out to find quantitative trait loci (QTL) for sensory evaluation traits in Hanwoo. Carcass samples of 250 Hanwoo steers were collected from National Agricultural Cooperative Livestock Research Institute, Ansung, Gyeonggi province, Korea, between 2011 and 2012 and genotyped with the Affymetrix Bovine Axiom Array 640K single nucleotide polymorphism (SNP) chip. Among the SNPs in the chip, a total of 322,160 SNPs were chosen after quality control tests. After adjusting for the effects of age, slaughter-year-season, and polygenic effects using genome relationship matrix, the corrected phenotypes for the sensory evaluation measurements were regressed on each SNP using a simple linear regression additive based model. A total of 1,631 SNPs were detected for color, aroma, tenderness, juiciness and palatability at 0.1% comparison-wise level. Among the significant SNPs, the best set of 52 SNP markers were chosen using a forward regression procedure at 0.05 level, among which the sets of 8, 14, 11, 10, and 9 SNPs were determined for the respectively sensory evaluation traits. The sets of significant SNPs explained 18% to 31% of phenotypic variance. Three SNPs were pleiotropic, i.e. AX-26703353 and AX-26742891 that were located at 101 and 110 Mb of BTA6, respectively, influencing tenderness, juiciness and palatability, while AX-18624743 at 3 Mb of BTA10 affected tenderness and palatability. Our results suggest that some QTL for sensory measures are segregating in a Hanwoo steer population. Additional WGA studies on fatty acid and nutritional components as well as the sensory panels are in process to characterize genetic architecture of meat quality and palatability in Hanwoo. PMID:25178363

  19. Whole-genome haplotyping by dilution, amplification, and sequencing

    PubMed Central

    Kaper, Fiona; Swamy, Sajani; Klotzle, Brandy; Munchel, Sarah; Cottrell, Joseph; Bibikova, Marina; Chuang, Han-Yu; Kruglyak, Semyon; Ronaghi, Mostafa; Eberle, Michael A.; Fan, Jian-Bing

    2013-01-01

    Standard whole-genome genotyping technologies are unable to determine haplotypes. Here we describe a method for rapid and cost-effective long-range haplotyping. Genomic DNA is diluted and distributed into multiple aliquots such that each aliquot receives a fraction of a haploid copy. The DNA template in each aliquot is amplified by multiple displacement amplification, converted into barcoded sequencing libraries using Nextera technology, and sequenced in multiplexed pools. To assess the performance of our method, we combined two male genomic DNA samples at equal ratios, resulting in a sample with diploid X chromosomes with known haplotypes. Pools of the multiplexed sequencing libraries were subjected to targeted pull-down of a 1-Mb contiguous region of the X-chromosome Duchenne muscular dystrophy gene. We were able to phase the Duchenne muscular dystrophy region into two contiguous haplotype blocks with a mean length of 494 kb. The haplotypes showed 99% agreement with the consensus base calls made by sequencing the individual DNAs. We subsequently used the strategy to haplotype two human genomes. Standard genomic sequencing to identify all heterozygous SNPs in the sample was combined with dilution-amplification–based sequencing data to resolve the phase of identified heterozygous SNPs. Using this procedure, we were able to phase >95% of the heterozygous SNPs from the diploid sequence data. The N50 for a Yoruba male DNA was 702 kb whereas the N50 for a European female DNA was 358 kb. Therefore, the strategy described here is suitable for haplotyping of a set of targeted regions as well as of the entire genome. PMID:23509297

  20. Whole-genome haplotyping by dilution, amplification, and sequencing.

    PubMed

    Kaper, Fiona; Swamy, Sajani; Klotzle, Brandy; Munchel, Sarah; Cottrell, Joseph; Bibikova, Marina; Chuang, Han-Yu; Kruglyak, Semyon; Ronaghi, Mostafa; Eberle, Michael A; Fan, Jian-Bing

    2013-04-01

    Standard whole-genome genotyping technologies are unable to determine haplotypes. Here we describe a method for rapid and cost-effective long-range haplotyping. Genomic DNA is diluted and distributed into multiple aliquots such that each aliquot receives a fraction of a haploid copy. The DNA template in each aliquot is amplified by multiple displacement amplification, converted into barcoded sequencing libraries using Nextera technology, and sequenced in multiplexed pools. To assess the performance of our method, we combined two male genomic DNA samples at equal ratios, resulting in a sample with diploid X chromosomes with known haplotypes. Pools of the multiplexed sequencing libraries were subjected to targeted pull-down of a 1-Mb contiguous region of the X-chromosome Duchenne muscular dystrophy gene. We were able to phase the Duchenne muscular dystrophy region into two contiguous haplotype blocks with a mean length of 494 kb. The haplotypes showed 99% agreement with the consensus base calls made by sequencing the individual DNAs. We subsequently used the strategy to haplotype two human genomes. Standard genomic sequencing to identify all heterozygous SNPs in the sample was combined with dilution-amplification-based sequencing data to resolve the phase of identified heterozygous SNPs. Using this procedure, we were able to phase >95% of the heterozygous SNPs from the diploid sequence data. The N50 for a Yoruba male DNA was 702 kb whereas the N50 for a European female DNA was 358 kb. Therefore, the strategy described here is suitable for haplotyping of a set of targeted regions as well as of the entire genome. PMID:23509297

  1. New wheat microRNA using whole-genome sequence.

    PubMed

    Kurtoglu, Kuaybe Yucebilgili; Kantar, Melda; Budak, Hikmet

    2014-06-01

    MicroRNAs are post-transcriptional regulators of gene expression, taking roles in a variety of fundamental biological processes. Hence, their identification, annotation and characterization are of great significance, especially in bread wheat, one of the main food sources for humans. The recent availability of 5× coverage Triticum aestivum L. whole-genome sequence provided us with the opportunity to perform a systematic prediction of a complete catalogue of wheat microRNAs. Using an in silico homology-based approach, stem-loop coding regions were derived from two assemblies, constructed from wheat 454 reads. To avoid the presence of pseudo-microRNAs in the final data set, transposable element related stem-loops were eliminated by repeat analysis. Overall, 52 putative wheat microRNAs were predicted, including seven, which have not been previously published. Moreover, with distinct analysis of the two different assemblies, both variety and representation of putative microRNA-coding stem-loops were found to be predominant in the intergenic regions. By searching available expressed sequences and small RNA library databases, expression evidence for 39 (out of 52) putative wheat microRNAs was provided. Expression of three of the predicted microRNAs (miR166, miR396 and miR528) was also comparatively quantified with real-time quantitative reverse transcription PCR. This is the first report on in silico prediction of a whole repertoire of bread wheat microRNAs, supported by the wet-lab validation. PMID:24395439

  2. Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes

    SciTech Connect

    Kuo, Alan; Grigoriev, Igor

    2009-04-17

    Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentous ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.

  3. A Whole Genome Association Study on Meat Palatability in Hanwoo

    PubMed Central

    Hyeong, K.-E.; Lee, Y.-M.; Kim, Y.-S.; Nam, K. C.; Jo, C.; Lee, K.-H.; Lee, J.-E.; Kim, J.-J.

    2014-01-01

    A whole genome association (WGA) study was carried out to find quantitative trait loci (QTL) for sensory evaluation traits in Hanwoo. Carcass samples of 250 Hanwoo steers were collected from National Agricultural Cooperative Livestock Research Institute, Ansung, Gyeonggi province, Korea, between 2011 and 2012 and genotyped with the Affymetrix Bovine Axiom Array 640K single nucleotide polymorphism (SNP) chip. Among the SNPs in the chip, a total of 322,160 SNPs were chosen after quality control tests. After adjusting for the effects of age, slaughter-year-season, and polygenic effects using genome relationship matrix, the corrected phenotypes for the sensory evaluation measurements were regressed on each SNP using a simple linear regression additive based model. A total of 1,631 SNPs were detected for color, aroma, tenderness, juiciness and palatability at 0.1% comparison-wise level. Among the significant SNPs, the best set of 52 SNP markers were chosen using a forward regression procedure at 0.05 level, among which the sets of 8, 14, 11, 10, and 9 SNPs were determined for the respectively sensory evaluation traits. The sets of significant SNPs explained 18% to 31% of phenotypic variance. Three SNPs were pleiotropic, i.e. AX-26703353 and AX-26742891 that were located at 101 and 110 Mb of BTA6, respectively, influencing tenderness, juiciness and palatability, while AX-18624743 at 3 Mb of BTA10 affected tenderness and palatability. Our results suggest that some QTL for sensory measures are segregating in a Hanwoo steer population. Additional WGA studies on fatty acid and nutritional components as well as the sensory panels are in process to characterize genetic architecture of meat quality and palatability in Hanwoo. PMID:25178363

  4. Whole genome expression profiling in chewing-tobacco-associated oral cancers: a pilot study.

    PubMed

    Chakrabarti, Sanjukta; Multani, Shaleen; Dabholkar, Jyoti; Saranath, Dhananjaya

    2015-03-01

    The current study was undertaken with a view to identify differential biomarkers in chewing-tobacco-associated oral cancer tissues in patients of Indian ethnicity. The gene expression profile was analyzed in oral cancer tissues as compared to clinically normal oral buccal mucosa. We examined 30 oral cancer tissues and 27 normal oral tissues with 16 paired samples from contralateral site of the patient and 14 unpaired samples from different oral cancer patients, for whole genome expression using high-throughput IlluminaSentrix Human Ref-8 v2 Expression BeadChip array. The cDNA microarray analysis identified 425 differentially expressed genes with >1.5-fold expression in the oral cancer tissues as compared to normal tissues in the oral cancer patients. Overexpression of 255 genes and downregulation of 170 genes (p < 0.01) were observed. Further, a minimum twofold overexpression was observed in 32 genes and downregulation in 12 genes, in 30-83% of oral cancer patients. Biological pathway analysis using Kyoto Encyclopedia of Genes and Genome Pathway database revealed that the differentially regulated genes were associated with critical biological functions. The biological functions and representative deregulated genes include cell proliferation (AIM2, FAP, TNFSF13B, TMPRSS11A); signal transduction (FOLR2, MME, HTR3B); invasion and metastasis (SPP1, TNFAIP6, EPHB6); differentiation (CLEC4A, ELF5); angiogenesis (CXCL1); apoptosis (GLIPR1, WISP1, DAPL1); and immune responses (CD300A, IFIT2, TREM2); and metabolism (NNMT; ALDH3A1). Besides, several of the genes have been differentially expressed in human cancers including oral cancer. Our data indicated differentially expressed genes in oral cancer tissues and may identify prognostic and therapeutic biomarkers in oral cancers, postvalidation in larger numbers and varied population samples. PMID:25663065

  5. Whole-genome expression analysis reveals genes associated with treatment response to escitalopram in major depression.

    PubMed

    Pettai, Kristi; Milani, Lili; Tammiste, Anu; Võsa, Urmo; Kolde, Raivo; Eller, Triin; Nutt, David; Metspalu, Andres; Maron, Eduard

    2016-09-01

    The reasons for variability in treatment response in major depressive disorder (MDD) are not fully understood, but there is accumulating evidence suggesting that therapeutic outcomes of antidepressants can be influenced by genetic factors. In the present study we applied the microarray Illumina platform for whole genome expression profiling in depressive patients treated with escitalopram medication in order to identify genes underlying response to antidepressant treatment. The initial study sample consisted of 135 outpatients with major depressive disorder (mean age 31.1±11.6 years, 68% females) treated with escitalopram 10-20mg/day for 12 weeks, from which 87 patients (55 females) were included in gene expression analyzing. The gene expression profiles were measured on peripheral blood cells at baseline, at week 4 and at the end of treatment (week 12) using BeadChips Illumina. The fold change was used to demonstrate rate of changes in average gene expressions between studied groups. Statistical analyses were performed using the false discovery rate (FDR). The most interesting gene, which showed the predictive effect on treatment outcome by delineating low dose responders and treatment-resistant patients at the beginning of medication, was NLGN2, belonging to a family of neuronal cell surface proteins and involving in synapse formation. In addition, the several gene clusters, related to immune response, signal transduction and neurotrophin pathway, have distinguished responders from non-responders at the week 4 of treatment. After 4 weeks of escitalopram treatment (10mg/day), the YWHAZ gene has showed the highest transcriptional change in responders as compared with non-responders. Finally, at the end of the treatment we noticed that at least three genes (NR2C2, ZNF641, FKBP1A) have been strongly associated with resistance to escitalopram. Thus the results of this study support that exploration of peripheral gene expression is a useful tool in the further

  6. Next-Generation Whole-Genome Sequencing of Eight Strains of Bacillus cereus, Isolated from Food

    PubMed Central

    Krawczyk, Antonina O.; de Jong, Anne; Eijlander, Robyn T.; Berendsen, Erwin M.; Holsappel, Siger; Wells-Bennik, Marjon H. J.

    2015-01-01

    Bacillus cereus can contaminate food and cause emetic and diarrheal foodborne illness. Here, we report whole-genome sequences of eight strains of B. cereus, isolated from different food sources. PMID:26679589

  7. New perspectives on microbial community distortion after whole-genome amplification

    EPA Science Inventory

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the e...

  8. TCGA's Pan-Cancer Efforts and Expansion to Include Whole Genome Sequence - TCGA

    Cancer.gov

    Carolyn Hutter, Ph.D., Program Director of NHGRI's Division of Genomic Medicine, discusses the expansion of TCGA's Pan-Cancer efforts to include the Pan-Cancer Analysis of Whole Genomes (PAWG) project.

  9. High-resolution Whole-Genome Analysis of Skull Base Chordomas Implicates FHIT Loss in Chordoma Pathogenesis12

    PubMed Central

    Diaz, Roberto Jose; Guduk, Mustafa; Romagnuolo, Rocco; Smith, Christian A; Northcott, Paul; Shih, David; Berisha, Fitim; Flanagan, Adrienne; Munoz, David G; Cusimano, Michael D; Pamir, M Necmettin; Rutka, James T

    2012-01-01

    Chordoma is a rare tumor arising in the sacrum, clivus, or vertebrae. It is often not completely resectable and shows a high incidence of recurrence and progression with shortened patient survival and impaired quality of life. Chemotherapeutic options are limited to investigational therapies at present. Therefore, adjuvant therapy for control of tumor recurrence and progression is of great interest, especially in skull base lesions where complete tumor resection is often not possible because of the proximity of cranial nerves. To understand the extent of genetic instability and associated chromosomal and gene losses or gains in skull base chordoma, we undertook whole-genome single-nucleotide polymorphism microarray analysis of flash frozen surgical chordoma specimens, 21 from the clivus and 1 from C1 to C2 vertebrae. We confirm the presence of a deletion at 9p involving CDKN2A, CDKN2B, and MTAP but at a much lower rate (22%) than previously reported for sacral chordoma. At a similar frequency (21%), we found aneuploidy of chromosome 3. Tissue microarray immunohistochemistry demonstrated absent or reduced fragile histidine triad (FHIT) protein expression in 98% of sacral chordomas and 67%of skull base chordomas. Our data suggest that chromosome 3 aneuploidy and epigenetic regulation of FHIT contribute to loss of the FHIT tumor suppressor in chordoma. The finding that FHIT is lost in a majority of chordomas provides new insight into chordoma pathogenesis and points to a potential new therapeutic target for this challenging neoplasm. PMID:23019410

  10. High-resolution whole-genome analysis of skull base chordomas implicates FHIT loss in chordoma pathogenesis.

    PubMed

    Diaz, Roberto Jose; Guduk, Mustafa; Romagnuolo, Rocco; Smith, Christian A; Northcott, Paul; Shih, David; Berisha, Fitim; Flanagan, Adrienne; Munoz, David G; Cusimano, Michael D; Pamir, M Necmettin; Rutka, James T

    2012-09-01

    Chordoma is a rare tumor arising in the sacrum, clivus, or vertebrae. It is often not completely resectable and shows a high incidence of recurrence and progression with shortened patient survival and impaired quality of life. Chemotherapeutic options are limited to investigational therapies at present. Therefore, adjuvant therapy for control of tumor recurrence and progression is of great interest, especially in skull base lesions where complete tumor resection is often not possible because of the proximity of cranial nerves. To understand the extent of genetic instability and associated chromosomal and gene losses or gains in skull base chordoma, we undertook whole-genome single-nucleotide polymorphism microarray analysis of flash frozen surgical chordoma specimens, 21 from the clivus and 1 from C1 to C2 vertebrae. We confirm the presence of a deletion at 9p involving CDKN2A, CDKN2B, and MTAP but at a much lower rate (22%) than previously reported for sacral chordoma. At a similar frequency (21%), we found aneuploidy of chromosome 3. Tissue microarray immunohistochemistry demonstrated absent or reduced fragile histidine triad (FHIT) protein expression in 98% of sacral chordomas and 67%of skull base chordomas. Our data suggest that chromosome 3 aneuploidy and epigenetic regulation of FHIT contribute to loss of the FHIT tumor suppressor in chordoma. The finding that FHIT is lost in a majority of chordomas provides new insight into chordoma pathogenesis and points to a potential new therapeutic target for this challenging neoplasm. PMID:23019410

  11. Whole Genome Sequencing as a Genetic Test for Autism Spectrum Disorder: From Bench to Bedside and then Back Again

    PubMed Central

    Szego, Michael J.; Zawati, Ma’n H.

    2016-01-01

    Autism spectrum disorder (ASD) is characterized by repetitive patterns of behaviour and impairments in social interactions and communication abilities. Although ASD is a heterogeneous disorder, it is a highly genetic condition for which genetic testing is routinely performed. Microarray analysis is currently the standard of care genetic test for ASD, however whole genome sequencing offers several key advantages and will likely replace microarrays as a frontline genetic test in the near future. The 2nd Consultation on Translation of Genomic Advances into Health Applications took place in the spring of 2014 to broadly explore the current and potential impacts of genomic advances in supporting personalized and family-centered care for autism and related developmental conditions. In anticipation of WGS becoming a standard of care test, we examine the policy landscape and highlight the lack of consistency among guidelines regarding what genomic information should be returned to patients and their families. We also discuss the need to create the infrastructure to share clinical WGS data with researchers in a systematic and ethically defensible manner. PMID:27274747

  12. Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture.

    PubMed

    Seth-Smith, Helena M B; Harris, Simon R; Skilton, Rachel J; Radebe, Frans M; Golparian, Daniel; Shipitsyna, Elena; Duy, Pham Thanh; Scott, Paul; Cutcliffe, Lesley T; O'Neill, Colette; Parmar, Surendra; Pitt, Rachel; Baker, Stephen; Ison, Catherine A; Marsh, Peter; Jalal, Hamid; Lewis, David A; Unemo, Magnus; Clarke, Ian N; Parkhill, Julian; Thomson, Nicholas R

    2013-05-01

    The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of millions of people worldwide. Recombination is widespread within the genome of C. trachomatis, thus whole-genome sequencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed Illumina sequencing, we have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern. PMID:23525359

  13. Whole-genome sequences of Chlamydia trachomatis directly from clinical samples without culture

    PubMed Central

    Seth-Smith, Helena M.B.; Harris, Simon R.; Skilton, Rachel J.; Radebe, Frans M.; Golparian, Daniel; Shipitsyna, Elena; Duy, Pham Thanh; Scott, Paul; Cutcliffe, Lesley T.; O’Neill, Colette; Parmar, Surendra; Pitt, Rachel; Baker, Stephen; Ison, Catherine A.; Marsh, Peter; Jalal, Hamid; Lewis, David A.; Unemo, Magnus; Clarke, Ian N.; Parkhill, Julian; Thomson, Nicholas R.

    2013-01-01

    The use of whole-genome sequencing as a tool for the study of infectious bacteria is of growing clinical interest. Chlamydia trachomatis is responsible for sexually transmitted infections and the blinding disease trachoma, which affect hundreds of millions of people worldwide. Recombination is widespread within the genome of C. trachomatis, thus whole-genome sequencing is necessary to understand the evolution, diversity, and epidemiology of this pathogen. Culture of C. trachomatis has, until now, been a prerequisite to obtain DNA for whole-genome sequencing; however, as C. trachomatis is an obligate intracellular pathogen, this procedure is technically demanding and time consuming. Discarded clinical samples represent a large resource for sequencing the genomes of pathogens, yet clinical swabs frequently contain very low levels of C. trachomatis DNA and large amounts of contaminating microbial and human DNA. To determine whether it is possible to obtain whole-genome sequences from bacteria without the need for culture, we have devised an approach that combines immunomagnetic separation (IMS) for targeted bacterial enrichment with multiple displacement amplification (MDA) for whole-genome amplification. Using IMS-MDA in conjunction with high-throughput multiplexed Illumina sequencing, we have produced the first whole bacterial genome sequences direct from clinical samples. We also show that this method can be used to generate genome data from nonviable archived samples. This method will prove a useful tool in answering questions relating to the biology of many difficult-to-culture or fastidious bacteria of clinical concern. PMID:23525359

  14. Whole-genome shotgun optical mapping of Rhodobacter sphaeroides strain 2.4. 1 and its use for whole-genome shotgun sequence assembly

    SciTech Connect

    Shou, S.; Kvikstad, E.; Kile, A.; Severin, J.; Forrest, D.; Runnheim, R.; Churas, C.; Hickman, J. W.; Mackenzie, C.; Choudhary, M.; Donohue, T.; Kaplan, S.; Schwartz, D. C.

    2003-09-01

    Rhodobacter sphaeroides 2.4.1 is a facultative photoheterotrophic bacterium with tremendous metabolic diversity, which has significantly contributed to our understanding of the molecular genetics of photosynthesis, photoheterotrophy, nitrogen fixation, hydrogen metabolism, carbon dioxide fixation, taxis, and tetrapyrrole biosynthesis. To further understand this remarkable bacterium, and to accelerate an ongoing sequencing project, two whole-genome restriction maps (EcoRI and HindIII) of R. sphaeroides strain 2.4.1 were constructed using shotgun optical mapping. The approach directly mapped genomic DNA by the random mapping of single molecules. The two maps were used to facilitate sequence assembly by providing an optical scaffold for high-resolution alignment and verification of sequence contigs. Our results show that such maps facilitated the closure of sequence gaps by the early detection of nascent sequence contigs during the course of the whole-genome shotgun sequencing process.

  15. What can whole genome expression data tell us about the ecology and evolution of personality?

    PubMed Central

    Bell, Alison M.; Aubin-Horth, Nadia

    2010-01-01

    Consistent individual differences in behaviour, aka personality, pose several evolutionary questions. For example, it is difficult to explain within-individual consistency in behaviour because behavioural plasticity is often advantageous. In addition, selection erodes heritable behavioural variation that is related to fitness, therefore we wish to know the mechanisms that can maintain between-individual variation in behaviour. In this paper, we argue that whole genome expression data can reveal new insights into the proximate mechanisms underlying personality, as well as its evolutionary consequences. After introducing the basics of whole genome expression analysis, we show how whole genome expression data can be used to understand whether behaviours in different contexts are affected by the same molecular mechanisms. We suggest strategies for using the power of genomics to understand what maintains behavioural variation, to study the evolution of behavioural correlations and to compare personality traits across diverse organisms. PMID:21078652

  16. Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

    PubMed

    Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

    2016-08-15

    Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. PMID:27006240

  17. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    PubMed Central

    Casjens, Sherwood R.; Mongodin, Emmanuel F.; Qiu, Wei-Gang; Dunn, John J.; Luft, Benjamin J.; Fraser-Liggett, Claire M.; Schutzer, Steve E.

    2011-01-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04. PMID:22123755

  18. Whole-Genome Sequences of Two Borrelia afzelii and Two Borrelia garinii Lyme Disease Agent Isolates

    SciTech Connect

    Casjens, S.R.; Dunn, J.; Mongodin, E. F.; Qiu, W.-G.; Luft, B. J.; Fraser-Liggett, C. M.; Schutzer, S. E.

    2011-12-01

    Human Lyme disease is commonly caused by several species of spirochetes in the Borrelia genus. In Eurasia these species are largely Borrelia afzelii, B. garinii, B. burgdorferi, and B. bavariensis sp. nov. Whole-genome sequencing is an excellent tool for investigating and understanding the influence of bacterial diversity on the pathogenesis and etiology of Lyme disease. We report here the whole-genome sequences of four isolates from two of the Borrelia species that cause human Lyme disease, B. afzelii isolates ACA-1 and PKo and B. garinii isolates PBr and Far04.

  19. Whole-Genome Transcriptional Analysis of Chemolithoautotrophic Thiosulfate Oxidation by Thiobacillus denitrificans Under Aerobic vs. Denitrifying Conditions

    SciTech Connect

    Beller, H R; Letain, T E; Chakicherla, A; Kane, S R; Legler, T C; Coleman, M A

    2006-04-22

    Thiobacillus denitrificans is one of the few known obligate chemolithoautotrophic bacteria capable of energetically coupling thiosulfate oxidation to denitrification as well as aerobic respiration. As very little is known about the differential expression of genes associated with ke chemolithoautotrophic functions (such as sulfur-compound oxidation and CO2 fixation) under aerobic versus denitrifying conditions, we conducted whole-genome, cDNA microarray studies to explore this topic systematically. The microarrays identified 277 genes (approximately ten percent of the genome) as differentially expressed using Robust Multi-array Average statistical analysis and a 2-fold cutoff. Genes upregulated (ca. 6- to 150-fold) under aerobic conditions included a cluster of genes associated with iron acquisition (e.g., siderophore-related genes), a cluster of cytochrome cbb3 oxidase genes, cbbL and cbbS (encoding the large and small subunits of form I ribulose 1,5-bisphosphate carboxylase/oxygenase, or RubisCO), and multiple molecular chaperone genes. Genes upregulated (ca. 4- to 95-fold) under denitrifying conditions included nar, nir, and nor genes (associated respectively with nitrate reductase, nitrite reductase, and nitric oxide reductase, which catalyze successive steps of denitrification), cbbM (encoding form II RubisCO), and genes involved with sulfur-compound oxidation (including two physically separated but highly similar copies of sulfide:quinone oxidoreductase and of dsrC, associated with dissimilatory sulfite reductase). Among genes associated with denitrification, relative expression levels (i.e., degree of upregulation with nitrate) tended to decrease in the order nar > nir > nor > nos. Reverse transcription, quantitative PCR analysis was used to validate these trends.

  20. Capsular Typing Method for Streptococcus agalactiae Using Whole-Genome Sequence Data

    PubMed Central

    Vaughan, Alison; Jones, Nicola; Turner, Paul; Turner, Claudia; Efstratiou, Androulla; Patel, Darshana; Walker, A. Sarah; Berkley, James A.; Crook, Derrick W.

    2016-01-01

    Group B streptococcus (GBS) capsular serotypes are major determinants of virulence and affect potential vaccine coverage. Here we report a whole-genome-sequencing-based method for GBS serotype assignment. This method shows strong agreement (kappa of 0.92) with conventional methods and increased serotype assignment (100%) to all 10 capsular types. PMID:26962081

  1. CViT: “Chromosome Visualization Tool” – A whole-genome viewer

    Technology Transfer Automated Retrieval System (TEKTRAN)

    CViT (Chromosome Visualization Tool) is a Perl utility for quickly generating images of features on a whole genome at once. It reads GFF3-format data representing chromosomes (linkage groups or pseudomolecules), and features on those chromosomes. It can display features on any chromosomal unit syste...

  2. Draft Whole-Genome Sequence of the Type Strain Bacillus aquimaris TF12T

    PubMed Central

    Hernández-González, Ismael L.

    2016-01-01

    Bacillus aquimaris TF12 is a Gram-positive bacteria isolated from a tidal flat of the Yellow Sea in South Korea. We report the draft whole-genome sequence of Bacillus aquimaris TF12, the type strain of a set of bacteria typically associated with marine habitats and with a potentially high biotechnology value. PMID:27417832

  3. A whole-genome assembly of the domestic cow, Bos taurus

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion b...

  4. Draft Whole-Genome Sequence of Urease-Producing Sporosarcina koreensis

    PubMed Central

    Graw, Michael F.; Nguyen, Hanh

    2016-01-01

    Urease-producing microbes are of significance due to their potential application in biocement production. Sporosarcina koreensis Q1 is a urease-producing bacterium belonging to the phylum Firmicutes. Here, we present the draft whole-genome sequence of S. koreensis Q1, isolated from a barchan sand dune in Qatar. PMID:26988039

  5. Draft Whole-Genome Sequence of Urease-Producing Sporosarcina koreensis.

    PubMed

    Abdul Majid, Sara; Graw, Michael F; Nguyen, Hanh; Hay, Anthony G

    2016-01-01

    Urease-producing microbes are of significance due to their potential application in biocement production. Sporosarcina koreensis Q1 is a urease-producing bacterium belonging to the phylum Firmicutes. Here, we present the draft whole-genome sequence of S. koreensis Q1, isolated from a barchan sand dune in Qatar. PMID:26988039

  6. Draft Whole-Genome Sequence of the Type Strain Bacillus horikoshii DSM 8719.

    PubMed

    Hernández-González, Ismael L; Olmedo-Álvarez, Gabriela

    2016-01-01

    Members of the Bacillus genus have been extensively studied because of their ability to produce enzymes with high biotechnological value. Here, we report the draft of the whole-genome sequence of the type strain Bacillus horikoshii DSM 8719, an alkali-tolerant strain. PMID:27417833

  7. Draft Whole-Genome Sequence of the Type Strain Bacillus aquimaris TF12T.

    PubMed

    Hernández-González, Ismael L; Olmedo-Álvarez, Gabriela

    2016-01-01

    Bacillus aquimaris TF12 is a Gram-positive bacteria isolated from a tidal flat of the Yellow Sea in South Korea. We report the draft whole-genome sequence of Bacillus aquimaris TF12, the type strain of a set of bacteria typically associated with marine habitats and with a potentially high biotechnology value. PMID:27417832

  8. Whole Genome Selection Project Involving 2,000 Industry AI Sires

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Whole genome selection (WGS) uses markers spanning the genome to predict genetic merit for economically important traits. WGS may increase the rate of genetic progress through improved accuracy and reduced generation interval especially for traits that cannot be measured on breeding animals. In cont...

  9. Whole-genome resequencing: changing the paradigms of SNP detection, molecular mapping and gene discovery

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The next generation sequencing (NGS) technologies have opened a wealth of opportunities for plant breeding and genomics research, and changed the paradigms of marker detection, genotyping, and gene discovery. Abundant genomic resources have been generated using a whole genome resequencing (WGR) str...

  10. Whole-Genome Sequence and Classification of 11 Endophytic Bacteria from Poison Ivy (Toxicodendron radicans)

    PubMed Central

    Tran, Phuong N.; Tan, Nicholas E. H.; Lee, Yin Peng; Gan, Han Ming; Polter, Steven J.; Dailey, Lucas K.; Hudson, André O.

    2015-01-01

    Here, we report the whole-genome sequences and annotation of 11 endophytic bacteria from poison ivy (Toxicodendron radicans) vine tissue. Five bacteria belong to the genus Pseudomonas, and six single members from other genera were found present in interior vine tissue of poison ivy. PMID:26586879

  11. Whole-Genome Shotgun Sequencing of a Colonizing Multilocus Sequence Type 17 Streptococcus agalactiae Strain

    PubMed Central

    Singh, Pallavi; Springman, A. Cody; Davies, H. Dele

    2012-01-01

    This report highlights the whole-genome shotgun draft sequence for a Streptococcus agalactiae strain representing multilocus sequence type (ST) 17, isolated from a colonized woman at 8 weeks postpartum. This sequence represents an important addition to the published genomes and will promote comparative genomic studies of S. agalactiae recovered from diverse sources. PMID:23045509

  12. Whole-Genome Sequence of "Candidatus Liberibacter solanacearum" Strain R1 from California.

    PubMed

    Zheng, Z; Clark, N; Keremane, M; Lee, R; Wallis, C; Deng, X; Chen, J

    2014-01-01

    The draft whole-genome sequence of "Candidatus Liberibacter solanacearum" strain R1, isolated from and maintained in tomato plants in California, is reported. The R1 strain has the genome size of 1,204,257 bp, G+C content of 35.3%, 1,101 predicted open reading frames, and 57 RNA genes. PMID:25540355

  13. Whole-Genome Sequence of “Candidatus Liberibacter solanacearum” Strain R1 from California

    PubMed Central

    Zheng, Z.; Clark, N.; Keremane, M.; Lee, R.; Wallis, C.

    2014-01-01

    The draft whole-genome sequence of “Candidatus Liberibacter solanacearum” strain R1, isolated from and maintained in tomato plants in California, is reported. The R1 strain has the genome size of 1,204,257 bp, G+C content of 35.3%, 1,101 predicted open reading frames, and 57 RNA genes. PMID:25540355

  14. Whole-Genome Sequences of Nonencapsulated Haemophilus influenzae Strains Isolated in Italy

    PubMed Central

    Giufrè, Maria; De Chiara, Matteo; Censini, Stefano; Guidotti, Silvia; Torricelli, Giulia; De Angelis, Gabriella; Cardines, Rita; Pizza, Mariagrazia; Muzzi, Alessandro; Soriani, Marco

    2015-01-01

    Haemophilus influenzae is an important human pathogen involved in invasive disease. Here, we report the whole-genome sequences of 11 nonencapsulated H. influenzae (ncHi) strains isolated from both invasive disease and healthy carriers in Italy. This genomic information will enrich our understanding of the molecular basis of ncHi pathogenesis. PMID:25814593

  15. Whole-genome sequence of “Candidatus Liberibacter solanacearum” strain R1 from California

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The draft whole-genome sequence of “Candidatus Liberibacter solanacearum” strain R1, isolated from a tomato plant in California, United States, is reported. The R1 strain genome is 1,204,257 bp in size (G+C content of 35.3%), encoding 1,101 open reading frames and 57 RNA genes....

  16. WIDE-CROSS WHOLE-GENOME RADIATION HYBIRD MAPPING OF THE COTTON (GOSSYPIUM BARBADENSE L.) GENOME

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Whole-genome radiation hybrid mapping has been applied extensively to human and certain animal species but little to plants. We recently demonstrated an alternative mapping approach in cotton (Gossypium hirsutum L.) based on segmentation by 5-krad gamma-irradiation and derivation of wild-cross whol...

  17. Animal selection for whole genome sequencing by quantifying the unique contribution of homozygous haplotypes sequenced

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Major whole genome sequencing projects promise to identify rare and causal variants within livestock species; however, the efficient selection of animals for sequencing remains a major problem within these surveys. The goal of this project was to develop a library of high accuracy genetic variants f...

  18. Whole-Genome Analysis of Quorum-Sensing Burkholderia sp. Strain A9

    PubMed Central

    Chen, Jian Woon; Tee, Kok Keng; Chang, Chien-Yi; Yin, Wai-Fong; Chan, Xin-Yue

    2015-01-01

    Burkholderia spp. rely on N-acyl homoserine lactone as quorum-sensing signal molecules which coordinate their phenotype at the population level. In this work, we present the whole genome of Burkholderia sp. strain A9, which enables the discovery of its N-acyl homoserine lactone synthase gene. PMID:25745000

  19. Draft Whole-Genome Sequence of the Type Strain Bacillus horikoshii DSM 8719

    PubMed Central

    Hernández-González, Ismael L.

    2016-01-01

    Members of the Bacillus genus have been extensively studied because of their ability to produce enzymes with high biotechnological value. Here, we report the draft of the whole-genome sequence of the type strain Bacillus horikoshii DSM 8719, an alkali-tolerant strain. PMID:27417833

  20. Whole-Genome Sequence of Aeromonas hydrophila Strain AH-1 (Serotype O11).

    PubMed

    Forn-Cuní, Gabriel; Tomás, Juan M; Merino, Susana

    2016-01-01

    Aeromonas hydrophila is an emerging pathogen of aquatic and terrestrial animals, including humans. Here, we report the whole-genome sequence of the septicemic A. hydrophila AH-1 strain, belonging to the serotype O11, and the first mesophilic Aeromonas with surface layer (S-layer) to be sequenced. PMID:27587829

  1. Whole-Genome Sequencing of 10 Pseudomonas syringae Strains Representing Different Host Range Spectra

    PubMed Central

    Bartoli, Claudia; Carrere, Sébastien; Lamichhane, Jay Ram; Varvaro, Leonardo

    2015-01-01

    Pseudomonas syringae is a ubiquitous bacterium that readily persists in environmental habitats as a saprophyte and also is responsible for numerous diseases of crops. Here, we report the whole-genome sequences of 10 strains isolated from both woody and herbaceous plants that will contribute to the elucidation of the determinants of their host ranges. PMID:25931602

  2. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity.

    PubMed

    Chan, Kok-Gan; Yin, Wai-Fong; Chan, Xin-Yue

    2016-03-01

    Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000. PMID:26981378

  3. Laboratory-Acquired Infection with Salmonella enterica Serovar Typhimurium Exposed by Whole-Genome Sequencing

    PubMed Central

    Fitzgerald, Stephen F.; DePaulo, Rachel; Kitzul, Rosanne; Daku, Dawn; Levett, Paul N.; Cameron, Andrew D. S.

    2015-01-01

    Despite advances in laboratory design, professional training, and workplace biosafety guidelines, laboratory-acquired infections continue to occur. Effective tools are required to investigate cases and prevent future illness. Here, we demonstrate the value of whole-genome sequencing as a tool for the identification and source attribution of laboratory-acquired salmonellosis. PMID:26511736

  4. Capsular Typing Method for Streptococcus agalactiae Using Whole-Genome Sequence Data.

    PubMed

    Sheppard, Anna E; Vaughan, Alison; Jones, Nicola; Turner, Paul; Turner, Claudia; Efstratiou, Androulla; Patel, Darshana; Walker, A Sarah; Berkley, James A; Crook, Derrick W; Seale, Anna C

    2016-05-01

    Group B streptococcus (GBS) capsular serotypes are major determinants of virulence and affect potential vaccine coverage. Here we report a whole-genome-sequencing-based method for GBS serotype assignment. This method shows strong agreement (kappa of 0.92) with conventional methods and increased serotype assignment (100%) to all 10 capsular types. PMID:26962081

  5. Whole-Genome Sequencing Detection of Ongoing Listeria Contamination at a Restaurant, Rhode Island, USA, 2014

    PubMed Central

    Gosciminski, Michael; Miller, Adam

    2016-01-01

    In November 2014, the Rhode Island Department of Health investigated a cluster of 3 listeriosis cases. Using whole-genome sequencing to support epidemiologic, laboratory, and environmental investigations, the department identified 1 restaurant as the likely source of the outbreak and also linked the establishment to a listeriosis case that occurred in 2013. PMID:27434089

  6. Whole-Genome Sequencing of Salmonella enterica subsp. enterica Serovar Ouakam Isolated from Ground Turkey

    PubMed Central

    Marasini, Daya; Abo-Shama, Usama H.

    2016-01-01

    In this report, we announce the first whole-genome sequencing of Salmonella enterica subsp. enterica serovar Ouakam strain GNT-01, isolated from ground turkey retail meat. The strain has a chromosome of 5,088,451 bp long, with a G+C content of 52.3%, and a plasmid of 109,715 bp. PMID:26798110

  7. Whole-Genome Sequencing Detection of Ongoing Listeria Contamination at a Restaurant, Rhode Island, USA, 2014.

    PubMed

    Barkley, Jonathan S; Gosciminski, Michael; Miller, Adam

    2016-08-01

    In November 2014, the Rhode Island Department of Health investigated a cluster of 3 listeriosis cases. Using whole-genome sequencing to support epidemiologic, laboratory, and environmental investigations, the department identified 1 restaurant as the likely source of the outbreak and also linked the establishment to a listeriosis case that occurred in 2013. PMID:27434089

  8. Spiked GBS: A unified, open platform for single marker genotyping and whole-genome profiling

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In plant breeding, there are two primary applications for DNA markers in selection: 1) selection of known genes using a single marker assay (marker-assisted selection; MAS); and 2) whole-genome profiling and prediction (genomic selection; GS). Typically, marker platforms have addressed only one of t...

  9. Whole-Genome Sequence of Aeromonas hydrophila Strain AH-1 (Serotype O11)

    PubMed Central

    Forn-Cuní, Gabriel; Tomás, Juan M.

    2016-01-01

    Aeromonas hydrophila is an emerging pathogen of aquatic and terrestrial animals, including humans. Here, we report the whole-genome sequence of the septicemic A. hydrophila AH-1 strain, belonging to the serotype O11, and the first mesophilic Aeromonas with surface layer (S-layer) to be sequenced. PMID:27587829

  10. Software tool for the analysis and visualization of whole genome alignments

    Energy Science and Technology Software Center (ESTSC)

    2011-08-01

    GenomeVISTA is a tool which performs and displays pairwise and multiple whole genome DNA alignments. The tools provides a graphical user interface by which users can navigate alignments and multiple levels of resolution and get imformation about individual aligned regions. Users can load their own sequences into GenomeVISTA or view pre-computed alignments for genomes in the VISTA database.

  11. AMY-tree: an algorithm to use whole genome SNP calling for Y chromosomal phylogenetic applications

    PubMed Central

    2013-01-01

    Background Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years. These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Moreover, the exponential growth of known Y chromosomal lineages will require an automatic determination of the phylogenetic position of an individual based on whole genome SNP calling data and an up to date Y chromosomal tree. Results We present an automated approach, ‘AMY-tree’, which is able to determine the phylogenetic position of a Y chromosome using a whole genome SNP profile, independently from the NGS platform and SNP calling program, whereby mistakes in the SNP calling or phylogenetic Y chromosomal tree are taken into account. Moreover, AMY-tree indicates ambiguities within the present phylogenetic tree and points out new Y-SNPs which may be phylogenetically relevant. The AMY-tree software package was validated successfully on 118 whole genome SNP profiles of 109 males with different origins. Moreover, support was found for an unknown recurrent mutation, wrong reported mutation conversions and a large amount of new interesting Y-SNPs. Conclusions Therefore, AMY-tree is a useful tool to determine the Y lineage of a sample based on SNP calling, to identify Y-SNPs with yet unknown phylogenetic position and to optimize the Y chromosomal phylogenetic tree in the future. AMY-tree will not add lineages to the existing phylogenetic tree of the Y-chromosome but it is the first step to analyse whole genome SNP profiles in a phylogenetic framework. PMID:23405914

  12. A Whole-Genome Analysis Framework for Effective Identification of Pathogenic Regulatory Variants in Mendelian Disease.

    PubMed

    Smedley, Damian; Schubach, Max; Jacobsen, Julius O B; Köhler, Sebastian; Zemojtel, Tomasz; Spielmann, Malte; Jäger, Marten; Hochheiser, Harry; Washington, Nicole L; McMurry, Julie A; Haendel, Melissa A; Mungall, Christopher J; Lewis, Suzanna E; Groza, Tudor; Valentini, Giorgio; Robinson, Peter N

    2016-09-01

    The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease. PMID:27569544

  13. HIV Whole-Genome Sequencing Now: Answering Still-Open Questions.

    PubMed

    Metzner, Karin J

    2016-04-01

    Diversity, evolution, and epidemiology of HIV are directly relevant to HIV transmission and pathogenesis; hence, they play a key role in antiretroviral treatment and vaccine design. Global HIV whole-genome sequencing would provide a treasure chest of data to answer many questions still open in these fields. An article by Berg et al. in this issue of theJournal of Clinical Microbiologydescribes a universal strategy to amplify and sequence heterogeneous HIV whole genomes (M. G. Berg, J. Yamaguchi, E. Alessandri-Gradt, R. W. Tell, J.-C. Plantier, and C. A. Brennan, J Clin Microbiol 54:868-882, 2016,http://dx.doi.org/10.1128/JCM.02479-15). PMID:26791367

  14. Downsizing genomic medicine: approaching the ethical complexity of whole-genome sequencing by starting small.

    PubMed

    Sharp, Richard R

    2011-03-01

    As we look to a time when whole-genome sequencing is integrated into patient care, it is possible to anticipate a number of ethical challenges that will need to be addressed. The most intractable of these concern informed consent and the responsible management of very large amounts of genetic information. Given the range of possible findings, it remains unclear to what extent it will be possible to obtain meaningful patient consent to genomic testing. Equally unclear is how clinicians will disseminate the enormous volume of genetic information produced by whole-genome sequencing. Toward developing practical strategies for managing these ethical challenges, we propose a research agenda that approaches multiplexed forms of clinical genetic testing as natural laboratories in which to develop best practices for managing the ethical complexities of genomic medicine. PMID:21311340

  15. Whole genome sequencing of Mycobacterium tuberculosis SB24 isolated from Sabah, Malaysia.

    PubMed

    Philip, Noraini; Rodrigues, Kenneth Francis; William, Timothy; John, Daisy Vanitha

    2016-09-01

    Mycobacterium tuberculosis (M. tuberculosis) is the causative agent of tuberculosis (TB) that causes millions of death every year. We have sequenced the genome of M. tuberculosis isolated from cerebrospinal fluid (CSF) of a patient diagnosed with tuberculous meningitis (TBM). The isolated strain was referred as M. tuberculosis SB24. Genomic DNA of the M. tuberculosis SB24 was extracted and subjected to whole genome sequencing using PacBio platform. The draft genome size of M. tuberculosis SB24 was determined to be 4,452,489 bp with a G + C content of 65.6%. The whole genome shotgun project has been deposited in NCBI SRA under the accession number SRP076503. PMID:27556011

  16. Whole genome multilocus sequence typing as an epidemiologic tool for Yersinia pestis.

    PubMed

    Kingry, Luke C; Rowe, Lori A; Respicio-Kingry, Laurel B; Beard, Charles B; Schriefer, Martin E; Petersen, Jeannine M

    2016-04-01

    Human plague is a severe and often fatal zoonotic disease caused by Yersinia pestis. For public health investigations of human cases, nonintensive whole genome molecular typing tools, capable of defining epidemiologic relationships, are advantageous. Whole genome multilocus sequence typing (wgMLST) is a recently developed methodology that simplifies genomic analyses by transforming millions of base pairs of sequence into character data for each gene. We sequenced 13 US Y. pestis isolates with known epidemiologic relationships. Sequences were assembled de novo, and multilocus sequence typing alleles were assigned by comparison against 3979 open reading frames from the reference strain CO92. Allele-based cluster analysis accurately grouped the 13 isolates, as well as 9 publicly available Y. pestis isolates, by their epidemiologic relationships. Our findings indicate wgMLST is a simplified, sensitive, and scalable tool for epidemiologic analysis of Y. pestis strains. PMID:26778487

  17. A whole-genome, radiation hybrid mapping resource of hexaploid wheat.

    PubMed

    Tiwari, Vijay K; Heesacker, Adam; Riera-Lizarazu, Oscar; Gunn, Hilary; Wang, Shichen; Wang, Yi; Gu, Young Q; Paux, Etienne; Koo, Dal-Hoe; Kumar, Ajay; Luo, Ming-Cheng; Lazo, Gerard; Zemetra, Robert; Akhunov, Eduard; Friebe, Bernd; Poland, Jesse; Gill, Bikram S; Kianian, Shahryar; Leonard, Jeffrey M

    2016-04-01

    Generating a contiguous, ordered reference sequence of a complex genome such as hexaploid wheat (2n = 6x = 42; approximately 17 GB) is a challenging task due to its large, highly repetitive, and allopolyploid genome. In wheat, ordering of whole-genome or hierarchical shotgun sequencing contigs is primarily based on recombination and comparative genomics-based approaches. However, comparative genomics approaches are limited to syntenic inference and recombination is suppressed within the pericentromeric regions of wheat chromosomes, thus, precise ordering of physical maps and sequenced contigs across the whole-genome using these approaches is nearly impossible. We developed a whole-genome radiation hybrid (WGRH) resource and tested it by genotyping a set of 115 randomly selected lines on a high-density single nucleotide polymorphism (SNP) array. At the whole-genome level, 26 299 SNP markers were mapped on the RH panel and provided an average mapping resolution of approximately 248 Kb/cR1500 with a total map length of 6866 cR1500 . The 7296 unique mapping bins provided a five- to eight-fold higher resolution than genetic maps used in similar studies. Most strikingly, the RH map had uniform bin resolution across the entire chromosome(s), including pericentromeric regions. Our research provides a valuable and low-cost resource for anchoring and ordering sequenced BAC and next generation sequencing (NGS) contigs. The WGRH developed for reference wheat line Chinese Spring (CS-WGRH), will be useful for anchoring and ordering sequenced BAC and NGS based contigs for assembling a high-quality, reference sequence of hexaploid wheat. Additionally, this study provides an excellent model for developing similar resources for other polyploid species. PMID:26945524

  18. Whole-Genome Sequence of Chlamydia gallinacea Type Strain 08-1274/3

    PubMed Central

    Hölzer, Martin; Laroucau, Karine; Creasy, Heather Huot; Ott, Sandra; Vorimore, Fabien; Bavoil, Patrik M.; Marz, Manja

    2016-01-01

    The recently introduced bacterial species Chlamydia gallinacea is known to occur in domestic poultry and other birds. Its potential as an avian pathogen and zoonotic agent is under investigation. The whole-genome sequence of its type strain, 08-1274/3, consists of a 1,059,583-bp chromosome with 914 protein-coding sequences (CDSs) and a plasmid (p1274) comprising 7,619 bp with 9 CDSs. PMID:27445388

  19. A review of methods for subtyping Yersinia pestis: From phenotypes to whole genome sequencing.

    PubMed

    Vogler, Amy J; Keim, Paul; Wagner, David M

    2016-01-01

    Numerous subtyping methods have been applied to Yersinia pestis with varying success. Here, we review the various subtyping methods that have been applied to Y. pestis and their capacity for answering questions regarding the population genetics, phylogeography, and molecular epidemiology of this important human pathogen. Methods are evaluated in terms of expense, difficulty, transferability among laboratories, discriminatory power, usefulness for different study questions, and current applicability in light of the advent of whole genome sequencing. PMID:26518910

  20. Comparison of Whole-Genome Sequencing and Molecular-Epidemiological Techniques for Clostridium difficile Strain Typing.

    PubMed

    Dominguez, Samuel R; Anderson, Lydia J; Kotter, Cassandra V; Littlehorn, Cynthia A; Arms, Lesley E; Dowell, Elaine; Todd, James K; Frank, Daniel N

    2016-09-01

    We analyzed in parallel 27 pediatric Clostridium difficile isolates by repetitive sequence-based polymerase chain reaction (RepPCR), pulsed-field gel electrophoresis (PFGE), and whole-genome next-generation sequencing. Next-generation sequencing distinguished 3 groups of isolates that were indistinguishable by RepPCR and 1 isolate that clustered in the same PFGE group as other isolates. PMID:26407257

  1. Detection and phylogenetic assessment of conserved synteny derived from whole genome duplications.

    PubMed

    Kuraku, Shigehiro; Meyer, Axel

    2012-01-01

    Identification of intragenomic conservation of gene compositions in multiple chromosomal segments led to evidence of whole genome (WGDs) duplications. The process by which WGDs have been maintained and decayed provides us with clues for understanding how the genome evolves. In this chapter, we summarize current understanding of phylogenetic distribution and evolutionary impact of WGDs, introduce basic procedures to detect conserved synteny, and discuss typical pitfalls, as well as biological insights. PMID:22407717

  2. Whole-Genome Sequence of Rummeliibacillus stabekisii Strain PP9 Isolated from Antarctic Soil.

    PubMed

    da Mota, Fábio Faria; Vollú, Renata Estebanez; Jurelevicius, Diogo; Seldin, Lucy

    2016-01-01

    The whole genome of Rummeliibacillus stabekisii PP9, isolated from a soil sample from Antarctica, consists of a circular chromosome of 3,412,092 bp and a circular plasmid of 8,647 bp, with 3,244 protein-coding genes, 12 copies of the 16S-23S-5S rRNA operon, 101 tRNA genes, and 6 noncoding RNAs (ncRNAs). PMID:27231360

  3. Whole-Genome Sequence of Rummeliibacillus stabekisii Strain PP9 Isolated from Antarctic Soil

    PubMed Central

    da Mota, Fábio Faria; Vollú, Renata Estebanez; Jurelevicius, Diogo

    2016-01-01

    The whole genome of Rummeliibacillus stabekisii PP9, isolated from a soil sample from Antarctica, consists of a circular chromosome of 3,412,092 bp and a circular plasmid of 8,647 bp, with 3,244 protein-coding genes, 12 copies of the 16S-23S-5S rRNA operon, 101 tRNA genes, and 6 noncoding RNAs (ncRNAs). PMID:27231360

  4. Whole-Genome Sequence of Chlamydia gallinacea Type Strain 08-1274/3.

    PubMed

    Hölzer, Martin; Laroucau, Karine; Creasy, Heather Huot; Ott, Sandra; Vorimore, Fabien; Bavoil, Patrik M; Marz, Manja; Sachse, Konrad

    2016-01-01

    The recently introduced bacterial species Chlamydia gallinacea is known to occur in domestic poultry and other birds. Its potential as an avian pathogen and zoonotic agent is under investigation. The whole-genome sequence of its type strain, 08-1274/3, consists of a 1,059,583-bp chromosome with 914 protein-coding sequences (CDSs) and a plasmid (p1274) comprising 7,619 bp with 9 CDSs. PMID:27445388

  5. Inferring Demography from Runs of Homozygosity in Whole-Genome Sequence, with Correction for Sequence Errors

    PubMed Central

    MacLeod, Iona M.; Larkin, Denis M.; Lewin, Harris A.; Hayes, Ben J.; Goddard, Mike E.

    2013-01-01

    Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493–496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals. PMID:23842528

  6. Rapid Identification of Potential Drugs for Diabetic Nephropathy Using Whole-Genome Expression Profiles of Glomeruli

    PubMed Central

    Shi, Jingsong; Jiang, Song; Qiu, Dandan; Le, Weibo; Wang, Xiao; Lu, Yinhui; Liu, Zhihong

    2016-01-01

    Objective. To investigate potential drugs for diabetic nephropathy (DN) using whole-genome expression profiles and the Connectivity Map (CMAP). Methodology. Eighteen Chinese Han DN patients and six normal controls were included in this study. Whole-genome expression profiles of microdissected glomeruli were measured using the Affymetrix human U133 plus 2.0 chip. Differentially expressed genes (DEGs) between late stage and early stage DN samples and the CMAP database were used to identify potential drugs for DN using bioinformatics methods. Results. (1) A total of 1065 DEGs (FDR < 0.05 and fold change > 1.5) were found in late stage DN patients compared with early stage DN patients. (2) Piperlongumine, 15d-PGJ2 (15-delta prostaglandin J2), vorinostat, and trichostatin A were predicted to be the most promising potential drugs for DN, acting as NF-κB inhibitors, histone deacetylase inhibitors (HDACIs), PI3K pathway inhibitors, or PPARγ agonists, respectively. Conclusion. Using whole-genome expression profiles and the CMAP database, we rapidly predicted potential DN drugs, and therapeutic potential was confirmed by previously published studies. Animal experiments and clinical trials are needed to confirm both the safety and efficacy of these drugs in the treatment of DN. PMID:27069916

  7. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat.

    PubMed

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  8. Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials

    PubMed Central

    Broomall, Stacey M.; Ait Ichou, Mohamed; Krepps, Michael D.; Johnsky, Lauren A.; Karavis, Mark A.; Hubbard, Kyle S.; Insalaco, Joseph M.; Betters, Janet L.; Redmond, Brady W.; Rivers, Bryan A.; Liem, Alvin T.; Hill, Jessica M.; Fochler, Edward T.; Roth, Pierce A.; Rosenzweig, C. Nicole; Skowronski, Evan W.

    2015-01-01

    Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. PMID:26567301

  9. Whole-genome sequencing and the clinician: a tale of two cities

    PubMed Central

    Foley, A Reghan; Pitceathly, Robert D S; He, Jie; Kim, Jihee; Pearson, Nathaniel M; Muntoni, Francesco; Hanna, Michael G

    2014-01-01

    Background Clinicians are faced with unprecedented opportunities to identify the genetic aetiologies of hitherto molecularly uncharacterised conditions via the use of high-throughput sequencing. Access to genomic technology and resultant data is no longer limited to clinicians, geneticists and bioinformaticians, however; ongoing commercialisation gives patients themselves ever greater access to sequencing services. We report an increasingly common medical scenario by describing two neuromuscular patients—a mother and adult son—whose consumer access to whole-genome sequencing affected their diagnostic journey. Results Whole-genome sequencing initiated by the patients—to predict their risk of common diseases—revealed that they share several variants potentially relevant to neuromuscular diseases, which initially sidetracked diagnostic efforts. Since eventual clinical reassessment, including muscle imaging, pointed towards Bethlem myopathy, a collagen VI-related myopathy, we pursued Sanger sequencing of COL6A1, COL6A2 and COL6A3. This targeted approach revealed a heterozygous causative variant in COL6A3 (c.6365G>T (p.Gly2122Val)), shared by both individuals, that was not flagged by the interpretation of the whole-genome sequencing data. Conclusions This report highlights the essential interplay of clinical and genomic expertise in realising the potential of high-throughput sequencing. In an era when patients themselves may bring their own data to the table, definitively identifying clinically significant genomic variants will require close collaboration among clinicians, geneticists and bioinformaticians. PMID:24706943

  10. FIGG: Simulating populations of whole genome sequences for heterogeneous data analyses

    PubMed Central

    2014-01-01

    Background High-throughput sequencing has become one of the primary tools for investigation of the molecular basis of disease. The increasing use of sequencing in investigations that aim to understand both individuals and populations is challenging our ability to develop analysis tools that scale with the data. This issue is of particular concern in studies that exhibit a wide degree of heterogeneity or deviation from the standard reference genome. The advent of population scale sequencing studies requires analysis tools that are developed and tested against matching quantities of heterogeneous data. Results We developed a large-scale whole genome simulation tool, FIGG, which generates large numbers of whole genomes with known sequence characteristics based on direct sampling of experimentally known or theorized variations. For normal variations we used publicly available data to determine the frequency of different mutation classes across the genome. FIGG then uses this information as a background to generate new sequences from a parent sequence with matching frequencies, but different actual mutations. The background can be normal variations, known disease variations, or a theoretical frequency distribution of variations. Conclusion In order to enable the creation of large numbers of genomes, FIGG generates simulated sequences from known genomic variation and iteratively mutates each genome separately. The result is multiple whole genome sequences with unique variations that can primarily be used to provide different reference genomes, model heterogeneous populations, and can offer a standard test environment for new analysis algorithms or bioinformatics tools. PMID:24885193

  11. Whole-Genome Sequencing Reveals Genetic Variation in the Asian House Rat

    PubMed Central

    Teng, Huajing; Zhang, Yaohua; Shi, Chengmin; Mao, Fengbiao; Hou, Lingling; Guo, Hongling; Sun, Zhongsheng; Zhang, Jianxu

    2016-01-01

    Whole-genome sequencing of wild-derived rat species can provide novel genomic resources, which may help decipher the genetics underlying complex phenotypes. As a notorious pest, reservoir of human pathogens, and colonizer, the Asian house rat, Rattus tanezumi, is successfully adapted to its habitat. However, little is known regarding genetic variation in this species. In this study, we identified over 41,000,000 single-nucleotide polymorphisms, plus insertions and deletions, through whole-genome sequencing and bioinformatics analyses. Moreover, we identified over 12,000 structural variants, including 143 chromosomal inversions. Further functional analyses revealed several fixed nonsense mutations associated with infection and immunity-related adaptations, and a number of fixed missense mutations that may be related to anticoagulant resistance. A genome-wide scan for loci under selection identified various genes related to neural activity. Our whole-genome sequencing data provide a genomic resource for future genetic studies of the Asian house rat species and have the potential to facilitate understanding of the molecular adaptations of rats to their ecological niches. PMID:27172215

  12. Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials.

    PubMed

    Broomall, Stacey M; Ait Ichou, Mohamed; Krepps, Michael D; Johnsky, Lauren A; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; Betters, Janet L; Redmond, Brady W; Rivers, Bryan A; Liem, Alvin T; Hill, Jessica M; Fochler, Edward T; Roth, Pierce A; Rosenzweig, C Nicole; Skowronski, Evan W; Gibbons, Henry S

    2016-01-01

    Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. PMID:26567301

  13. Whole Genome Mapping with Feature Sets from High-Throughput Sequencing Data.

    PubMed

    Pan, Yonglong; Wang, Xiaoming; Liu, Lin; Wang, Hao; Luo, Meizhong

    2016-01-01

    A good physical map is essential to guide sequence assembly in de novo whole genome sequencing, especially when sequences are produced by high-throughput sequencing such as next-generation-sequencing (NGS) technology. We here present a novel method, Feature sets-based Genome Mapping (FGM). With FGM, physical map and draft whole genome sequences can be generated, anchored and integrated using the same data set of NGS sequences, independent of restriction digestion. Method model was created and parameters were inspected by simulations using the Arabidopsis genome sequence. In the simulations, when ~4.8X genome BAC library including 4,096 clones was used to sequence the whole genome, ~90% of clones were successfully connected to physical contigs, and 91.58% of genome sequences were mapped and connected to chromosomes. This method was experimentally verified using the existing physical map and genome sequence of rice. Of 4,064 clones covering 115 Mb sequence selected from ~3 tiles of 3 chromosomes of a rice draft physical map, 3,364 clones were reconstructed into physical contigs and 98 Mb sequences were integrated into the 3 chromosomes. The physical map-integrated draft genome sequences can provide permanent frameworks for eventually obtaining high-quality reference sequences by targeted sequencing, gap filling and combining other sequences. PMID:27611682

  14. Added value of whole-genome sequencing for management of highly drug-resistant TB

    PubMed Central

    Outhred, Alexander C.; Jelfs, Peter; Suliman, Basel; Hill-Cawthorne, Grant A.; Crawford, Archibald B. H.; Marais, Ben J.; Sintchenko, Vitali

    2015-01-01

    Objectives Phenotypic drug susceptibility testing (DST) for Mycobacterium tuberculosis takes several weeks to complete and second-line DST is often poorly reproducible, potentially leading to compromised clinical decisions. Following a fatal case of XDR TB, we investigated the potential benefit of using whole-genome sequencing to generate an in silico drug susceptibility profile. Methods The clinical course of the patient was reviewed, assessing the times at which phenotypic DST data became available and changes made to the therapeutic regimen. Whole-genome sequencing was performed on the earliest available isolate and variants associated with drug resistance were identified. Results The final DST report, including second-line drugs, was issued 10 weeks after patient presentation and 8 weeks after initial growth of M. tuberculosis. In the interim, the patient may have received a compromised regimen that had the potential to select for further drug resistance. The in silico susceptibility profile, extrapolated from evolving evidence in the literature, provided comparable or superior data to the DST results for second-line drugs and could be generated in a much shorter timeframe. Conclusions We propose routine whole-genome sequencing of all MDR M. tuberculosis isolates in adequately resourced settings. This will improve individual patient care, monitor for transmission events and advance our understanding of resistance-associated mutations. PMID:25492392

  15. Insights into the Genetic Structure and Diversity of 38 South Asian Indians from Deep Whole-Genome Sequencing

    PubMed Central

    Saw, Woei-Yuh; Ong, Rick Twee-Hee; Cheng, Anthony Youzhi; Pillai, Nisha Esakimuthu; Liu, Xuanyao; Xu, Wenting; Chen, Peng; Foo, Jia-Nee; Tan, Linda Wei-Lin; Koo, Seok-Hwee; Soong, Richie; Wenk, Markus Rene; Lim, Wei-Yen; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

    2014-01-01

    South Asia possesses a significant amount of genetic diversity due to considerable intergroup differences in culture and language. There have been numerous reports on the genetic structure of Asian Indians, although these have mostly relied on genotyping microarrays or targeted sequencing of the mitochondria and Y chromosomes. Asian Indians in Singapore are primarily descendants of immigrants from Dravidian-language–speaking states in south India, and 38 individuals from the general population underwent deep whole-genome sequencing with a target coverage of 30X as part of the Singapore Sequencing Indian Project (SSIP). The genetic structure and diversity of these samples were compared against samples from the Singapore Sequencing Malay Project and populations in Phase 1 of the 1,000 Genomes Project (1 KGP). SSIP samples exhibited greater intra-population genetic diversity and possessed higher heterozygous-to-homozygous genotype ratio than other Asian populations. When compared against a panel of well-defined Asian Indians, the genetic makeup of the SSIP samples was closely related to South Indians. However, even though the SSIP samples clustered distinctly from the Europeans in the global population structure analysis with autosomal SNPs, eight samples were assigned to mitochondrial haplogroups that were predominantly present in Europeans and possessed higher European admixture than the remaining samples. An analysis of the relative relatedness between SSIP with two archaic hominins (Denisovan, Neanderthal) identified higher ancient admixture in East Asian populations than in SSIP. The data resource for these samples is publicly available and is expected to serve as a valuable complement to the South Asian samples in Phase 3 of 1 KGP. PMID:24832686

  16. Predicting whole genome protein interaction networks from primary sequence data in model and non-model organisms using ENTS

    PubMed Central

    2013-01-01

    Background The large-scale identification of physical protein-protein interactions (PPIs) is an important step toward understanding how biological networks evolve and generate emergent phenotypes. However, experimental identification of PPIs is a laborious and error-prone process, and current methods of PPI prediction tend to be highly conservative or require large amounts of functional data that may not be available for newly-sequenced organisms. Results In this study we demonstrate a random-forest based technique, ENTS, for the computational prediction of protein-protein interactions based only on primary sequence data. Our approach is able to efficiently predict interactions on a whole-genome scale for any eukaryotic organism, using pairwise combinations of conserved domains and predicted subcellular localization of proteins as input features. We present the first predicted interactome for the forest tree Populus trichocarpa in addition to the predicted interactomes for Saccharomyces cerevisiae, Homo sapiens, Mus musculus, and Arabidopsis thaliana. Comparing our approach to other PPI predictors, we find that ENTS performs comparably to or better than a number of existing approaches, including several that utilize a variety of functional information for their predictions. We also find that the predicted interactions are biologically meaningful, as indicated by similarity in functional annotations and enrichment of co-expressed genes in public microarray datasets. Furthermore, we demonstrate some of the biological insights that can be gained from these predicted interaction networks. We show that the predicted interactions yield informative groupings of P. trichocarpa metabolic pathways, literature-supported associations among human disease states, and theory-supported insight into the evolutionary dynamics of duplicated genes in paleopolyploid plants. Conclusion We conclude that the ENTS classifier will be a valuable tool for the de novo annotation of genome

  17. Whole Genome Transcript Profiling of Drug Induced Steatosis in Rats Reveals a Gene Signature Predictive of Outcome

    PubMed Central

    Sahini, Nishika; Selvaraj, Saravanakumar; Borlak, Jürgen

    2014-01-01

    Drug induced steatosis (DIS) is characterised by excess triglyceride accumulation in the form of lipid droplets (LD) in liver cells. To explore mechanisms underlying DIS we interrogated the publically available microarray data from the Japanese Toxicogenomics Project (TGP) to study comprehensively whole genome gene expression changes in the liver of treated rats. For this purpose a total of 17 and 12 drugs which are diverse in molecular structure and mode of action were considered based on their ability to cause either steatosis or phospholipidosis, respectively, while 7 drugs served as negative controls. In our efforts we focused on 200 genes which are considered to be mechanistically relevant in the process of lipid droplet biogenesis in hepatocytes as recently published (Sahini and Borlak, 2014). Based on mechanistic considerations we identified 19 genes which displayed dose dependent responses while 10 genes showed time dependency. Importantly, the present study defined 9 genes (ANGPTL4, FABP7, FADS1, FGF21, GOT1, LDLR, GK, STAT3, and PKLR) as signature genes to predict DIS. Moreover, cross tabulation revealed 9 genes to be regulated ≥10 times amongst the various conditions and included genes linked to glucose metabolism, lipid transport and lipogenesis as well as signalling events. Additionally, a comparison between drugs causing phospholipidosis and/or steatosis revealed 26 genes to be regulated in common including 4 signature genes to predict DIS (PKLR, GK, FABP7 and FADS1). Furthermore, a comparison between in vivo single dose (3, 6, 9 and 24 h) and findings from rat hepatocyte studies (2 h, 8 h, 24 h) identified 10 genes which are regulated in common and contained 2 DIS signature genes (FABP7, FGF21). Altogether, our studies provide comprehensive information on mechanistically linked gene expression changes of a range of drugs causing steatosis and phospholipidosis and encourage the screening of DIS signature genes at the preclinical stage. PMID:25470483

  18. A Study on Pedagogical Requirements for Multi-platform Learning Objects

    NASA Astrophysics Data System (ADS)

    Behar, Patricia Alejandra; Passerino, Liliana Maria; de Castro E Souza Frozi, Ana Paula Frozi; de Oliveira Dias, Cristiani; da Silva, Ketia Kellen Araújo

    This study presents the development of a proposal of pedagogical requirements for multi-platform learning objects (LO). It aims at providing a debate on the importance of such pedagogical requirements in the development and construction of LOs. It also demonstrates an analysis of these requirements performed with a built learning object operating in the Web, digital TV (DTV) and cell phone.

  19. Genetic Diversity of the Q Fever Agent, Coxiella burnetii, Assessed by Microarray-Based Whole-Genome Comparisons†

    PubMed Central

    Beare, Paul A.; Samuel, James E.; Howe, Dale; Virtaneva, Kimmo; Porcella, Stephen F.; Heinzen, Robert A.

    2006-01-01

    Coxiella burnetii, a gram-negative obligate intracellular bacterium, causes human Q fever and is considered a potential agent of bioterrorism. Distinct genomic groups of C. burnetii are revealed by restriction fragment-length polymorphisms (RFLP). Here we comprehensively define the genetic diversity of C. burnetii by hybridizing the genomes of 20 RFLP-grouped and four ungrouped isolates from disparate sources to a high-density custom Affymetrix GeneChip containing all open reading frames (ORFs) of the Nine Mile phase I (NMI) reference isolate. We confirmed the relatedness of RFLP-grouped isolates and showed that two ungrouped isolates represent distinct genomic groups. Isolates contained up to 20 genomic polymorphisms consisting of 1 to 18 ORFs each. These were mostly complete ORF deletions, although partial deletions, point mutations, and insertions were also identified. A total of 139 chromosomal and plasmid ORFs were polymorphic among all C. burnetii isolates, representing ca. 7% of the NMI coding capacity. Approximately 67% of all deleted ORFs were hypothetical, while 9% were annotated in NMI as nonfunctional (e.g., frameshifted). The remaining deleted ORFs were associated with diverse cellular functions. The only deletions associated with isogenic NMI variants of attenuated virulence were previously described large deletions containing genes involved in lipopolysaccharide (LPS) biosynthesis, suggesting that these polymorphisms alone are responsible for the lower virulence of these variants. Interestingly, a variant of the Australia QD isolate producing truncated LPS had no detectable deletions, indicating LPS truncation can occur via small genetic changes. Our results provide new insight into the genetic diversity and virulence potential of Coxiella species. PMID:16547017

  20. Analysis of Campylobacter jejuni Whole Genome DNA Microarrays to Identify Gene Differences for Use in Strain Subtyping

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: Campylobacter jejuni is a major cause of gastroenteritis in humans and is carried in many common food animals. In order to reduce human infections a better understanding of Campylobacter epidemiology is needed. One way to improve this is the identification of genes that allow for the det...

  1. Analysis of Campylobacter jejuni whole genome DNA microarrays: significance of prophage and hypervariable regions for discriminating isolates

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Introduction: Campylobacter jejuni is a major cause of gastroenteritis in humans and is carried in many common food animals. In order to reduce human infections a better understanding of Campylobacter epidemiology is needed. Identifying genes that enable discriminating between isolates is an importa...

  2. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    ScienceCinema

    FitzGerald, Michael [Broad Institute

    2013-02-12

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  3. Whole-Genome Sequence of Enteractinococcus helveticum sp. nov. Strain UASWS1574 Isolated from Industrial Used Waters

    PubMed Central

    Crovadore, Julien; Calmin, Gautier; Chablais, Romain; Cochard, Bastien

    2016-01-01

    We report here the whole-genome shotgun sequences of the strain UASWS1574 of the undescribed Enteractinococcus helveticum sp. nov., isolated from used water. This is the first genome registered for the whole genus. PMID:27469945

  4. A rapid whole genome sequencing and analysis system supporting genomic epidemiology (7th Annual SFAF Meeting, 2012)

    SciTech Connect

    FitzGerald, Michael

    2012-06-01

    Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.

  5. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus

    PubMed Central

    Driebe, Elizabeth M.; Sahl, Jason W.; Roe, Chandler; Bowers, Jolene R.; Schupp, James M.; Gillece, John D.; Kelley, Erin; Price, Lance B.; Pearson, Talima R.; Hepp, Crystal M.; Brzoska, Pius M.; Cummings, Craig A.; Furtado, Manohar R.; Andersen, Paal S.; Stegger, Marc; Engelthaler, David M.; Keim, Paul S.

    2015-01-01

    Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP)-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss. PMID:26161978

  6. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71.

    PubMed

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H Rogier

    2015-04-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  7. Postzygotic single-nucleotide mosaicisms in whole-genome sequences of clinically unremarkable individuals

    PubMed Central

    Huang, August Y; Xu, Xiaojing; Ye, Adam Y; Wu, Qixi; Yan, Linlin; Zhao, Boxun; Yang, Xiaoxu; He, Yao; Wang, Sheng; Zhang, Zheng; Gu, Bowen; Zhao, Han-Qing; Wang, Meng; Gao, Hua; Gao, Ge; Zhang, Zhichao; Yang, Xiaoling; Wu, Xiru; Zhang, Yuehua; Wei, Liping

    2014-01-01

    Postzygotic single-nucleotide mutations (pSNMs) have been studied in cancer and a few other overgrowth human disorders at whole-genome scale and found to play critical roles. However, in clinically unremarkable individuals, pSNMs have never been identified at whole-genome scale largely due to technical difficulties and lack of matched control tissue samples, and thus the genome-wide characteristics of pSNMs remain unknown. We developed a new Bayesian-based mosaic genotyper and a series of effective error filters, using which we were able to identify 17 SNM sites from ∼80× whole-genome sequencing of peripheral blood DNAs from three clinically unremarkable adults. The pSNMs were thoroughly validated using pyrosequencing, Sanger sequencing of individual cloned fragments, and multiplex ligation-dependent probe amplification. The mutant allele fraction ranged from 5%-31%. We found that C→T and C→A were the predominant types of postzygotic mutations, similar to the somatic mutation profile in tumor tissues. Simulation data showed that the overall mutation rate was an order of magnitude lower than that in cancer. We detected varied allele fractions of the pSNMs among multiple samples obtained from the same individuals, including blood, saliva, hair follicle, buccal mucosa, urine, and semen samples, indicating that pSNMs could affect multiple sources of somatic cells as well as germ cells. Two of the adults have children who were diagnosed with Dravet syndrome. We identified two non-synonymous pSNMs in SCN1A, a causal gene for Dravet syndrome, from these two unrelated adults and found that the mutant alleles were transmitted to their children, highlighting the clinical importance of detecting pSNMs in genetic counseling. PMID:25312340

  8. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    PubMed

    Driebe, Elizabeth M; Sahl, Jason W; Roe, Chandler; Bowers, Jolene R; Schupp, James M; Gillece, John D; Kelley, Erin; Price, Lance B; Pearson, Talima R; Hepp, Crystal M; Brzoska, Pius M; Cummings, Craig A; Furtado, Manohar R; Andersen, Paal S; Stegger, Marc; Engelthaler, David M; Keim, Paul S

    2015-01-01

    Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP)-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss. PMID:26161978

  9. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    SciTech Connect

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; Fahrenbach, John P.; Nelakuditi, Viswateja; Pesce, Lorenzo L.; Pytel, Peter; McNally, Elizabeth M.

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused on 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.

  10. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGESBeta

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; Fahrenbach, John P.; Nelakuditi, Viswateja; Pesce, Lorenzo L.; Pytel, Peter; McNally, Elizabeth M.

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  11. Whole-genome mapping reveals a large chromosomal inversion on Iberian Brucella suis biovar 2 strains.

    PubMed

    Ferreira, Ana Cristina; Dias, Ricardo; de Sá, Maria Inácia Corrêa; Tenreiro, Rogério

    2016-08-30

    Optical mapping is a technology able to quickly generate high resolution ordered whole-genome restriction maps of bacteria, being a proven approach to search for diversity among bacterial isolates. In this work, optical whole-genome maps were used to compare closely-related Brucella suis biovar 2 strains. This biovar is the unique isolated in domestic pigs and wild boars in Portugal and Spain and most of the strains share specific molecular characteristics establishing an Iberian clonal lineage that can be differentiated from another lineage mainly isolated in several Central European countries. We performed the BamHI whole-genome optical maps of five B. suis biovar 2 field strains, isolated from wild boars in Portugal and Spain (three from the Iberian lineage and two from the Central European one) as well as of the reference strain B. suis biovar 2 ATCC 23445 (Central European lineage, Denmark). Each strain showed a distinct, highly individual configuration of 228-231 BamHI fragments. Nevertheless, a low divergence was globally observed in chromosome II (1.6%) relatively to chromosome I (2.4%). Optical mapping also disclosed genomic events associated with B. suis strains in chromosome I, namely one indel (3.5kb) and one large inversion (944kb). By using targeted-PCR in a set of 176 B. suis strains, including all biovars and haplotypes, the indel was found to be specific of the reference strain ATCC 23445 and the large inversion was shown to be an exclusive genomic marker of the Iberian clonal lineage of biovar 2. PMID:27527786

  12. An MCMC algorithm for haplotype assembly from whole-genome sequence data

    PubMed Central

    Bansal, Vikas; Halpern, Aaron L.; Axelrod, Nelson; Bafna, Vineet

    2008-01-01

    In comparison to genotypes, knowledge about haplotypes (the combination of alleles present on a single chromosome) is much more useful for whole-genome association studies and for making inferences about human evolutionary history. Haplotypes are typically inferred from population genotype data using computational methods. Whole-genome sequence data represent a promising resource for constructing haplotypes spanning hundreds of kilobases for an individual. In this article, we propose a Markov chain Monte Carlo (MCMC) algorithm, HASH (haplotype assembly for single human), for assembling haplotypes from sequenced DNA fragments that have been mapped to a reference genome assembly. The transitions of the Markov chain are generated using min-cut computations on graphs derived from the sequenced fragments. We have applied our method to infer haplotypes using whole-genome shotgun sequence data from a recently sequenced human individual. The high sequence coverage and presence of mate pairs result in fairly long haplotypes (N50 length ∼ 350 kb). Based on comparison of the sequenced fragments against the individual haplotypes, we demonstrate that the haplotypes for this individual inferred using HASH are significantly more accurate than the haplotypes estimated using a previously proposed greedy heuristic and a simple MCMC method. Using haplotypes from the HapMap project, we estimate the switch error rate of the haplotypes inferred using HASH to be quite low, ∼1.1%. Our Markov chain Monte Carlo algorithm represents a general framework for haplotype assembly that can be applied to sequence data generated by other sequencing technologies. The code implementing the methods and the phased individual haplotypes can be downloaded from http://www.cse.ucsd.edu/users/vibansal/HASH/. PMID:18676820

  13. A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

    PubMed Central

    Tan, Le Van; Tuyen, Nguyen Thi Kim; Thanh, Tran Tan; Ngan, Tran Thuy; Van, Hoang Minh Tu; Sabanathan, Saraswathy; Van, Tran Thi My; Thanh, Le Thi My; Nguyet, Lam Anh; Geoghegan, Jemma L.; Ong, Kien Chai; Perera, David; Hang, Vu Thi Ty; Ny, Nguyen Thi Han; Anh, Nguyen To; Ha, Do Quang; Qui, Phan Tu; Viet, Do Chau; Tuan, Ha Manh; Wong, Kum Thong; Holmes, Edward C.; Chau, Nguyen Van Vinh; Thwaites, Guy; van Doorn, H. Rogier

    2015-01-01

    Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples. PMID:25704598

  14. Accuracy of genomic prediction using imputed whole-genome sequence data in white layers.

    PubMed

    Heidaritabar, M; Calus, M P L; Megens, H-J; Vereijken, A; Groenen, M A M; Bastiaansen, J W M

    2016-06-01

    There is an increasing interest in using whole-genome sequence data in genomic selection breeding programmes. Prediction of breeding values is expected to be more accurate when whole-genome sequence is used, because the causal mutations are assumed to be in the data. We performed genomic prediction for the number of eggs in white layers using imputed whole-genome resequence data including ~4.6 million SNPs. The prediction accuracies based on sequence data were compared with the accuracies from the 60 K SNP panel. Predictions were based on genomic best linear unbiased prediction (GBLUP) as well as a Bayesian variable selection model (BayesC). Moreover, the prediction accuracy from using different types of variants (synonymous, non-synonymous and non-coding SNPs) was evaluated. Genomic prediction using the 60 K SNP panel resulted in a prediction accuracy of 0.74 when GBLUP was applied. With sequence data, there was a small increase (~1%) in prediction accuracy over the 60 K genotypes. With both 60 K SNP panel and sequence data, GBLUP slightly outperformed BayesC in predicting the breeding values. Selection of SNPs more likely to affect the phenotype (i.e. non-synonymous SNPs) did not improve the accuracy of genomic prediction. The fact that sequence data were based on imputation from a small number of sequenced animals may have limited the potential to improve the prediction accuracy. A small reference population (n = 1004) and possible exclusion of many causal SNPs during quality control can be other possible reasons for limited benefit of sequence data. We expect, however, that the limited improvement is because the 60 K SNP panel was already sufficiently dense to accurately determine the relationships between animals in our data. PMID:26776363

  15. Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies

    PubMed Central

    2012-01-01

    Background Rosaceae include numerous economically important and morphologically diverse species. Comparative mapping between the member species in Rosaceae have indicated some level of synteny. Recently the whole genome of three crop species, peach, apple and strawberry, which belong to different genera of the Rosaceae family, have been sequenced, allowing in-depth comparison of these genomes. Results Our analysis using the whole genome sequences of peach, apple and strawberry identified 1399 orthologous regions between the three genomes, with a mean length of around 100 kb. Each peach chromosome showed major orthology mostly to one strawberry chromosome, but to more than two apple chromosomes, suggesting that the apple genome went through more chromosomal fissions in addition to the whole genome duplication after the divergence of the three genera. However, the distribution of contiguous ancestral regions, identified using the multiple genome rearrangements and ancestors (MGRA) algorithm, suggested that the Fragaria genome went through a greater number of small scale rearrangements compared to the other genomes since they diverged from a common ancestor. Using the contiguous ancestral regions, we reconstructed a hypothetical ancestral genome for the Rosaceae 7 composed of nine chromosomes and propose the evolutionary steps from the ancestral genome to the extant Fragaria, Prunus and Malus genomes. Conclusion Our analysis shows that different modes of evolution may have played major roles in different subfamilies of Rosaceae. The hypothetical ancestral genome of Rosaceae and the evolutionary steps that lead to three different lineages of Rosaceae will facilitate our understanding of plant genome evolution as well as have a practical impact on knowledge transfer among member species of Rosaceae. PMID:22475018

  16. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma.

    PubMed

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-02-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  17. Whole-Genome Sequencing Reveals Diverse Models of Structural Variations in Esophageal Squamous Cell Carcinoma

    PubMed Central

    Cheng, Caixia; Zhou, Yong; Li, Hongyi; Xiong, Teng; Li, Shuaicheng; Bi, Yanghui; Kong, Pengzhou; Wang, Fang; Cui, Heyang; Li, Yaoping; Fang, Xiaodong; Yan, Ting; Li, Yike; Wang, Juan; Yang, Bin; Zhang, Ling; Jia, Zhiwu; Song, Bin; Hu, Xiaoling; Yang, Jie; Qiu, Haile; Zhang, Gehong; Liu, Jing; Xu, Enwei; Shi, Ruyi; Zhang, Yanyan; Liu, Haiyan; He, Chanting; Zhao, Zhenxiang; Qian, Yu; Rong, Ruizhou; Han, Zhiwei; Zhang, Yanlin; Luo, Wen; Wang, Jiaqian; Peng, Shaoliang; Yang, Xukui; Li, Xiangchun; Li, Lin; Fang, Hu; Liu, Xingmin; Ma, Li; Chen, Yunqing; Guo, Shiping; Chen, Xing; Xi, Yanfeng; Li, Guodong; Liang, Jianfang; Yang, Xiaofeng; Guo, Jiansheng; Jia, JunMei; Li, Qingshan; Cheng, Xiaolong; Zhan, Qimin; Cui, Yongping

    2016-01-01

    Comprehensive identification of somatic structural variations (SVs) and understanding their mutational mechanisms in cancer might contribute to understanding biological differences and help to identify new therapeutic targets. Unfortunately, characterization of complex SVs across the whole genome and the mutational mechanisms underlying esophageal squamous cell carcinoma (ESCC) is largely unclear. To define a comprehensive catalog of somatic SVs, affected target genes, and their underlying mechanisms in ESCC, we re-analyzed whole-genome sequencing (WGS) data from 31 ESCCs using Meerkat algorithm to predict somatic SVs and Patchwork to determine copy-number changes. We found deletions and translocations with NHEJ and alt-EJ signature as the dominant SV types, and 16% of deletions were complex deletions. SVs frequently led to disruption of cancer-associated genes (e.g., CDKN2A and NOTCH1) with different mutational mechanisms. Moreover, chromothripsis, kataegis, and breakage-fusion-bridge (BFB) were identified as contributing to locally mis-arranged chromosomes that occurred in 55% of ESCCs. These genomic catastrophes led to amplification of oncogene through chromothripsis-derived double-minute chromosome formation (e.g., FGFR1 and LETM2) or BFB-affected chromosomes (e.g., CCND1, EGFR, ERBB2, MMPs, and MYC), with approximately 30% of ESCCs harboring BFB-derived CCND1 amplification. Furthermore, analyses of copy-number alterations reveal high frequency of whole-genome duplication (WGD) and recurrent focal amplification of CDCA7 that might act as a potential oncogene in ESCC. Our findings reveal molecular defects such as chromothripsis and BFB in malignant transformation of ESCCs and demonstrate diverse models of SVs-derived target genes in ESCCs. These genome-wide SV profiles and their underlying mechanisms provide preventive, diagnostic, and therapeutic implications for ESCCs. PMID:26833333

  18. Whole-Genome Sequencing for Detecting Antimicrobial Resistance in Nontyphoidal Salmonella.

    PubMed

    McDermott, Patrick F; Tyson, Gregory H; Kabera, Claudine; Chen, Yuansha; Li, Cong; Folster, Jason P; Ayers, Sherry L; Lam, Claudia; Tate, Heather P; Zhao, Shaohua

    2016-09-01

    Laboratory-based in vitro antimicrobial susceptibility testing is the foundation for guiding anti-infective therapy and monitoring antimicrobial resistance trends. We used whole-genome sequencing (WGS) technology to identify known antimicrobial resistance determinants among strains of nontyphoidal Salmonella and correlated these with susceptibility phenotypes to evaluate the utility of WGS for antimicrobial resistance surveillance. Six hundred forty Salmonella of 43 different serotypes were selected from among retail meat and human clinical isolates that were tested for susceptibility to 14 antimicrobials using broth microdilution. The MIC for each drug was used to categorize isolates as susceptible or resistant based on Clinical and Laboratory Standards Institute clinical breakpoints or National Antimicrobial Resistance Monitoring System (NARMS) consensus interpretive criteria. Each isolate was subjected to whole-genome shotgun sequencing, and resistance genes were identified from assembled sequences. A total of 65 unique resistance genes, plus mutations in two structural resistance loci, were identified. There were more unique resistance genes (n = 59) in the 104 human isolates than in the 536 retail meat isolates (n = 36). Overall, resistance genotypes and phenotypes correlated in 99.0% of cases. Correlations approached 100% for most classes of antibiotics but were lower for aminoglycosides and beta-lactams. We report the first finding of extended-spectrum β-lactamases (ESBLs) (blaCTX-M1 and blaSHV2a) in retail meat isolates of Salmonella in the United States. Whole-genome sequencing is an effective tool for predicting antibiotic resistance in nontyphoidal Salmonella, although the use of more appropriate surveillance breakpoints and increased knowledge of new resistance alleles will further improve correlations. PMID:27381390

  19. The Future of Whole-Genome Sequencing for Public Health and the Clinic.

    PubMed

    Allard, Marc W

    2016-08-01

    An American Society for Microbiology (ASM) conference titled the Conference on Rapid Next-Generation Sequencing and Bioinformatic Pipelines for Enhanced Molecular Epidemiological Investigation of Pathogens provided a venue for discussing how technologies surrounding whole-genome sequencing (WGS) are advancing microbiology. Several applications in microbial taxonomy, microbial forensics, and genomics for public health pathogen surveillance were presented at the meeting and are reviewed. All of these studies document that WGS is revolutionizing applications in microbiology and that the impact of these technologies will be profound. ASM is providing support mechanisms to promote discussions of WGS techniques to foster applications and interpretations. PMID:27307454

  20. Whole genome sequences and annotation of Micrococcus luteus SUBG006, a novel phytopathogen of mango

    PubMed Central

    Rakhashiya, Purvi M.; Patel, Pooja P.; Thaker, Vrinda S.

    2015-01-01

    Actinobaceria, Micrococcus luteus SUBG006 was isolated from infected leaves of Mangifera indica L. vr. Nylon in Rajkot, (22.30°N, 70.78°E), Gujarat, India. The genome size is 3.86 Mb with G + C content of 69.80% and contains 112 rRNA sequences (5S, 16S and 23S). The whole genome sequencing has been deposited in DDBJ/EMBL/GenBank under the accession number JOKP00000000. PMID:26697318

  1. Sequence Determination from Overlapping Fragments: A Simple Model of Whole-Genome Shotgun Sequencing

    NASA Astrophysics Data System (ADS)

    Derrida, Bernard; Fink, Thomas M.

    2002-02-01

    Assembling fragments randomly sampled from along a sequence is the basis of whole-genome shotgun sequencing, a technique used to map the DNA of the human and other genomes. We calculate the probability that a random sequence can be recovered from a collection of overlapping fragments. We provide an exact solution for an infinite alphabet and in the case of constant overlaps. For the general problem we apply two assembly strategies and give the probability that the assembly puzzle can be solved in the limit of infinitely many fragments.

  2. Identification of low abundance microbiome in clinical samples using whole genome sequencing.

    PubMed

    Zhang, Chao; Cleveland, Kyle; Schnoll-Sussman, Felice; McClure, Bridget; Bigg, Michelle; Thakkar, Prashant; Schultz, Nikolaus; Shah, Manish A; Betel, Doron

    2015-01-01

    Identifying the microbiome composition from primary tissues directly affords an opportunity to study the causative relationships between the host microbiome and disease. However, this is challenging due the low abundance of microbial DNA relative to the host. We present a systematic evaluation of microbiome profiling directly from endoscopic biopsies by whole genome sequencing. We compared our methods with other approaches on datasets with previously identified microbial composition. We applied this approach to identify the microbiome from 27 stomach biopsies, and validated the presence of Helicobacter pylori by quantitative PCR. Finally, we profiled the microbial composition in The Cancer Genome Atlas gastric adenocarcinoma cohort. PMID:26614063

  3. Return of genetic testing results in the era of whole-genome sequencing.

    PubMed

    Knoppers, Bartha Maria; Zawati, Ma'n H; Sénécal, Karine

    2015-09-01

    Genetic testing based on whole-genome sequencing (WGS) often returns results that are not directly clinically actionable as well as raising the possibility of incidental (secondary) findings. In this article, we first survey the laws and policies guiding both researchers and clinicians in the return of results for WGS-based genetic testing. We then provide an overview of the landscape of international legislation and policies for return of these results, including considerations for return of incidental findings. Finally, we consider a range of approaches for the return of results. PMID:26239711

  4. Whole genome sequences and annotation of Micrococcus luteus SUBG006, a novel phytopathogen of mango.

    PubMed

    Rakhashiya, Purvi M; Patel, Pooja P; Thaker, Vrinda S

    2015-12-01

    Actinobaceria, Micrococcus luteus SUBG006 was isolated from infected leaves of Mangifera indica L. vr. Nylon in Rajkot, (22.30°N, 70.78°E), Gujarat, India. The genome size is 3.86 Mb with G + C content of 69.80% and contains 112 rRNA sequences (5S, 16S and 23S). The whole genome sequencing has been deposited in DDBJ/EMBL/GenBank under the accession number JOKP00000000. PMID:26697318

  5. A green-cotyledon/stay-green mutant exemplifies the ancient whole-genome duplications in soybean.

    PubMed

    Nakano, Michiharu; Yamada, Tetsuya; Masuda, Yu; Sato, Yutaka; Kobayashi, Hideki; Ueda, Hiroaki; Morita, Ryouhei; Nishimura, Minoru; Kitamura, Keisuke; Kusaba, Makoto

    2014-10-01

    The recent whole-genome sequencing of soybean (Glycine max) revealed that soybean experienced whole-genome duplications 59 million and 13 million years ago, and it has an octoploid-like genome in spite of its diploid nature. We analyzed a natural green-cotyledon mutant line, Tenshin-daiseitou. The physiological analysis revealed that Tenshin-daiseitou shows a non-functional stay-green phenotype in senescent leaves, which is similar to that of the mutant of Mendel's green-cotyledon gene I, the ortholog of SGR in pea. The identification of gene mutations and genetic segregation analysis suggested that defects in GmSGR1 and GmSGR2 were responsible for the green-cotyledon/stay-green phenotype of Tenshin-daiseitou, which was confirmed by RNA interference (RNAi) transgenic soybean experiments using GmSGR genes. The characterized green-cotyledon double mutant d1d2 was found to have the same mutations, suggesting that GmSGR1 and GmSGR2 are D1 and D2. Among the examined d1d2 strains, the d1d2 strain K144a showed a lower Chl a/b ratio in mature seeds than other strains but not in senescent leaves, suggesting a seed-specific genetic factor of the Chl composition in K144a. Analysis of the soybean genome sequence revealed four genomic regions with microsynteny to the Arabidopsis SGR1 region, which included the GmSGR1 and GmSGR2 regions. The other two regions contained GmSGR3a/GmSGR3b and GmSGR4, respectively, which might be pseudogenes or genes with a function that is unrelated to Chl degradation during seed maturation and leaf senescence. These GmSGR genes were thought to be produced by the two whole-genome duplications, and they provide a good example of such whole-genome duplication events in the evolution of the soybean genome. PMID:25108243

  6. An integrated computational pipeline and database to support whole-genome sequence annotation

    PubMed Central

    Mungall, CJ; Misra, S; Berman, BP; Carlson, J; Frise, E; Harris, N; Marshall, B; Shu, S; Kaminker, JS; Prochnik, SE; Smith, CD; Smith, E; Tupy, JL; Wiel, C; Rubin, GM; Lewis, SE

    2002-01-01

    We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture. PMID:12537570

  7. Identification of emergent blaCMY-2-carrying Proteus mirabilis lineages by whole-genome sequencing

    PubMed Central

    Mac Aogáin, M.; Rogers, T.R.; Crowley, B.

    2015-01-01

    Whole-genome sequencing of 24 Proteus mirabilis isolates revealed the clonal expansion of two cefoxitin-resistant strains among patients with community-onset infection. These strains harboured blaCMY-2 within a chromosomally located integrative and conjugative element and exhibited multidrug resistance phenotypes. A predominant strain, identified in 18 patients, also harboured the PGI-1 genomic island and associated resistance genes, accounting for its broader antibiotic resistance profile. The identification of these novel multidrug-resistant strains among community-onset infections suggests that they are endemic to this region and represent emergent P. mirabilis lineages of clinical significance. PMID:26865983

  8. When aging meets microgravity: whole genome promoters and enchancers transcription landscape in zebrafish onboard ISS

    NASA Astrophysics Data System (ADS)

    Arshanovskii, Kirill; Gusev, Oleg; Sychev, Vladimir; Poddubko, Svetlana; Deviatiiarov, Ruslan

    2016-07-01

    In order to gen new insights of gene regulation changes under conditions of real spaceflight, we have conducted whole-genome analysis of dynamic of promotes and enhancers transcriptional changes in zebrafish during prolonged exposure to real spaceflight. In the frame of Russia-Japan joint experiments "Aquatic Habitat"-"Aquarium" we have conducted Cap Analysis of Gene Expression (CAGE) assay of zebrafish in the rage from 7 to 40 days of real spaceflight onboard ISS. The analysis showed that both gene expression patterns and architecture of shapes and types of the promoters are affected by spaceflight environment.

  9. On-site manipulation of single whole-genome DNA molecules using optical tweezers

    NASA Astrophysics Data System (ADS)

    Oana, Hidehiro; Kubo, Koji; Yoshikawa, Kenichi; Atomi, Haruyuki; Imanaka, Tadayuki

    2004-11-01

    In this letter, we describe a noninvasive methodology for manipulating single Mb-size whole-genome DNA molecules. Cells were subjected to osmotic shock and the genome DNA released from the burst cells was transferred to a region of higher salt concentration using optical tweezers. The transferred genome DNA exhibits a conformational transition from a compact state into an elongated state, accompanied by the change in its environment. The applicability of optical tweezers to the on-site manipulation of giant genome DNA is suggested, i.e., lab-on-a-plate.

  10. Evolutionary insight from whole-genome sequencing of Pseudomonas aeruginosa from cystic fibrosis patients.

    PubMed

    Marvig, Rasmus Lykke; Sommer, Lea M; Jelsbak, Lars; Molin, Søren; Johansen, Helle Krogh

    2015-01-01

    The opportunistic pathogen Pseudomonas aeruginosa causes chronic airway infections in patients with cystic fibrosis (CF), and it is directly associated with the morbidity and mortality connected with this disease. The ability of P. aeruginosa to establish chronic infections in CF patients is suggested to be due to the large genetic repertoire of P. aeruginosa and its ability to genetically adapt to the host environment. Here, we review the recent work that has applied whole-genome sequencing to understand P. aeruginosa population genomics, within-host microevolution and diversity, mutational mechanisms, genetic adaptation and transmission events. Finally, we summarize the advances in relation to medical applications and laboratory evolution experiments. PMID:25865196

  11. Microarray analysis reveals strategies of Tribolium castaneum larvae to compensate for cysteine and serine protease inhibitors

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Microarrays containing Tribolium castaneum whole-genome sequences were developed to study the transcriptome response of T. castaneum larvae to dietary protease inhibitors. In larvae fed diets containing 0.1% of the cysteine protease inhibitor E-64 alone or in combination with 5.0% of the serine pro...

  12. Whole Genome Duplication Affects Evolvability of Flowering Time in an Autotetraploid Plant

    PubMed Central

    Martin, Sara L.; Husband, Brian C.

    2012-01-01

    Whole genome duplications have occurred recurrently throughout the evolutionary history of eukaryotes. The resulting genetic and phenotypic changes can influence physiological and ecological responses to the environment; however, the impact of genome copy number on evolvability has rarely been examined experimentally. Here, we evaluate the effect of genome duplication on the ability to respond to selection for early flowering time in lines drawn from naturally occurring diploid and autotetraploid populations of the plant Chamerion angustifolium (fireweed). We contrast this with the result of four generations of selection on synthesized neoautotetraploids, whose genic variability is similar to diploids but genome copy number is similar to autotetraploids. In addition, we examine correlated responses to selection in all three groups. Diploid and both extant tetraploid and neoautotetraploid lines responded to selection with significant reductions in time to flowering. Evolvability, measured as realized heritability, was significantly lower in extant tetraploids ( = 0.31) than diploids ( = 0.40). Neotetraploids exhibited the highest evolutionary response ( = 0.55). The rapid shift in flowering time in neotetraploids was associated with an increase in phenotypic variability across generations, but not with change in genome size or phenotypic correlations among traits. Our results suggest that whole genome duplications, without hybridization, may initially alter evolutionary rate, and that the dynamic nature of neoautopolyploids may contribute to the prevalence of polyploidy throughout eukaryotes. PMID:23028620

  13. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    PubMed Central

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.; Hallström, Björn M.; Liu, Zihe; Petranovic, Dina; Uhlén, Mathias; Joensson, Haakan N.; Andersson-Svahn, Helene; Nielsen, Jens

    2015-01-01

    There is an increasing demand for biotech-based production of recombinant proteins for use as pharmaceuticals in the food and feed industry and in industrial applications. Yeast Saccharomyces cerevisiae is among preferred cell factories for recombinant protein production, and there is increasing interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant α-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330 mutations in total. Gene ontology analysis of mutated genes revealed many biological processes, including some that have not been identified before in the context of protein secretion. Mutated genes identified in this study can be potentially used for reverse metabolic engineering, with the objective to construct efficient cell factories for protein secretion. The combined use of microfluidics screening and whole-genome sequencing to map the mutations associated with the improved phenotype can easily be adapted for other products and cell types to identify novel engineering targets, and this approach could broadly facilitate design of novel cell factories. PMID:26261321

  14. Kernel-based whole-genome prediction of complex traits: a review

    PubMed Central

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics. PMID:25360145

  15. Analysis of the microbiome: Advantages of whole genome shotgun versus 16S amplicon sequencing.

    PubMed

    Ranjan, Ravi; Rani, Asha; Metwally, Ahmed; McGee, Halvor S; Perkins, David L

    2016-01-22

    The human microbiome has emerged as a major player in regulating human health and disease. Translational studies of the microbiome have the potential to indicate clinical applications such as fecal transplants and probiotics. However, one major issue is accurate identification of microbes constituting the microbiota. Studies of the microbiome have frequently utilized sequencing of the conserved 16S ribosomal RNA (rRNA) gene. We present a comparative study of an alternative approach using whole genome shotgun sequencing (WGS). In the present study, we analyzed the human fecal microbiome compiling a total of 194.1 × 10(6) reads from a single sample using multiple sequencing methods and platforms. Specifically, after establishing the reproducibility of our methods with extensive multiplexing, we compared: 1) The 16S rRNA amplicon versus the WGS method, 2) the Illumina HiSeq versus MiSeq platforms, 3) the analysis of reads versus de novo assembled contigs, and 4) the effect of shorter versus longer reads. Our study demonstrates that whole genome shotgun sequencing has multiple advantages compared with the 16S amplicon method including enhanced detection of bacterial species, increased detection of diversity and increased prediction of genes. In addition, increased length, either due to longer reads or the assembly of contigs, improved the accuracy of species detection. PMID:26718401

  16. Whole-genome duplication increases tumor cell sensitivity to MPS1 inhibition

    PubMed Central

    Jemaà, Mohamed; Manic, Gwenola; Lledo, Gwendaline; Lissa, Delphine; Reynes, Christelle; Morin, Nathalie; Chibon, Frédéric; Sistigu, Antonella; Castedo, Maria; Vitale, Ilio; Kroemer, Guido; Abrieu, Ariane

    2016-01-01

    Several lines of evidence indicate that whole-genome duplication resulting in tetraploidy facilitates carcinogenesis by providing an intermediate and metastable state more prone to generate oncogenic aneuploidy. Here, we report a novel strategy to preferentially kill tetraploid cells based on the abrogation of the spindle assembly checkpoint (SAC) via the targeting of TTK protein kinase (better known as monopolar spindle 1, MPS1). The pharmacological inhibition as well as the knockdown of MPS1 kills more efficiently tetraploid cells than their diploid counterparts. By using time-lapse videomicroscopy, we show that tetraploid cells do not survive the aborted mitosis due to SAC abrogation upon MPS1 depletion. On the contrary diploid cells are able to survive up to at least two more cell cycles upon the same treatment. This effect might reflect the enhanced difficulty of cells with whole-genome doubling to tolerate a further increase in ploidy and/or an elevated level of chromosome instability in the absence of SAC functions. We further show that MPS1-inhibited tetraploid cells promote mitotic catastrophe executed by the intrinsic pathway of apoptosis, as indicated by the loss of mitochondrial potential, the release of the pro-apoptotic cytochrome c from mitochondria, and the activation of caspases. Altogether, our results suggest that MPS1 inhibition could be used as a therapeutic strategy for targeting tetraploid cancer cells. PMID:26637805

  17. Rediscovery by Whole Genome Sequencing: Classical Mutations and Genome Polymorphisms in Neurospora crassa

    SciTech Connect

    McCluskey, Kevin; Wiest, Aric E.; Grigoriev, Igor V.; Lipzen, Anna; Martin, Joel; Schackwitz, Wendy; Baker, Scott E.

    2011-06-02

    Classical forward genetics has been foundational to modern biology, and has been the paradigm for characterizing the role of genes in shaping phenotypes for decades. In recent years, reverse genetics has been used to identify the functions of genes, via the intentional introduction of variation and subsequent evaluation in physiological, molecular, and even population contexts. These approaches are complementary and whole genome analysis serves as a bridge between the two. We report in this article the whole genome sequencing of eighteen classical mutant strains of Neurospora crassa and the putative identification of the mutations associated with corresponding mutant phenotypes. Although some strains carry multiple unique nonsynonymous, nonsense, or frameshift mutations, the combined power of limiting the scope of the search based on genetic markers and of using a comparative analysis among the eighteen genomes provides strong support for the association between mutation and phenotype. For ten of the mutants, the mutant phenotype is recapitulated in classical or gene deletion mutants in Neurospora or other filamentous fungi. From thirteen to 137 nonsense mutations are present in each strain and indel sizes are shown to be highly skewed in gene coding sequence. Significant additional genetic variation was found in the eighteen mutant strains, and this variability defines multiple alleles of many genes. These alleles may be useful in further genetic and molecular analysis of known and yet-to-be-discovered functions and they invite new interpretations of molecular and genetic interactions in classical mutant strains.

  18. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing.

    PubMed

    Bowers, John E; Pearl, Stephanie A; Burke, John M

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  19. Whole-Genome Sequencing Analysis of Sapovirus Detected in South Korea

    PubMed Central

    Choi, Hye Lim; Suh, Chang-Il; Park, Seung-Won; Jin, Ji-Young; Cho, Han-Gil; Paik, Soon-Young

    2015-01-01

    Sapovirus (SaV), a virus residing in the intestines, is one of the important causes of gastroenteritis in human beings. Human SaV genomes are classified into various genogroups and genotypes. Whole-genome analysis and phylogenetic analysis of ROK62, the SaV isolated in South Korea, were carried out. The ROK62 genome of 7429 nucleotides contains 3 open-reading frames (ORF). The genotype of ROK62 is SaV GI-1, and 94% of its nucleotide sequence is identical with other SaVs, namely Manchester and Mc114. Recently, SaV infection has been on the rise throughout the world, particularly in countries neighboring South Korea; however, very few academic studies have been done nationally. As the first whole-genome sequence analysis of SaV in South Korea, this research will help provide reference for the detection of recombination, tracking of epidemic spread, and development of diagnosis methods for SaV. PMID:26161646

  20. A comprehensive whole-genome integrated cytogenetic map for the alpaca (Lama pacos).

    PubMed

    Avila, Felipe; Baily, Malorie P; Perelman, Polina; Das, Pranab J; Pontius, Joan; Chowdhary, Renuka; Owens, Elaine; Johnson, Warren E; Merriwether, David A; Raudsepp, Terje

    2014-01-01

    Genome analysis of the alpaca (Lama pacos, LPA) has progressed slowly compared to other domestic species. Here, we report the development of the first comprehensive whole-genome integrated cytogenetic map for the alpaca using fluorescence in situ hybridization (FISH) and CHORI-246 BAC library clones. The map is comprised of 230 linearly ordered markers distributed among all 36 alpaca autosomes and the sex chromosomes. For the first time, markers were assigned to LPA14, 21, 22, 28, and 36. Additionally, 86 genes from 15 alpaca chromosomes were mapped in the dromedary camel (Camelus dromedarius, CDR), demonstrating exceptional synteny and linkage conservation between the 2 camelid genomes. Cytogenetic mapping of 191 protein-coding genes improved and refined the known Zoo-FISH homologies between camelids and humans: we discovered new homologous synteny blocks (HSBs) corresponding to HSA1-LPA/CDR11, HSA4-LPA/CDR31 and HSA7-LPA/CDR36, and revised the location of breakpoints for others. Overall, gene mapping was in good agreement with the Zoo-FISH and revealed remarkable evolutionary conservation of gene order within many human-camelid HSBs. Most importantly, 91 FISH-mapped markers effectively integrated the alpaca whole-genome sequence and the radiation hybrid maps with physical chromosomes, thus facilitating the improvement of the sequence assembly and the discovery of genes of biological importance. PMID:25662411

  1. Whole Genome Sequence Typing to Investigate the Apophysomyces Outbreak following a Tornado in Joplin, Missouri, 2011

    PubMed Central

    Etienne, Kizee A.; Gillece, John; Hilsabeck, Remy; Schupp, Jim M.; Colman, Rebecca; Lockhart, Shawn R.; Gade, Lalitha; Thompson, Elizabeth H.; Sutton, Deanna A.; Neblett-Fanfair, Robyn; Park, Benjamin J.; Turabelidze, George; Keim, Paul; Brandt, Mary E.; Deak, Eszter; Engelthaler, David M.

    2012-01-01

    Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces. PMID:23209631

  2. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data.

    PubMed

    Dewey, Frederick E; Grove, Megan E; Priest, James R; Waggott, Daryl; Batra, Prag; Miller, Clint L; Wheeler, Matthew; Zia, Amin; Pan, Cuiping; Karzcewski, Konrad J; Miyake, Christina; Whirl-Carrillo, Michelle; Klein, Teri E; Datta, Somalee; Altman, Russ B; Snyder, Michael; Quertermous, Thomas; Ashley, Euan A

    2015-10-01

    High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework. PMID:26448358

  3. Epigenetic regulation of subgenome dominance following whole genome triplication in Brassica rapa.

    PubMed

    Cheng, Feng; Sun, Chao; Wu, Jian; Schnable, James; Woodhouse, Margaret R; Liang, Jianli; Cai, Chengcheng; Freeling, Michael; Wang, Xiaowu

    2016-07-01

    Subgenome dominance is an important phenomenon observed in allopolyploids after whole genome duplication, in which one subgenome retains more genes as well as contributes more to the higher expressing gene copy of paralogous genes. To dissect the mechanism of subgenome dominance, we systematically investigated the relationships of gene expression, transposable element (TE) distribution and small RNA targeting, relating to the multicopy paralogous genes generated from whole genome triplication in Brassica rapa. The subgenome dominance was found to be regulated by a relatively stable factor established previously, then inherited by and shared among B. rapa varieties. In addition, we found a biased distribution of TEs between flanking regions of paralogous genes. Furthermore, the 24-nt small RNAs target TEs and are negatively correlated to the dominant expression of individual paralogous gene pairs. The biased distribution of TEs among subgenomes and the targeting of 24-nt small RNAs together produce the dominant expression phenomenon at a subgenome scale. Based on these findings, we propose a bucket hypothesis to illustrate subgenome dominance and hybrid vigor. Our findings and hypothesis are valuable for the evolutionary study of polyploids, and may shed light on studies of hybrid vigor, which is common to most species. PMID:26871271

  4. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    DOE PAGESBeta

    Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; Peña, José; Hysom, David A.; Borucki, Monica K.

    2014-01-01

    Background . Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results . A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus.more » Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions . This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family.« less

  5. Sequence to Medical Phenotypes: A Framework for Interpretation of Human Whole Genome DNA Sequence Data

    PubMed Central

    Dewey, Frederick E.; Grove, Megan E.; Priest, James R.; Waggott, Daryl; Batra, Prag; Miller, Clint L.; Wheeler, Matthew; Zia, Amin; Pan, Cuiping; Karzcewski, Konrad J.; Miyake, Christina; Whirl-Carrillo, Michelle; Klein, Teri E.; Datta, Somalee; Altman, Russ B.; Snyder, Michael; Quertermous, Thomas; Ashley, Euan A.

    2015-01-01

    Abstract High throughput sequencing has facilitated a precipitous drop in the cost of genomic sequencing, prompting predictions of a revolution in medicine via genetic personalization of diagnostic and therapeutic strategies. There are significant barriers to realizing this goal that are related to the difficult task of interpreting personal genetic variation. A comprehensive, widely accessible application for interpretation of whole genome sequence data is needed. Here, we present a series of methods for identification of genetic variants and genotypes with clinical associations, phasing genetic data and using Mendelian inheritance for quality control, and providing predictive genetic information about risk for rare disease phenotypes and response to pharmacological therapy in single individuals and father-mother-child trios. We demonstrate application of these methods for disease and drug response prognostication in whole genome sequence data from twelve unrelated adults, and for disease gene discovery in one father-mother-child trio with apparently simplex congenital ventricular arrhythmia. In doing so we identify clinically actionable inherited disease risk and drug response genotypes in pre-symptomatic individuals. We also nominate a new candidate gene in congenital arrhythmia, ATP2B4, and provide experimental evidence of a regulatory role for variants discovered using this framework. PMID:26448358

  6. Genetic Mapping of Millions of SNPs in Safflower (Carthamus tinctorius L.) via Whole-Genome Resequencing

    PubMed Central

    Bowers, John E.; Pearl, Stephanie A.; Burke, John M.

    2016-01-01

    Accurate assembly of complete genomes is facilitated by very high density genetic maps. We performed low-coverage, whole-genome shotgun sequencing on 96 F6 recombinant inbred lines (RILs) of a cross between safflower (Carthamus tinctorius L.) and its wild progenitor (C. palaestinus Eig). We also produced a draft genome assembly of C. tinctorius covering 866 million bp (∼two-thirds) of the expected 1.35 Gbp genome after sequencing a single, short insert library to ∼21 × depth. Sequence reads from the RILs were mapped to this genome assembly to facilitate SNP identification, and the resulting polymorphisms were used to construct a genetic map. The resulting map included 2,008,196 genetically located SNPs in 1178 unique positions. A total of 57,270 scaffolds, each containing five or more mapped SNPs, were anchored to the map. This resulted in the assignment of sequence covering 14% of the expected genome length to a genetic position. Comparison of this safflower map to genetic maps of sunflower and lettuce revealed numerous chromosomal rearrangements, and the resulting patterns were consistent with a whole-genome duplication event in the lineage leading to sunflower. This sequence-based genetic map provides a powerful tool for the assembly of a low-cost draft genome of safflower, and the same general approach is expected to work for other species. PMID:27226165

  7. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib.

    PubMed

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I; Trump, Donald L; Johnson, Candace S; Morrison, Carl D

    2015-10-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise <2% of all granular cell tumors, are associated with aggressive behavior and poor clinical outcome, and are poorly understood in terms of tumor etiology and systematic treatment. Because of its rarity, the genetic basis of malignant granular cell tumor remains unknown. We performed whole-genome sequencing of one malignant granular cell tumor with metabolic response to pazopanib. This tumor exhibited a very low mutation rate and an overall stable genome with local complex rearrangements. The mutation signature was dominated by C>T transitions, particularly when immediately preceded by a 5' G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  8. Comparative whole genome sequence analysis of wild-type and cidofovir-resistant monkeypoxvirus

    PubMed Central

    2010-01-01

    We performed whole genome sequencing of a cidofovir {[(S)-1-(3-hydroxy-2-phosphonylmethoxy-propyl) cytosine] [HPMPC]}-resistant (CDV-R) strain of Monkeypoxvirus (MPV). Whole-genome comparison with the wild-type (WT) strain revealed 55 single-nucleotide polymorphisms (SNPs) and one tandem-repeat contraction. Over one-third of all identified SNPs were located within genes comprising the poxvirus replication complex, including the DNA polymerase, RNA polymerase, mRNA capping methyltransferase, DNA processivity factor, and poly-A polymerase. Four polymorphic sites were found within the DNA polymerase gene. DNA polymerase mutations observed at positions 314 and 684 in MPV were consistent with CDV-R loci previously identified in Vaccinia virus (VACV). These data suggest the mechanism of CDV resistance may be highly conserved across Orthopoxvirus (OPV) species. SNPs were also identified within virulence genes such as the A-type inclusion protein, serine protease inhibitor-like protein SPI-3, Schlafen ATPase and thymidylate kinase, among others. Aberrant chain extension induced by CDV may lead to diverse alterations in gene expression and viral replication that may result in both adaptive and attenuating mutations. Defining the potential contribution of substitutions in the replication complex and RNA processing machinery reported here may yield further insight into CDV resistance and may augment current therapeutic development strategies. PMID:20509894

  9. Inference of Homologous Recombination in Bacteria Using Whole-Genome Sequences

    PubMed Central

    Didelot, Xavier; Lawson, Daniel; Darling, Aaron; Falush, Daniel

    2010-01-01

    Bacteria and archaea reproduce clonally, but sporadically import DNA into their chromosomes from other organisms. In many of these events, the imported DNA replaces an homologous segment in the recipient genome. Here we present a new method to reconstruct the history of recombination events that affected a given sample of bacterial genomes. We introduce a mathematical model that represents both the donor and the recipient of each DNA import as an ancestor of the genomes in the sample. The model represents a simplification of the previously described coalescent with gene conversion. We implement a Monte Carlo Markov chain algorithm to perform inference under this model from sequence data alignments and show that inference is feasible for whole-genome alignments through parallelization. Using simulated data, we demonstrate accurate and reliable identification of individual recombination events and global recombination rate parameters. We applied our approach to an alignment of 13 whole genomes from the Bacillus cereus group. We find, as expected from laboratory experiments, that the recombination rate is higher between closely related organisms and also that the genome contains several broad regions of elevated levels of recombination. Application of the method to the genomic data sets that are becoming available should reveal the evolutionary history and private lives of populations of bacteria and archaea. The methods described in this article have been implemented in a computer software package, ClonalOrigin, which is freely available from http://code.google.com/p/clonalorigin/. PMID:20923983

  10. Whole-genome sequencing of matched primary and metastatic hepatocellular carcinomas

    PubMed Central

    2014-01-01

    Background To gain biological insights into lung metastases from hepatocellular carcinoma (HCC), we compared the whole-genome sequencing profiles of primary HCC and paired lung metastases. Methods We used whole-genome sequencing at 33X-43X coverage to profile somatic mutations in primary HCC (HBV+) and metachronous lung metastases (> 2 years interval). Results In total, 5,027-13,961 and 5,275-12,624 somatic single-nucleotide variants (SNVs) were detected in primary HCC and lung metastases, respectively. Generally, 38.88-78.49% of SNVs detected in metastases were present in primary tumors. We identified 65–221 structural variations (SVs) in primary tumors and 60–232 SVs in metastases. Comparison of these SVs shows very similar and largely overlapped mutated segments between primary and metastatic tumors. Copy number alterations between primary and metastatic pairs were also found to be closely related. Together, these preservations in genomic profiles from liver primary tumors to metachronous lung metastases indicate that the genomic features during tumorigenesis may be retained during metastasis. Conclusions We found very similar genomic alterations between primary and metastatic tumors, with a few mutations found specifically in lung metastases, which may explain the clinical observation that both primary and metastatic tumors are usually sensitive or resistant to the same systemic treatments. PMID:24405831

  11. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics

    PubMed Central

    Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K.; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

    2015-01-01

    Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C·G > T·A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes. PMID:26305677

  12. Whole genome sequence typing to investigate the Apophysomyces outbreak following a tornado in Joplin, Missouri, 2011.

    PubMed

    Etienne, Kizee A; Gillece, John; Hilsabeck, Remy; Schupp, Jim M; Colman, Rebecca; Lockhart, Shawn R; Gade, Lalitha; Thompson, Elizabeth H; Sutton, Deanna A; Neblett-Fanfair, Robyn; Park, Benjamin J; Turabelidze, George; Keim, Paul; Brandt, Mary E; Deak, Eszter; Engelthaler, David M

    2012-01-01

    Case reports of Apophysomyces spp. in immunocompetent hosts have been a result of traumatic deep implantation of Apophysomyces spp. spore-contaminated soil or debris. On May 22, 2011 a tornado occurred in Joplin, MO, leaving 13 tornado victims with Apophysomyces trapeziformis infections as a result of lacerations from airborne material. We used whole genome sequence typing (WGST) for high-resolution phylogenetic SNP analysis of 17 outbreak Apophysomyces isolates and five additional temporally and spatially diverse Apophysomyces control isolates (three A. trapeziformis and two A. variabilis isolates). Whole genome SNP phylogenetic analysis revealed three clusters of genotypically related or identical A. trapeziformis isolates and multiple distinct isolates among the Joplin group; this indicated multiple genotypes from a single or multiple sources. Though no linkage between genotype and location of exposure was observed, WGST analysis determined that the Joplin isolates were more closely related to each other than to the control isolates, suggesting local population structure. Additionally, species delineation based on WGST demonstrated the need to reassess currently accepted taxonomic classifications of phylogenetic species within the genus Apophysomyces. PMID:23209631

  13. Whole-genome sequencing of a malignant granular cell tumor with metabolic response to pazopanib

    PubMed Central

    Wei, Lei; Liu, Song; Conroy, Jeffrey; Wang, Jianmin; Papanicolau-Sengos, Antonios; Glenn, Sean T.; Murakami, Mitsuko; Liu, Lu; Hu, Qiang; Conroy, Jacob; Miles, Kiersten Marie; Nowak, David E.; Liu, Biao; Qin, Maochun; Bshara, Wiam; Omilian, Angela R.; Head, Karen; Bianchi, Michael; Burgher, Blake; Darlak, Christopher; Kane, John; Merzianu, Mihai; Cheney, Richard; Fabiano, Andrew; Salerno, Kilian; Talati, Chetasi; Khushalani, Nikhil I.; Trump, Donald L.; Johnson, Candace S.; Morrison, Carl D.

    2015-01-01

    Granular cell tumors are an uncommon soft tissue neoplasm. Malignant granular cell tumors comprise <2% of all granular cell tumors, are associated with aggressive behavior and poor clinical outcome, and are poorly understood in terms of tumor etiology and systematic treatment. Because of its rarity, the genetic basis of malignant granular cell tumor remains unknown. We performed whole-genome sequencing of one malignant granular cell tumor with metabolic response to pazopanib. This tumor exhibited a very low mutation rate and an overall stable genome with local complex rearrangements. The mutation signature was dominated by C>T transitions, particularly when immediately preceded by a 5′ G. A loss-of-function mutation was detected in a newly recognized tumor suppressor candidate, BRD7. No mutations were found in known targets of pazopanib. However, we identified a receptor tyrosine kinase pathway mutation in GFRA2 that warrants further evaluation. To the best of our knowledge, this is only the second reported case of a malignant granular cell tumor exhibiting a response to pazopanib, and the first whole-genome sequencing of this uncommon tumor type. The findings provide insight into the genetic basis of malignant granular cell tumors and identify potential targets for further investigation. PMID:27148567

  14. Phylogenetic discovery bias in Bacillus anthracis using single-nucleotide polymorphisms from whole-genome sequencing

    PubMed Central

    Pearson, Talima; Busch, Joseph D.; Ravel, Jacques; Read, Timothy D.; Rhoton, Shane D.; U'Ren, Jana M.; Simonson, Tatum S.; Kachur, Sergey M.; Leadem, Rebecca R.; Cardon, Michelle L.; Van Ert, Matthew N.; Huynh, Lynn Y.; Fraser, Claire M.; Keim, Paul

    2004-01-01

    Phylogenetic reconstruction using molecular data is often subject to homoplasy, leading to inaccurate conclusions about phylogenetic relationships among operational taxonomic units. Compared with other molecular markers, single-nucleotide polymorphisms (SNPs) exhibit extremely low mutation rates, making them rare in recently emerged pathogens, but they are less prone to homoplasy and thus extremely valuable for phylogenetic analyses. Despite their phylogenetic potential, ascertainment bias occurs when SNP characters are discovered through biased taxonomic sampling; by using whole-genome comparisons of five diverse strains of Bacillus anthracis to facilitate SNP discovery, we show that only polymorphisms lying along the evolutionary pathway between reference strains will be observed. We illustrate this in theoretical and simulated data sets in which complex phylogenetic topologies are reduced to linear evolutionary models. Using a set of 990 SNP markers, we also show how divergent branches in our topologies collapse to single points but provide accurate information on internodal distances and points of origin for ancestral clades. These data allowed us to determine the ancestral root of B. anthracis, showing that it lies closer to a newly described “C” branch than to either of two previously described “A” or “B” branches. In addition, subclade rooting of the C branch revealed unequal evolutionary rates that seem to be correlated with ecological parameters and strain attributes. Our use of nonhomoplastic whole-genome SNP characters allows branch points and clade membership to be estimated with great precision, providing greater insight into epidemiological, ecological, and forensic questions. PMID:15347815

  15. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    PubMed

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. PMID:27237775

  16. Whole genome DNA methylation analysis based on high throughput sequencing technology.

    PubMed

    Li, Ning; Ye, Mingzhi; Li, Yingrui; Yan, Zhixiang; Butcher, Lee M; Sun, Jihua; Han, Xu; Chen, Quan; Zhang, Xiuqing; Wang, Jun

    2010-11-01

    There are numerous approaches to decipher a whole genome DNA methylation profile ("methylome"), each varying in cost, throughput and resolution. The gold standard of these methods, whole genome bisulfite-sequencing (BS-seq), involves treatment of DNA with sodium bisulfite combined with subsequent high throughput sequencing. Using BS-seq, we generated a single-base-resolution methylome in human peripheral blood mononuclear cells (in press). This BS-seq map was then used as the reference methylome to compare two alternative sequencing-based methylome assays (performed on the same donor of PBMCs): methylated DNA immunoprecipitation (MeDIP-seq) and methyl-binding protein (MBD-seq). In our analysis, we found that MeDIP-seq and MBD-seq are complementary strategies, with MeDIP-seq more sensitive to highly methylated, high-CpG densities and MDB-seq more sensitive to highly methylated, moderate-CpG densities. Taking into account the size of a mammalian genome and the current expense of sequencing, we feel 3gigabases (Gbp) 45bp paired-end MeDIP-seq or MBD-seq uniquely mapped reads is the minimum requirement and cost-effective strategy for methylome pattern analysis. PMID:20430099

  17. Independent Evolution of Winner Traits without Whole Genome Duplication in Dekkera Yeasts

    PubMed Central

    Dai, Shao-Xing; Li, Wen-Xing; Zheng, Jun-Juan; Li, Gong-Hua; Huang, Jing-Fei

    2016-01-01

    Dekkera yeasts have often been considered as alternative sources of ethanol production that could compete with S. cerevisiae. The two lineages of yeasts independently evolved traits that include high glucose and ethanol tolerance, aerobic fermentation, and a rapid ethanol fermentation rate. The Saccharomyces yeasts attained these traits mainly through whole genome duplication approximately 100 million years ago (Mya). However, the Dekkera yeasts, which were separated from S. cerevisiae approximately 200 Mya, did not undergo whole genome duplication (WGD) but still occupy a niche similar to S. cerevisiae. Upon analysis of two Dekkera yeasts and five closely related non-WGD yeasts, we found that a massive loss of cis-regulatory elements occurred in an ancestor of the Dekkera yeasts, which led to improved mitochondrial functions similar to the S. cerevisiae yeasts. The evolutionary analysis indicated that genes involved in the transcription and translation process exhibited faster evolution in the Dekkera yeasts. We detected 90 positively selected genes, suggesting that the Dekkera yeasts evolved an efficient translation system to facilitate adaptive evolution. Moreover, we identified that 12 vacuolar H+-ATPase (V-ATPase) function genes that were under positive selection, which assists in developing tolerance to high alcohol and high sugar stress. We also revealed that the enzyme PGK1 is responsible for the increased rate of glycolysis in the Dekkera yeasts. These results provide important insights to understand the independent adaptive evolution of the Dekkera yeasts and provide tools for genetic modification promoting industrial usage. PMID:27152421

  18. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer.

    PubMed

    Fujimoto, Akihiro; Furuta, Mayuko; Totoki, Yasushi; Tsunoda, Tatsuhiko; Kato, Mamoru; Shiraishi, Yuichi; Tanaka, Hiroko; Taniguchi, Hiroaki; Kawakami, Yoshiiku; Ueno, Masaki; Gotoh, Kunihito; Ariizumi, Shun-Ichi; Wardell, Christopher P; Hayami, Shinya; Nakamura, Toru; Aikata, Hiroshi; Arihiro, Koji; Boroevich, Keith A; Abe, Tetsuo; Nakano, Kaoru; Maejima, Kazuhiro; Sasaki-Oku, Aya; Ohsawa, Ayako; Shibuya, Tetsuo; Nakamura, Hiromi; Hama, Natsuko; Hosoda, Fumie; Arai, Yasuhito; Ohashi, Shoko; Urushidate, Tomoko; Nagae, Genta; Yamamoto, Shogo; Ueda, Hiroki; Tatsuno, Kenji; Ojima, Hidenori; Hiraoka, Nobuyoshi; Okusaka, Takuji; Kubo, Michiaki; Marubashi, Shigeru; Yamada, Terumasa; Hirano, Satoshi; Yamamoto, Masakazu; Ohdan, Hideki; Shimada, Kazuaki; Ishikawa, Osamu; Yamaue, Hiroki; Chayama, Kazuki; Miyano, Satoru; Aburatani, Hiroyuki; Shibata, Tatsuhiro; Nakagawa, Hidewaki

    2016-05-01

    Liver cancer, which is most often associated with virus infection, is prevalent worldwide, and its underlying etiology and genomic structure are heterogeneous. Here we provide a whole-genome landscape of somatic alterations in 300 liver cancers from Japanese individuals. Our comprehensive analysis identified point mutations, structural variations (STVs), and virus integrations, in noncoding and coding regions. We discovered mutational signatures related to liver carcinogenesis and recurrently mutated coding and noncoding regions, such as long intergenic noncoding RNA genes (NEAT1 and MALAT1), promoters, CTCF-binding sites, and regulatory regions. STV analysis found a significant association with replication timing and identified known (CDKN2A, CCND1, APC, and TERT) and new (ASH1L, NCOR1, and MACROD2) cancer-related genes that were recurrently affected by STVs, leading to altered expression. These results emphasize the value of whole-genome sequencing analysis in discovering cancer driver mutations and understanding comprehensive molecular profiles of liver cancer, especially with regard to STVs and noncoding mutations. PMID:27064257

  19. Parallel single cancer cell whole genome amplification using button-valve assisted mixing in nanoliter chambers.

    PubMed

    Yang, Yoonsun; Swennenhuis, Joost F; Rho, Hoon Suk; Le Gac, Séverine; Terstappen, Leon W M M

    2014-01-01

    The heterogeneity of tumor cells and their alteration during the course of the disease urges the need for real time characterization of individual tumor cells to improve the assessment of treatment options. New generations of therapies are frequently associated with specific genetic alterations driving the need to determine the genetic makeup of tumor cells. Here, we present a microfluidic device for parallel single cell whole genome amplification (pscWGA) to obtain enough copies of a single cell genome to probe for the presence of treatment targets and the frequency of its occurrence among the tumor cells. Individual cells were first captured and loaded into eight parallel amplification units. Next, cells were lysed on a chip and their DNA amplified through successive introduction of dedicated reagents while mixing actively with the help of integrated button-valves. The reaction chamber volume for scWGA 23.85 nl, and starting from 6-7 pg DNA contained in a single cell, around 8 ng of DNA was obtained after WGA, representing over 1000-fold amplification. The amplified products from individual breast cancer cells were collected from the device to either directly investigate the amplification of specific genes by qPCR or for re-amplification of the DNA to obtain sufficient material for whole genome sequencing. Our pscWGA device provides sufficient DNA from individual cells for their genetic characterization, and will undoubtedly allow for automated sample preparation for single cancer cell genomic characterization. PMID:25233459

  20. Parallel Single Cancer Cell Whole Genome Amplification Using Button-Valve Assisted Mixing in Nanoliter Chambers

    PubMed Central

    Yang, Yoonsun; Swennenhuis, Joost F.; Rho, Hoon Suk; Le Gac, Séverine; Terstappen, Leon W. M. M.

    2014-01-01

    The heterogeneity of tumor cells and their alteration during the course of the disease urges the need for real time characterization of individual tumor cells to improve the assessment of treatment options. New generations of therapies are frequently associated with specific genetic alterations driving the need to determine the genetic makeup of tumor cells. Here, we present a microfluidic device for parallel single cell whole genome amplification (pscWGA) to obtain enough copies of a single cell genome to probe for the presence of treatment targets and the frequency of its occurrence among the tumor cells. Individual cells were first captured and loaded into eight parallel amplification units. Next, cells were lysed on a chip and their DNA amplified through successive introduction of dedicated reagents while mixing actively with the help of integrated button-valves. The reaction chamber volume for scWGA 23.85 nl, and starting from 6–7 pg DNA contained in a single cell, around 8 ng of DNA was obtained after WGA, representing over 1000-fold amplification. The amplified products from individual breast cancer cells were collected from the device to either directly investigate the amplification of specific genes by qPCR or for re-amplification of the DNA to obtain sufficient material for whole genome sequencing. Our pscWGA device provides sufficient DNA from individual cells for their genetic characterization, and will undoubtedly allow for automated sample preparation for single cancer cell genomic characterization. PMID:25233459

  1. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes

    PubMed Central

    Gardner, Shea N.; Jaing, Crystal J.; Elsheikh, Maher M.; Peña, José; Hysom, David A.; Borucki, Monica K.

    2014-01-01

    Background. Targeted enrichment improves coverage of highly mutable viruses at low concentration in complex samples. Degenerate primers that anneal to conserved regions can facilitate amplification of divergent, low concentration variants, even when the strain present is unknown. Results. A tool for designing multiplex sets of degenerate sequencing primers to tile overlapping amplicons across multiple whole genomes is described. The new script, run_tiled_primers, is part of the PriMux software. Primers were designed for each segment of South American hemorrhagic fever viruses, tick-borne encephalitis, Henipaviruses, Arenaviruses, Filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus, and Japanese encephalitis virus. Each group is highly diverse with as little as 5% genome consensus. Primer sets were computationally checked for nontarget cross reactions against the NCBI nucleotide sequence database. Primers for murine hepatitis virus were demonstrated in the lab to specifically amplify selected genes from a laboratory cultured strain that had undergone extensive passage in vitro and in vivo. Conclusions. This software should help researchers design multiplex sets of primers for targeted whole genome enrichment prior to sequencing to obtain better coverage of low titer, divergent viruses. Applications include viral discovery from a complex background and improved sensitivity and coverage of rapidly evolving strains or variants in a gene family. PMID:25157264

  2. Whole-Genome Mapping as a Novel High-Resolution Typing Tool for Legionella pneumophila

    PubMed Central

    Euser, Sjoerd M.; Landman, Fabian; Bruin, Jacob P.; IJzerman, Ed P.; den Boer, Jeroen W.; Schouls, Leo M.

    2015-01-01

    Legionella is the causative agent for Legionnaires' disease (LD) and is responsible for several large outbreaks in the world. More than 90% of LD cases are caused by Legionella pneumophila, and studies on the origin and transmission routes of this pathogen rely on adequate molecular characterization of isolates. Current typing of L. pneumophila mainly depends on sequence-based typing (SBT). However, studies have shown that in some outbreak situations, SBT does not have sufficient discriminatory power to distinguish between related and nonrelated L. pneumophila isolates. In this study, we used a novel high-resolution typing technique, called whole-genome mapping (WGM), to differentiate between epidemiologically related and nonrelated L. pneumophila isolates. Assessment of the method by various validation experiments showed highly reproducible results, and WGM was able to confirm two well-documented Dutch L. pneumophila outbreaks. Comparison of whole-genome maps of the two outbreaks together with WGMs of epidemiologically nonrelated L. pneumophila isolates showed major differences between the maps, and WGM yielded a higher discriminatory power than SBT. In conclusion, WGM can be a valuable alternative to perform outbreak investigations of L. pneumophila in real time since the turnaround time from culture to comparison of the L. pneumophila maps is less than 24 h. PMID:26202110

  3. Targeted or whole genome sequencing of formalin fixed tissue samples: potential applications in cancer genomics.

    PubMed

    Munchel, Sarah; Hoang, Yen; Zhao, Yue; Cottrell, Joseph; Klotzle, Brandy; Godwin, Andrew K; Koestler, Devin; Beyerlein, Peter; Fan, Jian-Bing; Bibikova, Marina; Chien, Jeremy

    2015-09-22

    Current genomic studies are limited by the poor availability of fresh-frozen tissue samples. Although formalin-fixed diagnostic samples are in abundance, they are seldom used in current genomic studies because of the concern of formalin-fixation artifacts. Better characterization of these artifacts will allow the use of archived clinical specimens in translational and clinical research studies. To provide a systematic analysis of formalin-fixation artifacts on Illumina sequencing, we generated 26 DNA sequencing data sets from 13 pairs of matched formalin-fixed paraffin-embedded (FFPE) and fresh-frozen (FF) tissue samples. The results indicate high rate of concordant calls between matched FF/FFPE pairs at reference and variant positions in three commonly used sequencing approaches (whole genome, whole exome, and targeted exon sequencing). Global mismatch rates and C · G > T · A substitutions were comparable between matched FF/FFPE samples, and discordant rates were low (<0.26%) in all samples. Finally, low-pass whole genome sequencing produces similar pattern of copy number alterations between FF/FFPE pairs. The results from our studies suggest the potential use of diagnostic FFPE samples for cancer genomic studies to characterize and catalog variations in cancer genomes. PMID:26305677

  4. Computel: Computation of Mean Telomere Length from Whole-Genome Next-Generation Sequencing Data

    PubMed Central

    Nersisyan, Lilit; Arakelyan, Arsen

    2015-01-01

    Telomeres are the ends of eukaryotic chromosomes, consisting of consecutive short repeats that protect chromosome ends from degradation. Telomeres shorten with each cell division, leading to replicative cell senescence. Deregulation of telomere length homeostasis is associated with the development of various age-related diseases and cancers. A number of experimental techniques exist for telomere length measurement; however, until recently, the absence of tools for extracting telomere lengths from high-throughput sequencing data has significantly obscured the association of telomere length with molecular processes in normal and diseased conditions. We have developed Computel, a program in R for computing mean telomere length from whole-genome next-generation sequencing data. Computel is open source, and is freely available at https://github.com/lilit-nersisyan/computel. It utilizes a short-read alignment-based approach and integrates various popular tools for sequencing data analysis. We validated it with synthetic and experimental data, and compared its performance with the previously available software. The results have shown that Computel outperforms existing software in accuracy, independence of results from sequencing conditions, stability against inherent sequencing errors, and better ability to distinguish pure telomeric sequences from interstitial telomeric repeats. By providing a highly reliable methodology for determining telomere lengths from whole-genome sequencing data, Computel should help to elucidate the role of telomeres in cellular health and disease. PMID:25923330

  5. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    PubMed Central

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  6. A comparison of RNA-seq and exon arrays for whole genome transcription profiling of the L5 spinal nerve transection model of neuropathic pain in the rat

    PubMed Central

    2014-01-01

    Background The past decade has seen an abundance of transcriptional profiling studies of preclinical models of persistent pain, predominantly employing microarray technology. In this study we directly compare exon microarrays to RNA-seq and investigate the ability of both platforms to detect differentially expressed genes following nerve injury using the L5 spinal nerve transection model of neuropathic pain. We also investigate the effects of increasing RNA-seq sequencing depth. Finally we take advantage of the “agnostic” approach of RNA-seq to discover areas of expression outside of annotated exons that show marked changes in expression following nerve injury. Results RNA-seq and microarrays largely agree in terms of the genes called as differentially expressed. However, RNA-seq is able to interrogate a much larger proportion of the genome. It can also detect a greater number of differentially expressed genes than microarrays, across a wider range of fold changes and is able to assign a larger range of expression values to the genes it measures. The number of differentially expressed genes detected increases with sequencing depth. RNA-seq also allows the discovery of a number of genes displaying unusual and interesting patterns of non-exonic expression following nerve injury, an effect that cannot be detected using microarrays. Conclusion We recommend the use of RNA-seq for future high-throughput transcriptomic experiments in pain studies. RNA-seq allowed the identification of a larger number of putative candidate pain genes than microarrays and can also detect a wider range of expression values in a neuropathic pain model. In addition, RNA-seq can interrogate the whole genome regardless of prior annotations, being able to detect transcription from areas of the genome not currently annotated as exons. Some of these areas are differentially expressed following nerve injury, and may represent novel genes or isoforms. We also recommend the use of a high

  7. Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach.

    PubMed

    Liang, Muxuan; Li, Zhizhong; Chen, Ting; Zeng, Jianyang

    2015-01-01

    Identification of cancer subtypes plays an important role in revealing useful insights into disease pathogenesis and advancing personalized therapy. The recent development of high-throughput sequencing technologies has enabled the rapid collection of multi-platform genomic data (e.g., gene expression, miRNA expression, and DNA methylation) for the same set of tumor samples. Although numerous integrative clustering approaches have been developed to analyze cancer data, few of them are particularly designed to exploit both deep intrinsic statistical properties of each input modality and complex cross-modality correlations among multi-platform input data. In this paper, we propose a new machine learning model, called multimodal deep belief network (DBN), to cluster cancer patients from multi-platform observation data. In our integrative clustering framework, relationships among inherent features of each single modality are first encoded into multiple layers of hidden variables, and then a joint latent model is employed to fuse common features derived from multiple input modalities. A practical learning algorithm, called contrastive divergence (CD), is applied to infer the parameters of our multimodal DBN model in an unsupervised manner. Tests on two available cancer datasets show that our integrative data analysis approach can effectively extract a unified representation of latent features to capture both intra- and cross-modality correlations, and identify meaningful disease subtypes from multi-platform cancer data. In addition, our approach can identify key genes and miRNAs that may play distinct roles in the pathogenesis of different cancer subtypes. Among those key miRNAs, we found that the expression level of miR-29a is highly correlated with survival time in ovarian cancer patients. These results indicate that our multimodal DBN based data analysis approach may have practical applications in cancer pathogenesis studies and provide useful guidelines for

  8. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions

    PubMed Central

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. PMID:26989155

  9. dbWGFP: a database and web server of human whole-genome single nucleotide variants and their functional predictions.

    PubMed

    Wu, Jiaxin; Wu, Mengmeng; Li, Lianshuo; Liu, Zhuo; Zeng, Wanwen; Jiang, Rui

    2016-01-01

    The recent advancement of the next generation sequencing technology has enabled the fast and low-cost detection of all genetic variants spreading across the entire human genome, making the application of whole-genome sequencing a tendency in the study of disease-causing genetic variants. Nevertheless, there still lacks a repository that collects predictions of functionally damaging effects of human genetic variants, though it has been well recognized that such predictions play a central role in the analysis of whole-genome sequencing data. To fill this gap, we developed a database named dbWGFP (a database and web server of human whole-genome single nucleotide variants and their functional predictions) that contains functional predictions and annotations of nearly 8.58 billion possible human whole-genome single nucleotide variants. Specifically, this database integrates 48 functional predictions calculated by 17 popular computational methods and 44 valuable annotations obtained from various data sources. Standalone software, user-friendly query services and free downloads of this database are available at http://bioinfo.au.tsinghua.edu.cn/dbwgfp. dbWGFP provides a valuable resource for the analysis of whole-genome sequencing, exome sequencing and SNP array data, thereby complementing existing data sources and computational resources in deciphering genetic bases of human inherited diseases. PMID:26989155

  10. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications.

    PubMed

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U B; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world's first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of 'mono' repeat (76.82%) over 'di' repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait of

  11. SBMDb: first whole genome putative microsatellite DNA marker database of sugarbeet for bioenergy and industrial applications

    PubMed Central

    Iquebal, Mir Asif; Jaiswal, Sarika; Angadi, U.B.; Sablok, Gaurav; Arora, Vasu; Kumar, Sunil; Rai, Anil; Kumar, Dinesh

    2015-01-01

    DNA marker plays important role as valuable tools to increase crop productivity by finding plausible answers to genetic variations and linking the Quantitative Trait Loci (QTL) of beneficial trait. Prior approaches in development of Short Tandem Repeats (STR) markers were time consuming and inefficient. Recent methods invoking the development of STR markers using whole genomic or transcriptomics data has gained wide importance with immense potential in developing breeding and cultivator improvement approaches. Availability of whole genome sequences and in silico approaches has revolutionized bulk marker discovery. We report world’s first sugarbeet whole genome marker discovery having 145 K markers along with 5 K functional domain markers unified in common platform using MySQL, Apache and PHP in SBMDb. Embedded markers and corresponding location information can be selected for desired chromosome, location/interval and primers can be generated using Primer3 core, integrated at backend. Our analyses revealed abundance of ‘mono’ repeat (76.82%) over ‘di’ repeats (13.68%). Highest density (671.05 markers/Mb) was found in chromosome 1 and lowest density (341.27 markers/Mb) in chromosome 6. Current investigation of sugarbeet genome marker density has direct implications in increasing mapping marker density. This will enable present linkage map having marker distance of ∼2 cM, i.e. from 200 to 2.6 Kb, thus facilitating QTL/gene mapping. We also report e-PCR-based detection of 2027 polymorphic markers in panel of five genotypes. These markers can be used for DUS test of variety identification and MAS/GAS in variety improvement program. The present database presents wide source of potential markers for developing and implementing new approaches for molecular breeding required to accelerate industrious use of this crop, especially for sugar, health care products, medicines and color dye. Identified markers will also help in improvement of bioenergy trait

  12. Use of Whole Genome Sequencing and Patient Interviews To Link a Case of Sporadic Listeriosis to Consumption of Prepackaged Lettuce.

    PubMed

    Jackson, K A; Stroika, S; Katz, L S; Beal, J; Brandt, E; Nadon, C; Reimer, A; Major, B; Conrad, A; Tarr, C; Jackson, B R; Mody, R K

    2016-05-01

    We report on a case of listeriosis in a patient who probably consumed a prepackaged romaine lettuce-containing product recalled for Listeria monocytogenes contamination. Although definitive epidemiological information demonstrating exposure to the specific recalled product was lacking, the patient reported consumption of a prepackaged romaine lettuce-containing product of either the recalled brand or a different brand. A multinational investigation found that patient and food isolates from the recalled product were indistinguishable by pulsed-field gel electrophoresis and were highly related by whole genome sequencing, differing by four alleles by whole genome multilocus sequence typing and by five high-quality single nucleotide polymorphisms, suggesting a common source. To our knowledge, this is the first time prepackaged lettuce has been identified as a likely source for listeriosis. This investigation highlights the power of whole genome sequencing, as well as the continued need for timely and thorough epidemiological exposure data to identify sources of foodborne infections. PMID:27296429

  13. Whole genome association studies in complex diseases: where do we stand?

    PubMed Central

    Need, Anna C.; Goldstein, David B.

    2010-01-01

    Hundreds of genome-wide association studies have been performed in recent years in order to try to identify common variants that associate with complex disease. These have met with varying success. Some of the strongest effects of common variants have been found in lateonset diseases and in drug response. The major histocompatibility complex has also shown very strong association with a variety of disorders. Although there have been some notable success stories in neuropsychiatric genetics, on the whole, common variation has explained little of the high heritability of these traits. In contrast, early studies of rare copy number variants have led rapidly to a number of genes and loci that strongly associate with neuropsychiatric disorders. It is likely that the use of whole-genome sequencing to extend the study of rare variation in neuropsychiatry will greatly advance our understanding of neuropsychiatric genetics. PMID:20373665

  14. Developmental timing of mutations revealed by whole-genome sequencing of twins with acute lymphoblastic leukemia.

    PubMed

    Ma, Yussanne; Dobbins, Sara E; Sherborne, Amy L; Chubb, Daniel; Galbiati, Marta; Cazzaniga, Giovanni; Micalizzi, Concetta; Tearle, Rick; Lloyd, Amy L; Hain, Richard; Greaves, Mel; Houlston, Richard S

    2013-04-30

    Acute lymphoblastic leukemia (ALL) is the major pediatric cancer. At diagnosis, the developmental timing of mutations contributing critically to clonal diversification and selection can be buried in the leukemia's covert natural history. Concordance of ALL in monozygotic, monochorionic twins is a consequence of intraplacental spread of an initiated preleukemic clone. Studying monozygotic twins with ALL provides a unique means of uncovering the timeline of mutations contributing to clonal evolution, pre- and postnatally. We sequenced the whole genomes of leukemic cells from two twin pairs with ALL to comprehensively characterize acquired somatic mutations in ALL, elucidating the developmental timing of all genetic lesions. Shared, prenatal, coding-region single-nucleotide variants were limited to the putative initiating lesions. All other nonsynonymous single-nucleotide variants were distinct between tumors and, therefore, secondary and postnatal. These changes occurred in a background of noncoding mutational changes that were almost entirely discordant in twin pairs and likely passenger mutations acquired during leukemic cell proliferation. PMID:23569245

  15. Whole-genome sequence comparisons reveal the evolution of Vibrio cholerae O1.

    PubMed

    Kim, Eun Jin; Lee, Chan Hee; Nair, G Balakrish; Kim, Dong Wook

    2015-08-01

    The analysis of the whole-genome sequences of Vibrio cholerae strains from previous and current cholera pandemics has demonstrated that genomic changes and alterations in phage CTX (particularly in the gene encoding the B subunit of cholera toxin) were major features in the evolution of V. cholerae. Recent studies have revealed the genetic mechanisms in these bacteria by which new variants of V. cholerae are generated from type-specific strains; these mechanisms suggest that certain strains are selected by environmental or human factors over time. By understanding the mechanisms and driving forces of historical and current changes in the V. cholerae population, it would be possible to predict the direction of such changes and the evolution of new variants; this has implications for the battle against cholera. PMID:25913612

  16. Integration of whole-genome sequencing into infection control practices: the potential and the hurdles.

    PubMed

    Robilotti, Elizabeth; Kamboj, Mini

    2015-04-01

    Microbial whole-genome sequencing (WGS) is poised to transform many of the currently used approaches in medical microbiology. Recent reports on the application of WGS to understand genetic evolution and reconstruct transmission pathways have provided valuable information that will influence infection control practices. While this technology holds great promise, obstacles to full implementation remain. Two articles in this issue of the Journal of Clinical Microbiology (S. Octavia, Q. Wang, M. M. Tanaka, S. Kaur, V. Sintchenko, and R. Lan, J Clin Microbiol 53:1063-1071, 2015, doi:10.1128/JCM.03235-14, and S. J. Salipante, D. J. SenGupta, L. A. Cummings, T. A. Land, D. R. Hoogestraat, and B. T. Cookson, J Clin Microbiol 53:1072-1079, 2015, doi:10.1128/JCM.03385-14) describe the breadth of application of WGS to the field of clinical epidemiology. PMID:25673795

  17. Whole-genome sequencing of giant pandas provides insights into demographic history and local adaptation.

    PubMed

    Zhao, Shancen; Zheng, Pingping; Dong, Shanshan; Zhan, Xiangjiang; Wu, Qi; Guo, Xiaosen; Hu, Yibo; He, Weiming; Zhang, Shanning; Fan, Wei; Zhu, Lifeng; Li, Dong; Zhang, Xuemei; Chen, Quan; Zhang, Hemin; Zhang, Zhihe; Jin, Xuelin; Zhang, Jinguo; Yang, Huanming; Wang, Jian; Wang, Jun; Wei, Fuwen

    2013-01-01

    The panda lineage dates back to the late Miocene and ultimately leads to only one extant species, the giant panda (Ailuropoda melanoleuca). Although global climate change and anthropogenic disturbances are recognized to shape animal population demography their contribution to panda population dynamics remains largely unknown. We sequenced the whole genomes of 34 pandas at an average 4.7-fold coverage and used this data set together with the previously deep-sequenced panda genome to reconstruct a continuous demographic history of pandas from their origin to the present. We identify two population expansions, two bottlenecks and two divergences. Evidence indicated that, whereas global changes in climate were the primary drivers of population fluctuation for millions of years, human activities likely underlie recent population divergence and serious decline. We identified three distinct panda populations that show genetic adaptation to their environments. However, in all three populations, anthropogenic activities have negatively affected pandas for 3,000 years. PMID:23242367

  18. Developing insights into the mechanisms of evolution of bacterial pathogens from whole-genome sequences

    PubMed Central

    Bentley, Stephen D

    2014-01-01

    Evolution of bacterial pathogen populations has been detected in a variety of ways including phenotypic tests, such as metabolic activity, reaction to antisera and drug resistance and genotypic tests that measure variation in chromosome structure, repetitive loci and individual gene sequences. While informative, these methods only capture a small subset of the total variation and, therefore, have limited resolution. Advances in sequencing technologies have made it feasible to capture whole-genome sequence variation for each sample under study, providing the potential to detect all changes at all positions in the genome from single nucleotide changes to large-scale insertions and deletions. In this review, we focus on recent work that has applied this powerful new approach and summarize some of the advances that this has brought in our understanding of the details of how bacterial pathogens evolve. PMID:23075447

  19. Clostridium botulinum Group II Isolate Phylogenomic Profiling Using Whole-Genome Sequence Data.

    PubMed

    Weedmark, K A; Mabon, P; Hayden, K L; Lambert, D; Van Domselaar, G; Austin, J W; Corbett, C R

    2015-09-01

    Clostridium botulinum group II isolates (n = 163) from different geographic regions, outbreaks, and neurotoxin types and subtypes were characterized in silico using whole-genome sequence data. Two clusters representing a variety of botulinum neurotoxin (BoNT) types and subtypes were identified by multilocus sequence typing (MLST) and core single nucleotide polymorphism (SNP) analysis. While one cluster included BoNT/B4/F6/E9 and nontoxigenic members, the other comprised a wide variety of different BoNT/E subtype isolates and a nontoxigenic strain. In silico MLST and core SNP methods were consistent in terms of clade-level isolate classification; however, core SNP analysis showed higher resolution capability. Furthermore, core SNP analysis correctly distinguished isolates by outbreak and location. This study illustrated the utility of next-generation sequence-based typing approaches for isolate characterization and source attribution and identified discrete SNP loci and MLST alleles for isolate comparison. PMID:26116673

  20. Whole genome sequencing of emerging multidrug resistant Candida auris isolates in India demonstrates low genetic variation.

    PubMed

    Sharma, C; Kumar, N; Pandey, R; Meis, J F; Chowdhary, A

    2016-09-01

    Candida auris is an emerging multidrug resistant yeast that causes nosocomial fungaemia and deep-seated infections. Notably, the emergence of this yeast is alarming as it exhibits resistance to azoles, amphotericin B and caspofungin, which may lead to clinical failure in patients. The multigene phylogeny and amplified fragment length polymorphism typing methods report the C. auris population as clonal. Here, using whole genome sequencing analysis, we decipher for the first time that C. auris strains from four Indian hospitals were highly related, suggesting clonal transmission. Further, all C. auris isolates originated from cases of fungaemia and were resistant to fluconazole (MIC >64 mg/L). PMID:27617098

  1. Whole-Genome Sequencing to Determine Origin of Multinational Outbreak of Sarocladium kiliense Bloodstream Infections

    PubMed Central

    Roe, Chandler C.; Smith, Rachel M.; Vallabhaneni, Snigdha; Duarte, Carolina; Escandón, Patricia; Castañeda, Elizabeth; Gómez, Beatriz L.; de Bedout, Catalina; López, Luisa F.; Salas, Valentina; Hederra, Luz Maria; Fernández, Jorge; Pidal, Paola; Hormazabel, Juan Carlos; Otaíza-O’Ryan, Fernando; Vannberg, Fredrik O.; Gillece, John; Lemmer, Darrin; Driebe, Elizabeth M.; Engelthaler, David M.; Litvintseva, Anastasia P.

    2016-01-01

    We used whole-genome sequence typing (WGST) to investigate an outbreak of Sarocladium kiliense bloodstream infections (BSI) associated with receipt of contaminated antinausea medication among oncology patients in Colombia and Chile during 2013–2014. Twenty-five outbreak isolates (18 from patients and 7 from medication vials) and 11 control isolates unrelated to this outbreak were subjected to WGST to elucidate a source of infection. All outbreak isolates were nearly indistinguishable (<5 single-nucleotide polymorphisms), and >21,000 single-nucleotide polymorphisms were identified from unrelated control isolates, suggesting a point source for this outbreak. S. kiliense has been previously implicated in healthcare-related infections; however, the lack of available typing methods has precluded the ability to substantiate point sources. WGST for outbreak investigation caused by eukaryotic pathogens without reference genomes or existing genotyping methods enables accurate source identification to guide implementation of appropriate control and prevention measures. PMID:26891230

  2. Bioinformatics Workflow for Clinical Whole Genome Sequencing at Partners HealthCare Personalized Medicine

    PubMed Central

    Tsai, Ellen A.; Shakbatyan, Rimma; Evans, Jason; Rossetti, Peter; Graham, Chet; Sharma, Himanshu; Lin, Chiao-Feng; Lebo, Matthew S.

    2016-01-01

    Effective implementation of precision medicine will be enhanced by a thorough understanding of each patient’s genetic composition to better treat his or her presenting symptoms or mitigate the onset of disease. This ideally includes the sequence information of a complete genome for each individual. At Partners HealthCare Personalized Medicine, we have developed a clinical process for whole genome sequencing (WGS) with application in both healthy individuals and those with disease. In this manuscript, we will describe our bioinformatics strategy to efficiently process and deliver genomic data to geneticists for clinical interpretation. We describe the handling of data from FASTQ to the final variant list for clinical review for the final report. We will also discuss our methodology for validating this workflow and the cost implications of running WGS. PMID:26927186

  3. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

    PubMed Central

    Alioto, Tyler S.; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D.; Hovig, Eivind; Heisler, Lawrence E.; Beck, Timothy A.; Simpson, Jared T.; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S.; Butler, Adam P.; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W.; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C.; Gut, Marta; Denroche, Robert E.; Harding, Nicholas J.; Yamaguchi, Takafumi N.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G.; Anderson, Charlotte L.; Waddell, Nicola; Pearson, John V.; Grimmond, Sean M.; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A.; López-Otín, Carlos; Campo, Elías; Campbell, Peter J.; Boutros, Paul C.; Puente, Xose S.; Gerhard, Daniela S.; Pfister, Stefan M.; McPherson, John D.; Hudson, Thomas J.; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T. W.; Gut, Ivo G.

    2015-01-01

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970

  4. Real time application of whole genome sequencing for outbreak investigation - What is an achievable turnaround time?

    PubMed

    McGann, Patrick; Bunin, Jessica L; Snesrud, Erik; Singh, Seema; Maybank, Rosslyn; Ong, Ana C; Kwak, Yoon I; Seronello, Scott; Clifford, Robert J; Hinkle, Mary; Yamada, Stephen; Barnhill, Jason; Lesho, Emil

    2016-07-01

    Whole genome sequencing (WGS) is increasingly employed in clinical settings, though few assessments of turnaround times (TAT) have been performed in real-time. In this study, WGS was used to investigate an unfolding outbreak of vancomycin resistant Enterococcus faecium (VRE) among 3 patients in the ICU of a tertiary care hospital. Including overnight culturing, a TAT of just 48.5 h for a comprehensive report was achievable using an Illumina Miseq benchtop sequencer. WGS revealed that isolates from patient 2 and 3 differed from that of patient 1 by a single nucleotide polymorphism (SNP), indicating nosocomial transmission. However, the unparalleled resolution provided by WGS suggested that nosocomial transmission involved two separate events from patient 1 to patient 2 and 3, and not a linear transmission suspected by the time line. Rapid TAT's are achievable using WGS in the clinical setting and can provide an unprecedented level of resolution for outbreak investigations. PMID:27185645

  5. Whole genome sequencing reveals extensive community-level transmission of group A Streptococcus in remote communities.

    PubMed

    Bowen, A C; Harris, T; Holt, D C; Giffard, P M; Carapetis, J R; Campbell, P T; McVERNON, J; Tong, S Y C

    2016-07-01

    Impetigo is common in remote Indigenous children of northern Australia, with the primary driver in this context being Streptococcus pyogenes [or group A Streptococcus (GAS)]. To reduce the high burden of impetigo, the transmission dynamics of GAS must be more clearly elucidated. We performed whole genome sequencing on 31 GAS isolates collected in a single community from children in 11 households with ⩾2 GAS-infected children. We aimed to determine whether transmission was occurring principally within households or across the community. The 31 isolates were represented by nine multilocus sequence types and isolates within each sequence type differed from one another by only 0-3 single nucleotide polymorphisms. There was evidence of extensive transmission both within households and across the community. Our findings suggest that strategies to reduce the burden of impetigo in this setting will need to extend beyond individual households, and incorporate multi-faceted, community-wide approaches. PMID:26833141

  6. Beyond race: towards a whole-genome perspective on human populations and genetic variation.

    PubMed

    Foster, Morris W; Sharp, Richard R

    2004-10-01

    The renewed emphasis on population-specific genetic variation, exemplified most prominently by the International HapMap Project, is complicated by a longstanding, uncritical reliance on existing population categories in genetic research. Race and other pre-existing population definitions (ethnicity, religion, language, nationality, culture and so on) tend to be contentious concepts that have polarized discussions about the ethics and science of research into population-specific human genetic variation. By contrast, a broader consideration of the multiple historical sources of genetic variation provides a whole-genome perspective on the ways i n which existing population definitions do, and do not, account for how genetic variation is distributed among individuals. Although genetics will continue to rely on analytical tools that make use of particular population histories, it is important to interpret findings in a broader genomic context. PMID:15510170

  7. Accurate whole genome sequencing and haplotyping from10-20 human cells

    PubMed Central

    Peters, Brock A.; Kermani, Bahram G.; Sparks, Andrew B.; Alferov, Oleg; Hong, Peter; Alexeev, Andrei; Jiang, Yuan; Dahl, Fredrik; Tang, Y. Tom; Haas, Juergen; Robasky, Kimberly; Zaranek, Alexander Wait; Lee, Je-Hyuk; Ball, Madeleine Price; Peterson, Joseph E.; Perazich, Helena; Yeung, George; Liu, Jia; Chen, Linsu; Kennemer, Michael I.; Pothuraju, Kaliprasad; Konvicka, Karel; Tsoupko-Sitnikov, Mike; Pant, Krishna P.; Ebert, Jessica C.; Nilsen, Geoffrey B.; Baccash, Jonathan; Halpern, Aaron L.; Church, George M.; Drmanac, Radoje

    2012-01-01

    Recent advances in whole genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, Long Fragment Read (LFR) technology, similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only ~100 pg of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants (SNVs) were assembled into long haplotype contigs. Removal of false positive SNVs not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 Mb. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications. PMID:22785314

  8. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data

    PubMed Central

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  9. Molecular etiology of an indolent lymphoproliferative disorder determined by whole-genome sequencing

    PubMed Central

    Parker, Jeremy D.K.; Shen, Yaoqing; Pleasance, Erin; Li, Yvonne; Schein, Jacqueline E.; Zhao, Yongjun; Moore, Richard; Wegrzyn-Woltosz, Joanna; Savage, Kerry J.; Weng, Andrew P.; Gascoyne, Randy D.; Jones, Steven; Marra, Marco; Laskin, Janessa; Karsan, Aly

    2016-01-01

    In an attempt to assess potential treatment options, whole-genome and transcriptome sequencing were performed on a patient with an unclassifiable small lymphoproliferative disorder. Variants from genome sequencing were prioritized using a combination of comparative variant distributions in a spectrum of lymphomas, and meta-analyses of gene expression profiling. In this patient, the molecular variants that we believe to be most relevant to the disease presentation most strongly resemble a diffuse large B-cell lymphoma (DLBCL), whereas the gene expression data are most consistent with a low-grade chronic lymphocytic leukemia (CLL). The variant of greatest interest was a predicted NOTCH2-truncating mutation, which has been recently reported in various lymphomas. PMID:27148583

  10. Genomic Epidemiology: Whole-Genome-Sequencing-Powered Surveillance and Outbreak Investigation of Foodborne Bacterial Pathogens.

    PubMed

    Deng, Xiangyu; den Bakker, Henk C; Hendriksen, Rene S

    2016-01-01

    As we are approaching the twentieth anniversary of PulseNet, a network of public health and regulatory laboratories that has changed the landscape of foodborne illness surveillance through molecular subtyping, public health microbiology is undergoing another transformation brought about by so-called next-generation sequencing (NGS) technologies that have made whole-genome sequencing (WGS) of foodborne bacterial pathogens a realistic and superior alternative to traditional subtyping methods. Routine, real-time, and widespread application of WGS in food safety and public health is on the horizon. Technological, operational, and policy challenges are still present and being addressed by an international and multidisciplinary community of researchers, public health practitioners, and other stakeholders. PMID:26772415

  11. Hepatitis C virus whole genome sequencing: Current methods/issues and future challenges.

    PubMed

    Trémeaux, Pauline; Caporossi, Alban; Thélu, Marie-Ange; Blum, Michael; Leroy, Vincent; Morand, Patrice; Larrat, Sylvie

    2016-10-01

    Therapy for hepatitis C is currently undergoing a revolution. The arrival of new antiviral agents targeting viral proteins reinforces the need for a better knowledge of the viral strains infecting each patient. Hepatitis C virus (HCV) whole genome sequencing provides essential information for precise typing, study of the viral natural history or identification of resistance-associated variants. First performed with Sanger sequencing, the arrival of next-generation sequencing (NGS) has simplified the technical process and provided more detailed data on the nature and evolution of viral quasi-species. We will review the different techniques used for HCV complete genome sequencing and their applications, both before and after the apparition of NGS. The progress brought by new and future technologies will also be discussed, as well as the remaining difficulties, largely due to the genomic variability. PMID:27068766

  12. A strategic stakeholder approach for addressing further analysis requests in whole genome sequencing research.

    PubMed

    Thornock, Bradley Steven O

    2016-01-01

    Whole genome sequencing (WGS) can be a cost-effective and efficient means of diagnosis for some children, but it also raises a number of ethical concerns. One such concern is how researchers derive and communicate results from WGS, including future requests for further analysis of stored sequences. The purpose of this paper is to think about what is at stake, and for whom, in any solution that is developed to deal with such requests. To accomplish this task, this paper will utilize stakeholder theory, a common method used in business ethics. Several scenarios that connect stakeholder concerns and WGS will also posited and analyzed. This paper concludes by developing criteria composed of a series of questions that researchers can answer in order to more effectively address requests for further analysis of stored sequences. PMID:27091475

  13. Advances in Understanding Bacterial Pathogenesis Gained from Whole-Genome Sequencing and Phylogenetics.

    PubMed

    Klemm, Elizabeth; Dougan, Gordon

    2016-05-11

    The development of next-generation sequencing as a cost-effective technology has facilitated the analysis of bacterial population structure at a whole-genome level and at scale. From these data, phylogenic trees have been constructed that define population structures at a local, national, and global level, providing a framework for genetic analysis. Although still at an early stage, these approaches have yielded progress in several areas, including pathogen transmission mapping, the genetics of niche colonization and host adaptation, as well as gene-to-phenotype association studies. Antibiotic resistance has proven to be a major challenge in the early 21(st) century, and phylogenetic analyses have uncovered the dramatic effect that the use of antibiotics has had on shaping bacterial population structures. An update on insights into bacterial evolution from comparative genomics is provided in this review. PMID:27173928

  14. CVTree: a Whole-Genome and Alignment-Free Approach to Microbial Phylogeny

    NASA Astrophysics Data System (ADS)

    Hao, Bailin

    The number of sequenced genomes of Archaea, Bacteria, and Fungi accumulates rapidly. Several thousands genomes of these unicellular organisms will be available in a few years. Due to the extremely large difference in genome size and gene content it is difficult to use the traditional alignment-based method to infer phylogeny from the genomes. An alignment-free and whole-genome-based approach called CVTree has been developed and successfully applied to these organisms. As CVTree has been successfully applied to genomes of viruses, chloroplasts, Bacteria, Archaea and fungi, in this brief review we will mainly touch on some mathematical problems related to the foundation of the new approach, including a few yet unsolved problems, such as the violation of the triangular inequalities of the dissimilarity measure used in the CVTree method.

  15. Clostridium botulinum Group II Isolate Phylogenomic Profiling Using Whole-Genome Sequence Data

    PubMed Central

    Weedmark, K. A.; Mabon, P.; Hayden, K. L.; Lambert, D.; Van Domselaar, G.; Austin, J. W.

    2015-01-01

    Clostridium botulinum group II isolates (n = 163) from different geographic regions, outbreaks, and neurotoxin types and subtypes were characterized in silico using whole-genome sequence data. Two clusters representing a variety of botulinum neurotoxin (BoNT) types and subtypes were identified by multilocus sequence typing (MLST) and core single nucleotide polymorphism (SNP) analysis. While one cluster included BoNT/B4/F6/E9 and nontoxigenic members, the other comprised a wide variety of different BoNT/E subtype isolates and a nontoxigenic strain. In silico MLST and core SNP methods were consistent in terms of clade-level isolate classification; however, core SNP analysis showed higher resolution capability. Furthermore, core SNP analysis correctly distinguished isolates by outbreak and location. This study illustrated the utility of next-generation sequence-based typing approaches for isolate characterization and source attribution and identified discrete SNP loci and MLST alleles for isolate comparison. PMID:26116673

  16. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome.

    PubMed

    Chapman, Jarrod A; Mascher, Martin; Buluç, Aydın; Barry, Kerrie; Georganas, Evangelos; Session, Adam; Strnadova, Veronika; Jenkins, Jerry; Sehgal, Sunish; Oliker, Leonid; Schmutz, Jeremy; Yelick, Katherine A; Scholz, Uwe; Waugh, Robbie; Poland, Jesse A; Muehlbauer, Gary J; Stein, Nils; Rokhsar, Daniel S

    2015-01-01

    Polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population. PMID:25637298

  17. Clinical decision support for whole genome sequence information leveraging a service-oriented architecture: a prototype.

    PubMed

    Welch, Brandon M; Rodriguez-Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information could soon be routinely available to clinicians to support the personalized care of their patients. At such time, clinical decision support (CDS) integrated into the clinical workflow will likely be necessary to support genome-guided clinical care. Nevertheless, developing CDS capabilities for WGS information presents many unique challenges that need to be overcome for such approaches to be effective. In this manuscript, we describe the development of a prototype CDS system that is capable of providing genome-guided CDS at the point of care and within the clinical workflow. To demonstrate the functionality of this prototype, we implemented a clinical scenario of a hypothetical patient at high risk for Lynch Syndrome based on his genomic information. We demonstrate that this system can effectively use service-oriented architecture principles and standards-based components to deliver point of care CDS for WGS information in real-time. PMID:25954430

  18. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation.

    PubMed

    Michaelson, Jacob J; Shi, Yujian; Gujral, Madhusudan; Zheng, Hancheng; Malhotra, Dheeraj; Jin, Xin; Jian, Minghan; Liu, Guangming; Greer, Douglas; Bhandari, Abhishek; Wu, Wenting; Corominas, Roser; Peoples, Aine; Koren, Amnon; Gore, Athurva; Kang, Shuli; Lin, Guan Ning; Estabillo, Jasper; Gadomski, Therese; Singh, Balvindar; Zhang, Kun; Akshoomoff, Natacha; Corsello, Christina; McCarroll, Steven; Iakoucheva, Lilia M; Li, Yingrui; Wang, Jun; Sebat, Jonathan

    2012-12-21

    De novo mutation plays an important role in autism spectrum disorders (ASDs). Notably, pathogenic copy number variants (CNVs) are characterized by high mutation rates. We hypothesize that hypermutability is a property of ASD genes and may also include nucleotide-substitution hot spots. We investigated global patterns of germline mutation by whole-genome sequencing of monozygotic twins concordant for ASD and their parents. Mutation rates varied widely throughout the genome (by 100-fold) and could be explained by intrinsic characteristics of DNA sequence and chromatin structure. Dense clusters of mutations within individual genomes were attributable to compound mutation or gene conversion. Hypermutability was a characteristic of genes involved in ASD and other diseases. In addition, genes impacted by mutations in this study were associated with ASD in independent exome-sequencing data sets. Our findings suggest that regional hypermutation is a significant factor shaping patterns of genetic variation and disease risk in humans. PMID:23260136

  19. Diversity through duplication: whole-genome sequencing reveals novel gene retrocopies in the human population.

    PubMed

    Richardson, Sandra R; Salvador-Palomeque, Carmen; Faulkner, Geoffrey J

    2014-05-01

    Gene retrocopies are generated by reverse transcription and genomic integration of mRNA. As such, retrocopies present an important exception to the central dogma of molecular biology, and have substantially impacted the functional landscape of the metazoan genome. While an estimated 8,000-17,000 retrocopies exist in the human genome reference sequence, the extent of variation between individuals in terms of retrocopy content has remained largely unexplored. Three recent studies by Abyzov et al., Ewing et al. and Schrider et al. have exploited 1,000 Genomes Project Consortium data, as well as other sources of whole-genome sequencing data, to uncover novel gene retrocopies. Here, we compare the methods and results of these three studies, highlight the impact of retrocopies in human diversity and genome evolution, and speculate on the potential for somatic gene retrocopies to impact cancer etiology and genetic diversity among individual neurons in the mammalian brain. PMID:24615986

  20. Whole genome analysis of diverse Chlamydia trachomatis strains identifies phylogenetic relationships masked by current clinical typing

    PubMed Central

    Harris, Simon R.; Clarke, Ian N.; Seth-Smith, Helena M. B.; Solomon, Anthony W.; Cutcliffe, Lesley T.; Marsh, Peter; Skilton, Rachel J.; Holland, Martin J.; Mabey, David; Peeling, Rosanna W.; Lewis, David A.; Spratt, Brian G.; Unemo, Magnus; Persson, Kenneth; Bjartling, Carina; Brunham, Robert; de Vries, Henry J.C.; Morré, Servaas A.; Speksnijder, Arjen; Bébéar, Cécile M.; Clerc, Maïté; de Barbeyrac, Bertille; Parkhill, Julian; Thomson, Nicholas R.

    2012-01-01

    Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed whole genome phylogeny from representative strains of both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis demonstrates that predicting phylogenetic structure using the ompA gene, traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks true relationships. We show that in many instances ompA is a chimera that can be exchanged in part or whole, both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, another important diagnostic target. We have used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b. PMID:22406642

  1. Why we should not use the Affordable Care Act to encourage widespread whole genome sequencing.

    PubMed

    Ossorio, Pilar N; Kelleher, J Paul

    2014-02-01

    Perry Payne argues that the health care system should encourage provision of whole genome sequencing (WGS) for most people in the near future. Payne's essay contains two distinct claims. One claim is that near-universal access to WGS would be beneficial both to individuals and to populations who, without it, could be on the losing end of widening health disparities. The second claim is that the preventive services provisions of the Patient Protection and Affordable Care Act (ACA) should be invoked to establish legal entitlements to WGS, without any patient cost sharing. We believe there are strong reasons to reject both of these claims. Indeed, the reasons that count against providing wide access to WGS are the very same reasons that undermine Payne's argument for providing WGS under the preventive services provisions of the ACA. PMID:24193611

  2. Microsatellite polymorphism among Chrysanthemum sp. polyploids: the influence of whole genome duplication

    PubMed Central

    Wang, Haibin; Qi, Xiangyu; Gao, Ri; Wang, Jingjing; Dong, Bin; Jiang, Jiafu; Chen, Sumei; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2014-01-01

    Polyploidy is common among flowering plants, including the Asteraceae, a relatively recent angiosperm group. EST-SSRs were used to characterize polymorphism among 29 Chrysanthemum and Ajania spp. accessions of various ploidy levels. Most EST-SSR loci were readily transferable between the species, 29 accessions were separated into three groups in terms of the number of fragments. It inferred that the formation from tetraploid to hexaploid and from octoploid to decaploid may be a recent event, while from the diploid to the tetraploid may be an ancient one in the Chrysanthemum lineage. EST-SSR polymorphism was found and some transcripts containing an SSR were transcribed differently in the de novo autotetraploid C. nankingense and C. lavandulifolium than in their progenitor diploid. EST-SSR could provide a potential molecular basis of adaptation during evolution, while whole genome duplication has a major effect on the mutational dynamics of EST-SSR loci, which could also affect gene regulation. PMID:25339092

  3. Whole-Genome Sequencing and Disability in the NICU: Exploring Practical and Ethical Challenges.

    PubMed

    Deem, Michael J

    2016-01-01

    Clinical whole-genome sequencing (WGS) promises to deliver faster diagnoses and lead to better management of care in the NICU. However,several disability rights advocates have expressed concern that clinical use of genetic technologies may reinforce and perpetuate stigmatization of and discrimination against disabled persons in medical and social contexts. There is growing need, then, for clinicians and bioethicists to consider how the clinical use of WGS in the newborn period might exacerbate such harms to persons with disabilities. This article explores ways to extend these concerns to clinical WGS in neonatal care. By considering these perspectives during the early phases of expanded use of WGS in the NICU, this article encourages clinicians and bioethicists to continue to reflect on ways to attend to the concerns of disability rights advocates, foster trust and cooperation between the medical and disability communities, and forestall some of the social harms clinical WGS might cause to persons with disabilities and their families. PMID:26729703

  4. Plant Genetic Archaeology: Whole-Genome Sequencing Reveals the Pedigree of a Classical Trisomic Line

    PubMed Central

    Salomé, Patrice A.; Weigel, Detlef

    2014-01-01

    The circadian oscillator is astonishingly robust to changes in the environment but also to genomic changes that alter the copy number of its components through genome duplication, gene duplication, and homeologous gene loss. While studying the potential effect of aneuploidy on the Arabidopsis thaliana circadian clock, we discovered that a line thought to be trisomic for chromosome 3 also bears the gi-1 mutation, resulting in a short period and late flowering. With the help of whole-genome sequencing, we uncovered the unexpected complexity of this trisomic stock’s history, as its genome shows evidence of past outcrossing with another A. thaliana accession. Our study indicates that although historical aneuploidy lines exist and are available, it might be safer to generate new individuals and confirm their genomes and karyotypes by sequencing. PMID:25524155

  5. Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

    PubMed

    Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

    2016-08-01

    The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. PMID:27008877

  6. Identification of cancer-driver genes in focal genomic alterations from whole genome sequencing data.

    PubMed

    Jang, Ho; Hur, Youngmi; Lee, Hyunju

    2016-01-01

    DNA copy number alterations (CNAs) are the main genomic events that occur during the initiation and development of cancer. Distinguishing driver aberrant regions from passenger regions, which might contain candidate target genes for cancer therapies, is an important issue. Several methods for identifying cancer-driver genes from multiple cancer patients have been developed for single nucleotide polymorphism (SNP) arrays. However, for NGS data, methods for the SNP array cannot be directly applied because of different characteristics of NGS such as higher resolutions of data without predefined probes and incorrectly mapped reads to reference genomes. In this study, we developed a wavelet-based method for identification of focal genomic alterations for sequencing data (WIFA-Seq). We applied WIFA-Seq to whole genome sequencing data from glioblastoma multiforme, ovarian serous cystadenocarcinoma and lung adenocarcinoma, and identified focal genomic alterations, which contain candidate cancer-related genes as well as previously known cancer-driver genes. PMID:27156852

  7. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing.

    PubMed

    Alioto, Tyler S; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D; Hovig, Eivind; Heisler, Lawrence E; Beck, Timothy A; Simpson, Jared T; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S; Butler, Adam P; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C; Gut, Marta; Denroche, Robert E; Harding, Nicholas J; Yamaguchi, Takafumi N; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G; Anderson, Charlotte L; Waddell, Nicola; Pearson, John V; Grimmond, Sean M; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A; López-Otín, Carlos; Campo, Elías; Campbell, Peter J; Boutros, Paul C; Puente, Xose S; Gerhard, Daniela S; Pfister, Stefan M; McPherson, John D; Hudson, Thomas J; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T W; Gut, Ivo G

    2015-01-01

    As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼ 100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970

  8. A Gene-By-Gene Approach to Bacterial Population Genomics: Whole Genome MLST of Campylobacter.

    PubMed

    Sheppard, Samuel K; Jolley, Keith A; Maiden, Martin C J

    2012-01-01

    Campylobacteriosis remains a major human public health problem world-wide. Genetic analyses of Campylobacter isolates, and particularly molecular epidemiology, have been central to the study of this disease, particularly the characterization of Campylobacter genotypes isolated from human infection, farm animals, and retail food. These studies have demonstrated that Campylobacter populations are highly structured, with distinct genotypes associated with particular wild or domestic animal sources, and that chicken meat is the most likely source of most human infection in countries such as the UK. The availability of multiple whole genome sequences from Campylobacter isolates presents the prospect of identifying those genes or allelic variants responsible for host-association and increased human disease risk, but the diversity of Campylobacter genomes present challenges for such analyses. We present a gene-by-gene approach for investigating the genetic basis of phenotypes in diverse bacteria such as Campylobacter, implemented with the BIGSdb software on the pubMLST.org/campylobacter website. PMID:24704917

  9. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

    SciTech Connect

    Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin; Barry, Kerrie; Georganas, Evangelos; Session, Adam; Strnadova, Veronika; Jenkins, Jerry; Sehgal, Sunish; Oliker, Leonid; Schmutz, Jeremy; Yelick, Katherine A.; Scholz, Uwe; Waugh, Robbie; Poland, Jesse A.; Muehlbauer, Gary J.; Stein, Nils; Rokhsar, Daniel S.

    2015-01-31

    We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible to construct a mapping population.

  10. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

    DOE PAGESBeta

    Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin; Barry, Kerrie; Georganas, Evangelos; Session, Adam; Strnadova, Veronika; Jenkins, Jerry; Sehgal, Sunish; Oliker, Leonid; et al

    2015-01-31

    We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less

  11. Whole genome sequencing provides an unambiguous link between Salmonella Dublin outbreak strain and a historical isolate.

    PubMed

    Mohammed, M; Delappe, N; O'Connor, J; McKeown, P; Garvey, P; Cormican, M

    2016-02-01

    Salmonella enterica subsp. enterica serovar Dublin is an uncommon cause of human salmonellosis; however, a relatively high proportion of cases are associated with invasive disease. The serotype is associated with cattle. A geographically diffuse outbreak of S. Dublin involving nine patients occurred in Ireland in 2013. The source of infection was not identified. Typing of outbreak associated isolates by pulsed-field gel electrophoresis (PFGE) was of limited value because PFGE has limited discriminatory power for S. Dublin. Whole genome sequencing (WGS) showed conclusively that the isolates were closely related to each other, to an apparently unrelated isolate from 2011 and distinct from other isolates that were not readily distinguishable by PFGE. PMID:26165314

  12. Integrating Crop Growth Models with Whole Genome Prediction through Approximate Bayesian Computation

    PubMed Central

    Technow, Frank; Messina, Carlos D.; Totir, L. Radu; Cooper, Mark

    2015-01-01

    Genomic selection, enabled by whole genome prediction (WGP) methods, is revolutionizing plant breeding. Existing WGP methods have been shown to deliver accurate predictions in the most common settings, such as prediction of across environment performance for traits with additive gene effects. However, prediction of traits with non-additive gene effects and prediction of genotype by environment interaction (G×E), continues to be challenging. Previous attempts to increase prediction accuracy for these particularly difficult tasks employed prediction methods that are purely statistical in nature. Augmenting the statistical methods with biological knowledge has been largely overlooked thus far. Crop growth models (CGMs) attempt to represent the impact of functional relationships between plant physiology and the environment in the formation of yield and similar output traits of interest. Thus, they can explain the impact of G×E and certain types of non-additive gene effects on the expressed phenotype. Approximate Bayesian computation (ABC), a novel and powerful computational procedure, allows the incorporation of CGMs directly into the estimation of whole genome marker effects in WGP. Here we provide a proof of concept study for this novel approach and demonstrate its use with synthetic data sets. We show that this novel approach can be considerably more accurate than the benchmark WGP method GBLUP in predicting performance in environments represented in the estimation set as well as in previously unobserved environments for traits determined by non-additive gene effects. We conclude that this proof of concept demonstrates that using ABC for incorporating biological knowledge in the form of CGMs into WGP is a very promising and novel approach to improving prediction accuracy for some of the most challenging scenarios in plant breeding and applied genetics. PMID:26121133

  13. Whole-Genome Sequencing for the Investigation of a Hospital Outbreak of MRSA in China

    PubMed Central

    Kong, Zhenzhen; Zhao, Peipei; Liu, Haibing; Yu, Xiang; Qin, Yanyan; Su, Zhaoliang; Wang, Shengjun; Xu, Huaxi; Chen, Jianguo

    2016-01-01

    Staphylococcus aureus is a globally disseminated drug-resistant bacterial species. It remains a leading cause of hospital-acquired infection, primarily among immunocompromised patients. In 2012, the Affiliated People’s Hospital of Jiangsu University experienced a putative outbreak of methicillin-resistant S. aureus (MRSA) that affected 12 patients in the Neurosurgery Department. In this study, whole-genome sequencing (WGS) was used to gain insight into the epidemiology of the outbreak caused by MRSA, and traditional bacterial genotyping approaches were also applied to provide supportive evidence for WGS. We sequenced the DNA from 6 isolates associated with the outbreak. Phylogenetic analysis was constructed by comparing single-nucleotide polymorphisms (SNPs) in the core genome of 6 isolates in the present study and another 3 referenced isolates from GenBank. Of the 6 MRSA sequences in the current study, 5 belonged to the same group, clustering with T0131, while the other one clustered closely with TW20. All of the isolates were identified as ST239-SCCmecIII clones. Whole-genome analysis revealed that four of the outbreak isolates were more tightly clustered into a group and SA13002 together with SA13009 were distinct from the outbreak strains, which were considered non-outbreak strains. Based on the sequencing results, the antibiotic-resistance gene status (present or absent) was almost perfectly concordant with the results of phenotypic susceptibility testing. Various toxin genes were also analyzed successfully. Our analysis demonstrates that using traditional molecular methods and WGS can facilitate the identification of outbreaks and help to control nosocomial transmission. PMID:26950298

  14. Multivariate whole genome average interval mapping: QTL analysis for multiple traits and/or environments.

    PubMed

    Verbyla, Arūnas P; Cullis, Brian R

    2012-09-01

    A major aim in some plant-based studies is the determination of quantitative trait loci (QTL) for multiple traits or across multiple environments. Understanding these QTL by trait or QTL by environment interactions can be of great value to the plant breeder. A whole genome approach for the analysis of QTL is presented for such multivariate applications. The approach is an extension of whole genome average interval mapping in which all intervals on a linkage map are included in the analysis simultaneously. A random effects working model is proposed for the multivariate (trait or environment) QTL effects for each interval, with a variance-covariance matrix linking the variates in a particular interval. The significance of the variance-covariance matrix for the QTL effects is tested and if significant, an outlier detection technique is used to select a putative QTL. This QTL by variate interaction is transferred to the fixed effects. The process is repeated until the variance-covariance matrix for QTL random effects is not significant; at this point all putative QTL have been selected. Unlinked markers can also be included in the analysis. A simulation study was conducted to examine the performance of the approach and demonstrated the multivariate approach results in increased power for detecting QTL in comparison to univariate methods. The approach is illustrated for data arising from experiments involving two doubled haploid populations. The first involves analysis of two wheat traits, α-amylase activity and height, while the second is concerned with a multi-environment trial for extensibility of flour dough. The method provides an approach for multi-trait and multi-environment QTL analysis in the presence of non-genetic sources of variation. PMID:22692445

  15. A Model for Carbohydrate Metabolism in the Diatom Phaeodactylum tricornutum Deduced from Comparative Whole Genome Analysis

    PubMed Central

    Kaplan, Aaron; Caron, Lise; Weber, Till; Maheswari, Uma; Armbrust, E. Virginia; Bowler, Chris

    2008-01-01

    Background Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. Methodology/Principal Findings The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a β-1,3-glucan) outside of the plastids. We identified various β-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. Conclusions/Significance Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum. This model provides novel

  16. Integrated clinical, whole-genome, and transcriptome analysis of multisampled lethal metastatic prostate cancer

    PubMed Central

    Bova, G. Steven; Kallio, Heini M.L.; Annala, Matti; Kivinummi, Kati; Högnäs, Gunilla; Häyrynen, Sergei; Rantapero, Tommi; Kivinen, Virpi; Isaacs, William B.; Tolonen, Teemu; Nykter, Matti; Visakorpi, Tapio

    2016-01-01

    We report the first combined analysis of whole-genome sequence, detailed clinical history, and transcriptome sequence of multiple prostate cancer metastases in a single patient (A21). Whole-genome and transcriptome sequence was obtained from nine anatomically separate metastases, and targeted DNA sequencing was performed in cancerous and noncancerous foci within the primary tumor specimen removed 5 yr before death. Transcriptome analysis revealed increased expression of androgen receptor (AR)-regulated genes in liver metastases that harbored an AR p.L702H mutation, suggesting a dominant effect by the mutation despite being present in only one of an estimated 16 copies per cell. The metastases harbored several alterations to the PI3K/AKT pathway, including a clonal truncal mutation in PIK3CG and present in all metastatic sites studied. The list of truncal genomic alterations shared by all metastases included homozygous deletion of TP53, hemizygous deletion of RB1 and CHD1, and amplification of FGFR1. If the patient were treated today, given this knowledge, the use of second-generation androgen-directed therapies, cessation of glucocorticoid administration, and therapeutic inhibition of the PI3K/AKT pathway or FGFR1 receptor could provide personalized benefit. Three previously unreported truncal clonal missense mutations (ABCC4 p.R891L, ALDH9A1 p.W89R, and ASNA1 p.P75R) were expressed at the RNA level and assessed as druggable. The truncal status of mutations may be critical for effective actionability and merit further study. Our findings suggest that a large set of deeply analyzed cases could serve as a powerful guide to more effective prostate cancer basic science and personalized cancer medicine clinical trials. PMID:27148588

  17. Whole-Genome Sequencing for the Investigation of a Hospital Outbreak of MRSA in China.

    PubMed

    Kong, Zhenzhen; Zhao, Peipei; Liu, Haibing; Yu, Xiang; Qin, Yanyan; Su, Zhaoliang; Wang, Shengjun; Xu, Huaxi; Chen, Jianguo

    2016-01-01

    Staphylococcus aureus is a globally disseminated drug-resistant bacterial species. It remains a leading cause of hospital-acquired infection, primarily among immunocompromised patients. In 2012, the Affiliated People's Hospital of Jiangsu University experienced a putative outbreak of methicillin-resistant S. aureus (MRSA) that affected 12 patients in the Neurosurgery Department. In this study, whole-genome sequencing (WGS) was used to gain insight into the epidemiology of the outbreak caused by MRSA, and traditional bacterial genotyping approaches were also applied to provide supportive evidence for WGS. We sequenced the DNA from 6 isolates associated with the outbreak. Phylogenetic analysis was constructed by comparing single-nucleotide polymorphisms (SNPs) in the core genome of 6 isolates in the present study and another 3 referenced isolates from GenBank. Of the 6 MRSA sequences in the current study, 5 belonged to the same group, clustering with T0131, while the other one clustered closely with TW20. All of the isolates were identified as ST239-SCCmecIII clones. Whole-genome analysis revealed that four of the outbreak isolates were more tightly clustered into a group and SA13002 together with SA13009 were distinct from the outbreak strains, which were considered non-outbreak strains. Based on the sequencing results, the antibiotic-resistance gene status (present or absent) was almost perfectly concordant with the results of phenotypic susceptibility testing. Various toxin genes were also analyzed successfully. Our analysis demonstrates that using traditional molecular methods and WGS can facilitate the identification of outbreaks and help to control nosocomial transmission. PMID:26950298

  18. Whole genome survey of coding SNPs reveals a reproducible pathway determinant of Parkinson disease.

    PubMed

    Srinivasan, Balaji S; Doostzadeh, Jaleh; Absalan, Farnaz; Mohandessi, Sharareh; Jalili, Roxana; Bigdeli, Saharnaz; Wang, Justin; Mahadevan, Jaydev; Lee, Caroline L G; Davis, Ronald W; William Langston, J; Ronaghi, Mostafa

    2009-02-01

    It is quickly becoming apparent that situating human variation in a pathway context is crucial to understanding its phenotypic significance. Toward this end, we have developed a general method for finding pathways associated with traits that control for pathway size. We have applied this method to a new whole genome survey of coding SNP variation in 187 patients afflicted with Parkinson disease (PD) and 187 controls. We show that our dataset provides an independent replication of the axon guidance association recently reported by Lesnick et al. [PLoS Genet 2007;3:e98], and also indicates that variation in the ubiquitin-mediated proteolysis and T-cell receptor signaling pathways may predict PD susceptibility. Given this result, it is reasonable to hypothesize that pathway associations are more replicable than individual SNP associations in whole genome association studies. However, this hypothesis is complicated by a detailed comparison of our dataset to the second recent PD association study by Fung et al. [Lancet Neurol 2006;5:911-916]. Surprisingly, we find that the axon guidance pathway does not rank at the very top of the Fung dataset after controlling for pathway size. More generally, in comparing the studies, we find that SNP frequencies replicate well despite technologically different assays, but that both SNP and pathway associations are globally uncorrelated across studies. We thus have a situation in which an association between axon guidance pathway variation and PD has been found in 2 out of 3 studies. We conclude by relating this seeming inconsistency to the molecular heterogeneity of PD, and suggest future analyses that may resolve such discrepancies. PMID:18853455

  19. Whole genome survey of coding SNPs reveals a reproducible pathway determinant of Parkinson disease

    PubMed Central

    Srinivasan, Balaji S; Doostzadeh, Jaleh; Absalan, Farnaz; Mohandessi, Sharareh; Jalili, Roxana; Bigdeli, Saharnaz; Wang, Justin; Mahadevan, Jaydev; Lee, Caroline LG; Davis, Ronald W; William Langston, J; Ronaghi, Mostafa

    2009-01-01

    It is quickly becoming apparent that situating human variation in a pathway context is crucial to understanding its phenotypic significance. Toward this end, we have developed a general method for finding pathways associated with traits that control for pathway size. We have applied this method to a new whole genome survey of coding SNP variation in 187 patients afflicted with Parkinson disease (PD) and 187 controls. We show that our dataset provides an independent replication of the axon guidance association recently reported by Lesnick et al. [PLoS Genet 2007;3:e98], and also indicates that variation in the ubiquitin-mediated proteolysis and T-cell receptor signaling pathways may predict PD susceptibility. Given this result, it is reasonable to hypothesize that pathway associations are more replicable than individual SNP associations in whole genome association studies. However, this hypothesis is complicated by a detailed comparison of our dataset to the second recent PD association study by Fung et al. [Lancet Neurol 2006;5:911–916]. Surprisingly, we find that the axon guidance pathway does not rank at the very top of the Fung dataset after controlling for pathway size. More generally, in comparing the studies, we find that SNP frequencies replicate well despite technologically different assays, but that both SNP and pathway associations are globally uncorrelated across studies. We thus have a situation in which an association between axon guidance pathway variation and PD has been found in 2 out of 3 studies. We conclude by relating this seeming inconsistency to the molecular heterogeneity of PD, and suggest future analyses that may resolve such discrepancies. PMID:18853455

  20. Phylogenetics and Differentiation of Salmonella Newport Lineages by Whole Genome Sequencing

    PubMed Central

    Cao, Guojie; Meng, Jianghong; Strain, Errol; Stones, Robert; Pettengill, James; Zhao, Shaohua; McDermott, Patrick; Brown, Eric; Allard, Marc

    2013-01-01

    Salmonella Newport has ranked in the top three Salmonella serotypes associated with foodborne outbreaks from 1995 to 2011 in the United States. In the current study, we selected 26 S. Newport strains isolated from diverse sources and geographic locations and then conducted 454 shotgun pyrosequencing procedures to obtain 16–24 × coverage of high quality draft genomes for each strain. Comparative genomic analysis of 28 S. Newport strains (including 2 reference genomes) and 15 outgroup genomes identified more than 140,000 informative SNPs. A resulting phylogenetic tree consisted of four sublineages and indicated that S. Newport had a clear geographic structure. Strains from Asia were divergent from those from the Americas. Our findings demonstrated that analysis using whole genome sequencing data resulted in a more accurate picture of phylogeny compared to that using single genes or small sets of genes. We selected loci around the mutS gene of S. Newport to differentiate distinct lineages, including those between invH and mutS genes at the 3′ end of Salmonella Pathogenicity Island 1 (SPI-1), ste fimbrial operon, and Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR) associated-proteins (cas). These genes in the outgroup genomes held high similarity with either S. Newport Lineage II or III at the same loci. S. Newport Lineages II and III have different evolutionary histories in this region and our data demonstrated genetic flow and homologous recombination events around mutS. The findings suggested that S. Newport Lineages II and III diverged early in the serotype evolution and have evolved largely independently. Moreover, we identified genes that could delineate sublineages within the phylogenetic tree and that could be used as potential biomarkers for trace-back investigations during outbreaks. Thus, whole genome sequencing data enabled us to better understand the genetic background of pathogenicity and evolutionary history of S. Newport and

  1. Rapid Whole-Genome Sequencing for Investigation of a Neonatal MRSA Outbreak

    PubMed Central

    Köser, Claudio U.; Holden, Matthew T.G.; Ellington, Matthew J.; Cartwright, Edward J.P.; Brown, Nicholas M.; Ogilvy-Stuart, Amanda L.; Hsu, Li Yang; Chewapreecha, Claire; Croucher, Nicholas J.; Harris, Simon R.; Sanders, Mandy; Enright, Mark C.; Dougan, Gordon; Bentley, Stephen D.; Parkhill, Julian; Fraser, Louise J.; Betley, Jason R.; Schulz-Trieglaff, Ole B.; Smith, Geoffrey P.; Peacock, Sharon J.

    2013-01-01

    Background Isolates of methicillin-resistant Staphylococcus aureus (MRSA) belonging to a single lineage are often indistinguishable by means of current typing techniques. Whole-genome sequencing may provide improved resolution to define transmission pathways and characterize outbreaks. Methods We investigated a putative MRSA outbreak in a neonatal intensive care unit. By using rapid high-throughput sequencing technology with a clinically relevant turnaround time, we retrospectively sequenced the DNA from seven isolates associated with the outbreak and another seven MRSA isolates associated with carriage of MRSA or bacteremia in the same hospital. Results We constructed a phylogenetic tree by comparing single-nucleotide polymorphisms (SNPs) in the core genome to a reference genome (an epidemic MRSA clone, EMRSA-15 [sequence type 22]). This revealed a distinct cluster of outbreak isolates and clear separation between these and the nonoutbreak isolates. A previously missed transmission event was detected between two patients with bacteremia who were not part of the outbreak. We created an artificial “resistome” of antibiotic-resistance genes and demonstrated concordance between it and the results of phenotypic susceptibility testing; we also created a “toxome” consisting of toxin genes. One outbreak isolate had a hypermutator phenotype with a higher number of SNPs than the other outbreak isolates, highlighting the difficulty of imposing a simple threshold for the number of SNPs between isolates to decide whether they are part of a recent transmission chain. Conclusions Whole-genome sequencing can provide clinically relevant data within a time frame that can influence patient care. The need for automated data interpretation and the provision of clinically meaningful reports represent hurdles to clinical implementation. (Funded by the U.K. Clinical Research Collaboration Translational Infection Research Initiative and others.) PMID:22693998

  2. Whole genome duplication events in plant evolution reconstructed and predicted using myosin motor proteins

    PubMed Central

    2013-01-01

    Background The evolution of land plants is characterized by whole genome duplications (WGD), which drove species diversification and evolutionary novelties. Detecting these events is especially difficult if they date back to the origin of the plant kingdom. Established methods for reconstructing WGDs include intra- and inter-genome comparisons, KS age distribution analyses, and phylogenetic tree constructions. Results By analysing 67 completely sequenced plant genomes 775 myosins were identified and manually assembled. Phylogenetic trees of the myosin motor domains revealed orthologous and paralogous relationships and were consistent with recent species trees. Based on the myosin inventories and the phylogenetic trees, we have identified duplications of the entire myosin motor protein family at timings consistent with 23 WGDs, that had been reported before. We also predict 6 WGDs based on further protein family duplications. Notably, the myosin data support the two recently reported WGDs in the common ancestor of all extant angiosperms. We predict single WGDs in the Manihot esculenta and Nicotiana benthamiana lineages, two WGDs for Linum usitatissimum and Phoenix dactylifera, and a triplication or two WGDs for Gossypium raimondii. Our data show another myosin duplication in the ancestor of the angiosperms that could be either the result of a single gene duplication or a remnant of a WGD. Conclusions We have shown that the myosin inventories in angiosperms retain evidence of numerous WGDs that happened throughout plant evolution. In contrast to other protein families, many myosins are still present in extant species. They are closely related and have similar domain architectures, and their phylogenetic grouping follows the genome duplications. Because of its broad taxonomic sampling the dataset provides the basis for reliable future identification of further whole genome duplications. PMID:24053117

  3. Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing.

    PubMed

    Ronholm, J; Nasheri, Neda; Petronella, Nicholas; Pagotto, Franco

    2016-10-01

    The epidemiological investigation of a foodborne outbreak, including identification of related cases, source attribution, and development of intervention strategies, relies heavily on the ability to subtype the etiological agent at a high enough resolution to differentiate related from nonrelated cases. Historically, several different molecular subtyping methods have been used for this purpose; however, emerging techniques, such as single nucleotide polymorphism (SNP)-based techniques, that use whole-genome sequencing (WGS) offer a resolution that was previously not possible. With WGS, unlike traditional subtyping methods that lack complete information, data can be used to elucidate phylogenetic relationships and disease-causing lineages can be tracked and monitored over time. The subtyping resolution and evolutionary context provided by WGS data allow investigators to connect related illnesses that would be missed by traditional techniques. The added advantage of data generated by WGS is that these data can also be used for secondary analyses, such as virulence gene detection, antibiotic resistance gene profiling, synteny comparisons, mobile genetic element identification, and geographic attribution. In addition, several software packages are now available to generate in silico results for traditional molecular subtyping methods from the whole-genome sequence, allowing for efficient comparison with historical databases. Metagenomic approaches using next-generation sequencing have also been successful in the detection of nonculturable foodborne pathogens. This review addresses state-of-the-art techniques in microbial WGS and analysis and then discusses how this technology can be used to help support food safety investigations. Retrospective outbreak investigations using WGS are presented to provide organism-specific examples of the benefits, and challenges, associated with WGS in comparison to traditional molecular subtyping techniques. PMID:27559074

  4. Comparison of whole genome sequences from human and non-human Escherichia coli O26 strains

    PubMed Central

    Norman, Keri N.; Clawson, Michael L.; Strockbine, Nancy A.; Mandrell, Robert E.; Johnson, Roger; Ziebell, Kim; Zhao, Shaohua; Fratamico, Pina M.; Stones, Robert; Allard, Marc W.; Bono, James L.

    2015-01-01

    Shiga toxin-producing Escherichia coli (STEC) O26 is the second leading E. coli serogroup responsible for human illness outbreaks behind E. coli O157:H7. Recent outbreaks have been linked to emerging pathogenic O26:H11 strains harboring stx2 only. Cattle have been recognized as an important reservoir of O26 strains harboring stx1; however the reservoir of these emerging stx2 strains is unknown. The objective of this study was to identify nucleotide polymorphisms in human and cattle-derived strains in order to compare differences in polymorphism derived genotypes and virulence gene profiles between the two host species. Whole genome sequencing was performed on 182 epidemiologically unrelated O26 strains, including 109 human-derived strains and 73 non-human-derived strains. A panel of 289 O26 strains (241 STEC and 48 non-STEC) was subsequently genotyped using a set of 283 polymorphisms identified by whole genome sequencing, resulting in 64 unique genotypes. Phylogenetic analyses identified seven clusters within the O26 strains. The seven clusters did not distinguish between isolates originating from humans or cattle; however, clusters did correspond with particular virulence gene profiles. Human and non-human-derived strains harboring stx1 clustered separately from strains harboring stx2, strains harboring eae, and non-STEC strains. Strains harboring stx2 were more closely related to non-STEC strains and strains harboring eae than to strains harboring stx1. The finding of human and cattle-derived strains with the same polymorphism derived genotypes and similar virulence gene profiles, provides evidence that similar strains are found in cattle and humans and transmission between the two species may occur. PMID:25815275

  5. Environmental whole-genome amplification to access microbial populations in contaminated sediments

    SciTech Connect

    Abulencia, Carl B; Wyborski, Denise L.; Garcia, Joseph A.; Podar, Mircea; Chen, Wenqiong; Chang, Sherman H.; Chang, Hwai W.; Watson, David B; Brodie, Eoin L.; Hazen, Terry; Keller, Martin

    2006-05-01

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using {phi}29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2% genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small-subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9% of the sequences had significant similarities to known proteins, and 'clusters of orthologous groups' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  6. Whole-genome DNA methylation in skin lesions from patients with psoriasis vulgaris.

    PubMed

    Zhang, Peng; Zhao, Ming; Liang, Gongping; Yin, Guangliang; Huang, Dan; Su, Fengxia; Zhai, Hanyue; Wang, Litao; Su, Yuwen; Lu, Qianjin

    2013-03-01

    Psoriasis, a chronic inflammatory skin disorder, is characterized by aberrant keratinocyte proliferation and differentiation in the epidermis. Although the pathogenesis of psoriasis is still incompletely understood, both genetic susceptibilities and environmental triggers are known to act as key players in its development. Several studies have suggested that DNA methylation is involved in the pathogenesis of psoriasis. However, the precise mechanisms underlying the regulation and maintenance of the methylome as well as their relationship with this disease remain poorly characterized. Herein, we used methylated DNA immunoprecipitation sequencing (MeDIP-Seq) to characterize whole-genome DNA methylation patterns in involved and uninvolved skin lesions from patients with psoriasis. The results of our MeDIP-Seq analyses identified differentially methylated regions (DMRs) covering almost the entire genome with sufficient depth and high resolution, showing that the number of hypermethylated DMRs was considerably higher than that of hypomethylated DMRs in involved psoriatic skin samples. Moreover, gene ontology analysis of MeDIP-Seq data showed that the aberrantly methylated genes belonged to several different ontological domains, such as the immune system, cell cycle and apoptosis. The results of the bisulfite-sequencing experiments for the genes PDCD5 and TIMP2 confirmed the methylation status identified by MeDIP-Seq, and the mRNA expression levels of these two genes were consistent with their DNA methylation profiles. To our knowledge, the present study constitutes the first report on MeDIP-Seq in psoriasis. The identification of whole-genome DNA methylation patterns associated with psoriasis provides new insight into the pathogenesis of this complex disease and represents a promising avenue through which to investigate novel therapeutic approaches. PMID:23369618

  7. Integrated clinical, whole-genome, and transcriptome analysis of multisampled lethal metastatic prostate cancer.

    PubMed

    Bova, G Steven; Kallio, Heini M L; Annala, Matti; Kivinummi, Kati; Högnäs, Gunilla; Häyrynen, Sergei; Rantapero, Tommi; Kivinen, Virpi; Isaacs, William B; Tolonen, Teemu; Nykter, Matti; Visakorpi, Tapio

    2016-05-01

    We report the first combined analysis of whole-genome sequence, detailed clinical history, and transcriptome sequence of multiple prostate cancer metastases in a single patient (A21). Whole-genome and transcriptome sequence was obtained from nine anatomically separate metastases, and targeted DNA sequencing was performed in cancerous and noncancerous foci within the primary tumor specimen removed 5 yr before death. Transcriptome analysis revealed increased expression of androgen receptor (AR)-regulated genes in liver metastases that harbored an AR p.L702H mutation, suggesting a dominant effect by the mutation despite being present in only one of an estimated 16 copies per cell. The metastases harbored several alterations to the PI3K/AKT pathway, including a clonal truncal mutation in PIK3CG and present in all metastatic sites studied. The list of truncal genomic alterations shared by all metastases included homozygous deletion of TP53, hemizygous deletion of RB1 and CHD1, and amplification of FGFR1. If the patient were treated today, given this knowledge, the use of second-generation androgen-directed therapies, cessation of glucocorticoid administration, and therapeutic inhibition of the PI3K/AKT pathway or FGFR1 receptor could provide personalized benefit. Three previously unreported truncal clonal missense mutations (ABCC4 p.R891L, ALDH9A1 p.W89R, and ASNA1 p.P75R) were expressed at the RNA level and assessed as druggable. The truncal status of mutations may be critical for effective actionability and merit further study. Our findings suggest that a large set of deeply analyzed cases could serve as a powerful guide to more effective prostate cancer basic science and personalized cancer medicine clinical trials. PMID:27148588

  8. Rapid Whole-Genome Sequencing of Mycobacterium tuberculosis Isolates Directly from Clinical Samples

    PubMed Central

    Brown, Amanda C.; Einer-Jensen, Katja; Holdstock, Jolyon; Houniet, Darren T.; Chan, Jacqueline Z. M.; Depledge, Daniel P.; Nikolayevskyy, Vladyslav; Broda, Agnieszka; Stone, Madeline J.; Christiansen, Mette T.; Williams, Rachel; McAndrew, Michael B.; Tutill, Helena; Brown, Julianne; Melzer, Mark; Rosmarin, Caryn; McHugh, Timothy D.; Shorten, Robert J.; Drobniewski, Francis; Speight, Graham; Breuer, Judith

    2015-01-01

    The rapid identification of antimicrobial resistance is essential for effective treatment of highly resistant Mycobacterium tuberculosis. Whole-genome sequencing provides comprehensive data on resistance mutations and strain typing for monitoring transmission, but unlike for conventional molecular tests, this has previously been achievable only from cultures of M. tuberculosis. Here we describe a method utilizing biotinylated RNA baits designed specifically for M. tuberculosis DNA to capture full M. tuberculosis genomes directly from infected sputum samples, allowing whole-genome sequencing without the requirement of culture. This was carried out on 24 smear-positive sputum samples, collected from the United Kingdom and Lithuania where a matched culture sample was available, and 2 samples that had failed to grow in culture. M. tuberculosis sequencing data were obtained directly from all 24 smear-positive culture-positive sputa, of which 20 were of high quality (>20× depth and >90% of the genome covered). Results were compared with those of conventional molecular and culture-based methods, and high levels of concordance between phenotypical resistance and predicted resistance based on genotype were observed. High-quality sequence data were obtained from one smear-positive culture-negative case. This study demonstrated for the first time the successful and accurate sequencing of M. tuberculosis genomes directly from uncultured sputa. Identification of known resistance mutations within a week of sample receipt offers the prospect for personalized rather than empirical treatment of drug-resistant tuberculosis, including the use of antimicrobial-sparing regimens, leading to improved outcomes. PMID:25972414

  9. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks

    PubMed Central

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S. K.; Mammel, Mark K.; Tarr, Phillip I.; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and

  10. Whole-Genome-Based Phylogeny and Divergence of the Genus Brucella▿ †

    PubMed Central

    Foster, Jeffrey T.; Beckstrom-Sternberg, Stephen M.; Pearson, Talima; Beckstrom-Sternberg, James S.; Chain, Patrick S. G.; Roberto, Francisco F.; Hnath, Jonathan; Brettin, Tom; Keim, Paul

    2009-01-01

    Brucellae are worldwide bacterial pathogens of livestock and wildlife, but phylogenetic reconstructions have been challenging due to limited genetic diversity. We assessed the taxonomic and evolutionary relationships of five Brucella species—Brucella abortus, B. melitensis, B. suis, B. canis, and B. ovis—using whole-genome comparisons. We developed a phylogeny using single nucleotide polymorphisms (SNPs) from 13 genomes and rooted the tree using the closely related soil bacterium and opportunistic human pathogen, Ochrobactrum anthropi. Whole-genome sequencing and a SNP-based approach provided the requisite level of genetic detail to resolve species in the highly conserved brucellae. Comparisons among the Brucella genomes revealed 20,154 orthologous SNPs that were shared in all genomes. Rooting with Ochrobactrum anthropi reveals that the B. ovis lineage is basal to the rest of the Brucella lineage. We found that B. suis is a highly divergent clade with extensive intraspecific genetic diversity. Furthermore, B. suis was determined to be paraphyletic in our analyses, only forming a monophyletic clade when the B. canis genome was included. Using a molecular clock with these data suggests that most Brucella species diverged from their common B. ovis ancestor in the past 86,000 to 296,000 years, which precedes the domestication of their livestock hosts. Detailed knowledge of the Brucella phylogeny will lead to an improved understanding of the ecology, evolutionary history, and host relationships for this genus and can be used for determining appropriate genotyping approaches for rapid detection and diagnostic assays for molecular epidemiological and clinical studies. PMID:19201792

  11. Environmental Whole-Genome Amplification to Access Microbial Diversity in Contaminated Sediments

    SciTech Connect

    Abulencia, C.B.; Wyborski, D.L.; Garcia, J.; Podar, M.; Chen, W.; Chang, S.H.; Chang, H.W.; Watson, D.; Brodie,E.I.; Hazen, T.C.; Keller, M.

    2005-12-10

    Low-biomass samples from nitrate and heavy metal contaminated soils yield DNA amounts that have limited use for direct, native analysis and screening. Multiple displacement amplification (MDA) using ?29 DNA polymerase was used to amplify whole genomes from environmental, contaminated, subsurface sediments. By first amplifying the genomic DNA (gDNA), biodiversity analysis and gDNA library construction of microbes found in contaminated soils were made possible. The MDA method was validated by analyzing amplified genome coverage from approximately five Escherichia coli cells, resulting in 99.2 percent genome coverage. The method was further validated by confirming overall representative species coverage and also an amplification bias when amplifying from a mix of eight known bacterial strains. We extracted DNA from samples with extremely low cell densities from a U.S. Department of Energy contaminated site. After amplification, small subunit rRNA analysis revealed relatively even distribution of species across several major phyla. Clone libraries were constructed from the amplified gDNA, and a small subset of clones was used for shotgun sequencing. BLAST analysis of the library clone sequences showed that 64.9 percent of the sequences had significant similarities to known proteins, and ''clusters of orthologous groups'' (COG) analysis revealed that more than half of the sequences from each library contained sequence similarity to known proteins. The libraries can be readily screened for native genes or any target of interest. Whole-genome amplification of metagenomic DNA from very minute microbial sources, while introducing an amplification bias, will allow access to genomic information that was not previously accessible.

  12. Whole-genome sequencing to understand the genetic architecture of common gene expression and biomarker phenotypes

    PubMed Central

    Wood, Andrew R.; Tuke, Marcus A.; Nalls, Mike; Hernandez, Dena; Gibbs, J. Raphael; Lin, Haoxiang; Xu, Christopher S.; Li, Qibin; Shen, Juan; Jun, Goo; Almeida, Marcio; Tanaka, Toshiko; Perry, John R. B.; Gaulton, Kyle; Rivas, Manny; Pearson, Richard; Curran, Joanne E.; Johnson, Matthew P.; Göring, Harald H. H.; Duggirala, Ravindranath; Blangero, John; Mccarthy, Mark I.; Bandinelli, Stefania; Murray, Anna; Weedon, Michael N.; Singleton, Andrew; Melzer, David; Ferrucci, Luigi; Frayling, Timothy M

    2015-01-01

    Initial results from sequencing studies suggest that there are relatively few low-frequency (<5%) variants associated with large effects on common phenotypes. We performed low-pass whole-genome sequencing in 680 individuals from the InCHIANTI study to test two primary hypotheses: (i) that sequencing would detect single low-frequency–large effect variants that explained similar amounts of phenotypic variance as single common variants, and (ii) that some common variant associations could be explained by low-frequency variants. We tested two sets of disease-related common phenotypes for which we had statistical power to detect large numbers of common variant–common phenotype associations—11 132 cis-gene expression traits in 450 individuals and 93 circulating biomarkers in all 680 individuals. From a total of 11 657 229 high-quality variants of which 6 129 221 and 5 528 008 were common and low frequency (<5%), respectively, low frequency–large effect associations comprised 7% of detectable cis-gene expression traits [89 of 1314 cis-eQTLs at P < 1 × 10−06 (false discovery rate ∼5%)] and one of eight biomarker associations at P < 8 × 10−10. Very few (30 of 1232; 2%) common variant associations were fully explained by low-frequency variants. Our data show that whole-genome sequencing can identify low-frequency variants undetected by genotyping based approaches when sample sizes are sufficiently large to detect substantial numbers of common variant associations, and that common variant associations are rarely explained by single low-frequency variants of large effect. PMID:25378555

  13. Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication

    PubMed Central

    2014-01-01

    Background Horseshoe crabs are marine arthropods with a fossil record extending back approximately 450 million years. They exhibit remarkable morphological stability over their long evolutionary history, retaining a number of ancestral arthropod traits, and are often cited as examples of “living fossils.” As arthropods, they belong to the Ecdysozoa, an ancient super-phylum whose sequenced genomes (including insects and nematodes) have thus far shown more divergence from the ancestral pattern of eumetazoan genome organization than cnidarians, deuterostomes and lophotrochozoans. However, much of ecdysozoan diversity remains unrepresented in comparative genomic analyses. Results Here we apply a new strategy of combined de novo assembly and genetic mapping to examine the chromosome-scale genome organization of the Atlantic horseshoe crab, Limulus polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their parents at a mean redundancy of 1.1x per sample. The map includes 84,307 sequence markers grouped into 1,876 distinct genetic intervals and 5,775 candidate conserved protein coding genes. Conclusions Comparison with other metazoan genomes shows that the L. polyphemus genome preserves ancestral bilaterian linkage groups, and that a common ancestor of modern horseshoe crabs underwent one or more ancient whole genome duplications 300 million years ago, followed by extensive chromosome fusion. These results provide a counter-example to the often noted correlation between whole genome duplication and evolutionary radiations. The new, low-cost genetic mapping method for obtaining a chromosome-scale view of non-model organism genomes that we demonstrate here does not require laboratory culture, and is potentially applicable to a broad range of other species. PMID:24987520

  14. Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks.

    PubMed

    Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S K; Mammel, Mark K; Tarr, Phillip I; Eppinger, Mark

    2016-01-01

    Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and

  15. Tumor Touch Imprints as Source for Whole Genome Analysis of Neuroblastoma Tumors

    PubMed Central

    Brunner, Clemens; Brunner-Herglotz, Bettina; Ziegler, Andrea; Frech, Christian; Amann, Gabriele; Ladenstein, Ruth; Ambros, Inge M.; Ambros, Peter F.

    2016-01-01

    Introduction Tumor touch imprints (TTIs) are routinely used for the molecular diagnosis of neuroblastomas by interphase fluorescence in-situ hybridization (I-FISH). However, in order to facilitate a comprehensive, up-to-date molecular diagnosis of neuroblastomas and to identify new markers to refine risk and therapy stratification methods, whole genome approaches are needed. We examined the applicability of an ultra-high density SNP array platform that identifies copy number changes of varying sizes down to a few exons for the detection of genomic changes in tumor DNA extracted from TTIs. Material and Methods DNAs were extracted from TTIs of 46 neuroblastoma and 4 other pediatric tumors. The DNAs were analyzed on the Cytoscan HD SNP array platform to evaluate numerical and structural genomic aberrations. The quality of the data obtained from TTIs was compared to that from randomly chosen fresh or fresh frozen solid tumors (n = 212) and I-FISH validation was performed. Results SNP array profiles were obtained from 48 (out of 50) TTI DNAs of which 47 showed genomic aberrations. The high marker density allowed for single gene analysis, e.g. loss of nine exons in the ATRX gene and the visualization of chromothripsis. Data quality was comparable to fresh or fresh frozen tumor SNP profiles. SNP array results were confirmed by I-FISH. Conclusion TTIs are an excellent source for SNP array processing with the advantage of simple handling, distribution and storage of tumor tissue on glass slides. The minimal amount of tumor tissue needed to analyze whole genomes makes TTIs an economic surrogate source in the molecular diagnostic work up of tumor samples. PMID:27560999

  16. Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

    PubMed Central

    Baig, Abiyad; Weinert, Lucy A.; Peters, Sarah E.; Howell, Kate J.; Chaudhuri, Roy R.; Wang, Jinhong; Holden, Matthew T. G.; Parkhill, Julian; Langford, Paul R.; Rycroft, Andrew N.; Wren, Brendan W.; Tucker, Alexander W.; Maskell, Duncan J.

    2015-01-01

    Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here, we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN, and cpn60) did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70), of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further, sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species. PMID:26583006

  17. Whole-Genome Sequencing Identifies Emergence of a Quinolone Resistance Mutation in a Case of Stenotrophomonas maltophilia Bacteremia

    PubMed Central

    Pak, Theodore R.; Altman, Deena R.; Attie, Oliver; Sebra, Robert; Hamula, Camille L.; Lewis, Martha; Deikus, Gintaras; Newman, Leah C.; Fang, Gang; Hand, Jonathan; Patel, Gopi; Wallach, Fran; Schadt, Eric E.; Huprikar, Shirish; van Bakel, Harm; Bashir, Ali

    2015-01-01

    Whole-genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole-genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy. PMID:26324280

  18. Whole-genome sequencing identifies emergence of a quinolone resistance mutation in a case of Stenotrophomonas maltophilia bacteremia.

    PubMed

    Pak, Theodore R; Altman, Deena R; Attie, Oliver; Sebra, Robert; Hamula, Camille L; Lewis, Martha; Deikus, Gintaras; Newman, Leah C; Fang, Gang; Hand, Jonathan; Patel, Gopi; Wallach, Fran; Schadt, Eric E; Huprikar, Shirish; van Bakel, Harm; Kasarskis, Andrew; Bashir, Ali

    2015-11-01

    Whole-genome sequences for Stenotrophomonas maltophilia serial isolates from a bacteremic patient before and after development of levofloxacin resistance were assembled de novo and differed by one single-nucleotide variant in smeT, a repressor for multidrug efflux operon smeDEF. Along with sequenced isolates from five contemporaneous cases, they displayed considerable diversity compared against all published complete genomes. Whole-genome sequencing and complete assembly can conclusively identify resistance mechanisms emerging in S. maltophilia strains during clinical therapy. PMID:26324280

  19. Understanding the Quorum-Sensing Bacterium Pantoea stewartii Strain M009 with Whole-Genome Sequencing Analysis

    PubMed Central

    Tan, Wen-Si; Chang, Chien-Yi; Yin, Wai-Fong

    2015-01-01

    Pantoea stewartii is known to be the causative agent of Stewart’s wilt, which usually affects sweet corn (Zea mays) with the corn flea beetle as the transmission vector. In this work, we present the whole-genome sequence of Pantoea stewartii strain M009, isolated from a Malaysian tropical rainforest waterfall. PMID:25635007

  20. Whole-Genome Sequence of Fish-Pathogenic Mycobacterium sp. Strain 012931, Isolated from Yellowtail (Seriola quinqueradiata).

    PubMed

    Kurokawa, Satoru; Kabayama, Jun; Nho, Seong Won; Hwang, Seong Don; Hikima, Jun-Ichi; Jung, Tae Sung; Kondo, Hidehiro; Hirono, Ikuo; Takeyama, Haruko; Aoki, Takashi

    2013-01-01

    The genus Mycobacterium comprises a large number of well-characterized species, several of which are human and animal pathogens. Here, we report the whole-genome sequence of Mycobacterium sp. strain 012931, a fish pathogen responsible for huge losses in aquaculture farms in Japan. The strain was isolated from a marine fish, yellowtail (Seriola quinqueradiata). PMID:23929466

  1. Whole genome amplification induced bias in the detection of KRAS-mutated cell populations during colorectal carcinoma tissue testing.

    PubMed

    Stranska, Jana; Jancik, Sylwia; Slavkovsky, Rastislav; Holinkova, Veronika; Rabcanova, Miroslava; Vojta, Petr; Hajduch, Marian; Drabek, Jiri

    2015-03-01

    Whole genome amplification replicates the entire DNA content of a sample and can thus help to circumvent material limitations when insufficient DNA is available for planned genetic analyses. However, there are conflicting data in the literature whether whole genome amplification introduces bias or reflects precisely the spectrum of starting DNA. We analyzed the origins of discrepancies in KRAS (Kirsten rat sarcoma viral oncogene homolog gene) mutation detection in six of ten samples amplified using the GenomePlex® Tissue Whole Genome Amplification kit 5 (WGA5; Sigma-Aldrich, St. Louis, MO, USA) and KRAS StripAssay® (KRAS SA; ViennaLab Diagnostics, Vienna, Austria). We undertook reextraction, reamplification, retyping, authentication, reanalysis, and reinterpretation to determine whether the discrepancies originated during the preanalytical, analytical, and/or interpretative phase of genotyping. We conclude that a combination of glass slide/sample heterogeneity and biased amplification due to stochastic effects in the early phases of whole genome amplification (WGA) may have adversely affected the results obtained. Our findings are relevant for both forensic genetics testing and massively parallel sequencing using preamplification. PMID:25655305

  2. Whole genome association mapping of grain shape variation among Oryza sativa L. germplasms based on elliptic Fourier analysis

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Although grain shape is an important cereal breeding target, it has been evaluated using simple measurements, e.g. the length : width ratio. We used elliptic Fourier analysis to evaluate grain shape variation and conducted whole genome association mapping of grain shape using a germplasm collectio...

  3. Whole-Genome Sequences of 15 Strains of Staphylococcus aureus subsp. aureus Isolated from Foodstuff and Human Clinical Samples.

    PubMed

    Crovadore, Julien; Calmin, Gautier; Tonacini, Jenna; Chablais, Romain; Baumgartner, Andreas; Schnyder, Bruno; Hodille, Elisabeth; Lefort, François

    2015-01-01

    The whole-genome sequences of 15 strains of Staphylococcus aureus (10 strains isolated from foodstuff samples in Switzerland and five from human clinical samples) were obtained by Illumina sequencing. Most strains fit within the known diversity for the species, but one (SA-120) possessed a higher G+C content and a higher number of genes than usual. PMID:26112789

  4. A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance

    PubMed Central

    Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole

    2016-01-01

    Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available. PMID:27327771

  5. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus

    PubMed Central

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-01-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. PMID:26981432

  6. A searchable, whole genome resource designed for protein variant analysis in diverse lineages of U.S. beef cattle

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A key feature of a gene's function is the variety of protein isoforms it encodes in a population. However, the genetic diversity in bovine whole genome databases tends to be underrepresented because these databases contain an abundance of sequence from the most influential sires. Our first aim was ...

  7. Whole-Genome Sequence of Ralstonia solanacearum P673, a Strain Capable of Infecting Tomato Plants at Low Temperatures

    PubMed Central

    Huguet-Tapia, Jose C.

    2014-01-01

    Ralstonia solanacearum is the causal agent of bacterial wilt, one of the most destructive bacterial plant diseases. We present the whole-genome sequence of the strain P673 (phylotype IIB, sequevar 4). This strain is capable of producing disease in tomato plants at low temperatures. P673 has 311 unique genes. PMID:24558246

  8. GenoFrag: software to design primers optimized for whole genome scanning by long-range PCR amplification

    PubMed Central

    Ben Zakour, Nouri; Gautier, Michel; Andonov, Rumen; Lavenier, Dominique; Cochet, Marie-Françoise; Veber, Philippe; Sorokin, Alexei; Le Loir, Yves

    2004-01-01

    Genome sequence data can be used to analyze genome plasticity by whole genome PCR scanning. Small sized chromosomes can indeed be fully amplified by long-range PCR with a set of primers designed using a reference strain and applied to several other strains. Analysis of the resulting patterns can reveal the genome plasticity. To facilitate such analysis, we have developed GenoFrag, a software package for the design of primers optimized for whole genome scanning by long-range PCR. GenoFrag was developed for the analysis of Staphylococcus aureus genome plasticity by whole genome amplification in ∼10 kb-long fragments. A set of primers was generated from the genome sequence of S.aureus N315, employed here as a reference strain. Two subsets of primers were successfully used to amplify two portions of the N315 chromosome. This experimental validation demonstrates that GenoFrag is a robust and reliable tool for primer design and that whole genome PCR scanning can be envisaged for the analysis of genome diversity in S.aureus, one of the major public health concerns worldwide. PMID:14704339

  9. Whole-Genome Sequencing Reveals a New Genospecies of Methylobacterium sp. GXS13, Isolated from Vitis vinifera L. Xylem Sap

    PubMed Central

    Lai, Wan Xin; Gan, Han Ming; Hudson, André O.

    2016-01-01

    The whole-genome sequence of a new genospecies of Methylobacterium sp., named GXS13 and isolated from grapevine xylem sap, is reported and demonstrates potential for methylotrophy, cytokinin synthesis, and cell wall modification. In addition, biosynthetic gene clusters were identified for cupriachelin, carotenoid, and acyl-homoserine lactone using the antiSMASH server. PMID:26847900

  10. A whole genome sequence of ‘Candidatus Liberibacter asiaticus’ from Guangdong, China, where HLB was first described

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Citrus Huanglongbing (HLB, yellow shoot disease) has been endemic in Guangdong Province, China, for >100 years. “Candidatus Liberibacter asiaticus” (CLas) is a putative pathogen of HLB and currently unculturable. Here, a draft whole genome sequence of CLas strain A4 from Guangdong is presented. Stra...

  11. Whole genome sequencing of Candidatus Liberibacter asiaticus strain A4 from Guangdong, China, and strain HHCA from California

    Technology Transfer Automated Retrieval System (TEKTRAN)

    “Candidatus Liberibacter asiaticus” is associated with citrus Huanglongbing (HLB) in both China and the United States. While HLB has been known for over a century in Guangdong, China, the disease was first discovered in California in 2012. To better study the “old” and “new” HLBs, whole genomes of “...

  12. Use of Whole-Genome Sequencing to Link Burkholderia pseudomallei from Air Sampling to Mediastinal Melioidosis, Australia

    PubMed Central

    Price, Erin P.; Mayo, Mark; Kaestli, Mirjam; Theobald, Vanessa; Harrington, Ian; Harrington, Glenda; Sarovich, Derek S.

    2015-01-01

    The frequency with which melioidosis results from inhalation rather than percutaneous inoculation or ingestion is unknown. We recovered Burkholderia pseudomallei from air samples at the residence of a patient with presumptive inhalational melioidosis and used whole-genome sequencing to link the environmental bacteria to B. pseudomallei recovered from the patient. PMID:26488732

  13. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    PubMed

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Nielsen, Mette T; Rosenqvist Lund, Birthe S; Ameh, James A; Ambali, Abdul G; Sørensen, Gitte; Le Hello, Simon; Aarestrup, Frank M; Hendriksen, Rene S

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections. PMID:27228329

  14. Detection and Whole-Genome Sequencing of Carbapenemase-Producing Aeromonas hydrophila Isolates from Routine Perirectal Surveillance Culture.

    PubMed

    Hughes, Heather Y; Conlan, Sean P; Lau, Anna F; Dekker, John P; Michelin, Angela V; Youn, Jung-Ho; Henderson, David K; Frank, Karen M; Segre, Julia A; Palmore, Tara N

    2016-04-01

    Perirectal surveillance cultures and a stool culture grewAeromonasspecies from three patients over a 6-week period and were without epidemiological links. Detection of theblaKPC-2gene in one isolate prompted inclusion of non-Enterobacteriaceaein our surveillance culture workup. Whole-genome sequencing confirmed that the isolates were unrelated and provided data forAeromonasreference genomes. PMID:26888898

  15. Use of Whole-Genome Sequencing to Link Burkholderia pseudomallei from Air Sampling to Mediastinal Melioidosis, Australia.

    PubMed

    Currie, Bart J; Price, Erin P; Mayo, Mark; Kaestli, Mirjam; Theobald, Vanessa; Harrington, Ian; Harrington, Glenda; Sarovich, Derek S

    2015-11-01

    The frequency with which melioidosis results from inhalation rather than percutaneous inoculation or ingestion is unknown. We recovered Burkholderia pseudomallei from air samples at the residence of a patient with presumptive inhalational melioidosis and used whole-genome sequencing to link the environmental bacteria to B. pseudomallei recovered from the patient. PMID:26488732

  16. Whole-Genome Sequences of 15 Strains of Staphylococcus aureus subsp. aureus Isolated from Foodstuff and Human Clinical Samples

    PubMed Central

    Crovadore, Julien; Calmin, Gautier; Tonacini, Jenna; Chablais, Romain; Baumgartner, Andreas; Schnyder, Bruno; Hodille, Elisabeth

    2015-01-01

    The whole-genome sequences of 15 strains of Staphylococcus aureus (10 strains isolated from foodstuff samples in Switzerland and five from human clinical samples) were obtained by Illumina sequencing. Most strains fit within the known diversity for the species, but one (SA-120) possessed a higher G+C content and a higher number of genes than usual. PMID:26112789

  17. Whole-genome sequence of Clostridium lituseburense L74, isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus.

    PubMed

    Lee, Yookyung; Lim, Sooyeon; Rhee, Moon-Soo; Chang, Dong-Ho; Kim, Byoung-Chan

    2016-03-01

    Clostridium lituseburense L74 was isolated from the larval gut of the rhinoceros beetle, Trypoxylus dichotomus collected in Yeong-dong, Chuncheongbuk-do, South Korea and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession NZ_LITJ00000000. PMID:26981432

  18. Whole-Genome Draft Sequences of Six Commensal Fecal and Six Mastitis-Associated Escherichia coli Strains of Bovine Origin.

    PubMed

    Leimbach, Andreas; Poehlein, Anja; Witten, Anika; Wellnitz, Olga; Shpigel, Nahum; Petzl, Wolfram; Zerbe, Holm; Daniel, Rolf; Dobrindt, Ulrich

    2016-01-01

    The bovine gastrointestinal tract is a natural reservoir for commensal and pathogenic Escherichia coli strains with the ability to cause mastitis. Here, we report the whole-genome sequences of six E. coli isolates from acute mastitis cases and six E. coli isolates from the feces of udder-healthy cows. PMID:27469942

  19. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    PubMed Central

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Nielsen, Mette T.; Rosenqvist Lund, Birthe S.; Ameh, James A.; Ambali, Abdul G.; Sørensen, Gitte; Le Hello, Simon; Aarestrup, Frank M.; Hendriksen, Rene S.

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections. PMID:27228329

  20. Whole-Genome Draft Sequences of Six Commensal Fecal and Six Mastitis-Associated Escherichia coli Strains of Bovine Origin

    PubMed Central

    Leimbach, Andreas; Witten, Anika; Wellnitz, Olga; Shpigel, Nahum; Petzl, Wolfram; Zerbe, Holm; Daniel, Rolf

    2016-01-01

    The bovine gastrointestinal tract is a natural reservoir for commensal and pathogenic Escherichia coli strains with the ability to cause mastitis. Here, we report the whole-genome sequences of six E. coli isolates from acute mastitis cases and six E. coli isolates from the feces of udder-healthy cows. PMID:27469942

  1. Whole-Genome Sequences of Two Campylobacter coli Isolates from the Antimicrobial Resistance Monitoring Program in Colombia

    PubMed Central

    Bernal, Johan F.; Donado-Godoy, Pilar; Valencia, María Fernanda; León, Maribel; Gómez, Yolanda; Rodríguez, Fernando; Agarwala, Richa; Landsman, David

    2016-01-01

    Campylobacter coli, along with Campylobacter jejuni, is a major agent of gastroenteritis and acute enterocolitis in humans. We report the whole-genome sequences of two multidrug-resistance C. coli strains, isolated from the Colombian poultry chain. The isolates contain a variety of antimicrobial resistance genes for aminoglycosides, lincosamides, fluoroquinolones, and tetracycline. PMID:26988048

  2. Understanding the Quorum-Sensing Bacterium Pantoea stewartii Strain M009 with Whole-Genome Sequencing Analysis.

    PubMed

    Tan, Wen-Si; Chang, Chien-Yi; Yin, Wai-Fong; Chan, Kok-Gan

    2015-01-01

    Pantoea stewartii is known to be the causative agent of Stewart's wilt, which usually affects sweet corn (Zea mays) with the corn flea beetle as the transmission vector. In this work, we present the whole-genome sequence of Pantoea stewartii strain M009, isolated from a Malaysian tropical rainforest waterfall. PMID:25635007

  3. Whole-genome resequencing of Hanwoo (Korean cattle) and insight into regions of homozygosity

    PubMed Central

    2013-01-01

    Background Hanwoo (Korean cattle), which originated from natural crossbreeding between taurine and zebu cattle, migrated to the Korean peninsula through North China. Hanwoo were raised as draft animals until the 1970s without the introduction of foreign germplasm. Since 1979, Hanwoo has been bred as beef cattle. Genetic variation was analyzed by whole-genome deep resequencing of a Hanwoo bull. The Hanwoo genome was compared to that of two other breeds, Black Angus and Holstein, and genes within regions of homozygosity were investigated to elucidate the genetic and genomic characteristics of Hanwoo. Results The Hanwoo bull genome was sequenced to 45.6-fold coverage using the ABI SOLiD system. In total, 4.7 million single-nucleotide polymorphisms and 0.4 million small indels were identified by comparison with the Btau4.0 reference assembly. Of the total number of SNPs and indels, 58% and 87%, respectively, were novel. The overall genotype concordance between the SNPs and BovineSNP50 BeadChip data was 96.4%. Of 1.6 million genetic differences in Hanwoo, approximately 25,000 non-synonymous SNPs, splice-site variants, and coding indels (NS/SS/Is) were detected in 8,360 genes. Among 1,045 genes containing reliable specific NS/SS/Is in Hanwoo, 109 genes contained more than one novel damaging NS/SS/I. Of the genes containing NS/SS/Is, 610 genes were assigned as trait-associated genes. Moreover, 16, 78, and 51 regions of homozygosity (ROHs) were detected in Hanwoo, Black Angus, and Holstein, respectively. ‘Regulation of actin filament length’ was revealed as a significant gene ontology term and 25 trait-associated genes for meat quality and disease resistance were found in 753 genes that resided in the ROHs of Hanwoo. In Hanwoo, 43 genes were located in common ROHs between whole-genome resequencing and SNP chips in BTA2, 10, and 13 coincided with quantitative trait loci for meat fat traits. In addition, the common ROHs in BTA2 and 16 were in agreement between Hanwoo and

  4. Whole-genome sequencing to delineate Mycobacterium tuberculosis outbreaks: a retrospective observational study

    PubMed Central

    Walker, Timothy M; Ip, Camilla LC; Harrell, Ruth H; Evans, Jason T; Kapatai, Georgia; Dedicoat, Martin J; Eyre, David W; Wilson, Daniel J; Hawkey, Peter M; Crook, Derrick W; Parkhill, Julian; Harris, David; Walker, A Sarah; Bowden, Rory; Monk, Philip; Smith, E Grace; Peto, Tim EA

    2013-01-01

    Summary Background Tuberculosis incidence in the UK has risen in the past decade. Disease control depends on epidemiological data, which can be difficult to obtain. Whole-genome sequencing can detect microevolution within Mycobacterium tuberculosis strains. We aimed to estimate the genetic diversity of related M tuberculosis strains in the UK Midlands and to investigate how this measurement might be used to investigate community outbreaks. Methods In a retrospective observational study, we used Illumina technology to sequence M tuberculosis genomes from an archive of frozen cultures. We characterised isolates into four groups: cross-sectional, longitudinal, household, and community. We measured pairwise nucleotide differences within hosts and between hosts in household outbreaks and estimated the rate of change in DNA sequences. We used the findings to interpret network diagrams constructed from 11 community clusters derived from mycobacterial interspersed repetitive-unit–variable-number tandem-repeat data. Findings We sequenced 390 separate isolates from 254 patients, including representatives from all five major lineages of M tuberculosis. The estimated rate of change in DNA sequences was 0·5 single nucleotide polymorphisms (SNPs) per genome per year (95% CI 0·3–0·7) in longitudinal isolates from 30 individuals and 25 families. Divergence is rarely higher than five SNPs in 3 years. 109 (96%) of 114 paired isolates from individuals and households differed by five or fewer SNPs. More than five SNPs separated isolates from none of 69 epidemiologically linked patients, two (15%) of 13 possibly linked patients, and 13 (17%) of 75 epidemiologically unlinked patients (three-way comparison exact p<0·0001). Genetic trees and clinical and epidemiological data suggest that super-spreaders were present in two community clusters. Interpretation Whole-genome sequencing can delineate outbreaks of tuberculosis and allows inference about direction of transmission between

  5. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture.

    PubMed

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C P G M; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Amin, Najaf; van Duijn, Cornelia M; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce B J; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia M T; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

    2015-10-01

    The extent to which low-frequency (minor allele frequency (MAF) between 1-5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is mainly unknown. Bone mineral density (BMD) is highly heritable, a major predictor of osteoporotic fractures, and has been previously associated with common genetic variants, as well as rare, population-specific, coding variants. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication genotyping (n = 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size fourfold larger than the mean of previously reported common variants for lumbar spine BMD (rs11692564(T), MAF = 1.6%, replication effect size = +0.20 s.d., Pmeta = 2 × 10(-14)), which was also associated with a decreased risk of fracture (odds ratio = 0.85; P = 2 × 10(-11); ncases = 98,742 and ncontrols = 409,511). Using an En1(cre/flox) mouse model, we observed that conditional loss of En1 results in low bone mass, probably as a consequence of high bone turnover. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817(T), MAF = 1.2%, replication effect size = +0.41 s.d., Pmeta = 1 × 10(-11)). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture

  6. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

    PubMed Central

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang; Estrada, Karol; Rosello-Diez, Alberto; Leo, Paul J; Dahia, Chitra L; Park-Min, Kyung Hyun; Tobias, Jonathan H; Kooperberg, Charles; Kleinman, Aaron; Styrkarsdottir, Unnur; Liu, Ching-Ti; Uggla, Charlotta; Evans, Daniel S; Nielson, Carrie M; Walter, Klaudia; Pettersson-Kymmer, Ulrika; McCarthy, Shane; Eriksson, Joel; Kwan, Tony; Jhamai, Mila; Trajanoska, Katerina; Memari, Yasin; Min, Josine; Huang, Jie; Danecek, Petr; Wilmot, Beth; Li, Rui; Chou, Wen-Chi; Mokry, Lauren E; Moayyeri, Alireza; Claussnitzer, Melina; Cheng, Chia-Ho; Cheung, Warren; Medina-Gómez, Carolina; Ge, Bing; Chen, Shu-Huang; Choi, Kwangbom; Oei, Ling; Fraser, James; Kraaij, Robert; Hibbs, Matthew A; Gregson, Celia L; Paquette, Denis; Hofman, Albert; Wibom, Carl; Tranah, Gregory J; Marshall, Mhairi; Gardiner, Brooke B; Cremin, Katie; Auer, Paul; Hsu, Li; Ring, Sue; Tung, Joyce Y; Thorleifsson, Gudmar; Enneman, Anke W; van Schoor, Natasja M; de Groot, Lisette C.P.G.M.; van der Velde, Nathalie; Melin, Beatrice; Kemp, John P; Christiansen, Claus; Sayers, Adrian; Zhou, Yanhua; Calderari, Sophie; van Rooij, Jeroen; Carlson, Chris; Peters, Ulrike; Berlivet, Soizik; Dostie, Josée; Uitterlinden, Andre G; Williams, Stephen R.; Farber, Charles; Grinberg, Daniel; LaCroix, Andrea Z; Haessler, Jeff; Chasman, Daniel I; Giulianini, Franco; Rose, Lynda M; Ridker, Paul M; Eisman, John A; Nguyen, Tuan V; Center, Jacqueline R; Nogues, Xavier; Garcia-Giralt, Natalia; Launer, Lenore L; Gudnason, Vilmunder; Mellström, Dan; Vandenput, Liesbeth; Karlsson, Magnus K; Ljunggren, Östen; Svensson, Olle; Hallmans, Göran; Rousseau, François; Giroux, Sylvie; Bussière, Johanne; Arp, Pascal P; Koromani, Fjorda; Prince, Richard L; Lewis, Joshua R; Langdahl, Bente L; Hermann, A Pernille; Jensen, Jens-Erik B; Kaptoge, Stephen; Khaw, Kay-Tee; Reeve, Jonathan; Formosa, Melissa M; Xuereb-Anastasi, Angela; Åkesson, Kristina; McGuigan, Fiona E; Garg, Gaurav; Olmos, Jose M; Zarrabeitia, Maria T; Riancho, Jose A; Ralston, Stuart H; Alonso, Nerea; Jiang, Xi; Goltzman, David; Pastinen, Tomi; Grundberg, Elin; Gauguier, Dominique; Orwoll, Eric S; Karasik, David; Davey-Smith, George; Smith, Albert V; Siggeirsdottir, Kristin; Harris, Tamara B; Zillikens, M Carola; van Meurs, Joyce BJ; Thorsteinsdottir, Unnur; Maurano, Matthew T; Timpson, Nicholas J; Soranzo, Nicole; Durbin, Richard; Wilson, Scott G; Ntzani, Evangelia E; Brown, Matthew A; Stefansson, Kari; Hinds, David A; Spector, Tim; Cupples, L Adrienne; Ohlsson, Claes; Greenwood, Celia MT; Jackson, Rebecca D; Rowe, David W; Loomis, Cynthia A; Evans, David M; Ackert-Bicknell, Cheryl L; Joyner, Alexandra L; Duncan, Emma L; Kiel, Douglas P; Rivadeneira, Fernando; Richards, J Brent

    2016-01-01

    SUMMARY The extent to which low-frequency (minor allele frequency [MAF] between 1–5%) and rare (MAF ≤ 1%) variants contribute to complex traits and disease in the general population is largely unknown. Bone mineral density (BMD) is highly heritable, is a major predictor of osteoporotic fractures and has been previously associated with common genetic variants1–8, and rare, population-specific, coding variants9. Here we identify novel non-coding genetic variants with large effects on BMD (ntotal = 53,236) and fracture (ntotal = 508,253) in individuals of European ancestry from the general population. Associations for BMD were derived from whole-genome sequencing (n=2,882 from UK10K), whole-exome sequencing (n= 3,549), deep imputation of genotyped samples using a combined UK10K/1000Genomes reference panel (n=26,534), and de-novo replication genotyping (n= 20,271). We identified a low-frequency non-coding variant near a novel locus, EN1, with an effect size 4-fold larger than the mean of previously reported common variants for lumbar spine BMD8 (rs11692564[T], MAF = 1.7%, replication effect size = +0.20 standard deviations [SD], Pmeta = 2×10−14), which was also associated with a decreased risk of fracture (OR = 0.85; P = 2×10−11; ncases = 98,742 and ncontrols = 409,511). Using an En1Cre/flox mouse model, we observed that conditional loss of En1 results in low bone mass, likely as a consequence of high bone turn-over. We also identified a novel low-frequency non-coding variant with large effects on BMD near WNT16 (rs148771817[T], MAF = 1.1%, replication effect size = +0.39 SD, Pmeta = 1×10−11). In general, there was an excess of association signals arising from deleterious coding and conserved non-coding variants. These findings provide evidence that low-frequency non-coding variants have large effects on BMD and fracture, thereby providing rationale for whole-genome sequencing and improved imputation reference panels to study the genetic architecture of

  7. Ethical and legal implications of whole genome and whole exome sequencing in African populations

    PubMed Central

    2013-01-01

    Background Rapid advances in high throughput genomic technologies and next generation sequencing are making medical genomic research more readily accessible and affordable, including the sequencing of patient and control whole genomes and exomes in order to elucidate genetic factors underlying disease. Over the next five years, the Human Heredity and Health in Africa (H3Africa) Initiative, funded by the Wellcome Trust (United Kingdom) and the National Institutes of Health (United States of America), will contribute greatly towards sequencing of numerous African samples for biomedical research. Discussion Funding agencies and journals often require submission of genomic data from research participants to databases that allow open or controlled data access for all investigators. Access to such genotype-phenotype and pedigree data, however, needs careful control in order to prevent identification of individuals or families. This is particularly the case in Africa, where many researchers and their patients are inexperienced in the ethical issues accompanying whole genome and exome research; and where an historical unidirectional flow of samples and data out of Africa has created a sense of exploitation and distrust. In the current study, we analysed the implications of the anticipated surge of next generation sequencing data in Africa and the subsequent data sharing concepts on the protection of privacy of research subjects. We performed a retrospective analysis of the informed consent process for the continent and the rest-of-the-world and examined relevant legislation, both current and proposed. We investigated the following issues: (i) informed consent, including guidelines for performing culturally-sensitive next generation sequencing research in Africa and availability of suitable informed consent documents; (ii) data security and subject privacy whilst practicing data sharing; (iii) conveying the implications of such concepts to research participants in resource

  8. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction.

    PubMed

    Brøndum, R F; Su, G; Janss, L; Sahana, G; Guldbrandtsen, B; Boichard, D; Lund, M S

    2015-06-01

    This study investigated the effect on the reliability of genomic prediction when a small number of significant variants from single marker analysis based on whole genome sequence data were added to the regular 54k single nucleotide polymorphism (SNP) array data. The extra markers were selected with the aim of augmenting the custom low-density Illumina BovineLD SNP chip (San Diego, CA) used in the Nordic countries. The single-marker analysis was done breed-wise on all 16 index traits included in the breeding goals for Nordic Holstein, Danish Jersey, and Nordic Red cattle plus the total merit index itself. Depending on the trait's economic weight, 15, 10, or 5 quantitative trait loci (QTL) were selected per trait per breed and 3 to 5 markers were selected to tag each QTL. After removing duplicate markers (same marker selected for more than one trait or breed) and filtering for high pairwise linkage disequilibrium and assaying performance on the array, a total of 1,623 QTL markers were selected for inclusion on the custom chip. Genomic prediction analyses were performed for Nordic and French Holstein and Nordic Red animals using either a genomic BLUP or a Bayesian variable selection model. When using the genomic BLUP model including the QTL markers in the analysis, reliability was increased by up to 4 percentage points for production traits in Nordic Holstein animals, up to 3 percentage points for Nordic Reds, and up to 5 percentage points for French Holstein. Smaller gains of up to 1 percentage point was observed for mastitis, but only a 0.5 percentage point increase was seen for fertility. When using a Bayesian model accuracies were generally higher with only 54k data compared with the genomic BLUP approach, but increases in reliability were relatively smaller when QTL markers were included. Results from this study indicate that the reliability of genomic prediction can be increased by including markers significant in genome-wide association studies on whole genome

  9. Effective normalization for copy number variation detection from whole genome sequencing

    PubMed Central

    2012-01-01

    Background Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. Methods We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. Results The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable

  10. Whole-genome sequencing reveals the effect of vaccination on the evolution of Bordetella pertussis

    PubMed Central

    Xu, Yinghua; Liu, Bin; Gröndahl-Yli-Hannuksila, Kirsi; Tan, Yajun; Feng, Lu; Kallonen, Teemu; Wang, Lichan; Peng, Ding; He, Qiushui; Wang, Lei; Zhang, Shumin

    2015-01-01

    Herd immunity can potentially induce a change of circulating viruses. However, it remains largely unknown that how bacterial pathogens adapt to vaccination. In this study, Bordetella pertussis, the causative agent of whooping cough, was selected as an example to explore possible effect of vaccination on the bacterial pathogen. We sequenced and analysed the complete genomes of 40 B. pertussis strains from Finland and China, as well as 11 previously sequenced strains from the Netherlands, where different vaccination strategies have been used over the past 50 years. The results showed that the molecular clock moved at different rates in these countries and in distinct periods, which suggested that evolution of the B. pertussis population was closely associated with the country vaccination coverage. Comparative whole-genome analyses indicated that evolution in this human-restricted pathogen was mainly characterised by ongoing genetic shift and gene loss. Furthermore, 116 SNPs were specifically detected in currently circulating ptxP3-containing strains. The finding might explain the successful emergence of this lineage and its spread worldwide. Collectively, our results suggest that the immune pressure of vaccination is one major driving force for the evolution of B. pertussis, which facilitates further exploration of the pathogenicity of B. pertussis. PMID:26283022

  11. Whole-genome sequencing reveals the effect of vaccination on the evolution of Bordetella pertussis.

    PubMed

    Xu, Yinghua; Liu, Bin; Gröndahl-Yli-Hannuksila, Kirsi; Tan, Yajun; Feng, Lu; Kallonen, Teemu; Wang, Lichan; Peng, Ding; He, Qiushui; Wang, Lei; Zhang, Shumin

    2015-01-01

    Herd immunity can potentially induce a change of circulating viruses. However, it remains largely unknown that how bacterial pathogens adapt to vaccination. In this study, Bordetella pertussis, the causative agent of whooping cough, was selected as an example to explore possible effect of vaccination on the bacterial pathogen. We sequenced and analysed the complete genomes of 40 B. pertussis strains from Finland and China, as well as 11 previously sequenced strains from the Netherlands, where different vaccination strategies have been used over the past 50 years. The results showed that the molecular clock moved at different rates in these countries and in distinct periods, which suggested that evolution of the B. pertussis population was closely associated with the country vaccination coverage. Comparative whole-genome analyses indicated that evolution in this human-restricted pathogen was mainly characterised by ongoing genetic shift and gene loss. Furthermore, 116 SNPs were specifically detected in currently circulating ptxP3-containing strains. The finding might explain the successful emergence of this lineage and its spread worldwide. Collectively, our results suggest that the immune pressure of vaccination is one major driving force for the evolution of B. pertussis, which facilitates further exploration of the pathogenicity of B. pertussis. PMID:26283022

  12. C. elegans whole-genome sequencing reveals mutational signatures related to carcinogens and DNA repair deficiency

    PubMed Central

    Meier, Bettina; Cooke, Susanna L.; Weiss, Joerg; Bailly, Aymeric P.; Alexandrov, Ludmil B.; Marshall, John; Raine, Keiran; Maddison, Mark; Anderson, Elizabeth; Stratton, Michael R.; Campbell, Peter J.

    2014-01-01

    Mutation is associated with developmental and hereditary disorders, aging, and cancer. While we understand some mutational processes operative in human disease, most remain mysterious. We used Caenorhabditis elegans whole-genome sequencing to model mutational signatures, analyzing 183 worm populations across 17 DNA repair-deficient backgrounds propagated for 20 generations or exposed to carcinogens. The baseline mutation rate in C. elegans was approximately one per genome per generation, not overtly altered across several DNA repair deficiencies over 20 generations. Telomere erosion led to complex chromosomal rearrangements initiated by breakage–fusion–bridge cycles and completed by simultaneously acquired, localized clusters of breakpoints. Aflatoxin B1 induced substitutions of guanines in a GpC context, as observed in aflatoxin-induced liver cancers. Mutational burden increased with impaired nucleotide excision repair. Cisplatin and mechlorethamine, DNA crosslinking agents, caused dose- and genotype-dependent signatures among indels, substitutions, and rearrangements. Strikingly, both agents induced clustered rearrangements resembling “chromoanasynthesis,” a replication-based mutational signature seen in constitutional genomic disorders, suggesting that interstrand crosslinks may play a pathogenic role in such events. Cisplatin mutagenicity was most pronounced in xpf-1 mutants, suggesting that this gene critically protects cells against platinum chemotherapy. Thus, experimental model systems combined with genome sequencing can recapture and mechanistically explain mutational signatures associated with human disease. PMID:25030888

  13. Prospective Whole-Genome Sequencing Enhances National Surveillance of Listeria monocytogenes.

    PubMed

    Kwong, Jason C; Mercoulia, Karolina; Tomita, Takehiro; Easton, Marion; Li, Hua Y; Bulach, Dieter M; Stinear, Timothy P; Seemann, Torsten; Howden, Benjamin P

    2016-02-01

    Whole-genome sequencing (WGS) has emerged as a powerful tool for comparing bacterial isolates in outbreak detection and investigation. Here we demonstrate that WGS performed prospectively for national epidemiologic surveillance of Listeria monocytogenes has the capacity to be superior to our current approaches using pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable-number tandem-repeat analysis (MLVA), binary typing, and serotyping. Initially 423 L. monocytogenes isolates underwent WGS, and comparisons uncovered a diverse genetic population structure derived from three distinct lineages. MLST, binary typing, and serotyping results inferred in silico from the WGS data were highly concordant (>99%) with laboratory typing performed in parallel. However, WGS was able to identify distinct nested clusters within groups of isolates that were otherwise indistinguishable using our current typing methods. Routine WGS was then used for prospective epidemiologic surveillance on a further 97 L. monocytogenes isolates over a 12-month period, which provided a greater level of discrimination than that of conventional typing for inferring linkage to point source outbreaks. A risk-based alert system based on WGS similarity was used to inform epidemiologists required to act on the data. Our experience shows that WGS can be adopted for prospective L. monocytogenes surveillance and investigated for other pathogens relevant to public health. PMID:26607978

  14. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis

    PubMed Central

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  15. Early vertebrate whole genome duplications were predated by a period of intense genome rearrangement

    PubMed Central

    Hufton, Andrew L.; Groth, Detlef; Vingron, Martin; Lehrach, Hans; Poustka, Albert J.; Panopoulou, Georgia

    2008-01-01

    Researchers, supported by data from polyploid plants, have suggested that whole genome duplication (WGD) may induce genomic instability and rearrangement, an idea which could have important implications for vertebrate evolution. Benefiting from the newly released amphioxus genome sequence (Branchiostoma floridae), an invertebrate that researchers have hoped is representative of the ancestral chordate genome, we have used gene proximity conservation to estimate rates of genome rearrangement throughout vertebrates and some of their invertebrate ancestors. We find that, while amphioxus remains the best single source of invertebrate information about the early chordate genome, its genome structure is not particularly well conserved and it cannot be considered a fossilization of the vertebrate preduplication genome. In agreement with previous reports, we identify two WGD events in early vertebrates and another in teleost fish. However, we find that the early vertebrate WGD events were not followed by increased rates of genome rearrangement. Indeed, we measure massive genome rearrangement prior to these WGD events. We propose that the vertebrate WGD events may have been symptoms of a preexisting predisposition toward genomic structural change. PMID:18625908

  16. Long-read, whole-genome shotgun sequence data for five model organisms

    PubMed Central

    Kim, Kristi E; Peluso, Paul; Babayan, Primo; Yeadon, P. Jane; Yu, Charles; Fisher, William W; Chin, Chen-Shan; Rapicavoli, Nicole A; Rank, David R; Li, Joachim; Catcheside, David E. A; Celniker, Susan E; Phillippy, Adam M; Bergman, Casey M; Landolin, Jane M

    2014-01-01

    Single molecule, real-time (SMRT) sequencing from Pacific Biosciences is increasingly used in many areas of biological research including de novo genome assembly, structural-variant identification, haplotype phasing, mRNA isoform discovery, and base-modification analyses. High-quality, public datasets of SMRT sequences can spur development of analytic tools that can accommodate unique characteristics of SMRT data (long read lengths, lack of GC or amplification bias, and a random error profile leading to high consensus accuracy). In this paper, we describe eight high-coverage SMRT sequence datasets from five organisms (Escherichia coli, Saccharomyces cerevisiae, Neurospora crassa, Arabidopsis thaliana, and Drosophila melanogaster) that have been publicly released to the general scientific community (NCBI Sequence Read Archive ID SRP040522). Data were generated using two sequencing chemistries (P4C2 and P5C3) on the PacBio RS II instrument. The datasets reported here can be used without restriction by the research community to generate whole-genome assemblies, test new algorithms, investigate genome structure and evolution, and identify base modifications in some of the most widely-studied model systems in biological research. PMID:25977796

  17. Application of Whole-Genome Sequencing for Bacterial Strain Typing in Molecular Epidemiology

    PubMed Central

    SenGupta, Dhruba J.; Cummings, Lisa A.; Land, Tyler A.; Hoogestraat, Daniel R.; Cookson, Brad T.

    2015-01-01

    Nosocomial infections pose a significant threat to patient health; however, the gold standard laboratory method for determining bacterial relatedness (pulsed-field gel electrophoresis [PFGE]) remains essentially unchanged 20 years after its introduction. Here, we explored bacterial whole-genome sequencing (WGS) as an alternative approach for molecular strain typing. We compared WGS to PFGE for investigating presumptive outbreaks involving three important pathogens: vancomycin-resistant Enterococcus faecium (n = 19), methicillin-resistant Staphylococcus aureus (n = 17), and Acinetobacter baumannii (n = 15). WGS was highly reproducible (average ≤ 0.39 differences between technical replicates), which enabled a functional, quantitative definition for determining clonality. Strain relatedness data determined by PFGE and WGS roughly correlated, but the resolution of WGS was superior (P = 5.6 × 10−8 to 0.016). Several discordant results were noted between the methods. A total of 28.9% of isolates which were indistinguishable by PFGE were nonclonal by WGS. For A. baumannii, a species known to undergo rapid horizontal gene transfer, 16.2% of isolate pairs considered nonidentical by PFGE were clonal by WGS. Sequencing whole bacterial genomes with single-nucleotide resolution demonstrates that PFGE is prone to false-positive and false-negative results and suggests the need for a new gold standard approach for molecular epidemiological strain typing. PMID:25631811

  18. Application of whole-genome sequencing for bacterial strain typing in molecular epidemiology.

    PubMed

    Salipante, Stephen J; SenGupta, Dhruba J; Cummings, Lisa A; Land, Tyler A; Hoogestraat, Daniel R; Cookson, Brad T

    2015-04-01

    Nosocomial infections pose a significant threat to patient health; however, the gold standard laboratory method for determining bacterial relatedness (pulsed-field gel electrophoresis [PFGE]) remains essentially unchanged 20 years after its introduction. Here, we explored bacterial whole-genome sequencing (WGS) as an alternative approach for molecular strain typing. We compared WGS to PFGE for investigating presumptive outbreaks involving three important pathogens: vancomycin-resistant Enterococcus faecium (n=19), methicillin-resistant Staphylococcus aureus (n=17), and Acinetobacter baumannii (n=15). WGS was highly reproducible (average≤0.39 differences between technical replicates), which enabled a functional, quantitative definition for determining clonality. Strain relatedness data determined by PFGE and WGS roughly correlated, but the resolution of WGS was superior (P=5.6×10(-8) to 0.016). Several discordant results were noted between the methods. A total of 28.9% of isolates which were indistinguishable by PFGE were nonclonal by WGS. For A. baumannii, a species known to undergo rapid horizontal gene transfer, 16.2% of isolate pairs considered nonidentical by PFGE were clonal by WGS. Sequencing whole bacterial genomes with single-nucleotide resolution demonstrates that PFGE is prone to false-positive and false-negative results and suggests the need for a new gold standard approach for molecular epidemiological strain typing. PMID:25631811

  19. Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

    PubMed

    Ma, Jun; Prince, Amanda; Aagaard, Kjersti M

    2014-01-01

    Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been recently conducted or are currently underway utilizing WGS. With the generation of vast amounts of data, the bioinformatics and computational analysis of WGS results become vital for the success of a metagenomics study. However, each step in the WGS data analysis, including metagenome assembly, gene prediction, taxonomy identification, function annotation, and pathway analysis, is complicated by the shear amount of data. Algorithms and tools have been developed specifically to handle WGS-generated metagenomics data with the hope of reducing the requirement on computational time and storage space. Here, we present an overview of the current state of metagenomics through WGS sequencing, challenges frequently encountered, and up-to-date solutions. Several applications that are uniquely applicable to microbiome studies in reproductive and perinatal medicine are also discussed. PMID:24390915

  20. The genomic landscape of fibrolamellar hepatocellular carcinoma: whole genome sequencing of ten patients

    PubMed Central

    Darcy, David G.; Chiaroni-Clarke, Rachel; Murphy, Jennifer M.; Honeyman, Joshua N.; Bhanot, Umesh; LaQuaglia, Michael P.; Simon, Sanford M.

    2015-01-01

    Fibrolamellar hepatocellular carcinoma is a rare, malignant liver tumor that often arises in the otherwise normal liver of adolescents and young adults. Previous studies have focused on biomarkers and comparisons to traditional hepatocellular carcinoma, and have yielded little data on the underlying pathophysiology. We performed whole genome sequencing on paired tumor and normal samples from 10 patients to identify recurrent mutations and structural variations that could predispose to oncogenesis. There are relatively few coding, somatic mutations in this cancer, putting it on the low end of the mutational spectrum. Aside from a previously described heterozygous deletion on chromosome 19 that encodes for a functional, chimeric protein, there were no other recurrent structural variations that contribute to the tumor genotype. The lack of a second-hit mutation in the genomic landscape of fibrolamellar hepatocellular carcinoma makes the DNAJB1-PRKACA fusion protein the best target for diagnostic and therapeutic advancements. The mutations, altered pathways and structural variants that characterized fibrolamellar hepatocellular carcinoma were distinct from those in hepatocellular carcinoma, further defining it as a distinct carcinoma. PMID:25605237

  1. Whole-genome bisulfite DNA sequencing of a DNMT3B mutant patient

    PubMed Central

    Heyn, Holger; Vidal, Enrique; Sayols, Sergi; Sanchez-Mut, Jose V.; Moran, Sebastian; Medina, Ignacio; Sandoval, Juan; Simó-Riudalbas, Laia; Szczesna, Karolina; Huertas, Dori; Gatto, Sole; Matarazzo, Maria R.; Dopazo, Joaquin; Esteller, Manel

    2012-01-01

    The immunodeficiency, centromere instability and facial anomalies (ICF) syndrome is associated to mutations of the DNA methyl-transferase DNMT3B, resulting in a reduction of enzyme activity. Aberrant expression of immune system genes and hypomethylation of pericentromeric regions accompanied by chromosomal instability were determined as alterations driving the disease phenotype. However, so far only technologies capable to analyze single loci were applied to determine epigenetic alterations in ICF patients. In the current study, we performed whole-genome bisulphite sequencing to assess alteration in DNA methylation at base pair resolution. Genome-wide we detected a decrease of methylation level of 42%, with the most profound changes occurring in inactive heterochromatic regions, satellite repeats and transposons. Interestingly, transcriptional active loci and ribosomal RNA repeats escaped global hypomethylation. Despite a genome-wide loss of DNA methylation the epigenetic landscape and crucial regulatory structures were conserved. Remarkably, we revealed a mislocated activity of mutant DNMT3B to H3K4me1 loci resulting in hypermethylation of active promoters. Functionally, we could associate alterations in promoter methylation with the ICF syndrome immunodeficient phenotype by detecting changes in genes related to the B-cell receptor mediated maturation pathway. PMID:22595875

  2. Whole genome analyses of marine fish pathogenic isolate, Mycobacterium sp. 012931.

    PubMed

    Kurokawa, Satoru; Kabayama, Jun; Hwang, Seong Don; Nho, Seong Won; Hikima, Jun-ichi; Jung, Tae Sung; Kondo, Hidehiro; Hirono, Ikuo; Takeyama, Haruko; Mori, Tetsushi; Aoki, Takashi

    2014-10-01

    Mycobacterium is a genus within the order Actinomycetales that comprises of a large number of well-characterized species, several of which includes pathogens known to cause serious disease in human and animal. Here, we report the whole genome sequence of Mycobacterium sp. strain 012931 isolated from the marine fish, yellowtail (Seriola quinqueradiata). Mycobacterium sp. 012931 is a fish pathogen causing serious damage to aquaculture farms in Japan. DNA dot plot analysis showed that Mycobacterium sp. 012931 was more closely related to Mycobacterium marinum when compared across several Mycobacterium species. However, little conservation of the gene order was observed between Mycobacterium sp. 012931 and M. marinum genome. The annotated 5,464 genes of Mycobacterium sp. 012931 was classified into 26 subsystems. The insertion/deletion gene analysis shows Mycobacterium sp. 012931 had 643 unique genes that were not found in the M. marinum strains. In the virulence, disease, and defense subsystem, both insertion and deletion genes of Mycobacterium sp. 012931 were associated with the PPE gene cluster of Mycobacteria. Of seven plcB genes in Mycobacterium sp. 012931, plcB_2 and plcB_3 showed low identities with those of M. marinum strains. Therefore, Mycobacterium sp. 012931 has differences on genetic and virulence from M. marinum and may induce different interaction mechanisms between host and pathogen. PMID:24879010

  3. Determinants of spontaneous mutation in the bacterium Escherichia coli as revealed by whole-genome sequencing

    PubMed Central

    Foster, Patricia L.; Lee, Heewook; Popodi, Ellen; Townes, Jesse P.; Tang, Haixu

    2015-01-01

    A complete understanding of evolutionary processes requires that factors determining spontaneous mutation rates and spectra be identified and characterized. Using mutation accumulation followed by whole-genome sequencing, we found that the mutation rates of three widely diverged commensal Escherichia coli strains differ only by about 50%, suggesting that a rate of 1–2 × 10−3 mutations per generation per genome is common for this bacterium. Four major forces are postulated to contribute to spontaneous mutations: intrinsic DNA polymerase errors, endogenously induced DNA damage, DNA damage caused by exogenous agents, and the activities of error-prone polymerases. To determine the relative importance of these factors, we studied 11 strains, each defective for a major DNA repair pathway. The striking result was that only loss of the ability to prevent or repair oxidative DNA damage significantly impacted mutation rates or spectra. These results suggest that, with the exception of oxidative damage, endogenously induced DNA damage does not perturb the overall accuracy of DNA replication in normally growing cells and that repair pathways may exist primarily to defend against exogenously induced DNA damage. The thousands of mutations caused by oxidative damage recovered across the entire genome revealed strong local-sequence biases of these mutations. Specifically, we found that the identity of the 3′ base can affect the mutability of a purine by oxidative damage by as much as eightfold. PMID:26460006

  4. Large-scale whole-genome sequencing of the Icelandic population.

    PubMed

    Gudbjartsson, Daniel F; Helgason, Hannes; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Gylfason, Arnaldur; Besenbacher, Soren; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Stacey, Simon N; Frigge, Michael L; Holm, Hilma; Saemundsdottir, Jona; Helgadottir, Hafdis Th; Johannsdottir, Hrefna; Sigfusson, Gunnlaugur; Thorgeirsson, Gudmundur; Sverrisson, Jon Th; Gretarsdottir, Solveig; Walters, G Bragi; Rafnar, Thorunn; Thjodleifsson, Bjarni; Bjornsson, Einar S; Olafsson, Sigurdur; Thorarinsdottir, Hildur; Steingrimsdottir, Thora; Gudmundsdottir, Thora S; Theodors, Asgeir; Jonasson, Jon G; Sigurdsson, Asgeir; Bjornsdottir, Gyda; Jonsson, Jon J; Thorarensen, Olafur; Ludvigsson, Petur; Gudbjartsson, Hakon; Eyjolfsson, Gudmundur I; Sigurdardottir, Olof; Olafsson, Isleifur; Arnar, David O; Magnusson, Olafur Th; Kong, Augustine; Masson, Gisli; Thorsteinsdottir, Unnur; Helgason, Agnar; Sulem, Patrick; Stefansson, Kari

    2015-05-01

    Here we describe the insights gained from sequencing the whole genomes of 2,636 Icelanders to a median depth of 20×. We found 20 million SNPs and 1.5 million insertions-deletions (indels). We describe the density and frequency spectra of sequence variants in relation to their functional annotation, gene position, pathway and conservation score. We demonstrate an excess of homozygosity and rare protein-coding variants in Iceland. We imputed these variants into 104,220 individuals down to a minor allele frequency of 0.1% and found a recessive frameshift mutation in MYL4 that causes early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. These data provide a study design that can be used to determine how variation in the sequence of the human genome gives rise to human diversity. PMID:25807286

  5. Whole-Genome Sequencing and Intraspecific Analysis of the Yeast Species Lachancea quebecensis

    PubMed Central

    Freel, Kelle C.; Friedrich, Anne; Sarilar, Véronique; Devillers, Hugo; Neuvéglise, Cécile; Schacherer, Joseph

    2016-01-01

    The gold standard in yeast population genomics has been the model organism Saccharomyces cerevisiae. However, the exploration of yeast species outside the Saccharomyces genus is essential to broaden the understanding of genome evolution. Here, we report the analyses of whole-genome sequences of nineisolates from the recently described yeast species Lachancea quebecensis. The genome of one isolate was assembled and annotated, and the intraspecific variability within L. quebecensis was surveyed by comparing the sequences from the eight other isolates to this reference sequence. Our study revealed that these strains harbor genomes with an average nucleotide diversity of π = 2 × 10−3 which is slightly lower, although on the same order of magnitude, as that previously determined for S. cerevisiae (π = 4 × 10−3). Our results show that even though these isolates were all obtained from a relatively isolated geographic location, the same ecological source, and represent a smaller sample size than is available for S. cerevisiae, the levels of divergence are similar to those observed in this model species. This divergence is essentially linked to the presence of two distinct clusters delineated according to geographic location. However, even with relatively similar ranges of genome divergence, L. quebecensis has an extremely low global phenotypic variance of 0.062 compared with 0.59 previously determined in S. cerevisiae. PMID:26733577

  6. Use of bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis

    PubMed Central

    Trewby, Hannah; Wright, David; Breadon, Eleanor L.; Lycett, Samantha J.; Mallon, Tom R.; McCormick, Carl; Johnson, Paul; Orton, Richard J.; Allen, Adrian R.; Galbraith, Julie; Herzyk, Pawel; Skuce, Robin A.; Biek, Roman; Kao, Rowland R.

    2016-01-01

    Mycobacterium bovis is the causal agent of bovine tuberculosis, one of the most important diseases currently facing the UK cattle industry. Here, we use high-density whole genome sequencing (WGS) in a defined sub-population of M. bovis in 145 cattle across 66 herd breakdowns to gain insights into local spread and persistence. We show that despite low divergence among isolates, WGS can in principle expose contributions of under-sampled host populations to M. bovis transmission. However, we demonstrate that in our data such a signal is due to molecular type switching, which had been previously undocumented for M. bovis. Isolates from farms with a known history of direct cattle movement between them did not show a statistical signal of higher genetic similarity. Despite an overall signal of genetic isolation by distance, genetic distances also showed no apparent relationship with spatial distance among affected farms over distances <5 km. Using simulations, we find that even over the brief evolutionary timescale covered by our data, Bayesian phylogeographic approaches are feasible. Applying such approaches showed that M. bovis dispersal in this system is heterogeneous but slow overall, averaging 2 km/year. These results confirm that widespread application of WGS to M. bovis will bring novel and important insights into the dynamics of M. bovis spread and persistence, but that the current questions most pertinent to control will be best addressed using approaches that more directly integrate WGS with additional epidemiological data. PMID:26972511

  7. Novel multi-sample scheme for inferring phylogenetic markers from whole genome tumor profiles

    PubMed Central

    Subramanian, Ayshwarya; Shackney, Stanley; Schwartz, Russell

    2013-01-01

    Computational cancer phylogenetics seeks to enumerate the temporal sequences of aberrations in tumor evolution, thereby delineating the evolution of possible tumor progression pathways, molecular subtypes and mechanisms of action. We previously developed a pipeline for constructing phylogenies describing evolution between major recurring cell types computationally inferred from whole-genome tumor profiles. The accuracy and detail of the phylogenies, however, depends on the identification of accurate, high-resolution molecular markers of progression, i.e., reproducible regions of aberration that robustly differentiate different subtypes and stages of progression. Here we present a novel hidden Markov model (HMM) scheme for the problem of inferring such phylogenetically significant markers through joint segmentation and calling of multi-sample tumor data. Our method classifies sets of genome-wide DNA copy number measurements into a partitioning of samples into normal (diploid) or amplified at each probe. It differs from other similar HMM methods in its design specifically for the needs of tumor phylogenetics, by seeking to identify robust markers of progression conserved across a set of copy number profiles. We show an analysis of our method in comparison to other methods on both synthetic and real tumor data, which confirms its effectiveness for tumor phylogeny inference and suggests avenues for future advances. PMID:24407301

  8. Whole-Genome Screening of Newborns? The Constitutional Boundaries of State Newborn Screening Programs

    PubMed Central

    King, Jaime S.; Smith, Monica E.

    2016-01-01

    State newborn screening (NBS) programs routinely screen nearly all of the 4 million newborns in the United States each year for ~30 primary conditions and a number of secondary conditions. NBS could be on the cusp of an unprecedented expansion as a result of advances in whole-genome sequencing (WGS). As WGS becomes cheaper and easier and as our knowledge and understanding of human genetics expand, the question of whether WGS has a role to play in state NBS programs becomes increasingly relevant and complex. As geneticists and state public health officials begin to contemplate the technical and procedural details of whether WGS could benefit existing NBS programs, this is an opportune time to revisit the legal framework of state NBS programs. In this article, we examine the constitutional underpinnings of state-mandated NBS and explore the range of current state statutes and regulations that govern the programs. We consider the legal refinements that will be needed to keep state NBS programs within constitutional bounds, focusing on 2 areas of concern: consent procedures and the criteria used to select new conditions for NBS panels. We conclude by providing options for states to consider when contemplating the use of WGS for NBS. PMID:26729704

  9. Whole Genome Sequencing of Field Isolates Reveals Extensive Genetic Diversity in Plasmodium vivax from Colombia

    PubMed Central

    Winter, David J.; Pacheco, M. Andreína; Vallejo, Andres F.; Schwartz, Rachel S.; Arevalo-Herrera, Myriam; Herrera, Socrates

    2015-01-01

    Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America. PMID:26709695

  10. Genome management and mismanagement—cell-level opportunities and challenges of whole-genome duplication

    PubMed Central

    Yant, Levi; Bomblies, Kirsten

    2015-01-01

    Whole-genome duplication (WGD) doubles the DNA content in the nucleus and leads to polyploidy. In whole-organism polyploids, WGD has been implicated in adaptability and the evolution of increased genome complexity, but polyploidy can also arise in somatic cells of otherwise diploid plants and animals, where it plays important roles in development and likely environmental responses. As with whole organisms, WGD can also promote adaptability and diversity in proliferating cell lineages, although whether WGD is beneficial is clearly context-dependent. WGD is also sometimes associated with aging and disease and may be a facilitator of dangerous genetic and karyotypic diversity in tumorigenesis. Scaling changes can affect cell physiology, but problems associated with WGD in large part seem to arise from problems with chromosome segregation in polyploid cells. Here we discuss both the adaptive potential and problems associated with WGD, focusing primarily on cellular effects. We see value in recognizing polyploidy as a key player in generating diversity in development and cell lineage evolution, with intriguing parallels across kingdoms. PMID:26637526

  11. Whole-genome analyses resolve early branches in the tree of life of modern birds.

    PubMed

    Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Li, Jianwen; Zhang, Fang; Li, Hui; Zhou, Long; Narula, Nitish; Liu, Liang; Ganapathy, Ganesh; Boussau, Bastien; Bayzid, Md Shamsuzzoha; Zavidovych, Volodymyr; Subramanian, Sankar; Gabaldón, Toni; Capella-Gutiérrez, Salvador; Huerta-Cepas, Jaime; Rekepalli, Bhanu; Munch, Kasper; Schierup, Mikkel; Lindow, Bent; Warren, Wesley C; Ray, David; Green, Richard E; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Li, Shengbin; Li, Ning; Huang, Yinhua; Derryberry, Elizabeth P; Bertelsen, Mads Frost; Sheldon, Frederick H; Brumfield, Robb T; Mello, Claudio V; Lovell, Peter V; Wirthlin, Morgan; Schneider, Maria Paula Cruz; Prosdocimi, Francisco; Samaniego, José Alfredo; Vargas Velazquez, Amhed Missael; Alfaro-Núñez, Alonzo; Campos, Paula F; Petersen, Bent; Sicheritz-Ponten, Thomas; Pas, An; Bailey, Tom; Scofield, Paul; Bunce, Michael; Lambert, David M; Zhou, Qi; Perelman, Polina; Driskell, Amy C; Shapiro, Beth; Xiong, Zijun; Zeng, Yongli; Liu, Shiping; Li, Zhenyu; Liu, Binghang; Wu, Kui; Xiao, Jin; Yinqi, Xiong; Zheng, Qiuemei; Zhang, Yong; Yang, Huanming; Wang, Jian; Smeds, Linnea; Rheindt, Frank E; Braun, Michael; Fjeldsa, Jon; Orlando, Ludovic; Barker, F Keith; Jønsson, Knud Andreas; Johnson, Warren; Koepfli, Klaus-Peter; O'Brien, Stephen; Haussler, David; Ryder, Oliver A; Rahbek, Carsten; Willerslev, Eske; Graves, Gary R; Glenn, Travis C; McCormack, John; Burt, Dave; Ellegren, Hans; Alström, Per; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas P; Zhang, Guojie

    2014-12-12

    To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago. PMID:25504713

  12. Whole genome and transcriptome sequencing of matched primary and peritoneal metastatic gastric carcinoma.

    PubMed

    Zhang, J; Huang, J Y; Chen, Y N; Yuan, F; Zhang, H; Yan, F H; Wang, M J; Wang, G; Su, M; Lu, G; Huang, Y; Dai, H; Ji, J; Zhang, J; Zhang, J N; Jiang, Y N; Chen, S J; Zhu, Z G; Yu, Y Y

    2015-01-01

    Gastric cancer is one of the most aggressive cancers and is the second leading cause of cancer death worldwide. Approximately 40% of global gastric cancer cases occur in China, with peritoneal metastasis being the prevalent form of recurrence and metastasis in advanced disease. Currently, there are limited clinical approaches for predicting and treatment of peritoneal metastasis, resulting in a 6-month average survival time. By comprehensive genome analysis will uncover the pathogenesis of peritoneal metastasis. Here we describe a comprehensive whole-genome and transcriptome sequencing analysis of one advanced gastric cancer case, including non-cancerous mucosa, primary cancer and matched peritoneal metastatic cancer. The peripheral blood is used as normal control. We identified 27 mutated genes, of which 19 genes are reported in COSMIC database (ZNF208, CRNN, ATXN3, DCTN1, RP1L1, PRB4, PRB1, MUC4, HS6ST3, MUC17, JAM2, ITGAD, IREB2, IQUB, CORO1B, CCDC121, AKAP2, ACAN and ACADL), and eight genes have not previously been described in gastric cancer (CCDC178, ARMC4, TUBB6, PLIN4, PKLR, PDZD2, DMBT1and DAB1).Additionally,GPX4 and MPND in 19q13.3-13.4 region, is characterized as a novel fusion-gene. This study disclosed novel biological markers and tumorigenic pathways that would predict gastric cancer occurring peritoneal metastasis. PMID:26330360

  13. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

    PubMed Central

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.

    2015-01-01

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151

  14. Genomic View of Bipolar Disorder Revealed by Whole Genome Sequencing in a Genetic Isolate

    PubMed Central

    Georgi, Benjamin; Craig, David; Kember, Rachel L.; Liu, Wencheng; Lindquist, Ingrid; Nasser, Sara; Brown, Christopher; Egeland, Janice A.; Paul, Steven M.; Bućan, Maja

    2014-01-01

    Bipolar disorder is a common, heritable mental illness characterized by recurrent episodes of mania and depression. Despite considerable effort to elucidate the genetic underpinnings of bipolar disorder, causative genetic risk factors remain elusive. We conducted a comprehensive genomic analysis of bipolar disorder in a large Old Order Amish pedigree. Microsatellite genotypes and high-density SNP-array genotypes of 388 family members were combined with whole genome sequence data for 50 of these subjects, comprising 18 parent-child trios. This study design permitted evaluation of candidate variants within the context of haplotype structure by resolving the phase in sequenced parent-child trios and by imputation of variants into multiple unsequenced siblings. Non-parametric and parametric linkage analysis of the entire pedigree as well as on smaller clusters of families identified several nominally significant linkage peaks, each of which included dozens of predicted deleterious variants. Close inspection of exonic and regulatory variants in genes under the linkage peaks using family-based association tests revealed additional credible candidate genes for functional studies and further replication in population-based cohorts. However, despite the in-depth genomic characterization of this unique, large and multigenerational pedigree from a genetic isolate, there was no convergence of evidence implicating a particular set of risk loci or common pathways. The striking haplotype and locus heterogeneity we observed has profound implications for the design of studies of bipolar and other related disorders. PMID:24625924

  15. The implications of whole-genome sequencing in the control of tuberculosis

    PubMed Central

    Lee, Robyn S.

    2015-01-01

    The availability of whole-genome sequencing (WGS) as a tool for the diagnosis and clinical management of tuberculosis (TB) offers considerable promise in the fight against this stubborn epidemic. However, like other new technologies, the best application of WGS remains to be determined, for both conceptual and technical reasons. In this review, we consider the potential value of WGS in the clinical laboratory for the detection of Mycobacterium tuberculosis and the prediction of antibiotic resistance. We also discuss issues pertaining to data generation, interpretation and dissemination, given that WGS has to date been generally performed in research labs where results are not necessarily packaged in a clinician-friendly format. Although WGS is far more accessible now than it was in the past, the transition from a research tool to study TB into a clinical test to manage this disease may require further fine-tuning. Improvements will likely come through iterative efforts that involve both the laboratories ready to move TB into the genomic era and the front-line clinical/public health staff who will be interpreting the results to inform management decisions. PMID:27034776

  16. Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma

    PubMed Central

    Gartner, Jared J.; Parker, Stephen C. J.; Prickett, Todd D.; Dutton-Regester, Ken; Stitzel, Michael L.; Lin, Jimmy C.; Davis, Sean; Simhadri, Vijaya L.; Jha, Sujata; Katagiri, Nobuko; Gotea, Valer; Teer, Jamie K.; Morken, Mario A.; Bhanot, Umesh K.; Chen, Guo; Elnitski, Laura L.; Davies, Michael A.; Gershenwald, Jeffrey E.; Carter, Hannah; Karchin, Rachel; Robinson, William; Robinson, Steven; Rosenberg, Steven A.; Collins, Francis S.; Parmigiani, Giovanni; Komar, Anton A.; Kimchi-Sarfaty, Chava; Hayward, Nicholas K.; Margulies, Elliott H.; Samuels, Yardena

    2013-01-01

    Synonymous mutations, which do not alter the protein sequence, have been shown to affect protein function [Sauna ZE, Kimchi-Sarfaty C (2011) Nat Rev Genet 12(10):683–691]. However, synonymous mutations are rarely investigated in the cancer genomics field. We used whole-genome and -exome sequencing to identify somatic mutations in 29 melanoma samples. Validation of one synonymous somatic mutation in BCL2L12 in 285 samples identified 12 cases that harbored the recurrent F17F mutation. This mutation led to increased BCL2L12 mRNA and protein levels because of differential targeting of WT and mutant BCL2L12 by hsa-miR-671–5p. Protein made from mutant BCL2L12 transcript bound p53, inhibited UV-induced apoptosis more efficiently than WT BCL2L12, and reduced endogenous p53 target gene transcription. This report shows selection of a recurrent somatic synonymous mutation in cancer. Our data indicate that silent alterations have a role to play in human cancer, emphasizing the importance of their investigation in future cancer genome studies. PMID:23901115

  17. Impacts of Whole-Genome Triplication on MIRNA Evolution in Brassica rapa.

    PubMed

    Sun, Chao; Wu, Jian; Liang, Jianli; Schnable, James C; Yang, Wencai; Cheng, Feng; Wang, Xiaowu

    2015-11-01

    MicroRNAs (miRNAs) are a class of short non-coding, endogenous RNAs that play essential roles in eukaryotes. Although the influence of whole-genome triplication (WGT) on protein-coding genes has been well documented in Brassica rapa, little is known about its impacts on MIRNAs. In this study, through generating a comprehensive annotation of 680 MIRNAs for B. rapa, we analyzed the evolutionary characteristics of these MIRNAs from different aspects in B. rapa. First, while MIRNAs and genes show similar patterns of biased distribution among subgenomes of B. rapa, we found that MIRNAs are much more overretained than genes following fractionation after WGT. Second, multiple-copy MIRNAs show significant sequence conservation than that of single-copy MIRNAs, which is opposite to that of genes. This indicates that increased purifying selection is acting upon these highly retained multiple-copy MIRNAs and their functional importance over singleton MIRNAs. Furthermore, we found the extensive divergence between pairs of miRNAs and their target genes following the WGT in B. rapa. In summary, our study provides a valuable resource for exploring MIRNA in B. rapa and highlights the impacts of WGT on the evolution of MIRNA. PMID:26527651

  18. Ancestral whole-genome duplication in the marine chelicerate horseshoe crabs.

    PubMed

    Kenny, N J; Chan, K W; Nong, W; Qu, Z; Maeso, I; Yip, H Y; Chan, T F; Kwan, H S; Holland, P W H; Chu, K H; Hui, J H L

    2016-02-01

    Whole-genome duplication (WGD) results in new genomic resources that can be exploited by evolution for rewiring genetic regulatory networks in organisms. In metazoans, WGD occurred before the last common ancestor of vertebrates, and has been postulated as a major evolutionary force that contributed to their speciation and diversification of morphological structures. Here, we have sequenced genomes from three of the four extant species of horseshoe crabs-Carcinoscorpius rotundicauda, Limulus polyphemus and Tachypleus tridentatus. Phylogenetic and sequence analyses of their Hox and other homeobox genes, which encode crucial transcription factors and have been used as indicators of WGD in animals, strongly suggests that WGD happened before the last common ancestor of these marine chelicerates >135 million years ago. Signatures of subfunctionalisation of paralogues of Hox genes are revealed in the appendages of two species of horseshoe crabs. Further, residual homeobox pseudogenes are observed in the three lineages. The existence of WGD in the horseshoe crabs, noted for relative morphological stasis over geological time, suggests that genomic diversity need not always be reflected phenotypically, in contrast to the suggested situation in vertebrates. This study provides evidence of ancient WGD in the ecdysozoan lineage, and reveals new opportunities for studying genomic and regulatory evolution after WGD in the Metazoa. PMID:26419336

  19. Whole-genome sequencing of Berkshire (European native pig) provides insights into its origin and domestication

    PubMed Central

    Li, Mingzhou; Tian, Shilin; Yeung, Carol K. L.; Meng, Xuehong; Tang, Qianzi; Niu, Lili; Wang, Xun; Jin, Long; Ma, Jideng; Long, Keren; Zhou, Chaowei; Cao, Yinchuan; Zhu, Li; Bai, Lin; Tang, Guoqing; Gu, Yiren; Jiang, An'an; Li, Xuewei; Li, Ruiqiang

    2014-01-01

    Domesticated organisms have experienced strong selective pressures directed at genes or genomic regions controlling traits of biological, agricultural or medical importance. The genome of native and domesticated pigs provide a unique opportunity for tracing the history of domestication and identifying signatures of artificial selection. Here we used whole-genome sequencing to explore the genetic relationships among the European native pig Berkshire and breeds that are distributed worldwide, and to identify genomic footprints left by selection during the domestication of Berkshire. Numerous nonsynonymous SNPs-containing genes fall into olfactory-related categories, which are part of a rapidly evolving superfamily in the mammalian genome. Phylogenetic analyses revealed a deep phylogenetic split between European and Asian pigs rather than between domestic and wild pigs. Admixture analysis exhibited higher portion of Chinese genetic material for the Berkshire pigs, which is consistent with the historical record regarding its origin. Selective sweep analyses revealed strong signatures of selection affecting genomic regions that harbor genes underlying economic traits such as disease resistance, pork yield, fertility, tameness and body length. These discoveries confirmed the history of origin of Berkshire pig by genome-wide analysis and illustrate how domestication has shaped the patterns of genetic variation. PMID:24728479

  20. Landscape of somatic mutations in 560 breast cancer whole-genome sequences

    DOE PAGESBeta

    Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; Ramakrishna, Manasa; Glodzik, Dominik; Zou, Xueqing; Martincorena, Inigo; Alexandrov, Ludmil B.; Martin, Sancha; Wedge, David C.; et al

    2016-06-02

    Here, we analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, anothermore » with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer.« less

  1. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals

    PubMed Central

    Nagasaki, Masao; Yasuda, Jun; Katsuoka, Fumiki; Nariai, Naoki; Kojima, Kaname; Kawai, Yosuke; Yamaguchi-Kabata, Yumi; Yokozawa, Junji; Danjoh, Inaho; Saito, Sakae; Sato, Yukuto; Mimori, Takahiro; Tsuda, Kaoru; Saito, Rumiko; Pan, Xiaoqing; Nishikawa, Satoshi; Ito, Shin; Kuroki, Yoko; Tanabe, Osamu; Fuse, Nobuo; Kuriyama, Shinichi; Kiyomoto, Hideyasu; Hozawa, Atsushi; Minegishi, Naoko; Douglas Engel, James; Kinoshita, Kengo; Kure, Shigeo; Yaegashi, Nobuo; Tsuboi, Akito; Nagami, Fuji; Kawame, Hiroshi; Tomita, Hiroaki; Tsuji, Ichiro; Nakaya, Jun; Sugawara, Junichi; Suzuki, Kichiya; Kikuya, Masahiro; Abe, Michiaki; Nakaya, Naoki; Osumi, Noriko; Yamashita, Riu; Ogishima, Soichi; Takai, Takako; Tominaga, Teiji; Taki, Yasuyuki; Suzuki, Yoichi; Yamamoto, Masayuki

    2015-01-01

    The Tohoku Medical Megabank Organization reports the whole-genome sequences of 1,070 healthy Japanese individuals and construction of a Japanese population reference panel (1KJPN). Here we identify through this high-coverage sequencing (32.4 × on average), 21.2 million, including 12 million novel, single-nucleotide variants (SNVs) at an estimated false discovery rate of <1.0%. This detailed analysis detected signatures for purifying selection on regulatory elements as well as coding regions. We also catalogue structural variants, including 3.4 million insertions and deletions, and 25,923 genic copy-number variants. The 1KJPN was effective for imputing genotypes of the Japanese population genome wide. These data demonstrate the value of high-coverage sequencing for constructing population-specific variant panels, which covers 99.0% SNVs of minor allele frequency ≥0.1%, and its value for identifying causal rare variants of complex human disease phenotypes in genetic association studies. PMID:26292667

  2. New perspectives on microbial community distortion after whole-genome amplification.

    PubMed

    Probst, Alexander J; Weinmaier, Thomas; DeSantis, Todd Z; Santo Domingo, Jorge W; Ashbolt, Nicholas

    2015-01-01

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the effects of WGA on 31 different microbial communities from five biotopes that also included low-biomass samples from drinking water and groundwater. Our findings provide evidence that microbiome segregation by biotope was possible despite WGA treatment. Nevertheless, samples from different biotopes revealed different levels of distortion, with genomic GC content significantly correlated with WGA perturbation. Certain phylogenetic clades revealed a homogenous trend across various sample types, for instance Alpha- and Betaproteobacteria showed a decrease in their abundance after WGA treatment. On the other hand, Enterobacteriaceae, an important biomarker group for fecal contamination in groundwater and drinking water, were strongly affected by WGA treatment without a predictable pattern. These novel results describe the impact of WGA on low-biomass samples and may highlight issues to be aware of when designing future metagenomic studies that necessitate preceding WGA treatment. PMID:26010362

  3. A Proposed Clinical Decision Support Architecture Capable of Supporting Whole Genome Sequence Information

    PubMed Central

    Welch, Brandon M.; Rodriguez Loya, Salvador; Eilbeck, Karen; Kawamoto, Kensaku

    2014-01-01

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine. PMID:25411644

  4. Mycobacterial DNA extraction for whole-genome sequencing from early positive liquid (MGIT) cultures.

    PubMed

    Votintseva, Antonina A; Pankhurst, Louise J; Anson, Luke W; Morgan, Marcus R; Gascoyne-Binzi, Deborah; Walker, Timothy M; Quan, T Phuong; Wyllie, David H; Del Ojo Elias, Carlos; Wilcox, Mark; Walker, A Sarah; Peto, Tim E A; Crook, Derrick W

    2015-04-01

    We developed a low-cost and reliable method of DNA extraction from as little as 1 ml of early positive mycobacterial growth indicator tube (MGIT) cultures that is suitable for whole-genome sequencing to identify mycobacterial species and predict antibiotic resistance in clinical samples. The DNA extraction method is based on ethanol precipitation supplemented by pretreatment steps with a MolYsis kit or saline wash for the removal of human DNA and a final DNA cleanup step with solid-phase reversible immobilization beads. The protocol yielded ≥0.2 ng/μl of DNA for 90% (MolYsis kit) and 83% (saline wash) of positive MGIT cultures. A total of 144 (94%) of the 154 samples sequenced on the MiSeq platform (Illumina) achieved the target of 1 million reads, with <5% of reads derived from human or nasopharyngeal flora for 88% and 91% of samples, respectively. A total of 59 (98%) of 60 samples that were identified by the national mycobacterial reference laboratory (NMRL) as Mycobacterium tuberculosis were successfully mapped to the H37Rv reference, with >90% coverage achieved. The DNA extraction protocol, therefore, will facilitate fast and accurate identification of mycobacterial species and resistance using a range of bioinformatics tools. PMID:25631807

  5. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis

    PubMed Central

    Riaz, Tiayyba; Shehzad, Wasim; Viari, Alain; Pompanon, François; Taberlet, Pierre; Coissac, Eric

    2011-01-01

    Using non-conventional markers, DNA metabarcoding allows biodiversity assessment from complex substrates. In this article, we present ecoPrimers, a software for identifying new barcode markers and their associated PCR primers. ecoPrimers scans whole genomes to find such markers without a priori knowledge. ecoPrimers optimizes two quality indices measuring taxonomical range and discrimination to select the most efficient markers from a set of reference sequences, according to specific experimental constraints such as marker length or specifically targeted taxa. The key step of the algorithm is the identification of conserved regions among reference sequences for anchoring primers. We propose an efficient algorithm based on data mining, that allows the analysis of huge sets of sequences. We evaluate the efficiency of ecoPrimers by running it on three different sequence sets: mitochondrial, chloroplast and bacterial genomes. Identified barcode markers correspond either to barcode regions already in use for plants or animals, or to new potential barcodes. Results from empirical experiments carried out on a promising new barcode for analyzing vertebrate diversity fully agree with expectations based on bioinformatics analysis. These tests demonstrate the efficiency of ecoPrimers for inferring new barcodes fitting with diverse experimental contexts. ecoPrimers is available as an open source project at: http://www.grenoble.prabi.fr/trac/ecoPrimers. PMID:21930509

  6. Whole-genome mutational burden analysis of three pluripotency induction methods

    PubMed Central

    Bhutani, Kunal; Nazor, Kristopher L.; Williams, Roy; Tran, Ha; Dai, Heng; Džakula, Željko; Cho, Edward H.; Pang, Andy W. C.; Rao, Mahendra; Cao, Han; Schork, Nicholas J.; Loring, Jeanne F.

    2016-01-01

    There is concern that the stresses of inducing pluripotency may lead to deleterious DNA mutations in induced pluripotent stem cell (iPSC) lines, which would compromise their use for cell therapies. Here we report comparative genomic analysis of nine isogenic iPSC lines generated using three reprogramming methods: integrating retroviral vectors, non-integrating Sendai virus and synthetic mRNAs. We used whole-genome sequencing and de novo genome mapping to identify single-nucleotide variants, insertions and deletions, and structural variants. Our results show a moderate number of variants in the iPSCs that were not evident in the parental fibroblasts, which may result from reprogramming. There were only small differences in the total numbers and types of variants among different reprogramming methods. Most importantly, a thorough genomic analysis showed that the variants were generally benign. We conclude that the process of reprogramming is unlikely to introduce variants that would make the cells inappropriate for therapy. PMID:26892726

  7. Whole-genome plasma sequencing reveals focal amplifications as a driving force in metastatic prostate cancer

    PubMed Central

    Ulz, Peter; Belic, Jelena; Graf, Ricarda; Auer, Martina; Lafer, Ingrid; Fischereder, Katja; Webersinke, Gerald; Pummer, Karl; Augustin, Herbert; Pichler, Martin; Hoefler, Gerald; Bauernhofer, Thomas; Geigl, Jochen B.; Heitzer, Ellen; Speicher, Michael R.

    2016-01-01

    Genomic alterations in metastatic prostate cancer remain incompletely characterized. Here we analyse 493 prostate cancer cases from the TCGA database and perform whole-genome plasma sequencing on 95 plasma samples derived from 43 patients with metastatic prostate cancer. From these samples, we identify established driver aberrations in a cancer-related gene in nearly all cases (97.7%), including driver gene fusions (TMPRSS2:ERG), driver focal deletions (PTEN, RYBP and SHQ1) and driver amplifications (AR and MYC). In serial plasma analyses, we observe changes in focal amplifications in 40% of cases. The mean time interval between new amplifications was 26.4 weeks (range: 5–52 weeks), suggesting that they represent rapid adaptations to selection pressure. An increase in neuron-specific enolase is accompanied by clonal pattern changes in the tumour genome, most consistent with subclonal diversification of the tumour. Our findings suggest a high plasticity of prostate cancer genomes with newly occurring focal amplifications as a driving force in progression. PMID:27328849

  8. Preliminary Genomic Characterization of Ten Hardwood Tree Species from Multiplexed Low Coverage Whole Genome Sequencing

    PubMed Central

    Staton, Margaret; Best, Teodora; Khodwekar, Sudhir; Owusu, Sandra; Xu, Tao; Xu, Yi; Jennings, Tara; Cronn, Richard; Arumuganathan, A. Kathiravetpilla; Coggeshall, Mark; Gailing, Oliver; Liang, Haiying; Romero-Severson, Jeanne; Schlarbaum, Scott; Carlson, John E.

    2015-01-01

    Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence. PMID:26698853

  9. Isolation and whole genome sequencing of a Ruminococcus-like bacterium, associated with irritable bowel syndrome.

    PubMed

    Hynönen, Ulla; Rasinkangas, Pia; Satokari, Reetta; Paulin, Lars; de Vos, Willem M; Pietilä, Taija E; Kant, Ravi; Palva, Airi

    2016-06-01

    In our previous studies on the intestinal microbiota in irritable bowel syndrome (IBS), we identified a bacterial phylotype with higher abundance in patients suffering from diarrhea than in healthy controls. In the present work, we have isolated in pure culture strain RT94, belonging to this phylotype, determined its whole genome sequence and performed an extensive genomic analysis and phenotypical testing. This revealed strain RT94 to be a strict anaerobe apparently belonging to a novel species with only 94% similarity in the 16S rRNA gene sequence to the closest relatives Ruminococcus torques and Ruminococcus lactaris. The G + C content of strain RT94 is 45.2 mol% and the major long-chain cellular fatty acids are C16:0, C18:0 and C14:0. The isolate is metabolically versatile but not a mucus or cellulose utilizer. It produces acetate, ethanol, succinate, lactate and formate, but very little butyrate, as end products of glucose metabolism. The mechanisms underlying the association of strain RT94 with diarrhea-type IBS are discussed. PMID:26946362

  10. Whole-genome analyses resolve early branches in the tree of life of modern birds

    PubMed Central

    Jarvis, Erich D.; Mirarab, Siavash; Aberer, Andre J.; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y. W.; Faircloth, Brant C.; Nabholz, Benoit; Howard, Jason T.; Suh, Alexander; Weber, Claudia C.; da Fonseca, Rute R.; Li, Jianwen; Zhang, Fang; Li, Hui; Zhou, Long; Narula, Nitish; Liu, Liang; Ganapathy, Ganesh; Boussau, Bastien; Bayzid, Md. Shamsuzzoha; Zavidovych, Volodymyr; Subramanian, Sankar; Gabaldón, Toni; Capella-Gutiérrez, Salvador; Huerta-Cepas, Jaime; Rekepalli, Bhanu; Munch, Kasper; Schierup, Mikkel; Lindow, Bent; Warren, Wesley C.; Ray, David; Green, Richard E.; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Li, Shengbin; Li, Ning; Huang, Yinhua; Derryberry, Elizabeth P.; Bertelsen, Mads Frost; Sheldon, Frederick H.; Brumfield, Robb T.; Mello, Claudio V.; Lovell, Peter V.; Wirthlin, Morgan; Schneider, Maria Paula Cruz; Prosdocimi, Francisco; Samaniego, José Alfredo; Velazquez, Amhed Missael Vargas; Alfaro-Núñez, Alonzo; Campos, Paula F.; Petersen, Bent; Sicheritz-Ponten, Thomas; Pas, An; Bailey, Tom; Scofield, Paul; Bunce, Michael; Lambert, David M.; Zhou, Qi; Perelman, Polina; Driskell, Amy C.; Shapiro, Beth; Xiong, Zijun; Zeng, Yongli; Liu, Shiping; Li, Zhenyu; Liu, Binghang; Wu, Kui; Xiao, Jin; Yinqi, Xiong; Zheng, Qiuemei; Zhang, Yong; Yang, Huanming; Wang, Jian; Smeds, Linnea; Rheindt, Frank E.; Braun, Michael; Fjeldsa, Jon; Orlando, Ludovic; Barker, F. Keith; Jønsson, Knud Andreas; Johnson, Warren; Koepfli, Klaus-Peter; O’Brien, Stephen; Haussler, David; Ryder, Oliver A.; Rahbek, Carsten; Willerslev, Eske; Graves, Gary R.; Glenn, Travis C.; McCormack, John; Burt, Dave; Ellegren, Hans; Alström, Per; Edwards, Scott V.; Stamatakis, Alexandros; Mindell, David P.; Cracraft, Joel; Braun, Edward L.; Warnow, Tandy; Jun, Wang; Gilbert, M. Thomas P.; Zhang, Guojie

    2015-01-01

    To better determine the history of modern birds, we performed a genome-scale phylogenetic analysis of 48 species representing all orders of Neoaves using phylogenomic methods created to handle genome-scale data. We recovered a highly resolved tree that confirms previously controversial sister or close relationships. We identified the first divergence in Neoaves, two groups we named Passerea and Columbea, representing independent lineages of diverse and convergently evolved land and water bird species. Among Passerea, we infer the common ancestor of core landbirds to have been an apex predator and confirm independent gains of vocal learning. Among Columbea, we identify pigeons and flamingoes as belonging to sister clades. Even with whole genomes, some of the earliest branches in Neoaves proved challenging to resolve, which was best explained by massive protein-coding sequence convergence and high levels of incomplete lineage sorting that occurred during a rapid radiation after the Cretaceous-Paleogene mass extinction event about 66 million years ago. PMID:25504713

  11. Whole-genome analyses of whole-brain data: working within an expanded search space

    PubMed Central

    Neale, Benjamin M; Thompson, Paul M

    2015-01-01

    Large-scale comparisons of patients and healthy controls have unearthed genetic risk factors associated with a range of neurological and psychiatric illnesses. Meanwhile, brain imaging studies are increasing in size and scope, revealing disease and genetic effects on brain structure and function, and implicating neural pathways and causal mechanisms. With the advent of global neuroimaging consortia, imaging studies are now well powered to discover genetic variants that reliably affect the brain. Genetic analyses of brain measures from tens of thousands of people are being extended to test genetic associations with signals at millions of locations in the brain. Connectome-wide, genome-wide scans can jointly screen brain circuits and genomes, presenting new statistical challenges. There is a growing need for the community to establish and enforce standards in this developing field to ensure robust findings. Here we discuss how neuroimagers and geneticists have formed alliances to discover how genetic factors affect the brain. The field is rapidly advancing with ultra-high-resolution imaging and whole-genome sequencing. We recommend a rigorous approach to neuroimaging genomics that capitalizes on its recent successes and ensures the reliability of future discoveries. PMID:24866045

  12. Whole Genome Sequencing of Field Isolates Reveals Extensive Genetic Diversity in Plasmodium vivax from Colombia.

    PubMed

    Winter, David J; Pacheco, M Andreína; Vallejo, Andres F; Schwartz, Rachel S; Arevalo-Herrera, Myriam; Herrera, Socrates; Cartwright, Reed A; Escalante, Ananias A

    2015-12-01

    Plasmodium vivax is the most prevalent malarial species in South America and exerts a substantial burden on the populations it affects. The control and eventual elimination of P. vivax are global health priorities. Genomic research contributes to this objective by improving our understanding of the biology of P. vivax and through the development of new genetic markers that can be used to monitor efforts to reduce malaria transmission. Here we analyze whole-genome data from eight field samples from a region in Cordóba, Colombia where malaria is endemic. We find considerable genetic diversity within this population, a result that contrasts with earlier studies suggesting that P. vivax had limited diversity in the Americas. We also identify a selective sweep around a substitution known to confer resistance to sulphadoxine-pyrimethamine (SP). This is the first observation of a selective sweep for SP resistance in this species. These results indicate that P. vivax has been exposed to SP pressure even when the drug is not in use as a first line treatment for patients afflicted by this parasite. We identify multiple non-synonymous substitutions in three other genes known to be involved with drug resistance in Plasmodium species. Finally, we found extensive microsatellite polymorphisms. Using this information we developed 18 polymorphic and easy to score microsatellite loci that can be used in epidemiological investigations in South America. PMID:26709695

  13. Exonic remnants of whole-genome duplication reveal cis-regulatory function of coding exons

    PubMed Central

    Dong, Xianjun; Navratilova, Pavla; Fredman, David; Drivenes, Øyvind; Becker, Thomas S.; Lenhard, Boris

    2010-01-01

    Using a comparative genomics approach to reconstruct the fate of genomic regulatory blocks (GRBs) and identify exonic remnants that have survived the disappearance of their host genes after whole-genome duplication (WGD) in teleosts, we discover a set of 38 candidate cis-regulatory coding exons (RCEs) with predicted target genes. These elements demonstrate evolutionary separation of overlapping protein-coding and regulatory information after WGD in teleosts. We present evidence that the corresponding mammalian exons are still under both coding and non-coding selection pressure, are more conserved than other protein coding exons in the host gene and several control sets, and share key characteristics with highly conserved non-coding elements in the same regions. Their dual function is corroborated by existing experimental data. Additionally, we show examples of human exon remnants stemming from the vertebrate 2R WGD. Our findings suggest that long-range cis-regulatory inputs for developmental genes are not limited to non-coding regions, but can also overlap the coding sequence of unrelated genes. Thus, exonic regulatory elements in GRBs might be functionally equivalent to those in non-coding regions, calling for a re-evaluation of the sequence space in which to look for long-range regulatory elements and experimentally test their activity. PMID:19969543

  14. Whole-genome copy number variation analysis in anophthalmia and microphthalmia

    PubMed Central

    Schilter, Kala F.; Reis, Linda M.; Schneider, Adele; Bardakjian, Tanya M.; Abdul-Rahman, Omar; Kozel, Beth A.; Zimmerman, Holly H.; Broeckel, Ulrich; Semina, Elena V.

    2014-01-01

    Anophthalmia and microphthalmia (A/M) represent severe developmental ocular malformations. Currently, mutations in known genes explain less than 40% of A/M cases. We performed whole genome copy number variation analysis in sixty patients affected with isolated or syndromic A/M. Pathogenic deletions of 3q26 (SOX2) were identified in four independent patients with syndromic microphthalmia. Other variants of interest included regions with a known role in human disease (likely pathogenic) as well as novel rearrangements (uncertain significance). A 2.2-Mb duplication of 3q29 in a patient with nonsyndromic anophthalmia and an 877-kb duplication of 11p13 (PAX6) and a 1.4-Mb deletion of 17q11.2 (NF1) in two independent probands with syndromic microphthalmia and other ocular defects were identified; while ocular anomalies have been previously associated with 3q29 duplications, PAX6 duplications, and NF1 mutations in some cases, the ocular phenotypes observed here are more severe than previously reported. Three novel regions of possible interest included a 2q14.2 duplication which cosegregated with microphthalmia/microcornea and congenital cataracts in one family, and 2q21 and 15q26 duplications in two additional cases; each of these regions contains genes that are active during vertebrate ocular development. Overall, this study identified causative copy number mutations and regions with a possible role in ocular disease in 17% of A/M cases. PMID:23701296

  15. Unique Features of a Japanese ‘Candidatus Liberibacter asiaticus’ Strain Revealed by Whole Genome Sequencing

    PubMed Central

    Katoh, Hiroshi; Miyata, Shin-ichi; Inoue, Hiromitsu; Iwanami, Toru

    2014-01-01

    Citrus greening (huanglongbing) is the most destructive disease of citrus worldwide. It is spread by citrus psyllids and is associated with phloem-limited bacteria of three species of α-Proteobacteria, namely, ‘Candidatus Liberibacter asiaticus’, ‘Ca. L. americanus’, and ‘Ca. L. africanus’. Recent findings suggested that some Japanese strains lack the bacteriophage-type DNA polymerase region (DNA pol), in contrast to the Floridian psy62 strain. The whole genome sequence of the pol-negative ‘Ca. L. asiaticus’ Japanese isolate Ishi-1 was determined by metagenomic analysis of DNA extracted from ‘Ca. L. asiaticus’-infected psyllids and leaf midribs. The 1.19-Mb genome has an average 36.32% GC content. Annotation revealed 13 operons encoding rRNA and 44 tRNA genes, but no typical bacterial pathogenesis-related genes were located within the genome, similar to the Floridian psy62 and Chinese gxpsy. In contrast to other ‘Ca. L. asiaticus’ strains, the genome of the Japanese Ishi-1 strain lacks a prophage-related region. PMID:25180586

  16. Whole genome nucleosome sequencing identifies novel types of forensic markers in degraded DNA samples

    PubMed Central

    Dong, Chun-nan; Yang, Ya-dong; Li, Shu-jin; Yang, Ya-ran; Zhang, Xiao-jing; Fang, Xiang-dong; Yan, Jiang-wei; Cong, Bin

    2016-01-01

    In the case of mass disasters, missing persons and forensic caseworks, highly degraded biological samples are often encountered. It can be a challenge to analyze and interpret the DNA profiles from these samples. Here we provide a new strategy to solve the problem by taking advantage of the intrinsic structural properties of DNA. We have assessed the in vivo positions of more than 35 million putative nucleosome cores in human leukocytes using high-throughput whole genome sequencing, and identified 2,462 single nucleotide variations (SNVs), 128 insertion-deletion polymorphisms (indels). After comparing the sequence reads with 44 STR loci commonly used in forensics, five STRs (TH01, TPOX, D18S51, DYS391, and D10S1248)were matched. We compared these “nucleosome protected STRs” (NPSTRs) with five other non-NPSTRs using mini-STR primer design, real-time PCR, and capillary gel electrophoresis on artificially degraded DNA. Moreover, genotyping performance of the five NPSTRs and five non-NPSTRs was also tested with real casework samples. All results show that loci located in nucleosomes are more likely to be successfully genotyped in degraded samples. In conclusion, after further strict validation, these markers could be incorporated into future forensic and paleontology identification kits, resulting in higher discriminatory power for certain degraded sample types. PMID:27189082

  17. Whole genome sequencing provides insights into the genetic determinants of invasiveness in Salmonella Dublin.

    PubMed

    Mohammed, M; Cormican, M

    2016-08-01

    Salmonella enterica subsp. enterica serovar Dublin (S. Dublin) is one of the non-typhoidal Salmonella (NTS); however, a relatively high proportion of human infections are associated with invasive disease. We applied whole genome sequencing to representative invasive and non-invasive clinical isolates of S. Dublin to determine the genomic variations among them and to investigate the underlying genetic determinants associated with invasiveness in S. Dublin. Although no particular genomic variation was found to differentiate in invasive and non-invasive isolates four virulence factors were detected within the genome of all isolates including two different type VI secretion systems (T6SS) encoded on two Salmonella pathogenicity islands (SPI), including SPI-6 (T6SSSPI-6) and SPI-19 (T6SSSPI-19), an intact lambdoid prophage (Gifsy-2-like prophage) that contributes significantly to the virulence and pathogenesis of Salmonella serotypes in addition to a virulence plasmid. These four virulence factors may all contribute to the potential of S. Dublin to cause invasive disease in humans. PMID:26996313

  18. Molecular evolution of fever, thrombocytopenia and leukocytopenia virus (FTLSV) based on whole-genome sequences.

    PubMed

    Liu, Licheng; Chen, Weijun; Yang, Yinhui; Jiang, Yongqiang

    2016-04-01

    FTLSV is a novel bunyavirus that was discovered in 2007 in the Henan province of China and has reported case fatality rates of up to 30%. Despite the high case fatality rate, knowledge of the evolution and molecular epidemiology of FTLSV is limited. In this study, detailed phylogenetic analyses were performed on whole-genome sequences to examine the virus's evolutionary rates, estimate dates of common ancestry, and determine the population dynamics and selection pressure for FTLSV. The evolutionary rates of FTLSV were estimated to be 2.28×10(-4), 2.42×10(-4) and 1.19×10(-4) nucleotide substitutions/site/year for the S, M and L segments, respectively. The most recent ancestor of the viruses existed approximately 182-294years ago. Evidence of RNA segment reassortment was found in FTLSV. A Bayesian skyline plot showed that after a period of genetic stability following high variability, the FTLSV population appeared to have contracted it. Selection pressures were estimated and revealed an abundance of negatively selected sites and sparse positively selected sites. These data will be valuable in understanding the evolution and molecular epidemiology of FTLSV, eventually helping to determine mechanisms of emergence and pathogenicity and the level of the virus's threat to public health. PMID:26748010

  19. Whole-genome sequencing of multidrug-resistant Mycobacterium tuberculosis isolates from Myanmar.

    PubMed

    Aung, Htin Lin; Tun, Thanda; Moradigaravand, Danesh; Köser, Claudio U; Nyunt, Wint Wint; Aung, Si Thu; Lwin, Thandar; Thinn, Kyi Kyi; Crump, John A; Parkhill, Julian; Peacock, Sharon J; Cook, Gregory M; Hill, Philip C

    2016-09-01

    Drug-resistant tuberculosis (TB) is a major health threat in Myanmar. An initial study was conducted to explore the potential utility of whole-genome sequencing (WGS) for the diagnosis and management of drug-resistant TB in Myanmar. Fourteen multidrug-resistant Mycobacterium tuberculosis isolates were sequenced. Known resistance genes for a total of nine antibiotics commonly used in the treatment of drug-susceptible and multidrug-resistant TB (MDR-TB) in Myanmar were interrogated through WGS. All 14 isolates were MDR-TB, consistent with the results of phenotypic drug susceptibility testing (DST), and the Beijing lineage predominated. Based on the results of WGS, 9 of the 14 isolates were potentially resistant to at least one of the drugs used in the standard MDR-TB regimen but for which phenotypic DST is not conducted in Myanmar. This study highlights a need for the introduction of second-line DST as part of routine TB diagnosis in Myanmar as well as new classes of TB drugs to construct effective regimens. PMID:27530852

  20. Identification of Chiari Type I Malformation subtypes using whole genome expression profiles and cranial base morphometrics

    PubMed Central

    2014-01-01

    Background Chiari Type I Malformation (CMI) is characterized by herniation of the cerebellar tonsils through the foramen magnum at the base of the skull, resulting in significant neurologic morbidity. As CMI patients display a high degree of clinical variability and multiple mechanisms have been proposed for tonsillar herniation, it is hypothesized that this heterogeneous disorder is due to multiple genetic and environmental factors. The purpose of the present study was to gain a better understanding of what factors contribute to this heterogeneity by using an unsupervised statistical approach to define disease subtypes within a case-only pediatric population. Methods A collection of forty-four pediatric CMI patients were ascertained to identify disease subtypes using whole genome expression profiles generated from patient blood and dura mater tissue samples, and radiological data consisting of posterior fossa (PF) morphometrics. Sparse k-means clustering and an extension to accommodate multiple data sources were used to cluster patients into more homogeneous groups using biological and radiological data both individually and collectively. Results All clustering analyses resulted in the significant identification of patient classes, with the pure biological classes derived from patient blood and dura mater samples demonstrating the strongest evidence. Those patient classes were further characterized by identifying enriched biological pathways, as well as correlated cranial base morphological and clinical traits. Conclusions Our results implicate several strong biological candidates warranting further investigation from the dura expression analysis and also identified a blood gene expression profile corresponding to a global down-regulation in protein synthesis. PMID:24962150

  1. Automated whole-genome multiple alignment of rat, mouse, and human

    SciTech Connect

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  2. Whole genome nucleosome sequencing identifies novel types of forensic markers in degraded DNA samples.

    PubMed

    Dong, Chun-Nan; Yang, Ya-Dong; Li, Shu-Jin; Yang, Ya-Ran; Zhang, Xiao-Jing; Fang, Xiang-Dong; Yan, Jiang-Wei; Cong, Bin

    2016-01-01

    In the case of mass disasters, missing persons and forensic caseworks, highly degraded biological samples are often encountered. It can be a challenge to analyze and interpret the DNA profiles from these samples. Here we provide a new strategy to solve the problem by taking advantage of the intrinsic structural properties of DNA. We have assessed the in vivo positions of more than 35 million putative nucleosome cores in human leukocytes using high-throughput whole genome sequencing, and identified 2,462 single nucleotide variations (SNVs), 128 insertion-deletion polymorphisms (indels). After comparing the sequence reads with 44 STR loci commonly used in forensics, five STRs (TH01, TPOX, D18S51, DYS391, and D10S1248)were matched. We compared these "nucleosome protected STRs" (NPSTRs) with five other non-NPSTRs using mini-STR primer design, real-time PCR, and capillary gel electrophoresis on artificially degraded DNA. Moreover, genotyping performance of the five NPSTRs and five non-NPSTRs was also tested with real casework samples. All results show that loci located in nucleosomes are more likely to be successfully genotyped in degraded samples. In conclusion, after further strict validation, these markers could be incorporated into future forensic and paleontology identification kits, resulting in higher discriminatory power for certain degraded sample types. PMID:27189082

  3. Whole Genome Sequencing Identifies a Novel Factor Required for Secretory Granule Maturation in Tetrahymena thermophila.

    PubMed

    Kontur, Cassandra; Kumar, Santosh; Lan, Xun; Pritchard, Jonathan K; Turkewitz, Aaron P

    2016-01-01

    Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded to a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies, in part, on ancestral lysosomal sorting machinery, but is also likely to involve novel factors. In prior work, multiple strains with defects in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation-a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wild-type copy of MMA1, and disrupting MMA1 in an otherwise wild-type strain phenocopies UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation. PMID:27317773

  4. Whole Genome Sequencing Identifies a Novel Factor Required for Secretory Granule Maturation in Tetrahymena thermophila

    PubMed Central

    Kontur, Cassandra; Kumar, Santosh; Lan, Xun; Pritchard, Jonathan K.; Turkewitz, Aaron P.

    2016-01-01

    Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded to a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies, in part, on ancestral lysosomal sorting machinery, but is also likely to involve novel factors. In prior work, multiple strains with defects in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation—a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wild-type copy of MMA1, and disrupting MMA1 in an otherwise wild-type strain phenocopies UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation. PMID:27317773

  5. Paired Tumor and Normal Whole Genome Sequencing of Metastatic Olfactory Neuroblastoma

    PubMed Central

    Weiss, Glen J.; Liang, Winnie S.; Izatt, Tyler; Arora, Shilpi; Cherni, Irene; Raju, Robert N.; Hostetter, Galen; Kurdoglu, Ahmet; Christoforides, Alexis; Sinari, Shripad; Baker, Angela S.; Metpally, Raghu; Tembe, Waibhav D.; Phillips, Lori

    2012-01-01

    Background Olfactory neuroblastoma (ONB) is a rare cancer of the sinonasal tract with little molecular characterization. We performed whole genome sequencing (WGS) on paired normal and tumor DNA from a patient with metastatic-ONB to identify the somatic alterations that might be drivers of tumorigenesis and/or metastatic progression. Methodology/Principal Findings Genomic DNA was isolated from fresh frozen tissue from a metastatic lesion and whole blood, followed by WGS at >30X depth, alignment and mapping, and mutation analyses. Sanger sequencing was used to confirm selected mutations. Sixty-two somatic short nucleotide variants (SNVs) and five deletions were identified inside coding regions, each causing a non-synonymous DNA sequence change. We selected seven SNVs and validated them by Sanger sequencing. In the metastatic ONB samples collected several months prior to WGS, all seven mutations were present. However, in the original surgical resection specimen (prior to evidence of metastatic disease), mutations in KDR, MYC, SIN3B, and NLRC4 genes were not present, suggesting that these were acquired with disease progression and/or as a result of post-treatment effects. Conclusions/Significance This work provides insight into the evolution of ONB cancer cells and provides a window into the more complex factors, including tumor clonality and multiple driver mutations. PMID:22649506

  6. Neuropeptide evolution: Chelicerate neurohormone and neuropeptide genes may reflect one or more whole genome duplications.

    PubMed

    Veenstra, Jan A

    2016-04-01

    Four genomes and two transcriptomes from six Chelicerate species were analyzed for the presence of neuropeptide and neurohormone precursors and their GPCRs. The genome from the spider Stegodyphus mimosarum yielded 87 neuropeptide precursors and 120 neuropeptide GPCRs. Many neuropeptide transcripts were also found in the transcriptomes of three other spiders, Latrodectus hesperus, Parasteatoda tepidariorum and Acanthoscurria geniculata. For the scorpion Mesobuthus martensii the numbers are 79 and 93 respectively. The very small genome of the house dust mite, Dermatophagoides farinae, on the other hand contains a much smaller number of such genes. A few new putative Arthropod neuropeptide genes were discovered. Thus, both spiders and the scorpion have an achatin gene and in spiders there are two different genes encoding myosuppressin-like peptides while spiders also have two genes encoding novel LGamides. Another finding is the presence of trissin in spiders and scorpions, while neuropeptide genes that seem to be orthologs of Lottia LFRYamide and Platynereis CCRFamide were also found. Such genes were also found in various insect species, but seem to be lacking from the Holometabola. The Chelicerate neuropeptide and neuropeptide GPCR genes often have paralogs. As the large majority of these are probably not due to local gene duplications, is plausible that they reflect the effects of one or more ancient whole genome duplications. PMID:26928473

  7. Two Rounds of Whole Genome Duplication in the AncestralVertebrate

    SciTech Connect

    Dehal, Paramvir; Boore, Jeffrey L.

    2005-04-12

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish-tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of 4-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage.

  8. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline.

    PubMed

    Lee, In-Hee; Lee, Kyungjoon; Hsing, Michael; Choe, Yongjoon; Park, Jin-Ho; Kim, Shu Hee; Bohn, Justin M; Neu, Matthew B; Hwang, Kyu-Baek; Green, Robert C; Kohane, Isaac S; Kong, Sek Won

    2014-05-01

    Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org). PMID:24478219

  9. A field guide to whole-genome sequencing, assembly and annotation

    PubMed Central

    Ekblom, Robert; Wolf, Jochen B W

    2014-01-01

    Genome sequencing projects were long confined to biomedical model organisms and required the concerted effort of large consortia. Rapid progress in high-throughput sequencing technology and the simultaneous development of bioinformatic tools have democratized the field. It is now within reach for individual research groups in the eco-evolutionary and conservation community to generate de novo draft genome sequences for any organism of choice. Because of the cost and considerable effort involved in such an endeavour, the important first step is to thoroughly consider whether a genome sequence is necessary for addressing the biological question at hand. Once this decision is taken, a genome project requires careful planning with respect to the organism involved and the intended quality of the genome draft. Here, we briefly review the state of the art within this field and provide a step-by-step introduction to the workflow involved in genome sequencing, assembly and annotation with particular reference to large and complex genomes. This tutorial is targeted at scientists with a background in conservation genetics, but more generally, provides useful practical guidance for researchers engaging in whole-genome sequencing projects. PMID:25553065

  10. Rapid Whole-Genome Sequencing for Genetic Disease Diagnosis in Neonatal Intensive Care Units

    PubMed Central

    Saunders, Carol Jean; Miller, Neil Andrew; Soden, Sarah Elizabeth; Dinwiddie, Darrell Lee; Noll, Aaron; Alnadi, Noor Abu; Andraws, Nevene; Patterson, Melanie LeAnn; Krivohlavek, Lisa Ann; Fellis, Joel; Humphray, Sean; Saffrey, Peter; Kingsbury, Zoya; Weir, Jacqueline Claire; Betley, Jason; Grocock, Russell James; Margulies, Elliott Harrison; Farrow, Emily Gwendolyn; Artman, Michael; Safina, Nicole Pauline; Petrikin, Joshua Erin; Hall, Kevin Peter; Kingsmore, Stephen Francis

    2014-01-01

    Monogenic diseases are frequent causes of neonatal morbidity and mortality, and disease presentations are often undifferentiated at birth. More than 3500 monogenic diseases have been characterized, but clinical testing is available for only some of them and many feature clinical and genetic heterogeneity. Hence, an immense unmet need exists for improved molecular diagnosis in infants. Because disease progression is extremely rapid, albeit heterogeneous, in newborns, molecular diagnoses must occur quickly to be relevant for clinical decision-making. We describe 50-hour differential diagnosis of genetic disorders by whole-genome sequencing (WGS) that features automated bioinformatic analysis and is intended to be a prototype for use in neonatal intensive care units. Retrospective 50-hour WGS identified known molecular diagnoses in two children. Prospective WGS disclosed potential molecular diagnosis of a severe GJB2-related skin disease in one neonate; BRAT1-related lethal neonatal rigidity and multifocal seizure syndrome in another infant; identified BCL9L as a novel, recessive visceral heterotaxy gene (HTX6) in a pedigree; and ruled out known candidate genes in one infant. Sequencing of parents or affected siblings expedited the identification of disease genes in prospective cases. Thus, rapid WGS can potentially broaden and foreshorten differential diagnosis, resulting in fewer empirical treatments and faster progression to genetic and prognostic counseling. PMID:23035047

  11. Whole Genome Sequencing and Complete Genetic Analysis Reveals Novel Pathways to Glycopeptide Resistance in Staphylococcus aureus

    PubMed Central

    Renzoni, Adriana; Andrey, Diego O.; Jousselin, Ambre; Barras, Christine; Monod, Antoinette; Vaudaux, Pierre; Lew, Daniel; Kelley, William L.

    2011-01-01

    The precise mechanisms leading to the emergence of low-level glycopeptide resistance in Staphylococcus aureus are poorly understood. In this study, we used whole genome deep sequencing to detect differences between two isogenic strains: a parental strain and a stable derivative selected stepwise for survival on 4 µg/ml teicoplanin, but which grows at higher drug concentrations (MIC 8 µg/ml). We uncovered only three single nucleotide changes in the selected strain. Nonsense mutations occurred in stp1, encoding a serine/threonine phosphatase, and in yjbH, encoding a post-transcriptional negative regulator of the redox/thiol stress sensor and global transcriptional regulator, Spx. A missense mutation (G45R) occurred in the histidine kinase sensor of cell wall stress, VraS. Using genetic methods, all single, pairwise combinations, and a fully reconstructed triple mutant were evaluated for their contribution to low-level glycopeptide resistance. We found a synergistic cooperation between dual phospho-signalling systems and a subtle contribution from YjbH, suggesting the activation of oxidative stress defences via Spx. To our knowledge, this is the first genetic demonstration of multiple sensor and stress pathways contributing simultaneously to glycopeptide resistance development. The multifactorial nature of glycopeptide resistance in this strain suggests a complex reprogramming of cell physiology to survive in the face of drug challenge. PMID:21738716

  12. Whole Genome Duplications and a ‘Function’ for Junk DNA? Facts and Hypotheses

    PubMed Central

    Veitia, Reiner A.; Bottani, Samuel

    2009-01-01

    Background The lack of correlation between genome size and organismal complexity is understood in terms of the massive presence of repetitive and non-coding DNA. This non-coding subgenome has long been called “junk” DNA. However, it might have important functions. Generation of junk DNA depends on proliferation of selfish DNA elements and on local or global DNA duplication followed by genic non-fonctionalization. Methodology/Principal Findings Evidence from genomic analyses and experimental data indicates that Whole Genome Duplications (WGD) are often followed by a return to the diploid state, through DNA deletions and intra/interchromosomal rearrangements. We use simple theoretical models and simulations to explore how a WGD accompanied by sequence deletions might affect the dosage balance often required among several gene products involved in regulatory processes. We find that potential genomic deletions leading to changes in nuclear and cell volume might potentially perturb gene dosage balance. Conclusions/Significance The potentially negative impact of DNA deletions can be buffered if deleted genic DNA is, at least temporarily, replaced by repetitive DNA so that the nuclear/cell volume remains compatible with normal living. Thus, we speculate that retention of non-functionalized non-coding DNA, and replacement of deleted DNA through proliferation of selfish elements, might help avoid dosage imbalances in cycles of polyploidization and diploidization, which are particularly frequent in plants. PMID:20011530

  13. PKS and NRPS gene clusters from microbial symbiont cells of marine sponges by whole genome amplification.

    PubMed

    Siegl, Alexander; Hentschel, Ute

    2010-08-01

    Whole genome amplification (WGA) approaches provide genomic information on single microbial cells and hold great promise for the field of environmental microbiology. Here, the microbial consortia of the marine sponge Aplysina aerophoba were sorted by fluorescence-activated cell sorting (FACS) and then subjected to WGA. A cosmid library was constructed from the WGA product of a sample containing two bacterial cells, one a member of the candidate phylum Poribacteria and one of a sponge-specific clade of Chloroflexi. Library screening led to the genomic characterization of three cosmid clones, encoding a polyketide synthase (PKS), a non-ribosomal peptide synthetase (NRPS) and the Chloroflexi 16S rRNA gene. PCR screening of WGA products from additional, FACS-sorted single bacterial symbiont cells supports the assignment of the Sup-PKS gene to the Poribacteria and the novel NRPS gene to the Chloroflexi. This promising single-cell genomics approach has permitted cloning of entire gene clusters from single microbial cells of known phylogenetic origin and thus provides a sought-after link between phylogeny and function. PMID:23766222

  14. Whole genome sequencing shows sleeping sickness relapse is due to parasite regrowth and not reinfection.

    PubMed

    Richardson, Joshua B; Evans, Benjamin; Pyana, Patient P; Van Reet, Nick; Sistrom, Mark; Büscher, Philippe; Aksoy, Serap; Caccone, Adalgisa

    2016-02-01

    The trypanosome Trypanosoma brucei gambiense (Tbg) is a cause of human African trypanosomiasis (HAT) endemic to many parts of sub-Saharan Africa. The disease is almost invariably fatal if untreated and there is no vaccine, which makes monitoring and managing drug resistance highly relevant. A recent study of HAT cases from the Democratic Republic of the Congo reported a high incidence of relapses in patients treated with melarsoprol. Of the 19 Tbg strains isolated from patients enrolled in this study, four pairs were obtained from the same patient before treatment and after relapse. We used whole genome sequencing to investigate whether these patients were infected with a new strain, or if the original strain had regrown to pathogenic levels. Clustering analysis of 5938 single nucleotide polymorphisms supports the hypothesis of regrowth of the original strain, as we found that strains isolated before and after treatment from the same patient were more similar to each other than to other isolates. We also identified 23 novel genes that could affect melarsoprol sensitivity, representing a promising new set of targets for future functional studies. This work exemplifies the utility of using evolutionary approaches to provide novel insights and tools for disease control. PMID:26834831

  15. A proposed clinical decision support architecture capable of supporting whole genome sequence information.

    PubMed

    Welch, Brandon M; Loya, Salvador Rodriguez; Eilbeck, Karen; Kawamoto, Kensaku

    2014-04-01

    Whole genome sequence (WGS) information may soon be widely available to help clinicians personalize the care and treatment of patients. However, considerable barriers exist, which may hinder the effective utilization of WGS information in a routine clinical care setting. Clinical decision support (CDS) offers a potential solution to overcome such barriers and to facilitate the effective use of WGS information in the clinic. However, genomic information is complex and will require significant considerations when developing CDS capabilities. As such, this manuscript lays out a conceptual framework for a CDS architecture designed to deliver WGS-guided CDS within the clinical workflow. To handle the complexity and breadth of WGS information, the proposed CDS framework leverages service-oriented capabilities and orchestrates the interaction of several independently-managed components. These independently-managed components include the genome variant knowledge base, the genome database, the CDS knowledge base, a CDS controller and the electronic health record (EHR). A key design feature is that genome data can be stored separately from the EHR. This paper describes in detail: (1) each component of the architecture; (2) the interaction of the components; and (3) how the architecture attempts to overcome the challenges associated with WGS information. We believe that service-oriented CDS capabilities will be essential to using WGS information for personalized medicine. PMID:25411644

  16. Whole-genome re-sequencing of non-model organisms: lessons from unmapped reads.

    PubMed

    Gouin, A; Legeai, F; Nouhaud, P; Whibley, A; Simon, J-C; Lemaitre, C

    2015-05-01

    Unmapped reads are often discarded from the analysis of whole-genome re-sequencing, but new biological information and insights can be uncovered through their analysis. In this paper, we investigate unmapped reads from the re-sequencing data of 33 pea aphid genomes from individuals specialized on different host plants. The unmapped reads for each individual were retrieved following mapping to the Acyrthosiphon pisum reference genome and its mitochondrial and symbiont genomes. These sets of unmapped reads were then cross-compared, revealing that a significant number of these unmapped sequences were conserved across individuals. Interestingly, sequences were most commonly shared between individuals adapted to the same host plant, suggesting that these sequences may contribute to the divergence between host plant specialized biotypes. Analysis of the contigs obtained from assembling the unmapped reads pooled by biotype allowed us to recover some divergent genomic regions previously excluded from analysis and to discover putative novel sequences of A. pisum and its symbionts. In conclusion, this study emphasizes the interest of the unmapped component of re-sequencing data sets and the potential loss of important information. We here propose strategies to aid the capture and interpretation of this information. PMID:25269379

  17. Whole-Genome Sequencing of the World’s Oldest People

    PubMed Central

    Gierman, Hinco J.; Fortney, Kristen; Roach, Jared C.; Coles, Natalie S.; Li, Hong; Glusman, Gustavo; Markov, Glenn J.; Smith, Justin D.; Hood, Leroy; Coles, L. Stephen; Kim, Stuart K.

    2014-01-01

    Supercentenarians (110 years or older) are the world’s oldest people. Seventy four are alive worldwide, with twenty two in the United States. We performed whole-genome sequencing on 17 supercentenarians to explore the genetic basis underlying extreme human longevity. We found no significant evidence of enrichment for a single rare protein-altering variant or for a gene harboring different rare protein altering variants in supercentenarian compared to control genomes. We followed up on the gene most enriched for rare protein-altering variants in our cohort of supercentenarians, TSHZ3, by sequencing it in a second cohort of 99 long-lived individuals but did not find a significant enrichment. The genome of one supercentenarian had a pathogenic mutation in DSC2, known to predispose to arrhythmogenic right ventricular cardiomyopathy, which is recommended to be reported to this individual as an incidental finding according to a recent position statement by the American College of Medical Genetics and Genomics. Even with this pathogenic mutation, the proband lived to over 110 years. The entire list of rare protein-altering variants and DNA sequence of all 17 supercentenarian genomes is available as a resource to assist the discovery of the genetic basis of extreme longevity in future studies. PMID:25390934

  18. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity.

    PubMed

    Dulak, Austin M; Stojanov, Petar; Peng, Shouyong; Lawrence, Michael S; Fox, Cameron; Stewart, Chip; Bandla, Santhoshi; Imamura, Yu; Schumacher, Steven E; Shefler, Erica; McKenna, Aaron; Carter, Scott L; Cibulskis, Kristian; Sivachenko, Andrey; Saksena, Gordon; Voet, Douglas; Ramos, Alex H; Auclair, Daniel; Thompson, Kristin; Sougnez, Carrie; Onofrio, Robert C; Guiducci, Candace; Beroukhim, Rameen; Zhou, Zhongren; Lin, Lin; Lin, Jules; Reddy, Rishindra; Chang, Andrew; Landrenau, Rodney; Pennathur, Arjun; Ogino, Shuji; Luketich, James D; Golub, Todd R; Gabriel, Stacey B; Lander, Eric S; Beer, David G; Godfrey, Tony E; Getz, Gad; Bass, Adam J

    2013-05-01

    The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a 5-year survival rate of ~15%, the identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole-exome sequencing of 149 EAC tumor-normal pairs, 15 of which have also been subjected to whole-genome sequencing. We identify a mutational signature defined by a high prevalence of A>C transversions at AA dinucleotides. Statistical analysis of exome data identified 26 significantly mutated genes. Of these genes, five (TP53, CDKN2A, SMAD4, ARID1A and PIK3CA) have previously been implicated in EAC. The new significantly mutated genes include chromatin-modifying factors and candidate contributors SPG20, TLR4, ELMO1 and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 identifies increased cellular invasion. Therefore, we suggest the potential activation of the RAC1 pathway as a contributor to EAC tumorigenesis. PMID:23525077

  19. Discovery of new Mycoplasma pneumoniae antigens by use of a whole-genome lambda display library.

    PubMed

    Beghetto, Elisa; De Paolis, Francesca; Montagnani, Francesca; Cellesi, Carla; Gargano, Nicola

    2009-01-01

    Mycoplasma pneumoniae is the leading cause of atypical pneumonia in children and young adults. Bacterial colonization can occur in both the upper and the lower respiratory tracts and take place both endemically and epidemically worldwide. Characteristically, the infection is chronic in onset and recovery and both humoral and cell-mediated mechanisms are involved in the response to bacterial colonization. To identify bacterial proteins recognized by host antibody responses, a whole-genome M. pneumoniae library was created and displayed on lambda bacteriophage. The challenge of such a library with sera from individuals hospitalized for mycoplasmal pneumonia allowed the identification of a panel of recombinant bacteriophages carrying B-cell epitopes. Among the already known M. pneumoniae B-cell antigens, our results confirmed the immunogenicity of P1 and P30 adhesins. Also, the data presented in this study localized, within their sequences, the immunodominant epitopes recognized by human immunoglobulins. Furthermore, library screening allowed the identification of four novel immunogenic polypeptides, respectively, encoded by fragments of the MPN152, MPN426, MPN456 and MPN-500 open reading frames, highlighting and further confirming the potential of lambda display technology in antigen and epitope discovery. PMID:18992837

  20. Multidrug-resistant Escherichia coli soft tissue infection investigated with bacterial whole genome sequencing.

    PubMed

    Buchanan, Ruaridh; Stoesser, Nicole; Crook, Derrick; Bowler, Ian C J W

    2014-01-01

    A 45-year-old man with dilated cardiomyopathy presented with acute leg pain and erythema suggestive of necrotising fasciitis. Initial surgical exploration revealed no necrosis and treatment for a soft tissue infection was started. Blood and tissue cultures unexpectedly grew a Gram-negative bacillus, subsequently identified by an automated broth microdilution phenotyping system as an extended-spectrum β-lactamase producing Escherichia coli. The patient was treated with a 3-week course of antibiotics (ertapenem followed by ciprofloxacin) and debridement for small areas of necrosis, followed by skin grafting. The presence of E. coli triggered investigation of both host and pathogen. The patient was found to have previously undiagnosed liver disease, a risk factor for E. coli soft tissue infection. Whole genome sequencing of isolates from all specimens confirmed they were clonal, of sequence type ST131 and associated with a likely plasmid-associated AmpC (CMY-2), several other resistance genes and a number of virulence factors. PMID:25331151

  1. Deep Whole-Genome Sequencing to Detect Mixed Infection of Mycobacterium tuberculosis.

    PubMed

    Gan, Mingyu; Liu, Qingyun; Yang, Chongguang; Gao, Qian; Luo, Tao

    2016-01-01

    Mixed infection by multiple Mycobacterium tuberculosis (MTB) strains is associated with poor treatment outcome of tuberculosis (TB). Traditional genotyping methods have been used to detect mixed infections of MTB, however, their sensitivity and resolution are limited. Deep whole-genome sequencing (WGS) has been proved highly sensitive and discriminative for studying population heterogeneity of MTB. Here, we developed a phylogenetic-based method to detect MTB mixed infections using WGS data. We collected published WGS data of 782 global MTB strains from public database. We called homogeneous and heterogeneous single nucleotide variations (SNVs) of individual strains by mapping short reads to the ancestral MTB reference genome. We constructed a phylogenomic database based on 68,639 homogeneous SNVs of 652 MTB strains. Mixed infections were determined if multiple evolutionary paths were identified by mapping the SNVs of individual samples to the phylogenomic database. By simulation, our method could specifically detect mixed infections when the sequencing depth of minor strains was as low as 1× coverage, and when the genomic distance of two mixed strains was as small as 16 SNVs. By applying our methods to all 782 samples, we detected 47 mixed infections and 45 of them were caused by locally endemic strains. The results indicate that our method is highly sensitive and discriminative for identifying mixed infections from deep WGS data of MTB isolates. PMID:27391214

  2. Digital Droplet Multiple Displacement Amplification (ddMDA) for Whole Genome Sequencing of Limited DNA Samples

    PubMed Central

    Rhee, Minsoung; Light, Yooli K.; Meagher, Robert J.; Singh, Anup K.

    2016-01-01

    Multiple displacement amplification (MDA) is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples) before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA) technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet), ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli) compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology. PMID:27144304

  3. Digital Droplet Multiple Displacement Amplification (ddMDA) for Whole Genome Sequencing of Limited DNA Samples.

    PubMed

    Rhee, Minsoung; Light, Yooli K; Meagher, Robert J; Singh, Anup K

    2016-01-01

    Multiple displacement amplification (MDA) is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples) before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA) technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently, the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet), ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli) compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology. PMID:27144304

  4. Whole-Genome Analyses of Korean Native and Holstein Cattle Breeds by Massively Parallel Sequencing

    PubMed Central

    Stothard, Paul; Chung, Won-Hyong; Jeon, Heoyn-Jeong; Miller, Stephen P.; Choi, So-Young; Lee, Jeong-Koo; Yang, Bokyoung; Lee, Kyung-Tai; Han, Kwang-Jin; Kim, Hyeong-Cheol; Jeong, Dongkee; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

    2014-01-01

    A main goal of cattle genomics is to identify DNA differences that account for variations in economically important traits. In this study, we performed whole-genome analyses of three important cattle breeds in Korea—Hanwoo, Jeju Heugu, and Korean Holstein—using the Illumina HiSeq 2000 sequencing platform. We achieved 25.5-, 29.6-, and 29.5-fold coverage of the Hanwoo, Jeju Heugu, and Korean Holstein genomes, respectively, and identified a total of 10.4 million single nucleotide polymorphisms (SNPs), of which 54.12% were found to be novel. We also detected 1,063,267 insertions–deletions (InDels) across the genomes (78.92% novel). Annotations of the datasets identified a total of 31,503 nonsynonymous SNPs and 859 frameshift InDels that could affect phenotypic variations in traits of interest. Furthermore, genome-wide copy number variation regions (CNVRs) were detected by comparing the Hanwoo, Jeju Heugu, and previously published Chikso genomes against that of Korean Holstein. A total of 992, 284, and 1881 CNVRs, respectively, were detected throughout the genome. Moreover, 53, 65, 45, and 82 putative regions of homozygosity (ROH) were identified in Hanwoo, Jeju Heugu, Chikso, and Korean Holstein respectively. The results of this study provide a valuable foundation for further investigations to dissect the molecular mechanisms underlying variation in economically important traits in cattle and to develop genetic markers for use in cattle breeding. PMID:24992012

  5. GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data

    PubMed Central

    Benoukraf, Touati; Wongphayak, Sarawut; Hadi, Luqman Hakim Abdul; Wu, Mengchu; Soong, Richie

    2013-01-01

    High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA—http://ctrad-csi.nus.edu.sg/gbsa), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms. PMID:23268441

  6. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins.

    PubMed

    Croucher, Nicholas J; Page, Andrew J; Connor, Thomas R; Delaney, Aidan J; Keane, Jacqueline A; Bentley, Stephen D; Parkhill, Julian; Harris, Simon R

    2015-02-18

    The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates' recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X. PMID:25414349

  7. Whole Genome Sequencing Reveals a De Novo SHANK3 Mutation in Familial Autism Spectrum Disorder

    PubMed Central

    Nemirovsky, Sergio I.; Córdoba, Marta; Zaiat, Jonathan J.; Completa, Sabrina P.; Vega, Patricia A.; González-Morón, Dolores; Medina, Nancy M.; Fabbro, Mónica; Romero, Soledad; Brun, Bianca; Revale, Santiago; Ogara, María Florencia; Pecci, Adali; Marti, Marcelo; Vazquez, Martin; Turjanski, Adrián; Kauffman, Marcelo A.

    2015-01-01

    Introduction Clinical genomics promise to be especially suitable for the study of etiologically heterogeneous conditions such as Autism Spectrum Disorder (ASD). Here we present three siblings with ASD where we evaluated the usefulness of Whole Genome Sequencing (WGS) for the diagnostic approach to ASD. Methods We identified a family segregating ASD in three siblings with an unidentified cause. We performed WGS in the three probands and used a state-of-the-art comprehensive bioinformatic analysis pipeline and prioritized the identified variants located in genes likely to be related to ASD. We validated the finding by Sanger sequencing in the probands and their parents. Results Three male siblings presented a syndrome characterized by severe intellectual disability, absence of language, autism spectrum symptoms and epilepsy with negative family history for mental retardation, language disorders, ASD or other psychiatric disorders. We found germline mosaicism for a heterozygous deletion of a cytosine in the exon 21 of the SHANK3 gene, resulting in a missense sequence of 5 codons followed by a premature stop codon (NM_033517:c.3259_3259delC, p.Ser1088Profs*6). Conclusions We reported an infrequent form of familial ASD where WGS proved useful in the clinic. We identified a mutation in SHANK3 that underscores its relevance in Autism Spectrum Disorder. PMID:25646853

  8. Mining metagenomic whole genome sequences revealed subdominant but constant Lactobacillus population in the human gut microbiota.

    PubMed

    Rossi, Maddalena; Martínez-Martínez, Daniel; Amaretti, Alberto; Ulrici, Alessandro; Raimondi, Stefano; Moya, Andrés

    2016-06-01

    The genus Lactobacillus includes over 215 species that colonize plants, foods, sewage and the gastrointestinal tract (GIT) of humans and animals. In the GIT, Lactobacillus population can be made by true inhabitants or by bacteria occasionally ingested with fermented or spoiled foods, or with probiotics. This study longitudinally surveyed Lactobacillus species and strains in the feces of a healthy subject through whole genome sequencing (WGS) data-mining, in order to identify members of the permanent or transient populations. In three time-points (0, 670 and 700 d), 58 different species were identified, 16 of them being retrieved for the first time in human feces. L. rhamnosus, L. ruminis, L. delbrueckii, L. plantarum, L. casei and L. acidophilus were the most represented, with estimated amounts ranging between 6 and 8 Log (cells g(-1) ), while the other were detected at 4 or 5 Log (cells g(-1) ). 86 Lactobacillus strains belonging to 52 species were identified. 43 seemingly occupied the GIT as true residents, since were detected in a time span of almost 2 years in all the three samples or in 2 samples separated by 670 or 700 d. As a whole, a stable community of lactobacilli was disclosed, with wide and understudied biodiversity. PMID:27043715

  9. Bonus Organisms in High-Throughput Eukaryotic Whole-Genome Shorgun Assembly

    SciTech Connect

    Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank; Platt, Darren

    2006-02-06

    The DOE Joint Genome Institute has sequenced over 50 eukaryotic genomes, ranging in size from 15 MB to 1.6 GB, over a wide range of organism types. In the course of doing so, it has become clear that a substantial fraction of these data sets contains bonus organisms, usually prokaryotes, in addition to the desired genome. While some of these additional organisms are extraneous contamination, they are sometimes symbionts, and so can be of biological interest. Therefore, it is desirable to assemble the bonus organisms along with the main genome. This transforms the problem into one of metagenomic assembly, which is considerably more challenging than traditional whole-genome shotgun (WGS) assembly. The different organisms will usually be present at different sequence depths, which is difficult to handle in most WGS assemblers. In addition, with multiple distinct genomes present, chimerism can produce cross-organism combinations. Finally, there is no guarantee that only a single bonus organism will be present. For example, one JGI project contained at least two different prokaryotic contaminants, plus a 145 KB plasmid of unknown origin. We have developed techniques to routinely identify and handle such bonus organisms in a high-throughput sequencing environment. Approaches include screening and partitioning the unassembled data, and iterative subassemblies. These methods are applicable not only to bonus organisms, but also to desired components such as organelles. These procedures have the additional benefit of identifying, and allowing for the removal of, cloning artifacts such as E.coli and spurious vector inclusions.

  10. Identification of Salmonella for public health surveillance using whole genome sequencing

    PubMed Central

    Ashton, Philip M.; Nair, Satheesh; Peters, Tansy M.; Bale, Janet A.; Powell, David G.; Painset, Anaïs; Tewolde, Rediat; Schaefer, Ulf; de Pinna, Elizabeth M.; Grant, Kathie A.

    2016-01-01

    In April 2015, Public Health England implemented whole genome sequencing (WGS) as a routine typing tool for public health surveillance of Salmonella, adopting a multilocus sequence typing (MLST) approach as a replacement for traditional serotyping. The WGS derived sequence type (ST) was compared to the phenotypic serotype for 6,887 isolates of S. enterica subspecies I, and of these, 6,616 (96%) were concordant. Of the 4% (n = 271) of isolates of subspecies I exhibiting a mismatch, 119 were due to a process error in the laboratory, 26 were likely caused by the serotype designation in the MLST database being incorrect and 126 occurred when two different serovars belonged to the same ST. The population structure of S. enterica subspecies II–IV differs markedly from that of subspecies I and, based on current data, defining the serovar from the clonal complex may be less appropriate for the classification of this group. Novel sequence types that were not present in the MLST database were identified in 8.6% of the total number of samples tested (including S. enterica subspecies I–IV and S. bongori) and these 654 isolates belonged to 326 novel STs. For S. enterica subspecies I, WGS MLST derived serotyping is a high throughput, accurate, robust, reliable typing method, well suited to routine public health surveillance. The combined output of ST and serovar supports the maintenance of traditional serovar nomenclature while providing additional insight on the true phylogenetic relationship between isolates. PMID:27069781

  11. Attitudes of African Americans toward Return of Results from Exome and Whole Genome Sequencing

    PubMed Central

    Yu, Joon-Ho; Crouch, Julia; Jamal, Seema M.; Tabor, Holly K.; Bamshad, Michael J.

    2013-01-01

    Exome sequencing and whole genome sequencing (ES/WGS) present patients and research participants with the opportunity to benefit from a broad scope of genetic results of clinical and personal utility. Yet, this potential for benefit also risks disenfranchising populations such as African Americans (AAs) that are already underrepresented in genetic research and utilize genetic tests at lower rates than other populations. Understanding a diverse range of perspectives on consenting for ES/WGS and receiving ES/WGS results is necessary to ensure parity in genomic health care and research. We conducted a series of 13 focus groups (n=76) to investigate if and how attitudes toward participation in ES/WGS research and return of results from ES/WGS differ between self described AAs and non-AAs. The majority of both AAs and non-AAs were willing to participate in WGS studies and receive individual genetic results, but the fraction not interested in either was higher in AAs. This is due in part to different expectations of health benefits from ES/WGS and how results should be managed. Our results underscore the need to develop and test culturally tailored strategies for returning ES/WGS results to AAs. PMID:23610051

  12. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins

    PubMed Central

    Croucher, Nicholas J.; Page, Andrew J.; Connor, Thomas R.; Delaney, Aidan J.; Keane, Jacqueline A.; Bentley, Stephen D.; Parkhill, Julian; Harris, Simon R.

    2015-01-01

    The emergence of new sequencing technologies has facilitated the use of bacterial whole genome alignments for evolutionary studies and outbreak analyses. These datasets, of increasing size, often include examples of multiple different mechanisms of horizontal sequence transfer resulting in substantial alterations to prokaryotic chromosomes. The impact of these processes demands rapid and flexible approaches able to account for recombination when reconstructing isolates’ recent diversification. Gubbins is an iterative algorithm that uses spatial scanning statistics to identify loci containing elevated densities of base substitutions suggestive of horizontal sequence transfer while concurrently constructing a maximum likelihood phylogeny based on the putative point mutations outside these regions of high sequence diversity. Simulations demonstrate the algorithm generates highly accurate reconstructions under realistically parameterized models of bacterial evolution, and achieves convergence in only a few hours on alignments of hundreds of bacterial genome sequences. Gubbins is appropriate for reconstructing the recent evolutionary history of a variety of haploid genotype alignments, as it makes no assumptions about the underlying mechanism of recombination. The software is freely available for download at github.com/sanger-pathogens/Gubbins, implemented in Python and C and supported on Linux and Mac OS X. PMID:25414349

  13. Evaluating and Characterizing Ancient Whole-Genome Duplications in Plants with Gene Count Data.

    PubMed

    Tiley, George P; Ané, Cécile; Burleigh, J Gordon

    2016-01-01

    Whole-genome duplications (WGDs) have helped shape the genomes of land plants, and recent evidence suggests that the genomes of all angiosperms have experienced at least two ancient WGDs. In plants, WGDs often are followed by rapid fractionation, in which many homeologous gene copies are lost. Thus, it can be extremely difficult to identify, let alone characterize, ancient WGDs. In this study, we use a new maximum likelihood estimator to test for evidence of ancient WGDs in land plants and estimate the fraction of new genes copies that are retained following a WGD using gene count data, the number of gene copies in gene families. We identified evidence of many putative ancient WGDs in land plants and found that the genome fractionation rates vary tremendously among ancient WGDs. Analyses of WGDs within Brassicales also indicate that background gene duplication and loss rates vary across land plants, and different gene families have different probabilities of being retained following a WGD. Although our analyses are largely robust to errors in duplication and loss rates and the choice of priors, simulations indicate that this method can have trouble detecting multiple WGDs that occur on the same branch, especially when the gene retention rates for ancient WGDs are very low. They also suggest that we should carefully evaluate evidence for some ancient plant WGD hypotheses. PMID:26988251

  14. Long insert whole genome sequencing for copy number variant and translocation detection

    PubMed Central

    Liang, Winnie S.; Aldrich, Jessica; Tembe, Waibhav; Kurdoglu, Ahmet; Cherni, Irene; Phillips, Lori; Reiman, Rebecca; Baker, Angela; Weiss, Glen J.; Carpten, John D.; Craig, David W.

    2014-01-01

    As next-generation sequencing continues to have an expanding presence in the clinic, the identification of the most cost-effective and robust strategy for identifying copy number changes and translocations in tumor genomes is needed. We hypothesized that performing shallow whole genome sequencing (WGS) of 900–1000-bp inserts (long insert WGS, LI-WGS) improves our ability to detect these events, compared with shallow WGS of 300–400-bp inserts. A priori analyses show that LI-WGS requires less sequencing compared with short insert WGS to achieve a target physical coverage, and that LI-WGS requires less sequence coverage to detect a heterozygous event with a power of 0.99. We thus developed an LI-WGS library preparation protocol based off of Illumina’s WGS library preparation protocol and illustrate the feasibility of performing LI-WGS. We additionally applied LI-WGS to three separate tumor/normal DNA pairs collected from patients diagnosed with different cancers to demonstrate our application of LI-WGS on actual patient samples for identification of somatic copy number alterations and translocations. With the evolution of sequencing technologies and bioinformatics analyses, we show that modifications to current approaches may improve our ability to interrogate cancer genomes. PMID:24071583

  15. Use of bacterial whole-genome sequencing to investigate local persistence and spread in bovine tuberculosis.

    PubMed

    Trewby, Hannah; Wright, David; Breadon, Eleanor L; Lycett, Samantha J; Mallon, Tom R; McCormick, Carl; Johnson, Paul; Orton, Richard J; Allen, Adrian R; Galbraith, Julie; Herzyk, Pawel; Skuce, Robin A; Biek, Roman; Kao, Rowland R

    2016-03-01

    Mycobacterium bovis is the causal agent of bovine tuberculosis, one of the most important diseases currently facing the UK cattle industry. Here, we use high-density whole genome sequencing (WGS) in a defined sub-population of M. bovis in 145 cattle across 66 herd breakdowns to gain insights into local spread and persistence. We show that despite low divergence among isolates, WGS can in principle expose contributions of under-sampled host populations to M. bovis transmission. However, we demonstrate that in our data such a signal is due to molecular type switching, which had been previously undocumented for M. bovis. Isolates from farms with a known history of direct cattle movement between them did not show a statistical signal of higher genetic similarity. Despite an overall signal of genetic isolation by distance, genetic distances also showed no apparent relationship with spatial distance among affected farms over distances <5 km. Using simulations, we find that even over the brief evolutionary timescale covered by our data, Bayesian phylogeographic approaches are feasible. Applying such approaches showed that M. bovis dispersal in this system is heterogeneous but slow overall, averaging 2 km/year. These results confirm that widespread application of WGS to M. bovis will bring novel and important insights into the dynamics of M. bovis spread and persistence, but that the current questions most pertinent to control will be best addressed using approaches that more directly integrate WGS with additional epidemiological data. PMID:26972511

  16. Prioritizing disease-linked variants, genes, and pathways with an interactive whole genome analysis pipeline

    PubMed Central

    Lee, In-Hee; Lee, Kyungjoon; Hsing, Michael; Choe, Yongjoon; Park, Jin-Ho; Kim, Shu Hee; Bohn, Justin M.; Neu, Matthew B.; Hwang, Kyu-Baek; Green, Robert C.; Kohane, Isaac S.; Kong, Sek Won

    2014-01-01

    Whole genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and non-rare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline –gNOME – to prioritize phenotype-associated variants while minimizing false positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype associated variants, and the result summaries are provided at variant-, gene-, and genome-levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared to population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss of function variants in known cancer pathways. gNOME web-server and source codes are freely available to the academic community. PMID:24478219

  17. The implications of whole-genome sequencing in the control of tuberculosis.

    PubMed

    Lee, Robyn S; Behr, Marcel A

    2016-04-01

    The availability of whole-genome sequencing (WGS) as a tool for the diagnosis and clinical management of tuberculosis (TB) offers considerable promise in the fight against this stubborn epidemic. However, like other new technologies, the best application of WGS remains to be determined, for both conceptual and technical reasons. In this review, we consider the potential value of WGS in the clinical laboratory for the detection of Mycobacterium tuberculosis and the prediction of antibiotic resistance. We also discuss issues pertaining to data generation, interpretation and dissemination, given that WGS has to date been generally performed in research labs where results are not necessarily packaged in a clinician-friendly format. Although WGS is far more accessible now than it was in the past, the transition from a research tool to study TB into a clinical test to manage this disease may require further fine-tuning. Improvements will likely come through iterative efforts that involve both the laboratories ready to move TB into the genomic era and the front-line clinical/public health staff who will be interpreting the results to inform management decisions. PMID:27034776

  18. Prospective Whole-Genome Sequencing Enhances National Surveillance of Listeria monocytogenes

    PubMed Central

    Kwong, Jason C.; Mercoulia, Karolina; Tomita, Takehiro; Easton, Marion; Li, Hua Y.; Bulach, Dieter M.; Stinear, Timothy P.; Seemann, Torsten

    2015-01-01

    Whole-genome sequencing (WGS) has emerged as a powerful tool for comparing bacterial isolates in outbreak detection and investigation. Here we demonstrate that WGS performed prospectively for national epidemiologic surveillance of Listeria monocytogenes has the capacity to be superior to our current approaches using pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable-number tandem-repeat analysis (MLVA), binary typing, and serotyping. Initially 423 L. monocytogenes isolates underwent WGS, and comparisons uncovered a diverse genetic population structure derived from three distinct lineages. MLST, binary typing, and serotyping results inferred in silico from the WGS data were highly concordant (>99%) with laboratory typing performed in parallel. However, WGS was able to identify distinct nested clusters within groups of isolates that were otherwise indistinguishable using our current typing methods. Routine WGS was then used for prospective epidemiologic surveillance on a further 97 L. monocytogenes isolates over a 12-month period, which provided a greater level of discrimination than that of conventional typing for inferring linkage to point source outbreaks. A risk-based alert system based on WGS similarity was used to inform epidemiologists required to act on the data. Our experience shows that WGS can be adopted for prospective L. monocytogenes surveillance and investigated for other pathogens relevant to public health. PMID:26607978

  19. ENCODE whole-genome data in the UCSC Genome Browser: update 2012

    PubMed Central

    Rosenbloom, Kate R.; Dreszer, Timothy R.; Long, Jeffrey C.; Malladi, Venkat S.; Sloan, Cricket A.; Raney, Brian J.; Cline, Melissa S.; Karolchik, Donna; Barber, Galt P.; Clawson, Hiram; Diekhans, Mark; Fujita, Pauline A.; Goldman, Mary; Gravell, Robert C.; Harte, Rachel A.; Hinrichs, Angie S.; Kirkup, Vanessa M.; Kuhn, Robert M.; Learned, Katrina; Maddren, Morgan; Meyer, Laurence R.; Pohl, Andy; Rhead, Brooke; Wong, Matthew C.; Zweig, Ann S.; Haussler, David; Kent, W. James

    2012-01-01

    The Encyclopedia of DNA Elements (ENCODE) Consortium is entering its 5th year of production-level effort generating high-quality whole-genome functional annotations of the human genome. The past year has brought the ENCODE compendium of functional elements to critical mass, with a diverse set of 27 biochemical assays now covering 200 distinct human cell types. Within the mouse genome, which has been under study by ENCODE groups for the past 2 years, 37 cell types have been assayed. Over 2000 individual experiments have been completed and submitted to the Data Coordination Center for public use. UCSC makes this data available on the quality-reviewed public Genome Browser (http://genome.ucsc.edu) and on an early-access Preview Browser (http://genome-preview.ucsc.edu). Visual browsing, data mining and download of raw and processed data files are all supported. An ENCODE portal (http://encodeproject.org) provides specialized tools and information about the ENCODE data sets. PMID:22075998

  20. Whole-genome plasma sequencing reveals focal amplifications as a driving force in metastatic prostate cancer.

    PubMed

    Ulz, Peter; Belic, Jelena; Graf, Ricarda; Auer, Martina; Lafer, Ingrid; Fischereder, Katja; Webersinke, Gerald; Pummer, Karl; Augustin, Herbert; Pichler, Martin; Hoefler, Gerald; Bauernhofer, Thomas; Geigl, Jochen B; Heitzer, Ellen; Speicher, Michael R

    2016-01-01

    Genomic alterations in metastatic prostate cancer remain incompletely characterized. Here we analyse 493 prostate cancer cases from the TCGA database and perform whole-genome plasma sequencing on 95 plasma samples derived from 43 patients with metastatic prostate cancer. From these samples, we identify established driver aberrations in a cancer-related gene in nearly all cases (97.7%), including driver gene fusions (TMPRSS2:ERG), driver focal deletions (PTEN, RYBP and SHQ1) and driver amplifications (AR and MYC). In serial plasma analyses, we observe changes in focal amplifications in 40% of cases. The mean time interval between new amplifications was 26.4 weeks (range: 5-52 weeks), suggesting that they represent rapid adaptations to selection pressure. An increase in neuron-specific enolase is accompanied by clonal pattern changes in the tumour genome, most consistent with subclonal diversification of the tumour. Our findings suggest a high plasticity of prostate cancer genomes with newly occurring focal amplifications as a driving force in progression. PMID:27328849

  1. Evaluating and Characterizing Ancient Whole-Genome Duplications in Plants with Gene Count Data

    PubMed Central

    Tiley, George P.; Ané, Cécile; Burleigh, J. Gordon

    2016-01-01

    Whole-genome duplications (WGDs) have helped shape the genomes of land plants, and recent evidence suggests that the genomes of all angiosperms have experienced at least two ancient WGDs. In plants, WGDs often are followed by rapid fractionation, in which many homeologous gene copies are lost. Thus, it can be extremely difficult to identify, let alone characterize, ancient WGDs. In this study, we use a new maximum likelihood estimator to test for evidence of ancient WGDs in land plants and estimate the fraction of new genes copies that are retained following a WGD using gene count data, the number of gene copies in gene families. We identified evidence of many putative ancient WGDs in land plants and found that the genome fractionation rates vary tremendously among ancient WGDs. Analyses of WGDs within Brassicales also indicate that background gene duplication and loss rates vary across land plants, and different gene families have different probabilities of being retained following a WGD. Although our analyses are largely robust to errors in duplication and loss rates and the choice of priors, simulations indicate that this method can have trouble detecting multiple WGDs that occur on the same branch, especially when the gene retention rates for ancient WGDs are very low. They also suggest that we should carefully evaluate evidence for some ancient plant WGD hypotheses. PMID:26988251

  2. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    PubMed

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa. PMID:24949238

  3. Whole Genome Sequencing demonstrates that Geographic Variation of Escherichia coli O157 Genotypes Dominates Host Association

    PubMed Central

    Strachan, Norval J. C.; Rotariu, Ovidiu; Lopes, Bruno; MacRae, Marion; Fairley, Susan; Laing, Chad; Gannon, Victor; Allison, Lesley J.; Hanson, Mary F.; Dallman, Tim; Ashton, Philip; Franz, Eelco; van Hoek, Angela H. A. M.; French, Nigel P.; George, Tessy; Biggs, Patrick J.; Forbes, Ken J.

    2015-01-01

    Genetic variation in an infectious disease pathogen can be driven by ecological niche dissimilarities arising from different host species and different geographical locations. Whole genome sequencing was used to compare E. coli O157 isolates from host reservoirs (cattle and sheep) from Scotland and to compare genetic variation of isolates (human, animal, environmental/food) obtained from Scotland, New Zealand, Netherlands, Canada and the USA. Nei’s genetic distance calculated from core genome single nucleotide polymorphisms (SNPs) demonstrated that the animal isolates were from the same population. Investigation of the Shiga toxin bacteriophage and their insertion sites (SBI typing) revealed that cattle and sheep isolates had statistically indistinguishable rarefaction profiles, diversity and genotypes. In contrast, isolates from different countries exhibited significant differences in Nei’s genetic distance and SBI typing. Hence, after successful international transmission, which has occurred on multiple occasions, local genetic variation occurs, resulting in a global patchwork of continental and trans-continental phylogeographic clades. These findings are important for three reasons: first, understanding transmission and evolution of infectious diseases associated with multiple host reservoirs and multi-geographic locations; second, highlighting the relevance of the sheep reservoir when considering farm based interventions; and third, improving our understanding of why human disease incidence varies across the world. PMID:26442781

  4. Kuwaiti population subgroup of nomadic Bedouin ancestry—Whole genome sequence and analysis

    PubMed Central

    John, Sumi Elsa; Thareja, Gaurav; Hebbar, Prashantha; Behbehani, Kazem; Thanaraj, Thangavel Alphonse; Alsmadi, Osama

    2014-01-01

    Kuwaiti native population comprises three distinct genetic subgroups of Persian, “city-dwelling” Saudi Arabian tribe, and nomadic “tent-dwelling” Bedouin ancestry. Bedouin subgroup is characterized by presence of 17% African ancestry; it owes it origin to nomadic tribes of the deserts of Arabian Peninsula and North Africa. By sequencing whole genome of a Kuwaiti male from this subgroup at 41X coverage, we report 3,752,878 SNPs, 411,839 indels, and 8451 structural variations. Neighbor-joining tree, based on shared variant positions carrying disease-risk alleles between the Bedouin and other continental genomes, places Bedouin genome at the nexus of African, Asian, and European genomes in concordance with geographical location of Kuwait and Peninsula. In congruence with participant's medical history for morbid obesity and bronchial asthma, risk alleles are seen at deleterious SNPs associated with obesity and asthma. Many of the observed deleterious ‘novel’ variants lie in genes associated with autosomal recessive disorders characteristic of the region. PMID:26484159

  5. New Perspectives on Microbial Community Distortion after Whole-Genome Amplification

    PubMed Central

    DeSantis, Todd Z.; Santo Domingo, Jorge W.; Ashbolt, Nicholas

    2015-01-01

    Whole-genome amplification (WGA) has become an important tool to explore the genomic information of microorganisms in an environmental sample with limited biomass, however potential selective biases during the amplification processes are poorly understood. Here, we describe the effects of WGA on 31 different microbial communities from five biotopes that also included low-biomass samples from drinking water and groundwater. Our findings provide evidence that microbiome segregation by biotope was possible despite WGA treatment. Nevertheless, samples from different biotopes revealed different levels of distortion, with genomic GC content significantly correlated with WGA perturbation. Certain phylogenetic clades revealed a homogenous trend across various sample types, for instance Alpha- and Betaproteobacteria showed a decrease in their abundance after WGA treatment. On the other hand, Enterobacteriaceae, an important biomarker group for fecal contamination in groundwater and drinking water, were strongly affected by WGA treatment without a predictable pattern. These novel results describe the impact of WGA on low-biomass samples and may highlight issues to be aware of when designing future metagenomic studies that necessitate preceding WGA treatment. PMID:26010362

  6. The extent of whole-genome copy number alterations predicts aggressive features in primary melanomas.

    PubMed

    Gandolfi, Greta; Longo, Caterina; Moscarella, Elvira; Zalaudek, Iris; Sancisi, Valentina; Raucci, Margherita; Manzotti, Gloria; Gugnoni, Mila; Piana, Simonetta; Argenziano, Giuseppe; Ciarrocchi, Alessia

    2016-03-01

    Recent evidence indicates that melanoma comprises distinct types of tumors and suggests that specific morphological features may help predict its clinical behavior. Using a SNP-array approach, we quantified chromosomal copy number alterations (CNA) across the whole genome in 41 primary melanomas and found a high degree of heterogeneity in their genomic asset. Association analysis correlating the number and relative length of CNA with clinical, morphological, and dermoscopic attributes of melanoma revealed that features of aggressiveness were strongly linked to the overall amount of genomic damage. Furthermore, we observed that melanoma progression and survival were mainly affected by a low number of large chromosome losses and a high number of small gains. We identified the alterations most frequently associated with aggressive melanoma, and by integrating our data with publicly available gene expression profiles, we identified five genes which expression was found to be necessary for melanoma cells proliferation. In conclusion, this work provides new evidence that the phenotypic heterogeneity of melanoma reflects a parallel genetic diversity and lays the basis to define novel strategies for a more precise prognostic stratification of patients. PMID:26575206

  7. Whole genome and transcriptome sequencing of matched primary and peritoneal metastatic gastric carcinoma

    PubMed Central

    Zhang, J.; Huang, J. Y.; Chen, Y. N.; Yuan, F.; Zhang, H.; Yan, F. H.; Wang, M. J.; Wang, G.; Su, M.; Lu, G; Huang, Y.; Dai, H.; Ji, J.; Zhang, J.; Zhang, J. N.; Jiang, Y. N.; Chen, S. J.; Zhu, Z. G.; Yu, Y. Y.

    2015-01-01

    Gastric cancer is one of the most aggressive cancers and is the second leading cause of cancer death worldwide. Approximately 40% of global gastric cancer cases occur in China, with peritoneal metastasis being the prevalent form of recurrence and metastasis in advanced disease. Currently, there are limited clinical approaches for predicting and treatment of peritoneal metastasis, resulting in a 6-month average survival time. By comprehensive genome analysis will uncover the pathogenesis of peritoneal metastasis. Here we describe a comprehensive whole-genome and transcriptome sequencing analysis of one advanced gastric cancer case, including non-cancerous mucosa, primary cancer and matched peritoneal metastatic cancer. The peripheral blood is used as normal control. We identified 27 mutated genes, of which 19 genes are reported in COSMIC database (ZNF208, CRNN, ATXN3, DCTN1, RP1L1, PRB4, PRB1, MUC4, HS6ST3, MUC17, JAM2, ITGAD, IREB2, IQUB, CORO1B, CCDC121, AKAP2, ACAN and ACADL), and eight genes have not previously been described in gastric cancer (CCDC178, ARMC4, TUBB6, PLIN4, PKLR, PDZD2, DMBT1and DAB1).Additionally,GPX4 and MPND in 19q13.3-13.4 region, is characterized as a novel fusion-gene. This study disclosed novel biological markers and tumorigenic pathways that would predict gastric cancer occurring peritoneal metastasis. PMID:26330360

  8. Landscape of somatic mutations in 560 breast cancer whole-genome sequences.

    PubMed

    Nik-Zainal, Serena; Davies, Helen; Staaf, Johan; Ramakrishna, Manasa; Glodzik, Dominik; Zou, Xueqing; Martincorena, Inigo; Alexandrov, Ludmil B; Martin, Sancha; Wedge, David C; Van Loo, Peter; Ju, Young Seok; Smid, Marcel; Brinkman, Arie B; Morganella, Sandro; Aure, Miriam R; Lingjærde, Ole Christian; Langerød, Anita; Ringnér, Markus; Ahn, Sung-Min; Boyault, Sandrine; Brock, Jane E; Broeks, Annegien; Butler, Adam; Desmedt, Christine; Dirix, Luc; Dronov, Serge; Fatima, Aquila; Foekens, John A; Gerstung, Moritz; Hooijer, Gerrit K J; Jang, Se Jin; Jones, David R; Kim, Hyung-Yong; King, Tari A; Krishnamurthy, Savitri; Lee, Hee Jin; Lee, Jeong-Yeon; Li, Yilong; McLaren, Stuart; Menzies, Andrew; Mustonen, Ville; O'Meara, Sarah; Pauporté, Iris; Pivot, Xavier; Purdie, Colin A; Raine, Keiran; Ramakrishnan, Kamna; Rodríguez-González, F Germán; Romieu, Gilles; Sieuwerts, Anieta M; Simpson, Peter T; Shepherd, Rebecca; Stebbings, Lucy; Stefansson, Olafur A; Teague, Jon; Tommasi, Stefania; Treilleux, Isabelle; Van den Eynden, Gert G; Vermeulen, Peter; Vincent-Salomon, Anne; Yates, Lucy; Caldas, Carlos; van't Veer, Laura; Tutt, Andrew; Knappskog, Stian; Tan, Benita Kiat Tee; Jonkers, Jos; Borg, Åke; Ueno, Naoto T; Sotiriou, Christos; Viari, Alain; Futreal, P Andrew; Campbell, Peter J; Span, Paul N; Van Laere, Steven; Lakhani, Sunil R; Eyfjord, Jorunn E; Thompson, Alastair M; Birney, Ewan; Stunnenberg, Hendrik G; van de Vijver, Marc J; Martens, John W M; Børresen-Dale, Anne-Lise; Richardson, Andrea L; Kong, Gu; Thomas, Gilles; Stratton, Michael R

    2016-06-01

    We analysed whole-genome sequences of 560 breast cancers to advance understanding of the driver mutations conferring clonal advantage and the mutational processes generating somatic mutations. We found that 93 protein-coding cancer genes carried probable driver mutations. Some non-coding regions exhibited high mutation frequencies, but most have distinctive structural features probably causing elevated mutation rates and do not contain driver mutations. Mutational signature analysis was extended to genome rearrangements and revealed twelve base substitution and six rearrangement signatures. Three rearrangement signatures, characterized by tandem duplications or deletions, appear associated with defective homologous-recombination-based DNA repair: one with deficient BRCA1 function, another with deficient BRCA1 or BRCA2 function, the cause of the third is unknown. This analysis of all classes of somatic mutation across exons, introns and intergenic regions highlights the repertoire of cancer genes and mutational processes operating, and progresses towards a comprehensive account of the somatic genetic basis of breast cancer. PMID:27135926

  9. Living laboratory: Whole-genome sequencing as a learning healthcare enterprise

    PubMed Central

    Angrist, M.; Jamal, L.

    2014-01-01

    With the proliferation of affordable large-scale human genomic data come profound and vexing questions about management of such data and their clinical uncertainty. These issues challenge the view that genomic research on human beings can (or should) be fully segregated from clinical genomics, either conceptually or practically. Here we argue that the historical sharp distinction between clinical care and research is especially problematic in the context of large-scale genomic sequencing of people with suspected genetic conditions. Core goals of both enterprises (e.g., understanding genotype-phenotype relationships; generating an evidence base for genomic medicine) are more likely to be realized at a population scale if both those ordering and those undergoing sequencing for clinical reasons are routinely and longitudinally studied. Rather than relying on expensive and lengthy randomized clinical trials and meta-analyses, we propose leveraging nascent clinical-research hybrid frameworks into a broader, more permanent instantiation of exploratory medical sequencing. Such an investment could enlighten stakeholders about the real-life challenges posed by whole-genome sequencing, e.g., establishing the clinical actionability of genetic variants, returning “off-target” results to families, developing effective service delivery models and monitoring long-term outcomes. PMID:25045831

  10. Whole-genome sequencing identifies genetic alterations in pediatric low-grade gliomas.

    PubMed

    Zhang, Jinghui; Wu, Gang; Miller, Claudia P; Tatevossian, Ruth G; Dalton, James D; Tang, Bo; Orisme, Wilda; Punchihewa, Chandanamali; Parker, Matthew; Qaddoumi, Ibrahim; Boop, Fredrick A; Lu, Charles; Kandoth, Cyriac; Ding, Li; Lee, Ryan; Huether, Robert; Chen, Xiang; Hedlund, Erin; Nagahawatte, Panduka; Rusch, Michael; Boggs, Kristy; Cheng, Jinjun; Becksfort, Jared; Ma, Jing; Song, Guangchun; Li, Yongjin; Wei, Lei; Wang, Jianmin; Shurtleff, Sheila; Easton, John; Zhao, David; Fulton, Robert S; Fulton, Lucinda L; Dooling, David J; Vadodaria, Bhavin; Mulder, Heather L; Tang, Chunlao; Ochoa, Kerri; Mullighan, Charles G; Gajjar, Amar; Kriwacki, Richard; Sheer, Denise; Gilbertson, Richard J; Mardis, Elaine R; Wilson, Richard K; Downing, James R; Baker, Suzanne J; Ellison, David W

    2013-06-01

    The most common pediatric brain tumors are low-grade gliomas (LGGs). We used whole-genome sequencing to identify multiple new genetic alterations involving BRAF, RAF1, FGFR1, MYB, MYBL1 and genes with histone-related functions, including H3F3A and ATRX, in 39 LGGs and low-grade glioneuronal tumors (LGGNTs). Only a single non-silent somatic alteration was detected in 24 of 39 (62%) tumors. Intragenic duplications of the portion of FGFR1 encoding the tyrosine kinase domain (TKD) and rearrangements of MYB were recurrent and mutually exclusive in 53% of grade II diffuse LGGs. Transplantation of Trp53-null neonatal astrocytes expressing FGFR1 with the duplication involving the TKD into the brains of nude mice generated high-grade astrocytomas with short latency and 100% penetrance. FGFR1 with the duplication induced FGFR1 autophosphorylation and upregulation of the MAPK/ERK and PI3K pathways, which could be blocked by specific inhibitors. Focusing on the therapeutically challenging diffuse LGGs, our study of 151 tumors has discovered genetic alterations and potential therapeutic targets across the entire range of pediatric LGGs and LGGNTs. PMID:23583981

  11. Wide-cross whole-genome radiation hybrid mapping of cotton (Gossypium hirsutum L.).

    PubMed Central

    Gao, Wenxiang; Chen, Z Jeffrey; Yu, John Z; Raska, Dwaine; Kohel, Russell J; Womack, James E; Stelly, David M

    2004-01-01

    We report the development and characterization of a "wide-cross whole-genome radiation hybrid" (WWRH) panel from cotton (Gossypium hirsutum L.). Chromosomes were segmented by gamma-irradiation of G. hirsutum (n = 26) pollen, and segmented chromosomes were rescued after in vivo fertilization of G. barbadense egg cells (n = 26). A 5-krad gamma-ray WWRH mapping panel (N = 93) was constructed and genotyped at 102 SSR loci. SSR marker retention frequencies were higher than those for animal systems and marker retention patterns were informative. Using the program RHMAP, 52 of 102 SSR markers were mapped into 16 syntenic groups. Linkage group 9 (LG 9) SSR markers BNL0625 and BNL2805 had been colocalized by linkage analysis, but their order was resolved by differential retention among WWRH plants. Two linkage groups, LG 13 and LG 9, were combined into one syntenic group, and the chromosome 1 linkage group marker BNL4053 was reassigned to chromosome 9. Analyses of cytogenetic stocks supported synteny of LG 9 and LG 13 and localized them to the short arm of chromosome 17. They also supported reassignment of marker BNL4053 to the long arm of chromosome 9. A WWRH map of the syntenic group composed of linkage groups 9 and 13 was constructed by maximum-likelihood analysis under the general retention model. The results demonstrate not only the feasibility of WWRH panel construction and mapping, but also complementarity to traditional linkage mapping and cytogenetic methods. PMID:15280245

  12. Sequence variants from whole genome sequencing a large group of Icelanders.

    PubMed

    Gudbjartsson, Daniel F; Sulem, Patrick; Helgason, Hannes; Gylfason, Arnaldur; Gudjonsson, Sigurjon A; Zink, Florian; Oddson, Asmundur; Magnusson, Gisli; Halldorsson, Bjarni V; Hjartarson, Eirikur; Sigurdsson, Gunnar Th; Kong, Augustine; Helgason, Agnar; Masson, Gisli; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

    2015-01-01

    We have accumulated considerable data on the genetic makeup of the Icelandic population by sequencing the whole genomes of 2,636 Icelanders to depth of at least 10X and by chip genotyping 101,584 more. The sequencing was done with Illumina technology. The median sequencing depth was 20X and 909 individuals were sequenced to a depth of at least 30X. We found 20 million single nucleotide polymorphisms (SNPs) and 1.5 million insertions/deletions (indels) that passed stringent quality control. Almost all the common SNPs (derived allele frequency (DAF) over 2%) that we identified in Iceland have been observed by either dbSNP (build 137) or the Exome Sequencing Project (ESP) while only 60 and 20% of rare (DAF<0.5%) SNPs and indels in coding regions, the most heavily studied parts of the genome, have been observed in the public databases. Features of our variant data, such as the transition/transversion ratio and the length distribution of indels, are similar to published reports. PMID:25977816

  13. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads

    PubMed Central

    2012-01-01

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported. PMID:23216677

  14. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.

    PubMed

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P

    2015-08-18

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151

  15. Digital droplet multiple displacement amplification (ddMDA) for whole genome sequencing of limited DNA samples

    DOE PAGESBeta

    Rhee, Minsoung; Light, Yooli K.; Meagher, Robert J.; Singh, Anup K.; Kumar-Sinha, Chandan

    2016-05-04

    Here, multiple displacement amplification (MDA) is a widely used technique for amplification of DNA from samples containing limited amounts of DNA (e.g., uncultivable microbes or clinical samples) before whole genome sequencing. Despite its advantages of high yield and fidelity, it suffers from high amplification bias and non-specific amplification when amplifying sub-nanogram of template DNA. Here, we present a microfluidic digital droplet MDA (ddMDA) technique where partitioning of the template DNA into thousands of sub-nanoliter droplets, each containing a small number of DNA fragments, greatly reduces the competition among DNA fragments for primers and polymerase thereby greatly reducing amplification bias. Consequently,more » the ddMDA approach enabled a more uniform coverage of amplification over the entire length of the genome, with significantly lower bias and non-specific amplification than conventional MDA. For a sample containing 0.1 pg/μL of E. coli DNA (equivalent of ~3/1000 of an E. coli genome per droplet), ddMDA achieves a 65-fold increase in coverage in de novo assembly, and more than 20-fold increase in specificity (percentage of reads mapping to E. coli) compared to the conventional tube MDA. ddMDA offers a powerful method useful for many applications including medical diagnostics, forensics, and environmental microbiology.« less

  16. Protein Microarrays

    NASA Astrophysics Data System (ADS)

    Ricard-Blum, S.

    Proteins are key actors in the life of the cell, involved in many physiological and pathological processes. Since variations in the expression of messenger RNA are not systematically correlated with variations in the protein levels, the latter better reflect the way a cell functions. Protein microarrays thus supply complementary information to DNA chips. They are used in particular to analyse protein expression profiles, to detect proteins within complex biological media, and to study protein-protein interactions, which give information about the functions of those proteins [3-9]. They have the same advantages as DNA microarrays for high-throughput analysis, miniaturisation, and the possibility of automation. Section 18.1 gives a brief overview of proteins. Following this, Sect. 18.2 describes how protein microarrays can be made on flat supports, explaining how proteins can be produced and immobilised on a solid support, and discussing the different kinds of substrate and detection method. Section 18.3 discusses the particular format of protein microarrays in suspension. The diversity of protein microarrays and their applications are then reported in Sect. 18.4, with applications to therapeutics (protein-drug interactions) and diagnostics. The prospects for future developments of protein microarrays are then outlined in the conclusion. The bibliography provides an extensive list of reviews and detailed references for those readers who wish to go further in this area. Indeed, the aim of the present chapter is not to give an exhaustive or detailed analysis of the state of the art, but rather to provide the reader with the basic elements needed to understand how proteins are designed and used.

  17. Multi-Platform Satellite Based Estimates of Runoff in Ungauged Areas

    NASA Astrophysics Data System (ADS)

    Seo, J. Y.; Lee, S.-I.

    2015-10-01

    Over the past decades, extreme weather events such as floods and droughts have been on a steady increase. Especially, ungauged or hard-to-reach areas turn out to be the most affected areas by the unexpected water-related disasters. It is usually due to insufficient observation data, and deterioration of infra-structures as well as inadequate water management system. For such reasons, reliable estimation of runoff is important for the planning and the implementation of water projects in ungauged areas. North Korea, whose terrain is mostly hilly and mountainous, has become vulnerable to floods and droughts due to poor watershed management based on unreliable hydrological information along with rapid deforestation. Runoff estimation using data from multi-platform satellites having broad spatio-temporal coverage could be of a valuable substitute for ground-observed measurements. In this study, monthly runoff in North Korea (38°N - 43°N, 124°E - 131°E) was estimated by combining space-borne data from multi-platform satellites with ground observations. Period of analysis is from January 2003 to December 2013. Data sets used for this study are as in the following: {1} Terrestrial Water Storage Anomaly (TWSA) from Gravity Recovery and Climate Experiment (GRACE), (2) Evapotranspiration from Moderate Resolution Imaging Spectroradiometer (MODIS), (3) Satellite-observed precipitation from Tropical Rainfall Measurement Mission (TRMM), and (4) Ground-observed precipitation from World Meterological Organization (WMO) (see Figure 1 and Table 1). These components are balanced with the terrestrial water storage change, and runoff can be estimated from eq. (1).

  18. Multi-platform metabolomics assays for human lung lavage fluids in an air pollution exposure study.

    PubMed

    Surowiec, Izabella; Karimpour, Masoumeh; Gouveia-Figueira, Sandra; Wu, Junfang; Unosson, Jon; Bosson, Jenny A; Blomberg, Anders; Pourazar, Jamshid; Sandström, Thomas; Behndig, Annelie F; Trygg, Johan; Nording, Malin L

    2016-07-01

    Metabolomics protocols are used to comprehensively characterize the metabolite content of biological samples by exploiting cutting-edge analytical platforms, such as gas chromatography (GC) or liquid chromatography (LC) coupled to mass spectrometry (MS) assays, as well as nuclear magnetic resonance (NMR) assays. We have developed novel sample preparation procedures combined with GC-MS, LC-MS, and NMR metabolomics profiling for analyzing bronchial wash (BW) and bronchoalveolar lavage (BAL) fluid from 15 healthy volunteers following exposure to biodiesel exhaust and filtered air. Our aim was to investigate the responsiveness of metabolite profiles in the human lung to air pollution exposure derived from combustion of biofuels, such as rapeseed methyl ester biodiesel, which are increasingly being promoted as alternatives to conventional fossil fuels. Our multi-platform approach enabled us to detect the greatest number of unique metabolites yet reported in BW and BAL fluid (82 in total). All of the metabolomics assays indicated that the metabolite profiles of the BW and BAL fluids differed appreciably, with 46 metabolites showing significantly different levels in the corresponding lung compartments. Furthermore, the GC-MS assay revealed an effect of biodiesel exhaust exposure on the levels of 1-monostearylglycerol, sucrose, inosine, nonanoic acid, and ethanolamine (in BAL) and pentadecanoic acid (in BW), whereas the LC-MS assay indicated a shift in the levels of niacinamide (in BAL). The NMR assay only identified lactic acid (in BW) as being responsive to biodiesel exhaust exposure. Our findings demonstrate that the proposed multi-platform approach is useful for wide metabolomics screening of BW and BAL fluids and can facilitate elucidation of metabolites responsive to biodiesel exhaust exposure. Graphical Abstract Graphical abstract illustrating the study workflow. NMR Nuclear Magnetic Resonance, LC-TOFMS Liquid chromatography-Time Of Flight Mass Spectrometry, GC Gas

  19. Whole-genome sequencing reveals complex mechanisms of intrinsic resistance to BRAF inhibition

    PubMed Central

    Turajlic, S.; Furney, S. J.; Stamp, G.; Rana, S.; Ricken, G.; Oduko, Y.; Saturno, G.; Springer, C.; Hayes, A.; Gore, M.; Larkin, J.; Marais, R.

    2014-01-01

    Background BRAF is mutated in ∼42% of human melanomas (COSMIC. http://www.sanger.ac.uk/genetics/CGP/cosmic/) and pharmacological BRAF inhibitors such as vemurafenib and dabrafenib achieve dramatic responses in patients whose tumours harbour BRAFV600 mutations. Objective responses occur in ∼50% of patients and disease stabilisation in a further ∼30%, but ∼20% of patients present primary or innate resistance and do not respond. Here, we investigated the underlying cause of treatment failure in a patient with BRAF mutant melanoma who presented primary resistance. Methods We carried out whole-genome sequencing and single nucleotide polymorphism (SNP) array analysis of five metastatic tumours from the patient. We validated mechanisms of resistance in a cell line derived from the patient's tumour. Results We observed that the majority of the single-nucleotide variants identified were shared across all tumour sites, but also saw site-specific copy-number alterations in discrete cell populations at different sites. We found that two ubiquitous mutations mediated resistance to BRAF inhibition in these tumours. A mutation in GNAQ sustained mitogen-activated protein kinase (MAPK) signalling, whereas a mutation in PTEN activated the PI3 K/AKT pathway. Inhibition of both pathways synergised to block the growth of the cells. Conclusions Our analyses show that the five metastases arose from a common progenitor and acquired additional alterations after disease dissemination. We demonstrate that a distinct combination of mutations mediated primary resistance to BRAF inhibition in this patient. These mutations were present in all five tumours and in a tumour sample taken before BRAF inhibitor treatment was administered. Inhibition of both pathways was required to block tumour cell growth, suggesting that combined targeting of these pathways could have been a valid therapeutic approach for this patient. PMID:24504448

  20. Whole Genome Sequencing Increases Molecular Diagnostic Yield Compared with Current Diagnostic Testing for Inherited Retinal Disease

    PubMed Central

    Ellingford, Jamie M.; Barton, Stephanie; Bhaskar, Sanjeev; Williams, Simon G.; Sergouniotis, Panagiotis I.; O'Sullivan, James; Lamb, Janine A.; Perveen, Rahat; Hall, Georgina; Newman, William G.; Bishop, Paul N.; Roberts, Stephen A.; Leach, Rick; Tearle, Rick; Bayliss, Stuart; Ramsden, Simon C.; Nemeth, Andrea H.; Black, Graeme C.M.

    2016-01-01

    Purpose To compare the efficacy of whole genome sequencing (WGS) with targeted next-generation sequencing (NGS) in the diagnosis of inherited retinal disease (IRD). Design Case series. Participants A total of 562 patients diagnosed with IRD. Methods We performed a direct comparative analysis of current molecular diagnostics with WGS. We retrospectively reviewed the findings from a diagnostic NGS DNA test for 562 patients with IRD. A subset of 46 of 562 patients (encompassing potential clinical outcomes of diagnostic analysis) also underwent WGS, and we compared mutation detection rates and molecular diagnostic yields. In addition, we compared the sensitivity and specificity of the 2 techniques to identify known single nucleotide variants (SNVs) using 6 control samples with publically available genotype data. Main Outcome Measures Diagnostic yield of genomic testing. Results Across known disease-causing genes, targeted NGS and WGS achieved similar levels of sensitivity and specificity for SNV detection. However, WGS also identified 14 clinically relevant genetic variants through WGS that had not been identified by NGS diagnostic testing for the 46 individuals with IRD. These variants included large deletions and variants in noncoding regions of the genome. Identification of these variants confirmed a molecular diagnosis of IRD for 11 of the 33 individuals referred for WGS who had not obtained a molecular diagnosis through targeted NGS testing. Weighted estimates, accounting for population structure, suggest that WGS methods could result in an overall 29% (95% confidence interval, 15–45) uplift in diagnostic yield. Conclusions We show that WGS methods can detect disease-causing genetic variants missed by current NGS diagnostic methodologies for IRD and thereby demonstrate the clinical utility and additional value of WGS. PMID:26872967

  1. Whole Genome Duplications Shaped the Receptor Tyrosine Kinase Repertoire of Jawed Vertebrates

    PubMed Central

    Brunet, Frédéric G.; Volff, Jean-Nicolas; Schartl, Manfred

    2016-01-01

    The receptor tyrosine kinase (RTK) gene family, involved primarily in cell growth and differentiation, comprises proteins with a common enzymatic tyrosine kinase intracellular domain adjacent to a transmembrane region. The amino-terminal portion of RTKs is extracellular and made of different domains, the combination of which characterizes each of the 20 RTK subfamilies among mammals. We analyzed a total of 7,376 RTK sequences among 143 vertebrate species to provide here the first comprehensive census of the jawed vertebrate repertoire. We ascertained the 58 genes previously described in the human and mouse genomes and established their phylogenetic relationships. We also identified five additional RTKs amounting to a total of 63 genes in jawed vertebrates. We found that the vertebrate RTK gene family has been shaped by the two successive rounds of whole genome duplications (WGD) called 1R and 2R (1R/2R) that occurred at the base of the vertebrates. In addition, the Vegfr and Ephrin receptor subfamilies were expanded by single gene duplications. In teleost fish, 23 additional RTK genes have been retained after another expansion through the fish-specific third round (3R) of WGD. Several lineage-specific gene losses were observed. For instance, birds have lost three RTKs, and different genes are missing in several fish sublineages. The RTK gene family presents an unusual high gene retention rate from the vertebrate WGDs (58.75% after 1R/2R, 64.4% after 3R), resulting in an expansion that might be correlated with the evolution of complexity of vertebrate cellular communication and intracellular signaling. PMID:27260203

  2. Utility of Whole-Genome Sequencing in Characterizing Acinetobacter Epidemiology and Analyzing Hospital Outbreaks

    PubMed Central

    Fitzpatrick, Margaret A.; Hauser, Alan R.

    2015-01-01

    Acinetobacter baumannii frequently causes nosocomial infections and outbreaks. Whole-genome sequencing (WGS) is a promising technique for strain typing and outbreak investigations. We compared the performance of conventional methods with WGS for strain typing clinical Acinetobacter isolates and analyzing a carbapenem-resistant A. baumannii (CRAB) outbreak. We performed two band-based typing techniques (pulsed-field gel electrophoresis and repetitive extragenic palindromic-PCR), multilocus sequence type (MLST) analysis, and WGS on 148 Acinetobacter calcoaceticus-A. baumannii complex bloodstream isolates collected from a single hospital from 2005 to 2012. Phylogenetic trees inferred from core-genome single nucleotide polymorphisms (SNPs) confirmed three Acinetobacter species within this collection. Four major A. baumannii clonal lineages (as defined by MLST) circulated during the study, three of which are globally distributed and one of which is novel. WGS indicated that a threshold of 2,500 core SNPs accurately distinguished A. baumannii isolates from different clonal lineages. The band-based techniques performed poorly in assigning isolates to clonal lineages and exhibited little agreement with sequence-based techniques. After applying WGS to a CRAB outbreak that occurred during the study, we identified a threshold of 2.5 core SNPs that distinguished nonoutbreak from outbreak strains. WGS was more discriminatory than the band-based techniques and was used to construct a more accurate transmission map that resolved many of the plausible transmission routes suggested by epidemiologic links. Our study demonstrates that WGS is superior to conventional techniques for A. baumannii strain typing and outbreak analysis. These findings support the incorporation of WGS into health care infection prevention efforts. PMID:26699703

  3. Economic evidence on identifying clinically actionable findings with whole-genome sequencing: a scoping review.

    PubMed

    Douglas, Michael P; Ladabaum, Uri; Pletcher, Mark J; Marshall, Deborah A; Phillips, Kathryn A

    2016-02-01

    The American College of Medical Genetics and Genomics (ACMG) recommends that mutations in 56 genes for 24 conditions are clinically actionable and should be reported as secondary findings after whole-genome sequencing (WGS). Our aim was to identify published economic evaluations of detecting mutations in these genes among the general population or among targeted/high-risk populations and conditions and identify gaps in knowledge. A targeted PubMed search from 1994 through November 2014 was performed, and we included original, English-language articles reporting cost-effectiveness or a cost-to-utility ratio or net benefits/benefit-cost focused on screening (not treatment) for conditions and genes listed by the ACMG. Articles were screened, classified as targeting a high-risk or general population, and abstracted by two reviewers. General population studies were evaluated for actual cost-effectiveness measures (e.g., incremental cost-effectiveness ratios (ICER)), whereas studies of targeted populations were evaluated for whether at least one scenario proposed was cost-effective (e.g., ICER of ≤$100,000 per life-year or quality-adjusted life-year gained). A total of 607 studies were identified, and 32 relevant studies were included. Identified studies addressed fewer than one-third (7 of 24; 29%) of the ACMG conditions. The cost-effectiveness of screening in the general population was examined for only 2 of 24 conditions (8%). The cost-effectiveness of most genetic findings that the ACMG recommends for return has not been evaluated in economic studies or in the context of screening in the general population. The individual studies do not directly address the cost-effectiveness of WGS. PMID:25996638

  4. Resolving Evolutionary Relationships in Closely Related Species with Whole-Genome Sequencing Data

    PubMed Central

    Nater, Alexander; Burri, Reto; Kawakami, Takeshi; Smeds, Linnéa; Ellegren, Hans

    2015-01-01

    Using genetic data to resolve the evolutionary relationships of species is of major interest in evolutionary and systematic biology. However, reconstructing the sequence of speciation events, the so-called species tree, in closely related and potentially hybridizing species is very challenging. Processes such as incomplete lineage sorting and interspecific gene flow result in local gene genealogies that differ in their topology from the species tree, and analyses of few loci with a single sequence per species are likely to produce conflicting or even misleading results. To study these phenomena on a full phylogenomic scale, we use whole-genome sequence data from 200 individuals of four black-and-white flycatcher species with so far unresolved phylogenetic relationships to infer gene tree topologies and visualize genome-wide patterns of gene tree incongruence. Using phylogenetic analysis in nonoverlapping 10-kb windows, we show that gene tree topologies are extremely diverse and change on a very small physical scale. Moreover, we find strong evidence for gene flow among flycatcher species, with distinct patterns of reduced introgression on the Z chromosome. To resolve species relationships on the background of widespread gene tree incongruence, we used four complementary coalescent-based methods for species tree reconstruction, including complex modeling approaches that incorporate post-divergence gene flow among species. This allowed us to infer the most likely species tree with high confidence. Based on this finding, we show that regions of reduced effective population size, which have been suggested as particularly useful for species tree inference, can produce positively misleading species tree topologies. Our findings disclose the pitfalls of using loci potentially under selection as phylogenetic markers and highlight the potential of modeling approaches to disentangle species relationships in systems with large effective population sizes and post

  5. Clinical application of whole-genome sequencing to inform treatment for multidrug-resistant tuberculosis cases.

    PubMed

    Witney, Adam A; Gould, Katherine A; Arnold, Amber; Coleman, David; Delgado, Rachel; Dhillon, Jasvir; Pond, Marcus J; Pope, Cassie F; Planche, Tim D; Stoker, Neil G; Cosgrove, Catherine A; Butcher, Philip D; Harrison, Thomas S; Hinds, Jason

    2015-05-01

    The treatment of drug-resistant tuberculosis cases is challenging, as drug options are limited, and the existing diagnostics are inadequate. Whole-genome sequencing (WGS) has been used in a clinical setting to investigate six cases of suspected extensively drug-resistant Mycobacterium tuberculosis (XDR-TB) encountered at a London teaching hospital between 2008 and 2014. Sixteen isolates from six suspected XDR-TB cases were sequenced; five cases were analyzed in a clinically relevant time frame, with one case sequenced retrospectively. WGS identified mutations in the M. tuberculosis genes associated with antibiotic resistance that are likely to be responsible for the phenotypic resistance. Thus, an evidence base was developed to inform the clinical decisions made around antibiotic treatment over prolonged periods. All strains in this study belonged to the East Asian (Beijing) lineage, and the strain relatedness was consistent with the expectations from the case histories, confirming one contact transmission event. We demonstrate that WGS data can be produced in a clinically relevant time scale some weeks before drug sensitivity testing (DST) data are available, and they actively help clinical decision-making through the assessment of whether an isolate (i) has a particular resistance mutation where there are absent or contradictory DST results, (ii) has no further resistance markers and therefore is unlikely to be XDR, or (iii) is identical to an isolate of known resistance (i.e., a likely transmission event). A small number of discrepancies between the genotypic predictions and phenotypic DST results are discussed in the wider context of the interpretation and reporting of WGS results. PMID:25673793

  6. Temporal Dynamics of Avian Populations during Pleistocene Revealed by Whole-Genome Sequences.

    PubMed

    Nadachowska-Brzyska, Krystyna; Li, Cai; Smeds, Linnea; Zhang, Guojie; Ellegren, Hans

    2015-05-18

    Global climate fluctuations have significantly influenced the distribution and abundance of biodiversity. During unfavorable glacial periods, many species experienced range contraction and fragmentation, expanding again during interglacials. An understanding of the evolutionary consequences of both historical and ongoing climate changes requires knowledge of the temporal dynamics of population numbers during such climate cycles. Variation in abundance should have left clear signatures in the patterns of intraspecific genetic variation in extant species, from which historical effective population sizes (N(e)) can be estimated. We analyzed whole-genome sequences of 38 avian species in a pairwise sequentially Markovian coalescent (PSMC, [5]) framework to quantitatively reveal changes in N(e) from approximately 10 million to 10 thousand years ago. Significant fluctuations in N(e) over time were evident for most species. The most pronounced pattern observed in many species was a severe reduction in N(e) coinciding with the beginning of the last glacial period (LGP). Among species, N(e) varied by at least three orders of magnitude, exceeding 1 million in the most abundant species. Several species on the IUCN Red List of Threatened Species showed long-term reduction in population size, predating recent declines. We conclude that cycles of population expansions and contractions have been a common feature of many bird species during the Quaternary period, likely coinciding with climate cycles. Population size reduction should have increased the risk of extinction but may also have promoted speciation. Species that have experienced long-term declines may be especially vulnerable to recent anthropogenic threats. PMID:25891404

  7. Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication

    PubMed Central

    2013-01-01

    Background The common ancestor of salmonid fishes, including rainbow trout (Oncorhynchus mykiss), experienced a whole genome duplication between 20 and 100 million years ago, and many of the duplicated genes have been retained in the trout genome. This retention complicates efforts to detect allelic variation in salmonid fishes. Specifically, single nucleotide polymorphism (SNP) detection is problematic because nucleotide variation can be found between the duplicate copies (paralogs) of a gene as well as between alleles. Results We present a method of differentiating between allelic and paralogous (gene copy) sequence variants, allowing identification of SNPs in organisms with multiple copies of a gene or set of genes. The basic strategy is to: 1) identify windows of unique cDNA sequences with homology to each other, 2) compare these unique cDNAs if they are not shared between individuals (i.e. the cDNA is homozygous in one individual and homozygous for another cDNA in the other individual), and 3) give a “SNP score” value between zero and one to each candidate sequence variant based on six criteria. Using this strategy we were able to detect about seven thousand potential SNPs from the transcriptomes of several clonal lines of rainbow trout. When directly compared to a pre-validated set of SNPs in polyploid wheat, we were also able to estimate the false-positive rate of this strategy as 0 to 28% depending on parameters used. Conclusions This strategy has an advantage over traditional techniques of SNP identification because another dimension of sequencing information is utilized. This method is especially well suited for identifying SNPs in polyploids, both outbred and inbred, but would tend to be conservative for diploid organisms. PMID:24237905

  8. Prediction of Expected Years of Life Using Whole-Genome Markers

    PubMed Central

    de los Campos, Gustavo; Klimentidis, Yann C.; Vazquez, Ana I.; Allison, David B.

    2012-01-01

    Genetic factors are believed to account for 25% of the interindividual differences in Years of Life (YL) among humans. However, the genetic loci that have thus far been found to be associated with YL explain a very small proportion of the expected genetic variation in this trait, perhaps reflecting the complexity of the trait and the limitations of traditional association studies when applied to traits affected by a large number of small-effect genes. Using data from the Framingham Heart Study and statistical methods borrowed largely from the field of animal genetics (whole-genome prediction, WGP), we developed a WGP model for the study of YL and evaluated the extent to which thousands of genetic variants across the genome examined simultaneously can be used to predict interindividual differences in YL. We find that a sizable proportion of differences in YL—which were unexplained by age at entry, sex, smoking and BMI—can be accounted for and predicted using WGP methods. The contribution of genomic information to prediction accuracy was even higher than that of smoking and body mass index (BMI) combined; two predictors that are considered among the most important life-shortening factors. We evaluated the impacts of familial relationships and population structure (as described by the first two marker-derived principal components) and concluded that in our dataset population structure explained partially, but not fully the gains in prediction accuracy obtained with WGP. Further inspection of prediction accuracies by age at death indicated that most of the gains in predictive ability achieved with WGP were due to the increased accuracy of prediction of early mortality, perhaps reflecting the ability of WGP to capture differences in genetic risk to deadly diseases such as cancer, which are most often responsible for early mortality in our sample. PMID:22848416

  9. Detecting and locating whole genome duplications on a phylogeny: a probabilistic approach.

    PubMed

    Rabier, Charles-Elie; Ta, Tram; Ané, Cécile

    2014-03-01

    Whole genome duplications (WGDs) followed by massive gene loss occurred in the evolutionary history of many groups. WGDs are usually inferred from the age distribution of paralogs (Ks-based methods) or from gene collinearity data (synteny). However, Ks-based methods are restricted to detect the recent WGDs due to saturation effects and the difficulty to date old duplicates, and synteny is difficult to reconstruct for distantly related species. Recently, Jiao et al. (Jiao Y, Wickett N, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al. 2011. Ancestral polyploidy in seed plants and angiosperms. Nature 473:97-100) introduced an empirical method that aims to detect a peak in duplication ages among nodes selected from a previous phylogenetic analysis. In this context, we present here two rigorous methods based on data from multiple gene families and on a new probabilistic model. Our model assumes that all gene lineages are instantaneously duplicated at the WGD event with a possible almost-immediate loss of some extra copies. Our reconciliation method relies on aligned molecular sequences, whereas our gene count method relies only on gene count data across species. We show, using extensive simulations, that both methods have a good detection power. Surprisingly, the gene count method enjoys no loss of power compared with the reconciliation method, despite the fact that sequence information is not used. We finally illustrate the performance of our methods on a benchmark yeast data set. Both methods are able to detect the well-known WGD in the Saccharomyces cerevisiae clade and agree on a small retention rate at the WGD, as established by synteny-based methods. PMID:24361993

  10. Effective preparation of Plasmodium vivax field isolates for high-throughput whole genome sequencing.

    PubMed

    Auburn, Sarah; Marfurt, Jutta; Maslen, Gareth; Campino, Susana; Ruano Rubio, Valentin; Manske, Magnus; Machunter, Barbara; Kenangalem, Enny; Noviyanti, Rintis; Trianty, Leily; Sebayang, Boni; Wirjanata, Grennady; Sriprawat, Kanlaya; Alcock, Daniel; Macinnis, Bronwyn; Miotto, Olivo; Clark, Taane G; Russell, Bruce; Anstey, Nicholas M; Nosten, François; Kwiatkowski, Dominic P; Price, Ric N

    2013-01-01

    Whole genome sequencing (WGS) of Plasmodium vivax is problematic due to the reliance on clinical isolates which are generally low in parasitaemia and sample volume. Furthermore, clinical isolates contain a significant contaminating background of host DNA which confounds efforts to map short read sequence of the target P. vivax DNA. Here, we discuss a methodology to significantly improve the success of P. vivax WGS on natural (non-adapted) patient isolates. Using 37 patient isolates from Indonesia, Thailand, and travellers, we assessed the application of CF11-based white blood cell filtration alone and in combination with short term ex vivo schizont maturation. Although CF11 filtration reduced human DNA contamination in 8 Indonesian isolates tested, additional short-term culture increased the P. vivax DNA yield from a median of 0.15 to 6.2 ng µl(-1) packed red blood cells (pRBCs) (p = 0.001) and reduced the human DNA percentage from a median of 33.9% to 6.22% (p = 0.008). Furthermore, post-CF11 and culture samples from Thailand gave a median P. vivax DNA yield of 2.34 ng µl(-1) pRBCs, and 2.65% human DNA. In 22 P. vivax patient isolates prepared with the 2-step method, we demonstrate high depth (median 654X coverage) and breadth (≥89%) of coverage on the Illumina GAII and HiSeq platforms. In contrast to the A+T-rich P. falciparum genome, negligible bias was observed in coverage depth between coding and non-coding regions of the P. vivax genome. This uniform coverage will greatly facilitate the detection of SNPs and copy number variants across the genome, enabling unbiased exploration of the natural diversity in P. vivax populations. PMID:23308154

  11. Identification of copy number variants in whole-genome data using Reference Coverage Profiles.

    PubMed

    Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E; Stittrich, Anna B; Ament, Seth A; Roach, Jared C; Brunkow, Mary E; Bodian, Dale L; Vockley, Joseph G; Shmulevich, Ilya; Niederhuber, John E; Hood, Leroy

    2015-01-01

    The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150-1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1-100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365

  12. Whole Genome Duplications Shaped the Receptor Tyrosine Kinase Repertoire of Jawed Vertebrates.

    PubMed

    Brunet, Frédéric G; Volff, Jean-Nicolas; Schartl, Manfred

    2016-01-01

    The receptor tyrosine kinase (RTK) gene family, involved primarily in cell growth and differentiation, comprises proteins with a common enzymatic tyrosine kinase intracellular domain adjacent to a transmembrane region. The amino-terminal portion of RTKs is extracellular and made of different domains, the combination of which characterizes each of the 20 RTK subfamilies among mammals. We analyzed a total of 7,376 RTK sequences among 143 vertebrate species to provide here the first comprehensive census of the jawed vertebrate repertoire. We ascertained the 58 genes previously described in the human and mouse genomes and established their phylogenetic relationships. We also identified five additional RTKs amounting to a total of 63 genes in jawed vertebrates. We found that the vertebrate RTK gene family has been shaped by the two successive rounds of whole genome duplications (WGD) called 1R and 2R (1R/2R) that occurred at the base of the vertebrates. In addition, the Vegfr and Ephrin receptor subfamilies were expanded by single gene duplications. In teleost fish, 23 additional RTK genes have been retained after another expansion through the fish-specific third round (3R) of WGD. Several lineage-specific gene losses were observed. For instance, birds have lost three RTKs, and different genes are missing in several fish sublineages. The RTK gene family presents an unusual high gene retention rate from the vertebrate WGDs (58.75% after 1R/2R, 64.4% after 3R), resulting in an expansion that might be correlated with the evolution of complexity of vertebrate cellular communication and intracellular signaling. PMID:27260203

  13. Whole genome sequencing of enriched chloroplast DNA using the Illumina GAII platform

    PubMed Central

    2010-01-01

    Background Complete chloroplast genome sequences provide a valuable source of molecular markers for studies in molecular ecology and evolution of plants. To obtain complete genome sequences, recent studies have made use of the polymerase chain reaction to amplify overlapping fragments from conserved gene loci. However, this approach is time consuming and can be more difficult to implement where gene organisation differs among plants. An alternative approach is to first isolate chloroplasts and then use the capacity of high-throughput sequencing to obtain complete genome sequences. We report our findings from studies of the latter approach, which used a simple chloroplast isolation procedure, multiply-primed rolling circle amplification of chloroplast DNA, Illumina Genome Analyzer II sequencing, and de novo assembly of paired-end sequence reads. Results A modified rapid chloroplast isolation protocol was used to obtain plant DNA that was enriched for chloroplast DNA, but nevertheless contained nuclear and mitochondrial DNA. Multiply-primed rolling circle amplification of this mixed template produced sufficient quantities of chloroplast DNA, even when the amount of starting material was small, and improved the template quality for Illumina Genome Analyzer II (hereafter Illumina GAII) sequencing. We demonstrate, using independent samples of karaka (Corynocarpus laevigatus), that there is high fidelity in the sequence obtained from this template. Although less than 20% of our sequenced reads could be mapped to chloroplast genome, it was relatively easy to assemble complete chloroplast genome sequences from the mixture of nuclear, mitochondrial and chloroplast reads. Conclusions We report successful whole genome sequencing of chloroplast DNA from karaka, obtained efficiently and with high fidelity. PMID:20920211

  14. Whole-genome resequencing uncovers molecular signatures of natural and sexual selection in wild bighorn sheep.

    PubMed

    Kardos, Marty; Luikart, Gordon; Bunch, Rowan; Dewey, Sarah; Edwards, William; McWilliam, Sean; Stephenson, John; Allendorf, Fred W; Hogg, John T; Kijas, James

    2015-11-01

    The identification of genes influencing fitness is central to our understanding of the genetic basis of adaptation and how it shapes phenotypic variation in wild populations. Here, we used whole-genome resequencing of wild Rocky Mountain bighorn sheep (Ovis canadensis) to >50-fold coverage to identify 2.8 million single nucleotide polymorphisms (SNPs) and genomic regions bearing signatures of directional selection (i.e. selective sweeps). A comparison of SNP diversity between the X chromosome and the autosomes indicated that bighorn males had a dramatically reduced long-term effective population size compared to females. This probably reflects a long history of intense sexual selection mediated by male-male competition for mates. Selective sweep scans based on heterozygosity and nucleotide diversity revealed evidence for a selective sweep shared across multiple populations at RXFP2, a gene that strongly affects horn size in domestic ungulates. The massive horns carried by bighorn rams appear to have evolved in part via strong positive selection at RXFP2. We identified evidence for selection within individual populations at genes affecting early body growth and cellular response to hypoxia; however, these must be interpreted more cautiously as genetic drift is strong within local populations and may have caused false positives. These results represent a rare example of strong genomic signatures of selection identified at genes with known function in wild populations of a nonmodel species. Our results also showcase the value of reference genome assemblies from agricultural or model species for studies of the genomic basis of adaptation in closely related wild taxa. PMID:26454263

  15. Tracking a hospital outbreak of KPC-producing ST11 Klebsiella pneumoniae with whole genome sequencing.

    PubMed

    Jiang, Y; Wei, Z; Wang, Y; Hua, X; Feng, Y; Yu, Y

    2015-11-01

    An outbreak of carbapenem-resistant Klebsiella pneumoniae strains emerged at a hospital, and was tracked in order to understand the spread of these infectious pathogens. A total of 66 K. pneumoniae strains were collect