Sample records for large genomic differences

  1. CoCoNUT: an efficient system for the comparison and analysis of genomes

    PubMed Central

    2008-01-01

    Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477

  2. The Peculiar Landscape of Repetitive Sequences in the Olive (Olea europaea L.) Genome

    PubMed Central

    Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

    2014-01-01

    Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome. PMID:24671744

  3. The peculiar landscape of repetitive sequences in the olive (Olea europaea L.) genome.

    PubMed

    Barghini, Elena; Natali, Lucia; Cossu, Rosa Maria; Giordani, Tommaso; Pindo, Massimo; Cattonaro, Federica; Scalabrin, Simone; Velasco, Riccardo; Morgante, Michele; Cavallini, Andrea

    2014-04-01

    Analyzing genome structure in different species allows to gain an insight into the evolution of plant genome size. Olive (Olea europaea L.) has a medium-sized haploid genome of 1.4 Gb, whose structure is largely uncharacterized, despite the growing importance of this tree as oil crop. Next-generation sequencing technologies and different computational procedures have been used to study the composition of the olive genome and its repetitive fraction. A total of 2.03 and 2.3 genome equivalents of Illumina and 454 reads from genomic DNA, respectively, were assembled following different procedures, which produced more than 200,000 differently redundant contigs, with mean length higher than 1,000 nt. Mapping Illumina reads onto the assembled sequences was used to estimate their redundancy. The genome data set was subdivided into highly and medium redundant and nonredundant contigs. By combining identification and mapping of repeated sequences, it was established that tandem repeats represent a very large portion of the olive genome (∼31% of the whole genome), consisting of six main families of different length, two of which were first discovered in these experiments. The other large redundant class in the olive genome is represented by transposable elements (especially long terminal repeat-retrotransposons). On the whole, the results of our analyses show the peculiar landscape of the olive genome, related to the massive amplification of tandem repeats, more than that reported for any other sequenced plant genome.

  4. Genomic diversity of the human intestinal parasite Entamoeba histolytica

    PubMed Central

    2012-01-01

    Background Entamoeba histolytica is a significant cause of disease worldwide. However, little is known about the genetic diversity of the parasite. We re-sequenced the genomes of ten laboratory cultured lines of the eukaryotic pathogen Entamoeba histolytica in order to develop a picture of genetic diversity across the genome. Results The extreme nucleotide composition bias and repetitiveness of the E. histolytica genome provide a challenge for short-read mapping, yet we were able to define putative single nucleotide polymorphisms in a large portion of the genome. The results suggest a rather low level of single nucleotide diversity, although genes and gene families with putative roles in virulence are among the more polymorphic genes. We did observe large differences in coverage depth among genes, indicating differences in gene copy number between genomes. We found evidence indicating that recombination has occurred in the history of the sequenced genomes, suggesting that E. histolytica may reproduce sexually. Conclusions E. histolytica displays a relatively low level of nucleotide diversity across its genome. However, large differences in gene family content and gene copy number are seen among the sequenced genomes. The pattern of polymorphism indicates that E. histolytica reproduces sexually, or has done so in the past, which has previously been suggested but not proven. PMID:22630046

  5. Interactive Exploration on Large Genomic Datasets.

    PubMed

    Tu, Eric

    2016-01-01

    The prevalence of large genomics datasets has made the the need to explore this data more important. Large sequencing projects like the 1000 Genomes Project [1], which reconstructed the genomes of 2,504 individuals sampled from 26 populations, have produced over 200TB of publically available data. Meanwhile, existing genomic visualization tools have been unable to scale with the growing amount of larger, more complex data. This difficulty is acute when viewing large regions (over 1 megabase, or 1,000,000 bases of DNA), or when concurrently viewing multiple samples of data. While genomic processing pipelines have shifted towards using distributed computing techniques, such as with ADAM [4], genomic visualization tools have not. In this work we present Mango, a scalable genome browser built on top of ADAM that can run both locally and on a cluster. Mango presents a combination of different optimizations that can be combined in a single application to drive novel genomic visualization techniques over terabytes of genomic data. By building visualization on top of a distributed processing pipeline, we can perform visualization queries over large regions that are not possible with current tools, and decrease the time for viewing large data sets. Mango is part of the Big Data Genomics project at University of California-Berkeley [25] and is published under the Apache 2 license. Mango is available at https://github.com/bigdatagenomics/mango.

  6. Genome sequencing of ovine isolates of Mycobacterium avium subspecies paratuberculosis offers insights into host association

    PubMed Central

    2012-01-01

    Background The genome of Mycobacterium avium subspecies paratuberculosis (MAP) is remarkably homogeneous among the genomes of bovine, human and wildlife isolates. However, previous work in our laboratories with the bovine K-10 strain has revealed substantial differences compared to sheep isolates. To systematically characterize all genomic differences that may be associated with the specific hosts, we sequenced the genomes of three U.S. sheep isolates and also obtained an optical map. Results Our analysis of one of the isolates, MAP S397, revealed a genome 4.8 Mb in size with 4,700 open reading frames (ORFs). Comparative analysis of the MAP S397 isolate showed it acquired approximately 10 large sequence regions that are shared with the human M. avium subsp. hominissuis strain 104 and lost 2 large regions that are present in the bovine strain. In addition, optical mapping defined the presence of 7 large inversions between the bovine and ovine genomes (~ 2.36 Mb). Whole-genome sequencing of 2 additional sheep strains of MAP (JTC1074 and JTC7565) further confirmed genomic homogeneity of the sheep isolates despite the presence of polymorphisms on the nucleotide level. Conclusions Comparative sequence analysis employed here provided a better understanding of the host association, evolution of members of the M. avium complex and could help in deciphering the phenotypic differences observed among sheep and cattle strains of MAP. A similar approach based on whole-genome sequencing combined with optical mapping could be employed to examine closely related pathogens. We propose an evolutionary scenario for M. avium complex strains based on these genome sequences. PMID:22409516

  7. Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing | Office of Cancer Genomics

    Cancer.gov

    Abstract: Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer comprising at least two molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease.

  8. The patterns of genomic variances and covariances across genome for milk production traits between Chinese and Nordic Holstein populations.

    PubMed

    Li, Xiujin; Lund, Mogens Sandø; Janss, Luc; Wang, Chonglong; Ding, Xiangdong; Zhang, Qin; Su, Guosheng

    2017-03-15

    With the development of SNP chips, SNP information provides an efficient approach to further disentangle different patterns of genomic variances and covariances across the genome for traits of interest. Due to the interaction between genotype and environment as well as possible differences in genetic background, it is reasonable to treat the performances of a biological trait in different populations as different but genetic correlated traits. In the present study, we performed an investigation on the patterns of region-specific genomic variances, covariances and correlations between Chinese and Nordic Holstein populations for three milk production traits. Variances and covariances between Chinese and Nordic Holstein populations were estimated for genomic regions at three different levels of genome region (all SNP as one region, each chromosome as one region and every 100 SNP as one region) using a novel multi-trait random regression model which uses latent variables to model heterogeneous variance and covariance. In the scenario of the whole genome as one region, the genomic variances, covariances and correlations obtained from the new multi-trait Bayesian method were comparable to those obtained from a multi-trait GBLUP for all the three milk production traits. In the scenario of each chromosome as one region, BTA 14 and BTA 5 accounted for very large genomic variance, covariance and correlation for milk yield and fat yield, whereas no specific chromosome showed very large genomic variance, covariance and correlation for protein yield. In the scenario of every 100 SNP as one region, most regions explained <0.50% of genomic variance and covariance for milk yield and fat yield, and explained <0.30% for protein yield, while some regions could present large variance and covariance. Although overall correlations between two populations for the three traits were positive and high, a few regions still showed weakly positive or highly negative genomic correlations for milk yield and fat yield. The new multi-trait Bayesian method using latent variables to model heterogeneous variance and covariance could work well for estimating the genomic variances and covariances for all genome regions simultaneously. Those estimated genomic parameters could be useful to improve the genomic prediction accuracy for Chinese and Nordic Holstein populations using a joint reference data in the future.

  9. Bulky Trichomonad Genomes: Encoding a Swiss Army Knife.

    PubMed

    Barratt, Joel; Gough, Rory; Stark, Damien; Ellis, John

    2016-10-01

    The trichomonads are a remarkably successful lineage of ancient, predominantly parasitic protozoa. Recent molecular analyses have revealed extensive duplication of certain genetic loci in trichomonads. Consequently, their genomes are exceptionally large compared to other parasitic protozoa. Retention of these large gene expansions across different trichomonad families raises the question: do these duplications afford an advantage? Many duplicated genes are linked to the parasitic lifestyle and some are regulated differently to their paralogues, suggesting they have acquired new functions. It is proposed that these large genomes encode a Swiss army knife of sorts, packed with a multitude of tools for use in many different circumstances. This may have bestowed trichomonads with the extraordinary versatility that has undoubtedly contributed to their success. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

    Treesearch

    A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

    2000-01-01

    Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...

  11. Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.

    PubMed

    Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris

    2004-07-14

    With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.

  12. Extreme-Scale De Novo Genome Assembly

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Georganas, Evangelos; Hofmeyr, Steven; Egan, Rob

    De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, each of which stresses different components of a computer system. This chapter explains the computational challenges involved in each step of the HipMer pipeline, the key distributed data structures, and communication costs in detail. We present performance results of assembling the human genome and themore » large hexaploid wheat genome on large supercomputers up to tens of thousands of cores.« less

  13. In Depth Characterization of Repetitive DNA in 23 Plant Genomes Reveals Sources of Genome Size Variation in the Legume Tribe Fabeae.

    PubMed

    Macas, Jiří; Novák, Petr; Pellicer, Jaume; Čížková, Jana; Koblížková, Andrea; Neumann, Pavel; Fuková, Iva; Doležel, Jaroslav; Kelly, Laura J; Leitch, Ilia J

    2015-01-01

    The differential accumulation and elimination of repetitive DNA are key drivers of genome size variation in flowering plants, yet there have been few studies which have analysed how different types of repeats in related species contribute to genome size evolution within a phylogenetic context. This question is addressed here by conducting large-scale comparative analysis of repeats in 23 species from four genera of the monophyletic legume tribe Fabeae, representing a 7.6-fold variation in genome size. Phylogenetic analysis and genome size reconstruction revealed that this diversity arose from genome size expansions and contractions in different lineages during the evolution of Fabeae. Employing a combination of low-pass genome sequencing with novel bioinformatic approaches resulted in identification and quantification of repeats making up 55-83% of the investigated genomes. In turn, this enabled an analysis of how each major repeat type contributed to the genome size variation encountered. Differential accumulation of repetitive DNA was found to account for 85% of the genome size differences between the species, and most (57%) of this variation was found to be driven by a single lineage of Ty3/gypsy LTR-retrotransposons, the Ogre elements. Although the amounts of several other lineages of LTR-retrotransposons and the total amount of satellite DNA were also positively correlated with genome size, their contributions to genome size variation were much smaller (up to 6%). Repeat analysis within a phylogenetic framework also revealed profound differences in the extent of sequence conservation between different repeat types across Fabeae. In addition to these findings, the study has provided a proof of concept for the approach combining recent developments in sequencing and bioinformatics to perform comparative analyses of repetitive DNAs in a large number of non-model species without the need to assemble their genomes.

  14. Landscape genomics reveals altered genome wide diversity within revegetated stands of Eucalyptus microcarpa (Grey Box).

    PubMed

    Jordan, Rebecca; Dillon, Shannon K; Prober, Suzanne M; Hoffmann, Ary A

    2016-12-01

    In order to contribute to evolutionary resilience and adaptive potential in highly modified landscapes, revegetated areas should ideally reflect levels of genetic diversity within and across natural stands. Landscape genomic analyses enable such diversity patterns to be characterized at genome and chromosomal levels. Landscape-wide patterns of genomic diversity were assessed in Eucalyptus microcarpa, a dominant tree species widely used in revegetation in Southeastern Australia. Trees from small and large patches within large remnants, small isolated remnants and revegetation sites were assessed across the now highly fragmented distribution of this species using the DArTseq genomic approach. Genomic diversity was similar within all three types of remnant patches analysed, although often significantly but only slightly lower in revegetation sites compared with natural remnants. Differences in diversity between stand types varied across chromosomes. Genomic differentiation was higher between small, isolated remnants, and among revegetated sites compared with natural stands. We conclude that small remnants and revegetated sites of our E. microcarpa samples largely but not completely capture patterns in genomic diversity across the landscape. Genomic approaches provide a powerful tool for assessing restoration efforts across the landscape. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  15. Transposable element genomic fissuring in Pyrenophora teres is associated with genome expansion and dynamics of host-pathogen genetic interactions

    USDA-ARS?s Scientific Manuscript database

    Pyrenophora teres, P. teres f. teres (PTT) and P. teres f. maculata (PTM) cause significant diseases in barley, but little is known about the large-scale genomic differences that may distinguish the two forms. Comprehensive genome assemblies were constructed from long DNA reads, optical and genetic ...

  16. Kullback Leibler divergence in complete bacterial and phage genomes

    PubMed Central

    Akhter, Sajia; Kashef, Mona T.; Ibrahim, Eslam S.; Bailey, Barbara

    2017-01-01

    The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses. PMID:29204318

  17. Kullback Leibler divergence in complete bacterial and phage genomes.

    PubMed

    Akhter, Sajia; Aziz, Ramy K; Kashef, Mona T; Ibrahim, Eslam S; Bailey, Barbara; Edwards, Robert A

    2017-01-01

    The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback-Leibler divergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages; (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites; (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems; (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.

  18. Genome analysis and polar tube firing dynamics of mosquito-infecting microsporidia

    USDA-ARS?s Scientific Manuscript database

    Microsporidia are highly divergent fungi that are obligate intracellular pathogens of a wide range of host organisms. Here we review recent findings from the genome sequences of mosquito-infecting microsporidian species Edhazardia aedis and Vavraia culicis, which show large differences in genome siz...

  19. The structure and evolution of angiosperm nuclear genomes.

    PubMed

    Bennetzen, J L

    1998-04-01

    Despite several decades of investigation, the organization of angiosperm genomes remained largely unknown until very recently. Data describing the sequence composition of large segments of genomes, covering hundreds of kilobases of contiguous sequence, have only become available in the past two years. Recent results indicate commonalities in the characteristics of many plant genomes, including in the structure of chromosomal components like telomeres and centromeres, and in the order and content of genes. Major differences between angiosperms have been associated mainly with repetitive DNAs, both gene families and mobile elements. Intriguing new studies have begun to characterize the dynamic three-dimensional structures of chromosomes and chromatin, and the relationship between genome structure and co-ordinated gene function.

  20. Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model

    PubMed Central

    2010-01-01

    Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size. PMID:20565983

  1. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity

    PubMed Central

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F; Abbazia, Patrick; Ababio, Amma; Adam, Naazneen

    2015-01-01

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery. DOI: http://dx.doi.org/10.7554/eLife.06416.001 PMID:25919952

  2. Whole genome comparison of a large collection of mycobacteriophages reveals a continuum of phage genetic diversity.

    PubMed

    Pope, Welkin H; Bowman, Charles A; Russell, Daniel A; Jacobs-Sera, Deborah; Asai, David J; Cresawn, Steven G; Jacobs, William R; Hendrix, Roger W; Lawrence, Jeffrey G; Hatfull, Graham F

    2015-04-28

    The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.

  3. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    PubMed

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.

  4. Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

    USDA-ARS?s Scientific Manuscript database

    Copy number variants (CNV) are large scale duplications or deletions of genomic sequence that are caused by a diverse set of molecular phenomena that are distinct from single nucleotide polymorphism (SNP) formation. Due to their different mechanisms of formation, CNVs are often difficult to track us...

  5. A robust clustering algorithm for identifying problematic samples in genome-wide association studies.

    PubMed

    Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A

    2012-01-01

    High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.

  6. Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements

    PubMed Central

    Szitenberg, Amir; Cha, Soyeon; Opperman, Charles H.; Bird, David M.; Blaxter, Mark L.; Lunt, David H.

    2016-01-01

    Abstract Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host’s genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes. PMID:27566762

  7. Transposable Element Genomic Fissuring in Pyrenophora teres Is Associated With Genome Expansion and Dynamics of Host–Pathogen Genetic Interactions

    PubMed Central

    Syme, Robert A.; Martin, Anke; Wyatt, Nathan A.; Lawrence, Julie A.; Muria-Gonzalez, Mariano J.; Friesen, Timothy L.; Ellwood, Simon R.

    2018-01-01

    Pyrenophora teres, P. teres f. teres (PTT) and P. teres f. maculata (PTM) cause significant diseases in barley, but little is known about the large-scale genomic differences that may distinguish the two forms. Comprehensive genome assemblies were constructed from long DNA reads, optical and genetic maps. As repeat masking in fungal genomes influences the final gene annotations, an accurate and reproducible pipeline was developed to ensure comparability between isolates. The genomes of the two forms are highly collinear, each composed of 12 chromosomes. Genome evolution in P. teres is characterized by genome fissuring through the insertion and expansion of transposable elements (TEs), a process that isolates blocks of genic sequence. The phenomenon is particularly pronounced in PTT, which has a larger, more repetitive genome than PTM and more recent transposon activity measured by the frequency and size of genome fissures. PTT has a longer cultivated host association and, notably, a greater range of host–pathogen genetic interactions compared to other Pyrenophora spp., a property which associates better with genome size than pathogen lifestyle. The two forms possess similar complements of TE families with Tc1/Mariner and LINE-like Tad-1 elements more abundant in PTT. Tad-1 was only detectable as vestigial fragments in PTM and, within the forms, differences in genome sizes and the presence and absence of several TE families indicated recent lineage invasions. Gene differences between P. teres forms are mainly associated with gene-sparse regions near or within TE-rich regions, with many genes possessing characteristics of fungal effectors. Instances of gene interruption by transposons resulting in pseudogenization were detected in PTT. In addition, both forms have a large complement of secondary metabolite gene clusters indicating significant capacity to produce an array of different molecules. This study provides genomic resources for functional genetics to help dissect factors underlying the host–pathogen interactions. PMID:29720997

  8. Functional and topological characteristics of mammalian regulatory domains

    PubMed Central

    Symmons, Orsolya; Uslu, Veli Vural; Tsujimura, Taro; Ruf, Sandra; Nassari, Sonya; Schwarzer, Wibke; Ettwiller, Laurence; Spitz, François

    2014-01-01

    Long-range regulatory interactions play an important role in shaping gene-expression programs. However, the genomic features that organize these activities are still poorly characterized. We conducted a large operational analysis to chart the distribution of gene regulatory activities along the mouse genome, using hundreds of insertions of a regulatory sensor. We found that enhancers distribute their activities along broad regions and not in a gene-centric manner, defining large regulatory domains. Remarkably, these domains correlate strongly with the recently described TADs, which partition the genome into distinct self-interacting blocks. Different features, including specific repeats and CTCF-binding sites, correlate with the transition zones separating regulatory domains, and may help to further organize promiscuously distributed regulatory influences within large domains. These findings support a model of genomic organization where TADs confine regulatory activities to specific but large regulatory domains, contributing to the establishment of specific gene expression profiles. PMID:24398455

  9. Integrative analysis of the Caenorhabditis elegans genome by the modENCODE project.

    PubMed

    Gerstein, Mark B; Lu, Zhi John; Van Nostrand, Eric L; Cheng, Chao; Arshinoff, Bradley I; Liu, Tao; Yip, Kevin Y; Robilotto, Rebecca; Rechtsteiner, Andreas; Ikegami, Kohta; Alves, Pedro; Chateigner, Aurelien; Perry, Marc; Morris, Mitzi; Auerbach, Raymond K; Feng, Xin; Leng, Jing; Vielle, Anne; Niu, Wei; Rhrissorrakrai, Kahn; Agarwal, Ashish; Alexander, Roger P; Barber, Galt; Brdlik, Cathleen M; Brennan, Jennifer; Brouillet, Jeremy Jean; Carr, Adrian; Cheung, Ming-Sin; Clawson, Hiram; Contrino, Sergio; Dannenberg, Luke O; Dernburg, Abby F; Desai, Arshad; Dick, Lindsay; Dosé, Andréa C; Du, Jiang; Egelhofer, Thea; Ercan, Sevinc; Euskirchen, Ghia; Ewing, Brent; Feingold, Elise A; Gassmann, Reto; Good, Peter J; Green, Phil; Gullier, Francois; Gutwein, Michelle; Guyer, Mark S; Habegger, Lukas; Han, Ting; Henikoff, Jorja G; Henz, Stefan R; Hinrichs, Angie; Holster, Heather; Hyman, Tony; Iniguez, A Leo; Janette, Judith; Jensen, Morten; Kato, Masaomi; Kent, W James; Kephart, Ellen; Khivansara, Vishal; Khurana, Ekta; Kim, John K; Kolasinska-Zwierz, Paulina; Lai, Eric C; Latorre, Isabel; Leahey, Amber; Lewis, Suzanna; Lloyd, Paul; Lochovsky, Lucas; Lowdon, Rebecca F; Lubling, Yaniv; Lyne, Rachel; MacCoss, Michael; Mackowiak, Sebastian D; Mangone, Marco; McKay, Sheldon; Mecenas, Desirea; Merrihew, Gennifer; Miller, David M; Muroyama, Andrew; Murray, John I; Ooi, Siew-Loon; Pham, Hoang; Phippen, Taryn; Preston, Elicia A; Rajewsky, Nikolaus; Rätsch, Gunnar; Rosenbaum, Heidi; Rozowsky, Joel; Rutherford, Kim; Ruzanov, Peter; Sarov, Mihail; Sasidharan, Rajkumar; Sboner, Andrea; Scheid, Paul; Segal, Eran; Shin, Hyunjin; Shou, Chong; Slack, Frank J; Slightam, Cindie; Smith, Richard; Spencer, William C; Stinson, E O; Taing, Scott; Takasaki, Teruaki; Vafeados, Dionne; Voronina, Ksenia; Wang, Guilin; Washington, Nicole L; Whittle, Christina M; Wu, Beijing; Yan, Koon-Kiu; Zeller, Georg; Zha, Zheng; Zhong, Mei; Zhou, Xingliang; Ahringer, Julie; Strome, Susan; Gunsalus, Kristin C; Micklem, Gos; Liu, X Shirley; Reinke, Valerie; Kim, Stuart K; Hillier, LaDeana W; Henikoff, Steven; Piano, Fabio; Snyder, Michael; Stein, Lincoln; Lieb, Jason D; Waterston, Robert H

    2010-12-24

    We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor-binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor-binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.

  10. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species.

    PubMed

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-06-23

    The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the theory that the large IRs stabilize the cp genome. Furthermore, the deleted large IR and the numerous genomic rearrangements that have occurred in the C. japonica cp genome provide new insights into both the evolutionary lineage of coniferous species in gymnosperm and the evolution of the cp genome.

  11. Mosaic Graphs and Comparative Genomics in Phage Communities

    PubMed Central

    Belcaid, Mahdi; Bergeron, Anne

    2010-01-01

    Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413

  12. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen.

    PubMed

    Frantzeskakis, Lamprinos; Kracher, Barbara; Kusch, Stefan; Yoshikawa-Maekawa, Makoto; Bauer, Saskia; Pedersen, Carsten; Spanu, Pietro D; Maekawa, Takaki; Schulze-Lefert, Paul; Panstruga, Ralph

    2018-05-22

    Powdery mildews are biotrophic pathogenic fungi infecting a number of economically important plants. The grass powdery mildew, Blumeria graminis, has become a model organism to study host specialization of obligate biotrophic fungal pathogens. We resolved the large-scale genomic architecture of B. graminis forma specialis hordei (Bgh) to explore the potential influence of its genome organization on the co-evolutionary process with its host plant, barley (Hordeum vulgare). The near-chromosome level assemblies of the Bgh reference isolate DH14 and one of the most diversified isolates, RACE1, enabled a comparative analysis of these haploid genomes, which are highly enriched with transposable elements (TEs). We found largely retained genome synteny and gene repertoires, yet detected copy number variation (CNV) of secretion signal peptide-containing protein-coding genes (SPs) and locally disrupted synteny blocks. Genes coding for sequence-related SPs are often locally clustered, but neither the SPs nor the TEs reside preferentially in genomic regions with unique features. Extended comparative analysis with different host-specific B. graminis formae speciales revealed the existence of a core suite of SPs, but also isolate-specific SP sets as well as congruence of SP CNV and phylogenetic relationship. We further detected evidence for a recent, lineage-specific expansion of TEs in the Bgh genome. The characteristics of the Bgh genome (largely retained synteny, CNV of SP genes, recently proliferated TEs and a lack of significant compartmentalization) are consistent with a "one-speed" genome that differs in its architecture and (co-)evolutionary pattern from the "two-speed" genomes reported for several other filamentous phytopathogens.

  13. Big Data Analytics for Genomic Medicine

    PubMed Central

    He, Karen Y.; Ge, Dongliang; He, Max M.

    2017-01-01

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients’ genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs. PMID:28212287

  14. Big Data Analytics for Genomic Medicine.

    PubMed

    He, Karen Y; Ge, Dongliang; He, Max M

    2017-02-15

    Genomic medicine attempts to build individualized strategies for diagnostic or therapeutic decision-making by utilizing patients' genomic information. Big Data analytics uncovers hidden patterns, unknown correlations, and other insights through examining large-scale various data sets. While integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a Big Data infrastructure exhibit challenges, they also provide a feasible opportunity to develop an efficient and effective approach to identify clinically actionable genetic variants for individualized diagnosis and therapy. In this paper, we review the challenges of manipulating large-scale next-generation sequencing (NGS) data and diverse clinical data derived from the EHRs for genomic medicine. We introduce possible solutions for different challenges in manipulating, managing, and analyzing genomic and clinical data to implement genomic medicine. Additionally, we also present a practical Big Data toolset for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.

  15. Methods comparison for microsatellite marker development: Different isolation methods, different yield efficiency

    NASA Astrophysics Data System (ADS)

    Zhan, Aibin; Bao, Zhenmin; Hu, Xiaoli; Lu, Wei; Hu, Jingjie

    2009-06-01

    Microsatellite markers have become one kind of the most important molecular tools used in various researches. A large number of microsatellite markers are required for the whole genome survey in the fields of molecular ecology, quantitative genetics and genomics. Therefore, it is extremely necessary to select several versatile, low-cost, efficient and time- and labor-saving methods to develop a large panel of microsatellite markers. In this study, we used Zhikong scallop ( Chlamys farreri) as the target species to compare the efficiency of the five methods derived from three strategies for microsatellite marker development. The results showed that the strategy of constructing small insert genomic DNA library resulted in poor efficiency, while the microsatellite-enriched strategy highly improved the isolation efficiency. Although the mining public database strategy is time- and cost-saving, it is difficult to obtain a large number of microsatellite markers, mainly due to the limited sequence data of non-model species deposited in public databases. Based on the results in this study, we recommend two methods, microsatellite-enriched library construction method and FIASCO-colony hybridization method, for large-scale microsatellite marker development. Both methods were derived from the microsatellite-enriched strategy. The experimental results obtained from Zhikong scallop also provide the reference for microsatellite marker development in other species with large genomes.

  16. Compositional patterns in the genomes of unicellular eukaryotes.

    PubMed

    Costantini, Maria; Alvarez-Valin, Fernando; Costantini, Susan; Cammarano, Rosalia; Bernardi, Giorgio

    2013-11-05

    The genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores, large and fairly homogeneous stretches of DNA that belong to a small number of families characterized by different average GC levels, by different gene concentration (that increase with GC), different chromatin structures, different replication timing in the cell cycle, and other different properties. A question raised by these basic results concerns how far back in evolution the compartmentalized organization of the eukaryotic genomes arose. In the present work we approached this problem by studying the compositional organization of the genomes from the unicellular eukaryotes for which full sequences are available, the sample used being representative. The average GC levels of the genomes from unicellular eukaryotes cover an extremely wide range (19%-60% GC) and the compositional patterns of individual genomes are extremely different but all genomes tested show a compositional compartmentalization. The average GC range of the genomes of unicellular eukaryotes is very broad (as broad as that of prokaryotes) and individual compositional patterns cover a very broad range from very narrow to very complex. Both features are not surprising for organisms that are very far from each other both in terms of phylogenetic distances and of environmental life conditions. Most importantly, all genomes tested, a representative sample of all supergroups of unicellular eukaryotes, are compositionally compartmentalized, a major difference with prokaryotes.

  17. Genomic comparison of closely related Giant Viruses supports an accordion-like model of evolution.

    PubMed

    Filée, Jonathan

    2015-01-01

    Genome gigantism occurs so far in Phycodnaviridae and Mimiviridae (order Megavirales). Origin and evolution of these Giant Viruses (GVs) remain open questions. Interestingly, availability of a collection of closely related GV genomes enabling genomic comparisons offer the opportunity to better understand the different evolutionary forces acting on these genomes. Whole genome alignment for five groups of viruses belonging to the Mimiviridae and Phycodnaviridae families show that there is no trend of genome expansion or general tendency of genome contraction. Instead, GV genomes accumulated genomic mutations over the time with gene gains compensating the different losses. In addition, each lineage displays specific patterns of genome evolution. Mimiviridae (megaviruses and mimiviruses) and Chlorella Phycodnaviruses evolved mainly by duplications and losses of genes belonging to large paralogous families (including movements of diverse mobiles genetic elements), whereas Micromonas and Ostreococcus Phycodnaviruses derive most of their genetic novelties thought lateral gene transfers. Taken together, these data support an accordion-like model of evolution in which GV genomes have undergone successive steps of gene gain and gene loss, accrediting the hypothesis that genome gigantism appears early, before the diversification of the different GV lineages.

  18. Phylogenomic, Pan-genomic, Pathogenomic and Evolutionary Genomic Insights into the Agronomically Relevant Enterobacteria Pantoea ananatis and Pantoea stewartii

    PubMed Central

    De Maayer, Pieter; Aliyu, Habibu; Vikram, Surendra; Blom, Jochen; Duffy, Brion; Cowan, Don A.; Smits, Theo H. M.; Venter, Stephanus N.; Coutinho, Teresa A.

    2017-01-01

    Pantoea ananatis is ubiquitously found in the environment and causes disease on a wide range of plant hosts. By contrast, its sister species, Pantoea stewartii subsp. stewartii is the host-specific causative agent of the devastating maize disease Stewart’s wilt. This pathogen has a restricted lifecycle, overwintering in an insect vector before being introduced into susceptible maize cultivars, causing disease and returning to overwinter in its vector. The other subspecies of P. stewartii subsp. indologenes, has been isolated from different plant hosts and is predicted to proliferate in different environmental niches. Here we have, by the use of comparative genomics and a comprehensive suite of bioinformatic tools, analyzed the genomes of ten P. stewartii and nineteen P. ananatis strains. Our phylogenomic analyses have revealed that there are two distinct clades within P. ananatis while far less phylogenetic diversity was observed among the P. stewartii subspecies. Pan-genome analyses revealed a large core genome comprising of 3,571 protein coding sequences is shared among the twenty-nine compared strains. Furthermore, we showed that an extensive accessory genome made up largely by a mobilome of plasmids, integrated prophages, integrative and conjugative elements and insertion elements has resulted in extensive diversification of P. stewartii and P. ananatis. While these organisms share many pathogenicity determinants, our comparative genomic analyses show that they differ in terms of the secretion systems they encode. The genomic differences identified in this study have allowed us to postulate on the divergent evolutionary histories of the analyzed P. ananatis and P. stewartii strains and on the molecular basis underlying their ecological success and host range. PMID:28959245

  19. Phylogenomic, Pan-genomic, Pathogenomic and Evolutionary Genomic Insights into the Agronomically Relevant Enterobacteria Pantoea ananatis and Pantoea stewartii.

    PubMed

    De Maayer, Pieter; Aliyu, Habibu; Vikram, Surendra; Blom, Jochen; Duffy, Brion; Cowan, Don A; Smits, Theo H M; Venter, Stephanus N; Coutinho, Teresa A

    2017-01-01

    Pantoea ananatis is ubiquitously found in the environment and causes disease on a wide range of plant hosts. By contrast, its sister species, Pantoea stewartii subsp. stewartii is the host-specific causative agent of the devastating maize disease Stewart's wilt. This pathogen has a restricted lifecycle, overwintering in an insect vector before being introduced into susceptible maize cultivars, causing disease and returning to overwinter in its vector. The other subspecies of P. stewartii subsp. indologenes , has been isolated from different plant hosts and is predicted to proliferate in different environmental niches. Here we have, by the use of comparative genomics and a comprehensive suite of bioinformatic tools, analyzed the genomes of ten P. stewartii and nineteen P. ananatis strains. Our phylogenomic analyses have revealed that there are two distinct clades within P. ananatis while far less phylogenetic diversity was observed among the P. stewartii subspecies. Pan-genome analyses revealed a large core genome comprising of 3,571 protein coding sequences is shared among the twenty-nine compared strains. Furthermore, we showed that an extensive accessory genome made up largely by a mobilome of plasmids, integrated prophages, integrative and conjugative elements and insertion elements has resulted in extensive diversification of P. stewartii and P. ananatis . While these organisms share many pathogenicity determinants, our comparative genomic analyses show that they differ in terms of the secretion systems they encode. The genomic differences identified in this study have allowed us to postulate on the divergent evolutionary histories of the analyzed P. ananatis and P. stewartii strains and on the molecular basis underlying their ecological success and host range.

  20. A Novel Genome-Information Content-Based Statistic for Genome-Wide Association Analysis Designed for Next-Generation Sequencing Data

    PubMed Central

    Luo, Li; Zhu, Yun

    2012-01-01

    Abstract The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T2, collapsing method, multivariate and collapsing (CMC) method, individual χ2 test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets. PMID:22651812

  1. A novel genome-information content-based statistic for genome-wide association analysis designed for next-generation sequencing data.

    PubMed

    Luo, Li; Zhu, Yun; Xiong, Momiao

    2012-06-01

    The genome-wide association studies (GWAS) designed for next-generation sequencing data involve testing association of genomic variants, including common, low frequency, and rare variants. The current strategies for association studies are well developed for identifying association of common variants with the common diseases, but may be ill-suited when large amounts of allelic heterogeneity are present in sequence data. Recently, group tests that analyze their collective frequency differences between cases and controls shift the current variant-by-variant analysis paradigm for GWAS of common variants to the collective test of multiple variants in the association analysis of rare variants. However, group tests ignore differences in genetic effects among SNPs at different genomic locations. As an alternative to group tests, we developed a novel genome-information content-based statistics for testing association of the entire allele frequency spectrum of genomic variation with the diseases. To evaluate the performance of the proposed statistics, we use large-scale simulations based on whole genome low coverage pilot data in the 1000 Genomes Project to calculate the type 1 error rates and power of seven alternative statistics: a genome-information content-based statistic, the generalized T(2), collapsing method, multivariate and collapsing (CMC) method, individual χ(2) test, weighted-sum statistic, and variable threshold statistic. Finally, we apply the seven statistics to published resequencing dataset from ANGPTL3, ANGPTL4, ANGPTL5, and ANGPTL6 genes in the Dallas Heart Study. We report that the genome-information content-based statistic has significantly improved type 1 error rates and higher power than the other six statistics in both simulated and empirical datasets.

  2. Large-scale chromosome folding versus genomic DNA sequences: A discrete double Fourier transform technique.

    PubMed

    Chechetkin, V R; Lobzin, V V

    2017-08-07

    Using state-of-the-art techniques combining imaging methods and high-throughput genomic mapping tools leaded to the significant progress in detailing chromosome architecture of various organisms. However, a gap still remains between the rapidly growing structural data on the chromosome folding and the large-scale genome organization. Could a part of information on the chromosome folding be obtained directly from underlying genomic DNA sequences abundantly stored in the databanks? To answer this question, we developed an original discrete double Fourier transform (DDFT). DDFT serves for the detection of large-scale genome regularities associated with domains/units at the different levels of hierarchical chromosome folding. The method is versatile and can be applied to both genomic DNA sequences and corresponding physico-chemical parameters such as base-pairing free energy. The latter characteristic is closely related to the replication and transcription and can also be used for the assessment of temperature or supercoiling effects on the chromosome folding. We tested the method on the genome of E. coli K-12 and found good correspondence with the annotated domains/units established experimentally. As a brief illustration of further abilities of DDFT, the study of large-scale genome organization for bacteriophage PHIX174 and bacterium Caulobacter crescentus was also added. The combined experimental, modeling, and bioinformatic DDFT analysis should yield more complete knowledge on the chromosome architecture and genome organization. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Nomadic lifestyle of Lactobacillus plantarum revealed by comparative genomics of 54 strains isolated from different habitats.

    PubMed

    Martino, Maria Elena; Bayjanov, Jumamurat R; Caffrey, Brian E; Wels, Michiel; Joncour, Pauline; Hughes, Sandrine; Gillet, Benjamin; Kleerebezem, Michiel; van Hijum, Sacha A F T; Leulier, François

    2016-12-01

    The ability of bacteria to adapt to diverse environmental conditions is well-known. The process of bacterial adaptation to a niche has been linked to large changes in the genome content, showing that many bacterial genomes reflect the constraints imposed by their habitat. However, some highly versatile bacteria are found in diverse habitats that almost share nothing in common. Lactobacillus plantarum is a lactic acid bacterium that is found in a large variety of habitat. With the aim of unravelling the link between evolution and ecological versatility of L. plantarum, we analysed the genomes of 54 L. plantarum strains isolated from different environments. Comparative genome analysis identified a high level of genomic diversity and plasticity among the strains analysed. Phylogenomic and functional divergence studies coupled with gene-trait matching analyses revealed a mixed distribution of the strains, which was uncoupled from their environmental origin. Our findings revealed the absence of specific genomic signatures marking adaptations of L. plantarum towards the diverse habitats it is associated with. This suggests fundamentally similar trends of genome evolution in L. plantarum, which occur in a manner that is apparently uncoupled from ecological constraint and reflects the nomadic lifestyle of this species. © 2016 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.

  4. Genomic gigantism: DNA loss is slow in mountain grasshoppers.

    PubMed

    Bensasson, D; Petrov, D A; Zhang, D X; Hartl, D L; Hewitt, G M

    2001-02-01

    Several studies have shown DNA loss to be inversely correlated with genome size in animals. These studies include a comparison between Drosophila and the cricket, Laupala, but there has been no assessment of DNA loss in insects with very large genomes. Podisma pedestris, the brown mountain grasshopper, has a genome over 100 times as large as that of Drosophila and 10 times as large as that of Laupala. We used 58 paralogous nuclear pseudogenes of mitochondrial origin to study the characteristics of insertion, deletion, and point substitution in P. pedestris and Italopodisma. In animals, these pseudogenes are "dead on arrival"; they are abundant in many different eukaryotes, and their mitochondrial origin simplifies the identification of point substitutions accumulated in nuclear pseudogene lineages. There appears to be a mononucleotide repeat within the 643-bp pseudogene sequence studied that acts as a strong hot spot for insertions or deletions (indels). Because the data for other insect species did not contain such an unusual region, hot spots were excluded from species comparisons. The rate of DNA loss relative to point substitution appears to be considerably and significantly lower in the grasshoppers studied than in Drosophila or Laupala. This suggests that the inverse correlation between genome size and the rate of DNA loss can be extended to comparisons between insects with large or gigantic genomes (i.e., Laupala and Podisma). The low rate of DNA loss implies that in grasshoppers, the accumulation of point mutations is a more potent force for obscuring ancient pseudogenes than their loss through indel accumulation, whereas the reverse is true for Drosophila. The main factor contributing to the difference in the rates of DNA loss estimated for grasshoppers, crickets, and Drosophila appears to be deletion size. Large deletions are relatively rare in Podisma and Italopodisma.

  5. International network of cancer genome projects

    PubMed Central

    2010-01-01

    The International Cancer Genome Consortium (ICGC) was launched to coordinate large-scale cancer genome studies in tumors from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe. Systematic studies of over 25,000 cancer genomes at the genomic, epigenomic, and transcriptomic levels will reveal the repertoire of oncogenic mutations, uncover traces of the mutagenic influences, define clinically-relevant subtypes for prognosis and therapeutic management, and enable the development of new cancer therapies. PMID:20393554

  6. Microarray Data Processing Techniques for Genome-Scale Network Inference from Large Public Repositories.

    PubMed

    Chockalingam, Sriram; Aluru, Maneesha; Aluru, Srinivas

    2016-09-19

    Pre-processing of microarray data is a well-studied problem. Furthermore, all popular platforms come with their own recommended best practices for differential analysis of genes. However, for genome-scale network inference using microarray data collected from large public repositories, these methods filter out a considerable number of genes. This is primarily due to the effects of aggregating a diverse array of experiments with different technical and biological scenarios. Here we introduce a pre-processing pipeline suitable for inferring genome-scale gene networks from large microarray datasets. We show that partitioning of the available microarray datasets according to biological relevance into tissue- and process-specific categories significantly extends the limits of downstream network construction. We demonstrate the effectiveness of our pre-processing pipeline by inferring genome-scale networks for the model plant Arabidopsis thaliana using two different construction methods and a collection of 11,760 Affymetrix ATH1 microarray chips. Our pre-processing pipeline and the datasets used in this paper are made available at http://alurulab.cc.gatech.edu/microarray-pp.

  7. Genome-wide heterogeneity of nucleotide substitution model fit.

    PubMed

    Arbiza, Leonardo; Patricio, Mateus; Dopazo, Hernán; Posada, David

    2011-01-01

    At a genomic scale, the patterns that have shaped molecular evolution are believed to be largely heterogeneous. Consequently, comparative analyses should use appropriate probabilistic substitution models that capture the main features under which different genomic regions have evolved. While efforts have concentrated in the development and understanding of model selection techniques, no descriptions of overall relative substitution model fit at the genome level have been reported. Here, we provide a characterization of best-fit substitution models across three genomic data sets including coding regions from mammals, vertebrates, and Drosophila (24,000 alignments). According to the Akaike Information Criterion (AIC), 82 of 88 models considered were selected as best-fit models at least in one occasion, although with very different frequencies. Most parameter estimates also varied broadly among genes. Patterns found for vertebrates and Drosophila were quite similar and often more complex than those found in mammals. Phylogenetic trees derived from models in the 95% confidence interval set showed much less variance and were significantly closer to the tree estimated under the best-fit model than trees derived from models outside this interval. Although alternative criteria selected simpler models than the AIC, they suggested similar patterns. All together our results show that at a genomic scale, different gene alignments for the same set of taxa are best explained by a large variety of different substitution models and that model choice has implications on different parameter estimates including the inferred phylogenetic trees. After taking into account the differences related to sample size, our results suggest a noticeable diversity in the underlying evolutionary process. All together, we conclude that the use of model selection techniques is important to obtain consistent phylogenetic estimates from real data at a genomic scale.

  8. Genomic prediction in contrast to a genome-wide association study in explaining heritable variation of complex growth traits in breeding populations of Eucalyptus.

    PubMed

    Müller, Bárbara S F; Neves, Leandro G; de Almeida Filho, Janeo E; Resende, Márcio F R; Muñoz, Patricio R; Dos Santos, Paulo E T; Filho, Estefano Paludzyszyn; Kirst, Matias; Grattapaglia, Dario

    2017-07-11

    The advent of high-throughput genotyping technologies coupled to genomic prediction methods established a new paradigm to integrate genomics and breeding. We carried out whole-genome prediction and contrasted it to a genome-wide association study (GWAS) for growth traits in breeding populations of Eucalyptus benthamii (n =505) and Eucalyptus pellita (n =732). Both species are of increasing commercial interest for the development of germplasm adapted to environmental stresses. Predictive ability reached 0.16 in E. benthamii and 0.44 in E. pellita for diameter growth. Predictive abilities using either Genomic BLUP or different Bayesian methods were similar, suggesting that growth adequately fits the infinitesimal model. Genomic prediction models using ~5000-10,000 SNPs provided predictive abilities equivalent to using all 13,787 and 19,506 SNPs genotyped in the E. benthamii and E. pellita populations, respectively. No difference was detected in predictive ability when different sets of SNPs were utilized, based on position (equidistantly genome-wide, inside genes, linkage disequilibrium pruned or on single chromosomes), as long as the total number of SNPs used was above ~5000. Predictive abilities obtained by removing relatedness between training and validation sets fell near zero for E. benthamii and were halved for E. pellita. These results corroborate the current view that relatedness is the main driver of genomic prediction, although some short-range historical linkage disequilibrium (LD) was likely captured for E. pellita. A GWAS identified only one significant association for volume growth in E. pellita, illustrating the fact that while genome-wide regression is able to account for large proportions of the heritability, very little or none of it is captured into significant associations using GWAS in breeding populations of the size evaluated in this study. This study provides further experimental data supporting positive prospects of using genome-wide data to capture large proportions of trait heritability and predict growth traits in trees with accuracies equal or better than those attainable by phenotypic selection. Additionally, our results document the superiority of the whole-genome regression approach in accounting for large proportions of the heritability of complex traits such as growth in contrast to the limited value of the local GWAS approach toward breeding applications in forest trees.

  9. Karyological evidence of hybridogenesis in Greenlings (Teleostei: Hexagrammidae).

    PubMed

    Suzuki, Shota; Arai, Katsutoshi; Munehara, Hiroyuki

    2017-01-01

    Two types of natural hybrids were discovered in populations of three Hexagrammos species (Teleostei: Hexagrammidae) distributed off the southern coast of Hokkaido in the North Pacific Ocean. Both hybrids reproduce by hybridogenesis, in which the maternal haploid genome is transmitted to offspring without recombination and the paternal haploid genome is eliminated during gametogenesis. While natural hybrids are unisexual and reproduce hemiclonally by backcrossing with the paternal species (BC-P), artificial F1-hybrids between the pure species produce recombinant gametes. Thus, despite having the same genome composition, the natural hybrids and the F1-hybrids are not genetically identical. Here, to clarify the differences between both hybrids, we examined the karyotypes of the three Hexagrammos species, their natural hybrids, the artificial F1-hybrids, and several backcrosses. Artificial F1-hybrids have karyotypes and chromosome numbers that are intermediate between those of the parental species. Conversely, the natural hybrids differed from F1-hybrids by having several large metacentric chromosomes and microchromosomes. Since the entire maternal haploid genome is inherited by the natural hybrids, maternal backcrosses (BC-M) between natural hybrids and males of the maternal species (H. octogrammus; Hoc) have a hemiclonal Hoc genome with large chromosomes from the mother and a normal Hoc genome from the father. However, the large chromosomes disappear in offspring of BC-M, probably due to fissuring during gametogenesis. Similarly, microsatellite DNA analysis revealed that chromosomes of BC-M undergo recombination. These findings suggest that genetic factors associated with hemiclonal reproduction may be located on the large metacentric chromosomes of natural hybrids.

  10. Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context

    PubMed Central

    Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

    2007-01-01

    Background Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. Results lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. Conclusion lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired. PMID:17877794

  11. Lightweight genome viewer: portable software for browsing genomics data in its chromosomal context.

    PubMed

    Faith, Jeremiah J; Olson, Andrew J; Gardner, Timothy S; Sachidanandam, Ravi

    2007-09-18

    Lightweight genome viewer (lwgv) is a web-based tool for visualization of sequence annotations in their chromosomal context. It performs most of the functions of larger genome browsers, while relying on standard flat-file formats and bypassing the database needs of most visualization tools. Visualization as an aide to discovery requires display of novel data in conjunction with static annotations in their chromosomal context. With database-based systems, displaying dynamic results requires temporary tables that need to be tracked for removal. lwgv simplifies the visualization of user-generated results on a local computer. The dynamic results of these analyses are written to transient files, which can import static content from a more permanent file. lwgv is currently used in many different applications, from whole genome browsers to single-gene RNAi design visualization, demonstrating its applicability in a large variety of contexts and scales. lwgv provides a lightweight alternative to large genome browsers for visualizing biological annotations and dynamic analyses in their chromosomal context. It is particularly suited for applications ranging from short sequences to medium-sized genomes when the creation and maintenance of a large software and database infrastructure is not necessary or desired.

  12. Fundamental differences in diversity and genomic population structure between Atlantic and Pacific Prochlorococcus.

    PubMed

    Kashtan, Nadav; Roggensack, Sara E; Berta-Thompson, Jessie W; Grinberg, Maor; Stepanauskas, Ramunas; Chisholm, Sallie W

    2017-09-01

    The Atlantic and Pacific Oceans represent different biogeochemical regimes in which the abundant marine cyanobacterium Prochlorococcus thrives. We have shown that Prochlorococcus populations in the Atlantic are composed of hundreds of genomically, and likely ecologically, distinct coexisting subpopulations with distinct genomic backbones. Here we ask if differences in the ecology and selection pressures between the Atlantic and Pacific are reflected in the diversity and genomic composition of their indigenous Prochlorococcus populations. We applied large-scale single-cell genomics and compared the cell-by-cell genomic composition of wild populations of co-occurring cells from samples from Station ALOHA off Hawaii, and from Bermuda Atlantic Time Series Station off Bermuda. We reveal fundamental differences in diversity and genomic structure of populations between the sites. The Pacific populations are more diverse than those in the Atlantic, composed of significantly more coexisting subpopulations and lacking dominant subpopulations. Prochlorococcus from the two sites seem to be composed of mostly non-overlapping distinct sets of subpopulations with different genomic backbones-likely reflecting different sets of ocean-specific micro-niches. Furthermore, phylogenetically closely related strains carry ocean-associated nutrient acquisition genes likely reflecting differences in major selection pressures between the oceans. This differential selection, along with geographic separation, clearly has a significant role in shaping these populations.

  13. Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria.

    PubMed

    Thorpe, Harry A; Bayliss, Sion C; Sheppard, Samuel K; Feil, Edward J

    2018-04-01

    The concept of the "pan-genome," which refers to the total complement of genes within a given sample or species, is well established in bacterial genomics. Rapid and scalable pipelines are available for managing and interpreting pan-genomes from large batches of annotated assemblies. However, despite overwhelming evidence that variation in intergenic regions in bacteria can directly influence phenotypes, most current approaches for analyzing pan-genomes focus exclusively on protein-coding sequences. To address this we present Piggy, a novel pipeline that emulates Roary except that it is based only on intergenic regions. A key utility provided by Piggy is the detection of highly divergent ("switched") intergenic regions (IGRs) upstream of genes. We demonstrate the use of Piggy on large datasets of clinically important lineages of Staphylococcus aureus and Escherichia coli. For S. aureus, we show that highly divergent (switched) IGRs are associated with differences in gene expression and we establish a multilocus reference database of IGR alleles (igMLST; implemented in BIGSdb).

  14. Compositional patterns in the genomes of unicellular eukaryotes

    PubMed Central

    2013-01-01

    Background The genomes of multicellular eukaryotes are compartmentalized in mosaics of isochores, large and fairly homogeneous stretches of DNA that belong to a small number of families characterized by different average GC levels, by different gene concentration (that increase with GC), different chromatin structures, different replication timing in the cell cycle, and other different properties. A question raised by these basic results concerns how far back in evolution the compartmentalized organization of the eukaryotic genomes arose. Results In the present work we approached this problem by studying the compositional organization of the genomes from the unicellular eukaryotes for which full sequences are available, the sample used being representative. The average GC levels of the genomes from unicellular eukaryotes cover an extremely wide range (19%-60% GC) and the compositional patterns of individual genomes are extremely different but all genomes tested show a compositional compartmentalization. Conclusions The average GC range of the genomes of unicellular eukaryotes is very broad (as broad as that of prokaryotes) and individual compositional patterns cover a very broad range from very narrow to very complex. Both features are not surprising for organisms that are very far from each other both in terms of phylogenetic distances and of environmental life conditions. Most importantly, all genomes tested, a representative sample of all supergroups of unicellular eukaryotes, are compositionally compartmentalized, a major difference with prokaryotes. PMID:24188247

  15. Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae)

    PubMed Central

    Li, Gang; Davis, Brian W.; Eizirik, Eduardo; Murphy, William J.

    2016-01-01

    Inter-species hybridization has been recently recognized as potentially common in wild animals, but the extent to which it shapes modern genomes is still poorly understood. Distinguishing historical hybridization events from other processes leading to phylogenetic discordance among different markers requires a well-resolved species tree that considers all modes of inheritance and overcomes systematic problems due to rapid lineage diversification by sampling large genomic character sets. Here, we assessed genome-wide phylogenetic variation across a diverse mammalian family, Felidae (cats). We combined genotypes from a genome-wide SNP array with additional autosomal, X- and Y-linked variants to sample ∼150 kb of nuclear sequence, in addition to complete mitochondrial genomes generated using light-coverage Illumina sequencing. We present the first robust felid time tree that accounts for unique maternal, paternal, and biparental evolutionary histories. Signatures of phylogenetic discordance were abundant in the genomes of modern cats, in many cases indicating hybridization as the most likely cause. Comparison of big cat whole-genome sequences revealed a substantial reduction of X-linked divergence times across several large recombination cold spots, which were highly enriched for signatures of selection-driven post-divergence hybridization between the ancestors of the snow leopard and lion lineages. These results highlight the mosaic origin of modern felid genomes and the influence of sex chromosomes and sex-biased dispersal in post-speciation gene flow. A complete resolution of the tree of life will require comprehensive genomic sampling of biparental and sex-limited genetic variation to identify and control for phylogenetic conflict caused by ancient admixture and sex-biased differences in genomic transmission. PMID:26518481

  16. An epidemiological perspective of personalized medicine: the Estonian experience

    PubMed Central

    Milani, L; Leitsalu, L; Metspalu, A

    2015-01-01

    Milani L, Leitsalu L, Metspalu A (University of Tartu). An epidemiological perspective of personalized medicine: the Estonian experience (Review). J Intern Med 2015; 277: 188–200. The Estonian Biobank and several other biobanks established over a decade ago are now starting to yield valuable longitudinal follow-up data for large numbers of individuals. These samples have been used in hundreds of different genome-wide association studies, resulting in the identification of reliable disease-associated variants. The focus of genomic research has started to shift from identifying genetic and nongenetic risk factors associated with common complex diseases to understanding the underlying mechanisms of the diseases and suggesting novel targets for therapy. However, translation of findings from genomic research into medical practice is still lagging, mainly due to insufficient evidence of clinical validity and utility. In this review, we examine the different elements required for the implementation of personalized medicine based on genomic information. First, biobanks and genome centres are required and have been established for the high-throughput genomic screening of large numbers of samples. Secondly, the combination of susceptibility alleles into polygenic risk scores has improved risk prediction of cardiovascular disease, breast cancer and several other diseases. Finally, national health information systems are being developed internationally, to combine data from electronic medical records from different sources, and also to gradually incorporate genomic information. We focus on the experience in Estonia, one of several countries with national goals towards more personalized health care based on genomic information, where the unique combination of elements required to accomplish this goal are already in place. PMID:25339628

  17. The Arab genome: Health and wealth.

    PubMed

    Zayed, Hatem

    2016-11-05

    The 22 Arab nations have a unique genetic structure, which reflects both conserved and diverse gene pools due to the prevalent endogamous and consanguineous marriage culture and the long history of admixture among different ethnic subcultures descended from the Asian, European, and African continents. Human genome sequencing has enabled large-scale genomic studies of different populations and has become a powerful tool for studying disease predictions and diagnosis. Despite the importance of the Arab genome for better understanding the dynamics of the human genome, discovering rare genetic variations, and studying early human migration out of Africa, it is poorly represented in human genome databases, such as HapMap and the 1000 Genomes Project. In this review, I demonstrate the significance of sequencing the Arab genome and setting an Arab genome reference(s) for better understanding the molecular pathogenesis of genetic diseases, discovering novel/rare variants, and identifying a meaningful genotype-phenotype correlation for complex diseases. Copyright © 2016. Published by Elsevier B.V.

  18. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    PubMed

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  19. DNA Asymmetric Strand Bias Affects the Amino Acid Composition of Mitochondrial Proteins

    PubMed Central

    Min, Xiang Jia; Hickey, Donal A.

    2007-01-01

    Abstract Variations in GC content between genomes have been extensively documented. Genomes with comparable GC contents can, however, still differ in the apportionment of the G and C nucleotides between the two DNA strands. This asymmetric strand bias is known as GC skew. Here, we have investigated the impact of differences in nucleotide skew on the amino acid composition of the encoded proteins. We compared orthologous genes between animal mitochondrial genomes that show large differences in GC and AT skews. Specifically, we compared the mitochondrial genomes of mammals, which are characterized by a negative GC skew and a positive AT skew, to those of flatworms, which show the opposite skews for both GC and AT base pairs. We found that the mammalian proteins are highly enriched in amino acids encoded by CA-rich codons (as predicted by their negative GC and positive AT skews), whereas their flatworm orthologs were enriched in amino acids encoded by GT-rich codons (also as predicted from their skews). We found that these differences in mitochondrial strand asymmetry (measured as GC and AT skews) can have very large, predictable effects on the composition of the encoded proteins. PMID:17974594

  20. How to kill the honey bee larva: genomic potential and virulence mechanisms of Paenibacillus larvae.

    PubMed

    Djukic, Marvin; Brzuszkiewicz, Elzbieta; Fünfhaus, Anne; Voss, Jörn; Gollnow, Kathleen; Poppinga, Lena; Liesegang, Heiko; Garcia-Gonzalez, Eva; Genersch, Elke; Daniel, Rolf

    2014-01-01

    Paenibacillus larvae, a Gram positive bacterial pathogen, causes American Foulbrood (AFB), which is the most serious infectious disease of honey bees. In order to investigate the genomic potential of P. larvae, two strains belonging to two different genotypes were sequenced and used for comparative genome analysis. The complete genome sequence of P. larvae strain DSM 25430 (genotype ERIC II) consisted of 4,056,006 bp and harbored 3,928 predicted protein-encoding genes. The draft genome sequence of P. larvae strain DSM 25719 (genotype ERIC I) comprised 4,579,589 bp and contained 4,868 protein-encoding genes. Both strains harbored a 9.7 kb plasmid and encoded a large number of virulence-associated proteins such as toxins and collagenases. In addition, genes encoding large multimodular enzymes producing nonribosomally peptides or polyketides were identified. In the genome of strain DSM 25719 seven toxin associated loci were identified and analyzed. Five of them encoded putatively functional toxins. The genome of strain DSM 25430 harbored several toxin loci that showed similarity to corresponding loci in the genome of strain DSM 25719, but were non-functional due to point mutations or disruption by transposases. Although both strains cause AFB, significant differences between the genomes were observed including genome size, number and composition of transposases, insertion elements, predicted phage regions, and strain-specific island-like regions. Transposases, integrases and recombinases are important drivers for genome plasticity. A total of 390 and 273 mobile elements were found in strain DSM 25430 and strain DSM 25719, respectively. Comparative genomics of both strains revealed acquisition of virulence factors by horizontal gene transfer and provided insights into evolution and pathogenicity.

  1. pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.

    PubMed

    Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael

    2013-08-01

    With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.

  2. Assembly of the Genome of the Disease Vector Aedes aegypti onto a Genetic Linkage Map Allows Mapping of Genes Affecting Disease Transmission

    PubMed Central

    Juneja, Punita; Osei-Poku, Jewelna; Ho, Yung S.; Ariani, Cristina V.; Palmer, William J.; Pain, Arnab; Jiggins, Francis M.

    2014-01-01

    The mosquito Aedes aegypti transmits some of the most important human arboviruses, including dengue, yellow fever and chikungunya viruses. It has a large genome containing many repetitive sequences, which has resulted in the genome being poorly assembled — there are 4,758 scaffolds, few of which have been assigned to a chromosome. To allow the mapping of genes affecting disease transmission, we have improved the genome assembly by scoring a large number of SNPs in recombinant progeny from a cross between two strains of Ae. aegypti, and used these to generate a genetic map. This revealed a high rate of misassemblies in the current genome, where, for example, sequences from different chromosomes were found on the same scaffold. Once these were corrected, we were able to assign 60% of the genome sequence to chromosomes and approximately order the scaffolds along the chromosome. We found that there are very large regions of suppressed recombination around the centromeres, which can extend to as much as 47% of the chromosome. To illustrate the utility of this new genome assembly, we mapped a gene that makes Ae. aegypti resistant to the human parasite Brugia malayi, and generated a list of candidate genes that could be affecting the trait. PMID:24498447

  3. Pediatric Genomic Data Inventory (PGDI) Overview

    Cancer.gov

    About Pediatric cancer is a genetic disease that can largely differ from similar malignancies in an adult population. To fuel new discoveries and treatments specific to pediatric oncologies, the NCI Office of Cancer Genomics has developed a dynamic resource known as the Pediatric Genomic Data Inventory to allow investigators to more easily locate genomic datasets. This resource lists known ongoing and completed sequencing projects of pediatric cancer cohorts from the United States and other countries, along with some basic details and reference metadata.

  4. Differentially Methylated Region-Representational Difference Analysis (DMR-RDA): A Powerful Method to Identify DMRs in Uncharacterized Genomes.

    PubMed

    Sasheva, Pavlina; Grossniklaus, Ueli

    2017-01-01

    Over the last years, it has become increasingly clear that environmental influences can affect the epigenomic landscape and that some epigenetic variants can have heritable, phenotypic effects. While there are a variety of methods to perform genome-wide analyses of DNA methylation in model organisms, this is still a challenging task for non-model organisms without a reference genome. Differentially methylated region-representational difference analysis (DMR-RDA) is a sensitive and powerful PCR-based technique that isolates DNA fragments that are differentially methylated between two otherwise identical genomes. The technique does not require special equipment and is independent of prior knowledge about the genome. It is even applicable to genomes that have high complexity and a large size, being the method of choice for the analysis of plant non-model systems.

  5. Proteogenomic Investigation of Strain Variation in Clinical Mycobacterium tuberculosis Isolates.

    PubMed

    Heunis, Tiaan; Dippenaar, Anzaan; Warren, Robin M; van Helden, Paul D; van der Merwe, Ruben G; Gey van Pittius, Nicolaas C; Pain, Arnab; Sampson, Samantha L; Tabb, David L

    2017-10-06

    Mycobacterium tuberculosis consists of a large number of different strains that display unique virulence characteristics. Whole-genome sequencing has revealed substantial genetic diversity among clinical M. tuberculosis isolates, and elucidating the phenotypic variation encoded by this genetic diversity will be of the utmost importance to fully understand M. tuberculosis biology and pathogenicity. In this study, we integrated whole-genome sequencing and mass spectrometry (GeLC-MS/MS) to reveal strain-specific characteristics in the proteomes of two clinical M. tuberculosis Latin American-Mediterranean isolates. Using this approach, we identified 59 peptides containing single amino acid variants, which covered ∼9% of all coding nonsynonymous single nucleotide variants detected by whole-genome sequencing. Furthermore, we identified 29 distinct peptides that mapped to a hypothetical protein not present in the M. tuberculosis H37Rv reference proteome. Here, we provide evidence for the expression of this protein in the clinical M. tuberculosis SAWC3651 isolate. The strain-specific databases enabled confirmation of genomic differences (i.e., large genomic regions of difference and nonsynonymous single nucleotide variants) in these two clinical M. tuberculosis isolates and allowed strain differentiation at the proteome level. Our results contribute to the growing field of clinical microbial proteogenomics and can improve our understanding of phenotypic variation in clinical M. tuberculosis isolates.

  6. Genome size variation in deep-sea amphipods

    PubMed Central

    Jamieson, A. J.; Piertney, S. B.

    2017-01-01

    Genome size varies considerably across taxa, and extensive research effort has gone into understanding whether variation can be explained by differences in key ecological and life-history traits among species. The extreme environmental conditions that characterize the deep sea have been hypothesized to promote large genome sizes in eukaryotes. Here we test this supposition by examining genome sizes among 13 species of deep-sea amphipods from the Mariana, Kermadec and New Hebrides trenches. Genome sizes were estimated using flow cytometry and found to vary nine-fold, ranging from 4.06 pg (4.04 Gb) in Paralicella caperesca to 34.79 pg (34.02 Gb) in Alicella gigantea. Phylogenetic independent contrast analysis identified a relationship between genome size and maximum body size, though this was largely driven by those species that display size gigantism. There was a distinct shift in the genome size trait diversification rate in the supergiant amphipod A. gigantea relative to the rest of the group. The variation in genome size observed is striking and argues against genome size being driven by a common evolutionary history, ecological niche and life-history strategy in deep-sea amphipods. PMID:28989783

  7. An ancient genome duplication contributed to the abundance of metabolic genes in the moss Physcomitrella patens

    PubMed Central

    Rensing, Stefan A; Ick, Julia; Fawcett, Jeffrey A; Lang, Daniel; Zimmer, Andreas; Van de Peer, Yves; Reski, Ralf

    2007-01-01

    Background: Analyses of complete genomes and large collections of gene transcripts have shown that most, if not all seed plants have undergone one or more genome duplications in their evolutionary past. Results: In this study, based on a large collection of EST sequences, we provide evidence that the haploid moss Physcomitrella patens is a paleopolyploid as well. Based on the construction of linearized phylogenetic trees we infer the genome duplication to have occurred between 30 and 60 million years ago. Gene Ontology and pathway association of the duplicated genes in P. patens reveal different biases of gene retention compared with seed plants. Conclusion: Metabolic genes seem to have been retained in excess following the genome duplication in P. patens. This might, at least partly, explain the versatility of metabolism, as described for P. patens and other mosses, in comparison to other land plants. PMID:17683536

  8. Evaluation of different sources of DNA for use in genome wide studies and forensic application.

    PubMed

    Al Safar, Habiba S; Abidi, Fatima H; Khazanehdari, Kamal A; Dadour, Ian R; Tay, Guan K

    2011-02-01

    In the field of epidemiology, Genome-Wide Association Studies (GWAS) are commonly used to identify genetic predispositions of many human diseases. Large repositories housing biological specimens for clinical and genetic investigations have been established to store material and data for these studies. The logistics of specimen collection and sample storage can be onerous, and new strategies have to be explored. This study examines three different DNA sources (namely, degraded genomic DNA, amplified degraded genomic DNA and amplified extracted DNA from FTA card) for GWAS using the Illumina platform. No significant difference in call rate was detected between amplified degraded genomic DNA extracted from whole blood and amplified DNA retrieved from FTA™ cards. However, using unamplified-degraded genomic DNA reduced the call rate to a mean of 42.6% compared to amplified DNA extracted from FTA card (mean of 96.6%). This study establishes the utility of FTA™ cards as a viable storage matrix for cells from which DNA can be extracted to perform GWAS analysis.

  9. Survey of microsatellite DNA in pine

    Treesearch

    Craig S. Echt; P. May-Marquardt

    1997-01-01

    A large insert genomic library from eastern white pine (Pinus strobus) was probed for the microsatellite motifs (AC)n and (AG)n, all 10 trinucleotide motifs, and 22 of the 33 possible tetranucleotide motifs. For comparison with a species from a different subgenus, a loblolly pine (Pinus taeda) genomic...

  10. Draft Genome Sequences of 510 Listeria monocytogenes Strains from Food Isolates and Human Listeriosis Cases from Northern Italy.

    PubMed

    Lomonaco, Sara; Gallina, Silvia; Filipello, Virginia; Sanchez Leon, Maria; Kastanis, George John; Allard, Marc; Brown, Eric; Amato, Ettore; Pontello, Mirella; Decastelli, Lucia

    2018-01-18

    Listeriosis outbreaks are frequently multistate/multicountry outbreaks, underlining the importance of molecular typing data for several diverse and well-characterized isolates. Large-scale whole-genome sequencing studies on Listeria monocytogenes isolates from non-U.S. locations have been limited. Herein, we describe the draft genome sequences of 510 L. monocytogenes isolates from northern Italy from different sources.

  11. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes.

    PubMed

    Zheng, Hou-Feng; Rong, Jing-Jing; Liu, Ming; Han, Fang; Zhang, Xing-Wei; Richards, J Brent; Wang, Li

    2015-01-01

    Genotype imputation is now routinely applied in genome-wide association studies (GWAS) and meta-analyses. However, most of the imputations have been run using HapMap samples as reference, imputation of low frequency and rare variants (minor allele frequency (MAF) < 5%) are not systemically assessed. With the emergence of next-generation sequencing, large reference panels (such as the 1000 Genomes panel) are available to facilitate imputation of these variants. Therefore, in order to estimate the performance of low frequency and rare variants imputation, we imputed 153 individuals, each of whom had 3 different genotype array data including 317k, 610k and 1 million SNPs, to three different reference panels: the 1000 Genomes pilot March 2010 release (1KGpilot), the 1000 Genomes interim August 2010 release (1KGinterim), and the 1000 Genomes phase1 November 2010 and May 2011 release (1KGphase1) by using IMPUTE version 2. The differences between these three releases of the 1000 Genomes data are the sample size, ancestry diversity, number of variants and their frequency spectrum. We found that both reference panel and GWAS chip density affect the imputation of low frequency and rare variants. 1KGphase1 outperformed the other 2 panels, at higher concordance rate, higher proportion of well-imputed variants (info>0.4) and higher mean info score in each MAF bin. Similarly, 1M chip array outperformed 610K and 317K. However for very rare variants (MAF ≤ 0.3%), only 0-1% of the variants were well imputed. We conclude that the imputation of low frequency and rare variants improves with larger reference panels and higher density of genome-wide genotyping arrays. Yet, despite a large reference panel size and dense genotyping density, very rare variants remain difficult to impute.

  12. FISH Oracle 2: a web server for integrative visualization of genomic data in cancer research.

    PubMed

    Mader, Malte; Simon, Ronald; Kurtz, Stefan

    2014-03-31

    A comprehensive view on all relevant genomic data is instrumental for understanding the complex patterns of molecular alterations typically found in cancer cells. One of the most effective ways to rapidly obtain an overview of genomic alterations in large amounts of genomic data is the integrative visualization of genomic events. We developed FISH Oracle 2, a web server for the interactive visualization of different kinds of downstream processed genomics data typically available in cancer research. A powerful search interface and a fast visualization engine provide a highly interactive visualization for such data. High quality image export enables the life scientist to easily communicate their results. A comprehensive data administration allows to keep track of the available data sets. We applied FISH Oracle 2 to published data and found evidence that, in colorectal cancer cells, the gene TTC28 may be inactivated in two different ways, a fact that has not been published before. The interactive nature of FISH Oracle 2 and the possibility to store, select and visualize large amounts of downstream processed data support life scientists in generating hypotheses. The export of high quality images supports explanatory data visualization, simplifying the communication of new biological findings. A FISH Oracle 2 demo server and the software is available at http://www.zbh.uni-hamburg.de/fishoracle.

  13. DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms.

    PubMed

    Panova, Marina; Aronsson, Henrik; Cameron, R Andrew; Dahl, Peter; Godhe, Anna; Lind, Ulrika; Ortega-Martinez, Olga; Pereyra, Ricardo; Tesson, Sylvie V M; Wrange, Anna-Lisa; Blomberg, Anders; Johannesson, Kerstin

    2016-01-01

    The marine environment harbors a large proportion of the total biodiversity on this planet, including the majority of the earths' different phyla and classes. Studying the genomes of marine organisms can bring interesting insights into genome evolution. Today, almost all marine organismal groups are understudied with respect to their genomes. One potential reason is that extraction of high-quality DNA in sufficient amounts is challenging for many marine species. This is due to high polysaccharide content, polyphenols and other secondary metabolites that will inhibit downstream DNA library preparations. Consequently, protocols developed for vertebrates and plants do not always perform well for invertebrates and algae. In addition, many marine species have large population sizes and, as a consequence, highly variable genomes. Thus, to facilitate the sequence read assembly process during genome sequencing, it is desirable to obtain enough DNA from a single individual, which is a challenge in many species of invertebrates and algae. Here, we present DNA extraction protocols for seven marine species (four invertebrates, two algae, and a marine yeast), optimized to provide sufficient DNA quality and yield for de novo genome sequencing projects.

  14. Genome evolution in Reptilia, the sister group of mammals.

    PubMed

    Janes, Daniel E; Organ, Christopher L; Fujita, Matthew K; Shedlock, Andrew M; Edwards, Scott V

    2010-01-01

    The genomes of birds and nonavian reptiles (Reptilia) are critical for understanding genome evolution in mammals and amniotes generally. Despite decades of study at the chromosomal and single-gene levels, and the evidence for great diversity in genome size, karyotype, and sex chromosome diversity, reptile genomes are virtually unknown in the comparative genomics era. The recent sequencing of the chicken and zebra finch genomes, in conjunction with genome scans and the online publication of the Anolis lizard genome, has begun to clarify the events leading from an ancestral amniote genome--predicted to be large and to possess a diverse repeat landscape on par with mammals and a birdlike sex chromosome system--to the small and highly streamlined genomes of birds. Reptilia exhibit a wide range of evolutionary rates of different subgenomes and, from isochores to mitochondrial DNA, provide a critical contrast to the genomic paradigms established in mammals.

  15. Rapid construction of genome map for large yellow croaker (Larimichthys crocea) by the whole-genome mapping in BioNano Genomics Irys system.

    PubMed

    Xiao, Shijun; Li, Jiongtang; Ma, Fengshou; Fang, Lujing; Xu, Shuangbin; Chen, Wei; Wang, Zhi Yong

    2015-09-03

    Large yellow croaker (Larimichthys crocea) is an important commercial fish in China and East-Asia. The annual product of the species from the aqua-farming industry is about 90 thousand tons. In spite of its economic importance, genetic studies of economic traits and genomic selections of the species are hindered by the lack of genomic resources. Specifically, a whole-genome physical map of large yellow croaker is still missing. The traditional BAC-based fingerprint method is extremely time- and labour-consuming. Here we report the first genome map construction using the high-throughput whole-genome mapping technique by nanochannel arrays in BioNano Genomics Irys system. For an optimal marker density of ~10 per 100 kb, the nicking endonuclease Nt.BspQ1 was chosen for the genome map generation. 645,305 DNA molecules with a total length of ~112 Gb were labelled and detected, covering more than 160X of the large yellow croaker genome. Employing IrysView package and signature patterns in raw DNA molecules, a whole-genome map of large yellow croaker was assembled into 686 maps with a total length of 727 Mb, which was consistent with the estimated genome size. The N50 length of the whole-genome map, including 126 maps, was up to 1.7 Mb. The excellent hybrid alignment with large yellow croaker draft genome validated the consensus genome map assembly and highlighted a promising application of whole-genome mapping on draft genome sequence super-scaffolding. The genome map data of large yellow croaker are accessible on lycgenomics.jmu.edu.cn/pm. Using the state-of-the-art whole-genome mapping technique in Irys system, the first whole-genome map for large yellow croaker has been constructed and thus highly facilitates the ongoing genomic and evolutionary studies for the species. To our knowledge, this is the first public report on genome map construction by the whole-genome mapping for aquatic-organisms. Our study demonstrates a promising application of the whole-genome mapping on genome maps construction for other non-model organisms in a fast and reliable manner.

  16. Reference-guided assembly of four diverse Arabidopsis thaliana genomes

    PubMed Central

    Schneeberger, Korbinian; Ossowski, Stephan; Ott, Felix; Klein, Juliane D.; Wang, Xi; Lanz, Christa; Smith, Lisa M.; Cao, Jun; Fitz, Joffrey; Warthmann, Norman; Henz, Stefan R.; Huson, Daniel H.; Weigel, Detlef

    2011-01-01

    We present whole-genome assemblies of four divergent Arabidopsis thaliana strains that complement the 125-Mb reference genome sequence released a decade ago. Using a newly developed reference-guided approach, we assembled large contigs from 9 to 42 Gb of Illumina short-read data from the Landsberg erecta (Ler-1), C24, Bur-0, and Kro-0 strains, which have been sequenced as part of the 1,001 Genomes Project for this species. Using alignments against the reference sequence, we first reduced the complexity of the de novo assembly and later integrated reads without similarity to the reference sequence. As an example, half of the noncentromeric C24 genome was covered by scaffolds that are longer than 260 kb, with a maximum of 2.2 Mb. Moreover, over 96% of the reference genome was covered by the reference-guided assembly, compared with only 87% with a complete de novo assembly. Comparisons with 2 Mb of dideoxy sequence reveal that the per-base error rate of the reference-guided assemblies was below 1 in 10,000. Our assemblies provide a detailed, genomewide picture of large-scale differences between A. thaliana individuals, most of which are difficult to access with alignment-consensus methods only. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone. Genome assemblies, raw reads, and further information are accessible through http://1001genomes.org/projects/assemblies.html. PMID:21646520

  17. Structural divergence among genomes of closely related baculoviruses and its implications for baculovirus evolution

    USDA-ARS?s Scientific Manuscript database

    Baculoviruses are members of a large, well-characterized family of dsDNA viruses that have been identified from insects of the orders Lepidoptera, Hymenoptera, and Diptera. Baculovirus genomes from different virus species generally exhibit a considerable degree of structural diversity. However, so...

  18. Segregation distortion causes large-scale differences between male and female genomes in hybrid ants.

    PubMed

    Kulmuni, Jonna; Seifert, Bernhard; Pamilo, Pekka

    2010-04-20

    Hybridization in isolated populations can lead either to hybrid breakdown and extinction or in some cases to speciation. The basis of hybrid breakdown lies in genetic incompatibilities between diverged genomes. In social Hymenoptera, the consequences of hybridization can differ from those in other animals because of haplodiploidy and sociality. Selection pressures differ between sexes because males are haploid and females are diploid. Furthermore, sociality and group living may allow survival of hybrid genotypes. We show that hybridization in Formica ants has resulted in a stable situation in which the males form two highly divergent gene pools whereas all the females are hybrids. This causes an exceptional situation with large-scale differences between male and female genomes. The genotype differences indicate strong transmission ratio distortion depending on offspring sex, whereby the mother transmits some alleles exclusively to her daughters and other alleles exclusively to her sons. The genetic differences between the sexes and the apparent lack of multilocus hybrid genotypes in males can be explained by recessive incompatibilities which cause the elimination of hybrid males because of their haploid genome. Alternatively, differentiation between sexes could be created by prezygotic segregation into male-forming and female-forming gametes in diploid females. Differentiation between sexes is stable and maintained throughout generations. The present study shows a unique outcome of hybridization and demonstrates that hybridization has the potential of generating evolutionary novelties in animals.

  19. Large differences in the genome organization of different plant Trypanosomatid parasites (Phytomonas spp.) reveal wide evolutionary divergences between taxa.

    PubMed

    Marín, C; Dollet, M; Pagès, M; Bastien, P

    2009-03-01

    All currently known plant trypanosomes have been grouped in the genus Phytomonas spp., although they can differ greatly in terms of both their biological properties and effects upon the host. Those parasitizing the phloem sap are specifically associated with lethal syndromes in Latin America, such as, phloem necrosis of coffee, 'Hartrot' of coconut and 'Marchitez sorpresiva' of oil palm, that inflict considerable economic losses in endemic countries. The genomic organization of one group of Phytomonas (D) considered as representative of the genus has been published previously. The present work presents the genomic structure of two representative isolates from the pathogenic phloem-restricted group (H) of Phytomonas, analyzed by pulsed field gel electrophoresis followed by hybridization with chromosome-specific DNA markers. It came as a surprise to observe an extremely different genomic organization in this group as compared with that of group D. Most notably, the chromosome number is 7 in this group (with a genome size of 10 Mb) versus 21 in the group D (totalling 25 Mb). These data unravel an unsuspected genomic diversity within plant trypanosomatids, that may justify a further debate about their division into different genera.

  20. 'Mind genomics': the experimental, inductive science of the ordinary, and its application to aspects of food and feeding.

    PubMed

    Moskowitz, Howard R

    2012-11-05

    The paper introduces the empirical science of 'mind genomics', whose objective is to understand the dimensions of ordinary, everyday experience, identify mind-set segments of people who value different aspects of that everyday experience, and then assign a new person to a mind-set by a statistically appropriate procedure. By studying different experiences using experimental design of ideas, 'mind genomics' constructs an empirical, inductive science of perception and experience, layer by layer. The ultimate objective of 'mind genomics' is a large-scale science of experience created using induction, with the science based upon emergent commonalities across many different types of daily experience. The particular topic investigated in the paper is the experience of healthful snacks, what makes a person 'want' them, and the dollar value of different sensory aspects of the healthful snack. Copyright © 2012 Elsevier Inc. All rights reserved.

  1. Independent evolution of genomic characters during major metazoan transitions.

    PubMed

    Simakov, Oleg; Kawashima, Takeshi

    2017-07-15

    Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  2. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus

    PubMed Central

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria. PMID:26900859

  3. The number of genes encoding repeat domain-containing proteins positively correlates with genome size in amoebal giant viruses

    PubMed Central

    Shukla, Avi; Chatterjee, Anirvan

    2018-01-01

    Abstract Curiously, in viruses, the virion volume appears to be predominantly driven by genome length rather than the number of proteins it encodes or geometric constraints. With their large genome and giant particle size, amoebal viruses (AVs) are ideally suited to study the relationship between genome and virion size and explore the role of genome plasticity in their evolutionary success. Different genomic regions of AVs exhibit distinct genealogies. Although the vertically transferred core genes and their functions are universally conserved across the nucleocytoplasmic large DNA virus (NCLDV) families and are essential for their replication, the horizontally acquired genes are variable across families and are lineage-specific. When compared with other giant virus families, we observed a near–linear increase in the number of genes encoding repeat domain-containing proteins (RDCPs) with the increase in the genome size of AVs. From what is known about the functions of RDCPs in bacteria and eukaryotes and their prevalence in the AV genomes, we envisage important roles for RDCPs in the life cycle of AVs, their genome expansion, and plasticity. This observation also supports the evolution of AVs from a smaller viral ancestor by the acquisition of diverse gene families from the environment including RDCPs that might have helped in host adaption. PMID:29308275

  4. Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae)

    PubMed Central

    Alverson, Andrew J.; Wei, XiaoXin; Rice, Danny W.; Stern, David B.; Barry, Kerrie; Palmer, Jeffrey D.

    2010-01-01

    The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors. PMID:20118192

  5. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing.

    PubMed

    Zhao, Shanrong; Prenger, Kurt; Smith, Lance; Messina, Thomas; Fan, Hongtao; Jaeger, Edward; Stephens, Susan

    2013-06-27

    Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html.

  6. Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing

    PubMed Central

    2013-01-01

    Background Technical improvements have decreased sequencing costs and, as a result, the size and number of genomic datasets have increased rapidly. Because of the lower cost, large amounts of sequence data are now being produced by small to midsize research groups. Crossbow is a software tool that can detect single nucleotide polymorphisms (SNPs) in whole-genome sequencing (WGS) data from a single subject; however, Crossbow has a number of limitations when applied to multiple subjects from large-scale WGS projects. The data storage and CPU resources that are required for large-scale whole genome sequencing data analyses are too large for many core facilities and individual laboratories to provide. To help meet these challenges, we have developed Rainbow, a cloud-based software package that can assist in the automation of large-scale WGS data analyses. Results Here, we evaluated the performance of Rainbow by analyzing 44 different whole-genome-sequenced subjects. Rainbow has the capacity to process genomic data from more than 500 subjects in two weeks using cloud computing provided by the Amazon Web Service. The time includes the import and export of the data using Amazon Import/Export service. The average cost of processing a single sample in the cloud was less than 120 US dollars. Compared with Crossbow, the main improvements incorporated into Rainbow include the ability: (1) to handle BAM as well as FASTQ input files; (2) to split large sequence files for better load balance downstream; (3) to log the running metrics in data processing and monitoring multiple Amazon Elastic Compute Cloud (EC2) instances; and (4) to merge SOAPsnp outputs for multiple individuals into a single file to facilitate downstream genome-wide association studies. Conclusions Rainbow is a scalable, cost-effective, and open-source tool for large-scale WGS data analysis. For human WGS data sequenced by either the Illumina HiSeq 2000 or HiSeq 2500 platforms, Rainbow can be used straight out of the box. Rainbow is available for third-party implementation and use, and can be downloaded from http://s3.amazonaws.com/jnj_rainbow/index.html. PMID:23802613

  7. Complete Genomic Sequence and Comparative Analysis of the Genome Segments of Sweet Potato Chlorotic Stunt Virus in China

    PubMed Central

    Qin, Yanhong; Wang, Li; Zhang, Zhenchen; Qiao, Qi; Zhang, Desheng; Tian, Yuting; Wang, Shuang; Wang, Yongjiang; Yan, Zhaoling

    2014-01-01

    Background Sweet potato chlorotic stunt virus (family Closteroviridae, genus Crinivirus) features a large bipartite, single-stranded, positive-sense RNA genome. To date, only three complete genomic sequences of SPCSV can be accessed through GenBank. SPCSV was first detected from China in 2011, only partial genomic sequences have been determined in the country. No report on the complete genomic sequence and genome structure of Chinese SPCSV isolates or the genetic relation between isolates from China and other countries is available. Methodology/Principal Findings The complete genomic sequences of five isolates from different areas in China were characterized. This study is the first to report the complete genome sequences of SPCSV from whitefly vectors. Genome structure analysis showed that isolates of WA and EA strains from China have the same coding protein as isolates Can181-9 and m2-47, respectively. Twenty cp genes and four RNA1 partial segments were sequenced and analyzed, and the nucleotide identities of complete genomic, cp, and RNA1 partial sequences were determined. Results indicated high conservation among strains and significant differences between WA and EA strains. Genetic analysis demonstrated that, except for isolates from Guangdong Province, SPCSVs from other areas belong to the WA strain. Genome organization analysis showed that the isolates in this study lack the p22 gene. Conclusions/Significance We presented the complete genome sequences of SPCSV in China. Comparison of nucleotide identities and genome structures between these isolates and previously reported isolates showed slight differences. The nucleotide identities of different SPCSV isolates showed high conservation among strains and significant differences between strains. All nine isolates in this study lacked p22 gene. WA strains were more extensively distributed than EA strains in China. These data provide important insights into the molecular variation and genomic structure of SPCSV in China as well as genetic relationships among isolates from China and other countries. PMID:25170926

  8. Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies.

    PubMed

    Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J

    2015-09-22

    Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.

  9. Comparative whole genome DNA methylation profiling of cattle sperm and somatic tissues reveals striking hypomethylated patterns in sperm

    USDA-ARS?s Scientific Manuscript database

    Using whole-genome bisulfite sequencing (WGBS), we profiled the DNA methylome of cattle sperms through comparison with three bovine somatic tissues (mammary grand, brain and blood). Large differences between them were observed in the methylation patterns of global CpGs, pericentromeric satellites, p...

  10. New DArT markers for oat provide enhanced map coverage and global germplasm characterization

    USDA-ARS?s Scientific Manuscript database

    Genomic discovery in oat and its application to oat improvement have been hindered by a lack of common markers on different genetic maps, and by the difficulty of conducting whole-genome analysis using high throughput markers. In this study we developed, characterized, and applied a large set oat g...

  11. Evidence of evolutionary history and selective sweeps in the genome of Meishan pig reveals its genetic and phenotypic characterization

    USDA-ARS?s Scientific Manuscript database

    Meishan is a famous Chinese indigenous pig breed known for its extremely high fecundity. To explore if Meishan has unique evolutionary process and genome characteristics differing from other pig breeds, we systematically analyzed its genetic divergence, and demographic history by large-scale reseque...

  12. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium

    PubMed Central

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur, amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms. PMID:28706512

  13. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium.

    PubMed

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur , amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.

  14. Can multi-subpopulation reference sets improve the genomic predictive ability for pigs?

    PubMed

    Fangmann, A; Bergfelder-Drüing, S; Tholen, E; Simianer, H; Erbe, M

    2015-12-01

    In most countries and for most livestock species, genomic evaluations are obtained from within-breed analyses. To achieve reliable breeding values, however, a sufficient reference sample size is essential. To increase this size, the use of multibreed reference populations for small populations is considered a suitable option in other species. Over decades, the separate breeding work of different pig breeding organizations in Germany has led to stratified subpopulations in the breed German Large White. Due to this fact and the limited number of Large White animals available in each organization, there was a pressing need for ascertaining if multi-subpopulation genomic prediction is superior compared with within-subpopulation prediction in pigs. Direct genomic breeding values were estimated with genomic BLUP for the trait "number of piglets born alive" using genotype data (Illumina Porcine 60K SNP BeadChip) from 2,053 German Large White animals from five different commercial pig breeding companies. To assess the prediction accuracy of within- and multi-subpopulation reference sets, a random 5-fold cross-validation with 20 replications was performed. The five subpopulations considered were only slightly differentiated from each other. However, the prediction accuracy of the multi-subpopulations approach was not better than that of the within-subpopulation evaluation, for which the predictive ability was already high. Reference sets composed of closely related multi-subpopulation sets performed better than sets of distantly related subpopulations but not better than the within-subpopulation approach. Despite the low differentiation of the five subpopulations, the genetic connectedness between these different subpopulations seems to be too small to improve the prediction accuracy by applying multi-subpopulation reference sets. Consequently, resources should be used for enlarging the reference population within subpopulation, for example, by adding genotyped females.

  15. Evolution of Genome Size and Complexity in Pinus

    PubMed Central

    Morse, Alison M.; Peterson, Daniel G.; Islam-Faridi, M. Nurul; Smith, Katherine E.; Magbanua, Zenaida; Garcia, Saul A.; Kubisiak, Thomas L.; Amerson, Henry V.; Carlson, John E.; Nelson, C. Dana; Davis, John M.

    2009-01-01

    Background Genome evolution in the gymnosperm lineage of seed plants has given rise to many of the most complex and largest plant genomes, however the elements involved are poorly understood. Methodology/Principal Findings Gymny is a previously undescribed retrotransposon family in Pinus that is related to Athila elements in Arabidopsis. Gymny elements are dispersed throughout the modern Pinus genome and occupy a physical space at least the size of the Arabidopsis thaliana genome. In contrast to previously described retroelements in Pinus, the Gymny family was amplified or introduced after the divergence of pine and spruce (Picea). If retrotransposon expansions are responsible for genome size differences within the Pinaceae, as they are in angiosperms, then they have yet to be identified. In contrast, molecular divergence of Gymny retrotransposons together with other families of retrotransposons can account for the large genome complexity of pines along with protein-coding genic DNA, as revealed by massively parallel DNA sequence analysis of Cot fractionated genomic DNA. Conclusions/Significance Most of the enormous genome complexity of pines can be explained by divergence of retrotransposons, however the elements responsible for genome size variation are yet to be identified. Genomic resources for Pinus including those reported here should assist in further defining whether and how the roles of retrotransposons differ in the evolution of angiosperm and gymnosperm genomes. PMID:19194510

  16. GeNets: a unified web platform for network-based genomic analyses.

    PubMed

    Li, Taibo; Kim, April; Rosenbluh, Joseph; Horn, Heiko; Greenfeld, Liraz; An, David; Zimmer, Andrew; Liberzon, Arthur; Bistline, Jon; Natoli, Ted; Li, Yang; Tsherniak, Aviad; Narayan, Rajiv; Subramanian, Aravind; Liefeld, Ted; Wong, Bang; Thompson, Dawn; Calvo, Sarah; Carr, Steve; Boehm, Jesse; Jaffe, Jake; Mesirov, Jill; Hacohen, Nir; Regev, Aviv; Lage, Kasper

    2018-06-18

    Functional genomics networks are widely used to identify unexpected pathway relationships in large genomic datasets. However, it is challenging to compare the signal-to-noise ratios of different networks and to identify the optimal network with which to interpret a particular genetic dataset. We present GeNets, a platform in which users can train a machine-learning model (Quack) to carry out these comparisons and execute, store, and share analyses of genetic and RNA-sequencing datasets.

  17. FISH Oracle 2: a web server for integrative visualization of genomic data in cancer research

    PubMed Central

    2014-01-01

    Background A comprehensive view on all relevant genomic data is instrumental for understanding the complex patterns of molecular alterations typically found in cancer cells. One of the most effective ways to rapidly obtain an overview of genomic alterations in large amounts of genomic data is the integrative visualization of genomic events. Results We developed FISH Oracle 2, a web server for the interactive visualization of different kinds of downstream processed genomics data typically available in cancer research. A powerful search interface and a fast visualization engine provide a highly interactive visualization for such data. High quality image export enables the life scientist to easily communicate their results. A comprehensive data administration allows to keep track of the available data sets. We applied FISH Oracle 2 to published data and found evidence that, in colorectal cancer cells, the gene TTC28 may be inactivated in two different ways, a fact that has not been published before. Conclusions The interactive nature of FISH Oracle 2 and the possibility to store, select and visualize large amounts of downstream processed data support life scientists in generating hypotheses. The export of high quality images supports explanatory data visualization, simplifying the communication of new biological findings. A FISH Oracle 2 demo server and the software is available at http://www.zbh.uni-hamburg.de/fishoracle. PMID:24684958

  18. Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo)

    PubMed Central

    2012-01-01

    Background The turkey (Meleagris gallopavo) is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs) the most abundant source of genetic variation within the genome. Results Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. Conclusion The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The whole genome SNP discovery study in turkey resulted in the detection of 5.49 million putative SNPs compared to the reference genome. All commercial lines appear to share a common origin. Presence of different alleles/haplotypes in the SM population highlights that specific haplotypes have been selected in the modern domesticated turkey. PMID:22891612

  19. Ecological genomics of adaptation and speciation in fungi.

    PubMed

    Leducq, Jean-Baptiste

    2014-01-01

    Fungi play a central role in both ecosystems and human societies. This is in part because they have adopted a large diversity of life history traits to conquer a wide variety of ecological niches. Here, I review recent fungal genomics studies that explored the molecular origins and the adaptive significance of this diversity. First, macro-ecological genomics studies revealed that fungal genomes were highly remodelled during their evolution. This remodelling, in terms of genome organization and size, occurred through the proliferation of non-coding elements, gene compaction, gene loss and the expansion of large families of adaptive genes. These features vary greatly among fungal clades, and are correlated with different life history traits such as multicellularity, pathogenicity, symbiosis, and sexual reproduction. Second, micro-ecological genomics studies, based on population genomics, experimental evolution and quantitative trait loci approaches, have allowed a deeper exploration of early evolutionary steps of the above adaptations. Fungi, and especially budding yeasts, were used intensively to characterize early mutations and chromosomal rearrangements that underlie the acquisition of new adaptive traits allowing them to conquer new ecological niches and potentially leading to speciation. By uncovering the ecological factors and genomic modifications that underline adaptation, these studies showed that Fungi are powerful models for ecological genomics (eco-genomics), and that this approach, so far mainly developed in a few model species, should be expanded to the whole kingdom.

  20. EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

    PubMed Central

    Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

    2006-01-01

    Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150

  1. Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering.

    PubMed

    Chang, Jinyuan; Zhou, Wen; Zhou, Wen-Xin; Wang, Lan

    2017-03-01

    Comparing large covariance matrices has important applications in modern genomics, where scientists are often interested in understanding whether relationships (e.g., dependencies or co-regulations) among a large number of genes vary between different biological states. We propose a computationally fast procedure for testing the equality of two large covariance matrices when the dimensions of the covariance matrices are much larger than the sample sizes. A distinguishing feature of the new procedure is that it imposes no structural assumptions on the unknown covariance matrices. Hence, the test is robust with respect to various complex dependence structures that frequently arise in genomics. We prove that the proposed procedure is asymptotically valid under weak moment conditions. As an interesting application, we derive a new gene clustering algorithm which shares the same nice property of avoiding restrictive structural assumptions for high-dimensional genomics data. Using an asthma gene expression dataset, we illustrate how the new test helps compare the covariance matrices of the genes across different gene sets/pathways between the disease group and the control group, and how the gene clustering algorithm provides new insights on the way gene clustering patterns differ between the two groups. The proposed methods have been implemented in an R-package HDtest and are available on CRAN. © 2016, The International Biometric Society.

  2. The gradient boosting algorithm and random boosting for genome-assisted evaluation in large data sets.

    PubMed

    González-Recio, O; Jiménez-Montero, J A; Alenda, R

    2013-01-01

    In the next few years, with the advent of high-density single nucleotide polymorphism (SNP) arrays and genome sequencing, genomic evaluation methods will need to deal with a large number of genetic variants and an increasing sample size. The boosting algorithm is a machine-learning technique that may alleviate the drawbacks of dealing with such large data sets. This algorithm combines different predictors in a sequential manner with some shrinkage on them; each predictor is applied consecutively to the residuals from the committee formed by the previous ones to form a final prediction based on a subset of covariates. Here, a detailed description is provided and examples using a toy data set are included. A modification of the algorithm called "random boosting" was proposed to increase predictive ability and decrease computation time of genome-assisted evaluation in large data sets. Random boosting uses a random selection of markers to add a subsequent weak learner to the predictive model. These modifications were applied to a real data set composed of 1,797 bulls genotyped for 39,714 SNP. Deregressed proofs of 4 yield traits and 1 type trait from January 2009 routine evaluations were used as dependent variables. A 2-fold cross-validation scenario was implemented. Sires born before 2005 were used as a training sample (1,576 and 1,562 for production and type traits, respectively), whereas younger sires were used as a testing sample to evaluate predictive ability of the algorithm on yet-to-be-observed phenotypes. Comparison with the original algorithm was provided. The predictive ability of the algorithm was measured as Pearson correlations between observed and predicted responses. Further, estimated bias was computed as the average difference between observed and predicted phenotypes. The results showed that the modification of the original boosting algorithm could be run in 1% of the time used with the original algorithm and with negligible differences in accuracy and bias. This modification may be used to speed the calculus of genome-assisted evaluation in large data sets such us those obtained from consortiums. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  3. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    PubMed

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  4. Distinguishing noise from signal in patterns of genomic divergence in a highly polymorphic avian radiation.

    PubMed

    Campagna, Leonardo; Gronau, Ilan; Silveira, Luís Fábio; Siepel, Adam; Lovette, Irby J

    2015-08-01

    Recently diverged taxa provide the opportunity to search for the genetic basis of the phenotypes that distinguish them. Genomic scans aim to identify loci that are diverged with respect to an otherwise weakly differentiated genetic background. These loci are candidates for being past targets of selection because they behave differently from the rest of the genome that has either not yet differentiated or that may cross species barriers through introgressive hybridization. Here we use a reduced-representation genomic approach to explore divergence among six species of southern capuchino seedeaters, a group of recently radiated sympatric passerine birds in the genus Sporophila. For the first time in these taxa, we discovered a small proportion of markers that appeared differentiated among species. However, when assessing the significance of these signatures of divergence, we found that similar patterns can also be recovered from random grouping of individuals representing different species. A detailed demographic inference indicates that genetic differences among Sporophila species could be the consequence of neutral processes, which include a very large ancestral effective population size that accentuates the effects of incomplete lineage sorting. As these neutral phenomena can generate genomic scan patterns that mimic those of markers involved in speciation and phenotypic differentiation, they highlight the need for caution when ascertaining and interpreting differentiated markers between species, especially when large numbers of markers are surveyed. Our study provides new insights into the demography of the southern capuchino radiation and proposes controls to distinguish signal from noise in similar genomic scans. © 2015 John Wiley & Sons Ltd.

  5. Reference-free comparative genomics of 174 chloroplasts.

    PubMed

    Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R; Yu, Jun; Cannon, Charles H

    2012-01-01

    Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.

  6. Reference-Free Comparative Genomics of 174 Chloroplasts

    PubMed Central

    Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.

    2012-01-01

    Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions. PMID:23185288

  7. Comparative genomics reveals insights into avian genome evolution and adaptation

    PubMed Central

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

    2015-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

  8. The Divided Bacterial Genome: Structure, Function, and Evolution.

    PubMed

    diCenzo, George C; Finan, Turlough M

    2017-09-01

    Approximately 10% of bacterial genomes are split between two or more large DNA fragments, a genome architecture referred to as a multipartite genome. This multipartite organization is found in many important organisms, including plant symbionts, such as the nitrogen-fixing rhizobia, and plant, animal, and human pathogens, including the genera Brucella , Vibrio , and Burkholderia . The availability of many complete bacterial genome sequences means that we can now examine on a broad scale the characteristics of the different types of DNA molecules in a genome. Recent work has begun to shed light on the unique properties of each class of replicon, the unique functional role of chromosomal and nonchromosomal DNA molecules, and how the exploitation of novel niches may have driven the evolution of the multipartite genome. The aims of this review are to (i) outline the literature regarding bacterial genomes that are divided into multiple fragments, (ii) provide a meta-analysis of completed bacterial genomes from 1,708 species as a way of reviewing the abundant information present in these genome sequences, and (iii) provide an encompassing model to explain the evolution and function of the multipartite genome structure. This review covers, among other topics, salient genome terminology; mechanisms of multipartite genome formation; the phylogenetic distribution of multipartite genomes; how each part of a genome differs with respect to genomic signatures, genetic variability, and gene functional annotation; how each DNA molecule may interact; as well as the costs and benefits of this genome structure. Copyright © 2017 American Society for Microbiology.

  9. Genomic and Genetic Diversity within the Pseudomonas fluorescens Complex

    PubMed Central

    Garrido-Sanz, Daniel; Meier-Kolthoff, Jan P.; Göker, Markus; Martín, Marta; Rivilla, Rafael; Redondo-Nieto, Miguel

    2016-01-01

    The Pseudomonas fluorescens complex includes Pseudomonas strains that have been taxonomically assigned to more than fifty different species, many of which have been described as plant growth-promoting rhizobacteria (PGPR) with potential applications in biocontrol and biofertilization. So far the phylogeny of this complex has been analyzed according to phenotypic traits, 16S rDNA, MLSA and inferred by whole-genome analysis. However, since most of the type strains have not been fully sequenced and new species are frequently described, correlation between taxonomy and phylogenomic analysis is missing. In recent years, the genomes of a large number of strains have been sequenced, showing important genomic heterogeneity and providing information suitable for genomic studies that are important to understand the genomic and genetic diversity shown by strains of this complex. Based on MLSA and several whole-genome sequence-based analyses of 93 sequenced strains, we have divided the P. fluorescens complex into eight phylogenomic groups that agree with previous works based on type strains. Digital DDH (dDDH) identified 69 species and 75 subspecies within the 93 genomes. The eight groups corresponded to clustering with a threshold of 31.8% dDDH, in full agreement with our MLSA. The Average Nucleotide Identity (ANI) approach showed inconsistencies regarding the assignment to species and to the eight groups. The small core genome of 1,334 CDSs and the large pan-genome of 30,848 CDSs, show the large diversity and genetic heterogeneity of the P. fluorescens complex. However, a low number of strains were enough to explain most of the CDSs diversity at core and strain-specific genomic fractions. Finally, the identification and analysis of group-specific genome and the screening for distinctive characters revealed a phylogenomic distribution of traits among the groups that provided insights into biocontrol and bioremediation applications as well as their role as PGPR. PMID:26915094

  10. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk

    PubMed Central

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B.; Huson, Daniel H.; Frick, Julia-Stefanie

    2016-01-01

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. PMID:27071651

  11. Influence of sequence and size of DNA on packaging efficiency of parvovirus MVM-based vectors.

    PubMed

    Brandenburger, A; Coessens, E; El Bakkouri, K; Velu, T

    1999-05-01

    We have derived a vector from the autonomous parvovirus MVM(p), which expresses human IL-2 specifically in transformed cells (Russell et al., J. Virol 1992;66:2821-2828). Testing the therapeutic potential of these vectors in vivo requires high-titer stocks. Stocks with a titer of 10(9) can be obtained after concentration and purification (Avalosse et al., J. Virol. Methods 1996;62:179-183), but this method requires large culture volumes and cannot easily be scaled up. We wanted to increase the production of recombinant virus at the initial transfection step. Poor vector titers could be due to inadequate genome amplification or to inefficient packaging. Here we show that intracellular amplification of MVM vector genomes is not the limiting factor for vector production. Several vector genomes of different size and/or structure were amplified to an equal extent. Their amplification was also equivalent to that of a cotransfected wild-type genome. We did not observe any interference between vector and wild-type genomes at the level of DNA amplification. Despite equivalent genome amplification, vector titers varied greatly between the different genomes, presumably owing to differences in packaging efficiency. Genomes with a size close to 100% that of wild type were packaged most efficiently with loss of efficiency at lower and higher sizes. However, certain genomes of identical size showed different packaging efficiencies, illustrating the importance of the DNA sequence, and probably its structure.

  12. Internet Versus Virtual Reality Settings for Genomics Information Provision.

    PubMed

    Persky, Susan; Kistler, William D; Klein, William M P; Ferrer, Rebecca A

    2018-06-22

    Current models of genomic information provision will be unable to handle large-scale clinical integration of genomic information, as may occur in primary care settings. Therefore, adoption of digital tools for genetic and genomic information provision is anticipated, primarily using Internet-based, distributed approaches. The emerging consumer communication platform of virtual reality (VR) is another potential intermediate approach between face-to-face and distributed Internet platforms to engage in genomics education and information provision. This exploratory study assessed whether provision of genomics information about body weight in a simulated, VR-based consultation (relative to a distributed, Internet platform) would be associated with differences in health behavior-related attitudes and beliefs, and interpersonal reactions to the avatar-physician. We also assessed whether outcomes differed depending upon whether genomic versus lifestyle-oriented information was conveyed. There were significant differences between communication platforms for all health behavior-oriented outcomes. Following communication in the VR setting, participants reported greater self-efficacy, dietary behavioral intentions, and exercise behavioral intentions than in the Internet-based setting. There were no differences in trust of the physician by setting, and no interaction between setting effects and the content of the information. This study was a first attempt to examine the potential capabilities of a VR-based communication setting for conveying genomic content in the context of weight management. There may be benefits to use of VR settings for communication about genomics, as well as more traditional health information, when it comes to influencing the attitudes and beliefs that underlie healthy lifestyle behaviors.

  13. Impacts of Genome-Wide Analyses on Our Understanding of Human Herpesvirus Diversity and Evolution.

    PubMed

    Renner, Daniel W; Szpara, Moriah L

    2018-01-01

    Until fairly recently, genome-wide evolutionary dynamics and within-host diversity were more commonly examined in the context of small viruses than in the context of large double-stranded DNA viruses such as herpesviruses. The high mutation rates and more compact genomes of RNA viruses have inspired the investigation of population dynamics for these species, and recent data now suggest that herpesviruses might also be considered candidates for population modeling. High-throughput sequencing (HTS) and bioinformatics have expanded our understanding of herpesviruses through genome-wide comparisons of sequence diversity, recombination, allele frequency, and selective pressures. Here we discuss recent data on the mechanisms that generate herpesvirus genomic diversity and underlie the evolution of these virus families. We focus on human herpesviruses, with key insights drawn from veterinary herpesviruses and other large DNA virus families. We consider the impacts of cell culture on herpesvirus genomes and how to accurately describe the viral populations under study. The need for a strong foundation of high-quality genomes is also discussed, since it underlies all secondary genomic analyses such as RNA sequencing (RNA-Seq), chromatin immunoprecipitation, and ribosome profiling. Areas where we foresee future progress, such as the linking of viral genetic differences to phenotypic or clinical outcomes, are highlighted as well. Copyright © 2017 Renner and Szpara.

  14. The Cancer Genome Atlas Pan-Cancer Analysis Project

    PubMed Central

    Weinstein, John N.; Collisson, Eric A.; Mills, Gordon B.; Shaw, Kenna M.; Ozenberger, Brad A.; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M.

    2014-01-01

    Cancer can take hundreds of different forms depending on the location, cell of origin and spectrum of genomic alterations that promote oncogenesis and affect therapeutic response. Although many genomic events with direct phenotypic impact have been identified, much of the complex molecular landscape remains incompletely charted for most cancer lineages. For that reason, The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumours to discover molecular aberrations at the DNA, RNA, protein, and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences, and emergent themes across tumour lineages. The Pan-Cancer initiative compares the first twelve tumour types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumour types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile. PMID:24071849

  15. Genome Variation Map: a data repository of genome variations in BIG Data Center

    PubMed Central

    Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang

    2018-01-01

    Abstract The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. PMID:29069473

  16. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)-A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes.

    PubMed

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.

  17. Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq)—A Method for High-Throughput Analysis of Differentially Methylated CCGG Sites in Plants with Large Genomes

    PubMed Central

    Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw

    2017-01-01

    Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096

  18. Evolutionary Dynamics of Microsatellite Distribution in Plants: Insight from the Comparison of Sequenced Brassica, Arabidopsis and Other Angiosperm Species

    PubMed Central

    Shi, Jiaqin; Huang, Shunmou; Fu, Donghui; Yu, Jinyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2013-01-01

    Despite their ubiquity and functional importance, microsatellites have been largely ignored in comparative genomics, mostly due to the lack of genomic information. In the current study, microsatellite distribution was characterized and compared in the whole genomes and both the coding and non-coding DNA sequences of the sequenced Brassica, Arabidopsis and other angiosperm species to investigate their evolutionary dynamics in plants. The variation in the microsatellite frequencies of these angiosperm species was much smaller than those for their microsatellite numbers and genome sizes, suggesting that microsatellite frequency may be relatively stable in plants. The microsatellite frequencies of these angiosperm species were significantly negatively correlated with both their genome sizes and transposable elements contents. The pattern of microsatellite distribution may differ according to the different genomic regions (such as coding and non-coding sequences). The observed differences in many important microsatellite characteristics (especially the distribution with respect to motif length, type and repeat number) of these angiosperm species were generally accordant with their phylogenetic distance, which suggested that the evolutionary dynamics of microsatellite distribution may be generally consistent with plant divergence/evolution. Importantly, by comparing these microsatellite characteristics (especially the distribution with respect to motif type) the angiosperm species (aside from a few species) all clustered into two obviously different groups that were largely represented by monocots and dicots, suggesting a complex and generally dichotomous evolutionary pattern of microsatellite distribution in angiosperms. Polyploidy may lead to a slight increase in microsatellite frequency in the coding sequences and a significant decrease in microsatellite frequency in the whole genome/non-coding sequences, but have little effect on the microsatellite distribution with respect to motif length, type and repeat number. Interestingly, several microsatellite characteristics seemed to be constant in plant evolution, which can be well explained by the general biological rules. PMID:23555856

  19. Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes

    PubMed Central

    Lorenzi, Hernan; Khan, Asis; Behnke, Michael S.; Namasivayam, Sivaranjani; Swapna, Lakshmipuram S.; Hadjithomas, Michalis; Karamycheva, Svetlana; Pinney, Deborah; Brunk, Brian P.; Ajioka, James W.; Ajzenberg, Daniel; Boothroyd, John C.; Boyle, Jon P.; Dardé, Marie L.; Diaz-Miranda, Maria A.; Dubey, Jitender P.; Fritz, Heather M.; Gennari, Solange M.; Gregory, Brian D.; Kim, Kami; Saeij, Jeroen P. J.; Su, Chunlei; White, Michael W.; Zhu, Xing-Quan; Howe, Daniel K.; Rosenthal, Benjamin M.; Grigg, Michael E.; Parkinson, John; Liu, Liang; Kissinger, Jessica C.; Roos, David S.; David Sibley, L

    2016-01-01

    Toxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges. To elucidate the genetic basis for these differences, we compared the genomes of 62 globally distributed T. gondii isolates to several closely related coccidian parasites. Our findings reveal that tandem amplification and diversification of secretory pathogenesis determinants is the primary feature that distinguishes the closely related genomes of these biologically diverse parasites. We further show that the unusual population structure of T. gondii is characterized by clade-specific inheritance of large conserved haploblocks that are significantly enriched in tandemly clustered secretory pathogenesis determinants. The shared inheritance of these conserved haploblocks, which show a different ancestry than the genome as a whole, may thus influence transmission, host range and pathogenicity. PMID:26738725

  20. Large genomic differences between Moraxella bovoculi isolates acquired from the eyes of cattle with conjunctivitis versus the deep nasopharynx of asymptomatic cattle

    USDA-ARS?s Scientific Manuscript database

    Moraxella bovoculi is a recently described bacterium that is associated with infectious bovine keratoconjunctivitis (IBK) or "pinkeye" in cattle. In this study, closed circularized genomes were generated for seven M. bovoculi isolates: three that originated from the eyes of clinical IBK bovine case...

  1. Single-Molecule FISH Reveals Non-selective Packaging of Rift Valley Fever Virus Genome Segments

    PubMed Central

    Wichgers Schreur, Paul J.; Kortekaas, Jeroen

    2016-01-01

    The bunyavirus genome comprises a small (S), medium (M), and large (L) RNA segment of negative polarity. Although genome segmentation confers evolutionary advantages by enabling genome reassortment events with related viruses, genome segmentation also complicates genome replication and packaging. Accumulating evidence suggests that genomes of viruses with eight or more genome segments are incorporated into virions by highly selective processes. Remarkably, little is known about the genome packaging process of the tri-segmented bunyaviruses. Here, we evaluated, by single-molecule RNA fluorescence in situ hybridization (FISH), the intracellular spatio-temporal distribution and replication kinetics of the Rift Valley fever virus (RVFV) genome and determined the segment composition of mature virions. The results reveal that the RVFV genome segments start to replicate near the site of infection before spreading and replicating throughout the cytoplasm followed by translocation to the virion assembly site at the Golgi network. Despite the average intracellular S, M and L genome segments approached a 1:1:1 ratio, major differences in genome segment ratios were observed among cells. We also observed a significant amount of cells lacking evidence of M-segment replication. Analysis of two-segmented replicons and four-segmented viruses subsequently confirmed the previous notion that Golgi recruitment is mediated by the Gn glycoprotein. The absence of colocalization of the different segments in the cytoplasm and the successful rescue of a tri-segmented variant with a codon shuffled M-segment suggested that inter-segment interactions are unlikely to drive the copackaging of the different segments into a single virion. The latter was confirmed by direct visualization of RNPs inside mature virions which showed that the majority of virions lack one or more genome segments. Altogether, this study suggests that RVFV genome packaging is a non-selective process. PMID:27548280

  2. Genomic Repeat Abundances Contain Phylogenetic Signal

    PubMed Central

    Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

    2015-01-01

    A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464

  3. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots.

    PubMed

    Wang, Yupeng; Ficklin, Stephen P; Wang, Xiyin; Feltus, F Alex; Paterson, Andrew H

    2016-01-01

    Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots.

  4. Large-Scale Gene Relocations following an Ancient Genome Triplication Associated with the Diversification of Core Eudicots

    PubMed Central

    Wang, Yupeng; Ficklin, Stephen P.; Wang, Xiyin; Feltus, F. Alex; Paterson, Andrew H.

    2016-01-01

    Different modes of gene duplication including whole-genome duplication (WGD), and tandem, proximal and dispersed duplications are widespread in angiosperm genomes. Small-scale, stochastic gene relocations and transposed gene duplications are widely accepted to be the primary mechanisms for the creation of dispersed duplicates. However, here we show that most surviving ancient dispersed duplicates in core eudicots originated from large-scale gene relocations within a narrow window of time following a genome triplication (γ) event that occurred in the stem lineage of core eudicots. We name these surviving ancient dispersed duplicates as relocated γ duplicates. In Arabidopsis thaliana, relocated γ, WGD and single-gene duplicates have distinct features with regard to gene functions, essentiality, and protein interactions. Relative to γ duplicates, relocated γ duplicates have higher non-synonymous substitution rates, but comparable levels of expression and regulation divergence. Thus, relocated γ duplicates should be distinguished from WGD and single-gene duplicates for evolutionary investigations. Our results suggest large-scale gene relocations following the γ event were associated with the diversification of core eudicots. PMID:27195960

  5. Opportunities and challenges of big data for the social sciences: The case of genomic data.

    PubMed

    Liu, Hexuan; Guo, Guang

    2016-09-01

    In this paper, we draw attention to one unique and valuable source of big data, genomic data, by demonstrating the opportunities they provide to social scientists. We discuss different types of large-scale genomic data and recent advances in statistical methods and computational infrastructure used to address challenges in managing and analyzing such data. We highlight how these data and methods can be used to benefit social science research. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Reptiles and mammals have differentially retained long conserved noncoding sequences from the amniote ancestor.

    PubMed

    Janes, D E; Chapus, C; Gondo, Y; Clayton, D F; Sinha, S; Blatti, C A; Organ, C L; Fujita, M K; Balakrishnan, C N; Edwards, S V

    2011-01-01

    Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation.

  7. Reptiles and Mammals Have Differentially Retained Long Conserved Noncoding Sequences from the Amniote Ancestor

    PubMed Central

    Janes, D.E.; Chapus, C.; Gondo, Y.; Clayton, D.F.; Sinha, S.; Blatti, C.A.; Organ, C.L.; Fujita, M.K.; Balakrishnan, C.N.; Edwards, S.V.

    2010-01-01

    Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation. PMID:21183607

  8. CGDV: a webtool for circular visualization of genomics and transcriptomics data.

    PubMed

    Jha, Vineet; Singh, Gulzar; Kumar, Shiva; Sonawane, Amol; Jere, Abhay; Anamika, Krishanpal

    2017-10-24

    Interpretation of large-scale data is very challenging and currently there is scarcity of web tools which support automated visualization of a variety of high throughput genomics and transcriptomics data and for a wide variety of model organisms along with user defined karyotypes. Circular plot provides holistic visualization of high throughput large scale data but it is very complex and challenging to generate as most of the available tools need informatics expertise to install and run them. We have developed CGDV (Circos for Genomics and Transcriptomics Data Visualization), a webtool based on Circos, for seamless and automated visualization of a variety of large scale genomics and transcriptomics data. CGDV takes output of analyzed genomics or transcriptomics data of different formats, such as vcf, bed, xls, tab limited matrix text file, CNVnator raw output and Gene fusion raw output, to plot circular view of the sample data. CGDV take cares of generating intermediate files required for circos. CGDV is freely available at https://cgdv-upload.persistent.co.in/cgdv/ . The circular plot for each data type is tailored to gain best biological insights into the data. The inter-relationship between data points, homologous sequences, genes involved in fusion events, differential expression pattern, sequencing depth, types and size of variations and enrichment of DNA binding proteins can be seen using CGDV. CGDV thus helps biologists and bioinformaticians to visualize a variety of genomics and transcriptomics data seamlessly.

  9. Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome.

    PubMed

    Nicholson, Matthew J; Theodorou, Michael K; Brookman, Jayne L

    2005-01-01

    The anaerobic gut fungi occupy a unique niche in the intestinal tract of large herbivorous animals and are thought to act as primary colonizers of plant material during digestion. They are the only known obligately anaerobic fungi but molecular analysis of this group has been hampered by difficulties in their culture and manipulation, and by their extremely high A+T nucleotide content. This study begins to answer some of the fundamental questions about the structure and organization of the anaerobic gut fungal genome. Directed plasmid libraries using genomic DNA digested with highly or moderately rich AT-specific restriction enzymes (VspI and EcoRI) were prepared from a polycentric Orpinomyces isolate. Clones were sequenced from these libraries and the breadth of genomic inserts, both genic and intergenic, was characterized. Genes encoding numerous functions not previously characterized for these fungi were identified, including cytoskeletal, secretory pathway and transporter genes. A peptidase gene with no introns and having sequence similarity to a gene encoding a bacterial peptidase was also identified, extending the range of metabolic enzymes resulting from apparent trans-kingdom transfer from bacteria to fungi, as previously characterized largely for genes encoding plant-degrading enzymes. This paper presents the first thorough analysis of the genic, intergenic and rDNA regions of a variety of genomic segments from an anaerobic gut fungus and provides observations on rules governing intron boundaries, the codon biases observed with different types of genes, and the sequence of only the second anaerobic gut fungal promoter reported. Large numbers of retrotransposon sequences of different types were found and the authors speculate on the possible consequences of any such transposon activity in the genome. The coding sequences identified included several orphan gene sequences, including one with regions strongly suggestive of structural proteins such as collagens and lampirin. This gene was present as a single copy in Orpinomyces, was expressed during vegetative growth and was also detected in genomes from another gut fungal genus, Neocallimastix.

  10. From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

    PubMed

    Kwok, Hin; Chiang, Alan Kwok Shing

    2016-02-24

    Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.

  11. Three chromosomal rearrangements promote genomic divergence between migratory and stationary ecotypes of Atlantic cod.

    PubMed

    Berg, Paul R; Star, Bastiaan; Pampoulie, Christophe; Sodeland, Marte; Barth, Julia M I; Knutsen, Halvor; Jakobsen, Kjetill S; Jentoft, Sissel

    2016-03-17

    Identification of genome-wide patterns of divergence provides insight on how genomes are influenced by selection and can reveal the potential for local adaptation in spatially structured populations. In Atlantic cod - historically a major marine resource - Northeast-Arctic- and Norwegian coastal cod are recognized by fundamental differences in migratory and non-migratory behavior, respectively. However, the genomic architecture underlying such behavioral ecotypes is unclear. Here, we have analyzed more than 8.000 polymorphic SNPs distributed throughout all 23 linkage groups and show that loci putatively under selection are localized within three distinct genomic regions, each of several megabases long, covering approximately 4% of the Atlantic cod genome. These regions likely represent genomic inversions. The frequency of these distinct regions differ markedly between the ecotypes, spawning in the vicinity of each other, which contrasts with the low level of divergence in the rest of the genome. The observed patterns strongly suggest that these chromosomal rearrangements are instrumental in local adaptation and separation of Atlantic cod populations, leaving footprints of large genomic regions under selection. Our findings demonstrate the power of using genomic information in further understanding the population dynamics and defining management units in one of the world's most economically important marine resources.

  12. Using large-scale genome variation cohorts to decipher the molecular mechanism of cancer.

    PubMed

    Habermann, Nina; Mardin, Balca R; Yakneen, Sergei; Korbel, Jan O

    2016-01-01

    Characterizing genomic structural variations (SVs) in the human genome remains challenging, and there is a growing interest to understand somatic SVs occurring in cancer, a disease of the genome. A havoc-causing SV process known as chromothripsis scars the genome when localized chromosome shattering and repair occur in a one-off catastrophe. Recent efforts led to the development of a set of conceptual criteria for the inference of chromothripsis events in cancer genomes and to the development of experimental model systems for studying this striking DNA alteration process in vitro. We discuss these approaches, and additionally touch upon current "Big Data" efforts that employ hybrid cloud computing to enable studies of numerous cancer genomes in an effort to search for commonalities and differences in molecular DNA alteration processes in cancer. Copyright © 2016. Published by Elsevier SAS.

  13. Burkholderia xenovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility

    PubMed Central

    Chain, Patrick S. G.; Denef, Vincent J.; Konstantinidis, Konstantinos T.; Vergez, Lisa M.; Agulló, Loreine; Reyes, Valeria Latorre; Hauser, Loren; Córdova, Macarena; Gómez, Luis; González, Myriam; Land, Miriam; Lao, Victoria; Larimer, Frank; LiPuma, John J.; Mahenthiralingam, Eshwar; Malfatti, Stephanie A.; Marx, Christopher J.; Parnell, J. Jacob; Ramette, Alban; Richardson, Paul; Seeger, Michael; Smith, Daryl; Spilker, Theodore; Sul, Woo Jun; Tsoi, Tamara V.; Ulrich, Luke E.; Zhulin, Igor B.; Tiedje, James M.

    2006-01-01

    Burkholderia xenovorans LB400 (LB400), a well studied, effective polychlorinated biphenyl-degrader, has one of the two largest known bacterial genomes and is the first nonpathogenic Burkholderia isolate sequenced. From an evolutionary perspective, we find significant differences in functional specialization between the three replicons of LB400, as well as a more relaxed selective pressure for genes located on the two smaller vs. the largest replicon. High genomic plasticity, diversity, and specialization within the Burkholderia genus are exemplified by the conservation of only 44% of the genes between LB400 and Burkholderia cepacia complex strain 383. Even among four B. xenovorans strains, genome size varies from 7.4 to 9.73 Mbp. The latter is largely explained by our findings that >20% of the LB400 sequence was recently acquired by means of lateral gene transfer. Although a range of genetic factors associated with in vivo survival and intercellular interactions are present, these genetic factors are likely related to niche breadth rather than determinants of pathogenicity. The presence of at least eleven “central aromatic” and twenty “peripheral aromatic” pathways in LB400, among the highest in any sequenced bacterial genome, supports this hypothesis. Finally, in addition to the experimentally observed redundancy in benzoate degradation and formaldehyde oxidation pathways, the fact that 17.6% of proteins have a better LB400 paralog than an ortholog in a different genome highlights the importance of gene duplication and repeated acquirement, which, coupled with their divergence, raises questions regarding the role of paralogs and potential functional redundancies in large-genome microbes. PMID:17030797

  14. Genomic diversity of necrotic enteritis-associated strains of Clostridium perfringens: a review.

    PubMed

    Lacey, Jake A; Johanesen, Priscilla A; Lyras, Dena; Moore, Robert J

    2016-06-01

    The investigation of genomic variation between Clostridium perfringens isolates from poultry has been an important tool to enhance our understanding of the genetic basis of strain pathogenicity and the epidemiology of virulent and avirulent strains within the context of necrotic enteritis (NE). The earliest studies used whole genome profiling techniques such as pulsed-field gel electrophoresis to differentiate isolates and determine their relative levels of relatedness. DNA sequencing has been used to investigate genetic variation in (a) individual genes, such as those encoding the alpha and NetB toxins; (b) panels of housekeeping genes for multi-locus sequence typing and (c) most recently whole genome sequencing to build a more complete picture of genomic differences between isolates. Conclusions drawn from these studies include: differential carriage of large conjugative plasmids accounts for a large proportion of inter-strain differences; plasmid-encoded genes are more highly conserved than chromosomal genes, perhaps indicating a relatively recent origin for the plasmids; isolates from NE-affected birds fall into three distinct sequence-based clades while non-pathogenic isolates from healthy birds tend to be more genomically diverse. Overall, the NE causing strains are closely related to C. perfringens isolates from other birds and other diseases whereas the non-pathogenic poultry strains are generally more remotely related to either the pathogenic strains or the strains from other birds. Genomic analysis has indicated that genes in addition to netB are associated with NE pathogenic isolates. Collectively, this work has resulted in a deeper understanding of the pathogenesis of this important poultry disease.

  15. Burkholderia xernovorans LB400 harbors a multi-replicon, 9.73-Mbp genome shaped for versatility

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chain, Patrick S. G.; Denef, Vincent; Konstantinidis, Konstantinos T

    2006-01-01

    Burkholderia xenovorans LB400 (LB400), a well studied, effective polychlorinated biphenyl-degrader, has one of the two largest known bacterial genomes and is the first nonpathogenic Burkholderia isolate sequenced. From an evolutionary perspective, we find significant differences in functional specialization between the three replicons of LB400, as well as a more relaxed selective pressure for genes located on the two smaller vs. the largest replicon. High genomic plasticity, diversity, and specialization within the Burkholderia genus are exemplified by the conservation of only 44% of the genes between LB400 and Burkholderia cepacia complex strain 383. Even among four B. xenovorans strains, genome sizemore » varies from 7.4 to 9.73 Mbp. The latter is largely explained by our findings that >20% of the LB400 sequence was recently acquired by means of lateral gene transfer. Although a range of genetic factors associated with in vivo survival and intercellular interactions are present, these genetic factors are likely related to niche breadth rather than determinants of pathogenicity. The presence of at least eleven 'central aromatic' and twenty 'peripheral aromatic' pathways in LB400, among the highest in any sequenced bacterial genome, supports this hypothesis. Finally, in addition to the experimentally observed redundancy in benzoate degradation and formaldehyde oxidation pathways, the fact that 17.6% of proteins have a better LB400 paralog than an ortholog in a different genome highlights the importance of gene duplication and repeated acquirement, which, coupled with their divergence, raises questions regarding the role of paralogs and potential functional redundancies in large-genome microbes.« less

  16. Genomic insights from whole genome sequencing of four clonal outbreak Campylobacter jejuni assessed within the global C. jejuni population.

    PubMed

    Clark, Clifford G; Berry, Chrystal; Walker, Matthew; Petkau, Aaron; Barker, Dillon O R; Guan, Cai; Reimer, Aleisha; Taboada, Eduardo N

    2016-12-03

    Whole genome sequencing (WGS) is useful for determining clusters of human cases, investigating outbreaks, and defining the population genetics of bacteria. It also provides information about other aspects of bacterial biology, including classical typing results, virulence, and adaptive strategies of the organism. Cell culture invasion and protein expression patterns of four related multilocus sequence type 21 (ST21) C. jejuni isolates from a significant Canadian water-borne outbreak were previously associated with the presence of a CJIE1 prophage. Whole genome sequencing was used to examine the genetic diversity among these isolates and confirm that previous observations could be attributed to differential prophage carriage. Moreover, we sought to determine the presence of genome sequences that could be used as surrogate markers to delineate outbreak-associated isolates. Differential carriage of the CJIE1 prophage was identified as the major genetic difference among the four outbreak isolates. High quality single-nucleotide variant (hqSNV) and core genome multilocus sequence typing (cgMLST) clustered these isolates within expanded datasets consisting of additional C. jejuni strains. The number and location of homopolymeric tract regions was identical in all four outbreak isolates but differed from all other C. jejuni examined. Comparative genomics and PCR amplification enabled the identification of large chromosomal inversions of approximately 93 kb and 388 kb within the outbreak isolates associated with transducer-like proteins containing long nucleotide repeat sequences. The 93-kb inversion was characteristic of the outbreak-associated isolates, and the gene content of this inverted region displayed high synteny with the reference strain. The four outbreak isolates were clonally derived and differed mainly in the presence of the CJIE1 prophage, validating earlier findings linking the prophage to phenotypic differences in virulence assays and protein expression. The identification of large, genetically syntenous chromosomal inversions in the genomes of outbreak-associated isolates provided a unique method for discriminating outbreak isolates from the background population. Transducer-like proteins appear to be associated with the chromosomal inversions. CgMLST and hqSNV analysis also effectively delineated the outbreak isolates within the larger C. jejuni population structure.

  17. Between Two Fern Genomes

    PubMed Central

    2014-01-01

    Ferns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves. PMID:25324969

  18. Viruses Roll the Dice: The Stochastic Behavior of Viral Genome Molecules Accelerates Viral Adaptation at the Cell and Tissue Levels

    PubMed Central

    Miyashita, Shuhei; Ishibashi, Kazuhiro; Kishino, Hirohisa; Ishikawa, Masayuki

    2015-01-01

    Recent studies on evolutionarily distant viral groups have shown that the number of viral genomes that establish cell infection after cell-to-cell transmission is unexpectedly small (1–20 genomes). This aspect of viral infection appears to be important for the adaptation and survival of viruses. To clarify how the number of viral genomes that establish cell infection is determined, we developed a simulation model of cell infection for tomato mosaic virus (ToMV), a positive-strand RNA virus. The model showed that stochastic processes that govern the replication or degradation of individual genomes result in the infection by a small number of genomes, while a large number of infectious genomes are introduced in the cell. It also predicted two interesting characteristics regarding cell infection patterns: stochastic variation among cells in the number of viral genomes that establish infection and stochastic inequality in the accumulation of their progenies in each cell. Both characteristics were validated experimentally by inoculating tobacco cells with a library of nucleotide sequence–tagged ToMV and analyzing the viral genomes that accumulated in each cell using a high-throughput sequencer. An additional simulation model revealed that these two characteristics enhance selection during tissue infection. The cell infection model also predicted a mechanism that enhances selection at the cellular level: a small difference in the replication abilities of coinfected variants results in a large difference in individual accumulation via the multiple-round formation of the replication complex (i.e., the replication machinery). Importantly, this predicted effect was observed in vivo. The cell infection model was robust to changes in the parameter values, suggesting that other viruses could adopt similar adaptation mechanisms. Taken together, these data reveal a comprehensive picture of viral infection processes including replication, cell-to-cell transmission, and evolution, which are based on the stochastic behavior of the viral genome molecules in each cell. PMID:25781391

  19. Comparative genomics reveals insights into avian genome evolution and adaptation.

    PubMed

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

    2014-12-12

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.

  20. Insights from 20 years of bacterial genome sequencing

    DOE PAGES

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

    2015-02-27

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less

  1. Insights from 20 years of bacterial genome sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Land, Miriam L.; Hauser, Loren; Jun, Se-Ran

    Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less

  2. Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes

    PubMed Central

    Cannon, Steven B.; Sterck, Lieven; Rombauts, Stephane; Sato, Shusei; Cheung, Foo; Gouzy, Jérôme; Wang, Xiaohong; Mudge, Joann; Vasdewani, Jayprakash; Schiex, Thomas; Spannagl, Manuel; Monaghan, Erin; Nicholson, Christine; Humphray, Sean J.; Schoof, Heiko; Mayer, Klaus F. X.; Rogers, Jane; Quétier, Francis; Oldroyd, Giles E.; Debellé, Frédéric; Cook, Douglas R.; Retzel, Ernest F.; Roe, Bruce A.; Town, Christopher D.; Tabata, Satoshi; Van de Peer, Yves; Young, Nevin D.

    2006-01-01

    Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago–Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20–30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar). PMID:17003129

  3. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment.

    PubMed

    Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

    2012-09-01

    The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org.

  4. Structure of faustovirus, a large dsDNA virus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Klose, Thomas; Reteno, Dorine G.; Benamar, Samia

    Many viruses protect their genome with a combination of a protein shell with or without a membrane layer. In this paper, we describe the structure of faustovirus, the first DNA virus (to our knowledge) that has been found to use two protein shells to encapsidate and protect its genome. The crystal structure of the major capsid protein, in combination with cryo-electron microscopy structures of two different maturation stages of the virus, shows that the outer virus shell is composed of a double jelly-roll protein that can be found in many double-stranded DNA viruses. The structure of the repeating hexameric unitmore » of the inner shell is different from all other known capsid proteins. In addition to the unique architecture, the region of the genome that encodes the major capsid protein stretches over 17,000 bp and contains a large number of introns and exons. Finally, this complexity might help the virus to rapidly adapt to new environments or hosts.« less

  5. Structure of faustovirus, a large dsDNA virus

    DOE PAGES

    Klose, Thomas; Reteno, Dorine G.; Benamar, Samia; ...

    2016-05-16

    Many viruses protect their genome with a combination of a protein shell with or without a membrane layer. In this paper, we describe the structure of faustovirus, the first DNA virus (to our knowledge) that has been found to use two protein shells to encapsidate and protect its genome. The crystal structure of the major capsid protein, in combination with cryo-electron microscopy structures of two different maturation stages of the virus, shows that the outer virus shell is composed of a double jelly-roll protein that can be found in many double-stranded DNA viruses. The structure of the repeating hexameric unitmore » of the inner shell is different from all other known capsid proteins. In addition to the unique architecture, the region of the genome that encodes the major capsid protein stretches over 17,000 bp and contains a large number of introns and exons. Finally, this complexity might help the virus to rapidly adapt to new environments or hosts.« less

  6. The Neandertal genome and ancient DNA authenticity

    PubMed Central

    Green, Richard E; Briggs, Adrian W; Krause, Johannes; Prüfer, Kay; Burbano, Hernán A; Siebauer, Michael; Lachmann, Michael; Pääbo, Svante

    2009-01-01

    Recent advances in high-thoughput DNA sequencing have made genome-scale analyses of genomes of extinct organisms possible. With these new opportunities come new difficulties in assessing the authenticity of the DNA sequences retrieved. We discuss how these difficulties can be addressed, particularly with regard to analyses of the Neandertal genome. We argue that only direct assays of DNA sequence positions in which Neandertals differ from all contemporary humans can serve as a reliable means to estimate human contamination. Indirect measures, such as the extent of DNA fragmentation, nucleotide misincorporations, or comparison of derived allele frequencies in different fragment size classes, are unreliable. Fortunately, interim approaches based on mtDNA differences between Neandertals and current humans, detection of male contamination through Y chromosomal sequences, and repeated sequencing from the same fossil to detect autosomal contamination allow initial large-scale sequencing of Neandertal genomes. This will result in the discovery of fixed differences in the nuclear genome between Neandertals and current humans that can serve as future direct assays for contamination. For analyses of other fossil hominins, which may become possible in the future, we suggest a similar ‘boot-strap' approach in which interim approaches are applied until sufficient data for more definitive direct assays are acquired. PMID:19661919

  7. Accommodating the load

    PubMed Central

    Metcalfe, Cushla J.; Casane, Didier

    2013-01-01

    Very large genomes, that is, those above 20 Gb, are rare but widely distributed throughout the eukaryotes. They are found within the diatoms, dinoflagellates, metazoans and green plants, but so far have not been found in the excavates. There is a known positive correlation between genome size and the proportion of the genome composed of transposable elements (TEs). Very large genomes may therefore be expected to be almost entirely composed of TEs. Of the large genomes examined, in the angiosperms, gymnosperms and the dinoflagellates only a small portion of the genome was identified as TEs, most of these genomes were unidentified and may be novel or diverse TEs. In the salamanders and lungfish, 25 to 47% of the genome were identifiable retrotransposons, that is, TEs that copy themselves before insertion. However, the predominant class of TEs found in the lungfish was not the same as that found in the salamanders. The little data we have at the moment suggests therefore that the diversity and abundance of TEs is variable between taxa with large genomes, similar to patterns found in taxa with smaller genomes. Based on results from the human genome, we suggest that the ‘missing’ portion of the lungfish and salamander genomes are old, highly divergent, and therefore inactive copies of TEs. The data available indicate that, unlike plants with large genomes, neither the lungfish nor the salamanders show an increased risk of extinction. Based on a slow rate of DNA loss in salamanders it has been suggested that the large salamander genome is the result of run-away genome expansion involving genome size increases via TE proliferation associated with reduced recombination rate. We know of no studies on DNA loss or recombination rates in lungfish genomes, however a similar scenario could describe the process of genome expansion in the lungfish. A series of waves of TE transposition and sequence decay would describe the pattern of TE content seen in both the lungfish and the salamanders. The lungfish and salamanders, therefore, may accommodate their large load of TEs because these TEs have accumulated gradually over a long period of time and have been subject to inactivation and decay. PMID:24616835

  8. ELSI practices in genomic research in East Asia: implications for research collaboration and public participation

    PubMed Central

    2014-01-01

    Common infrastructures and platforms are required for international collaborations in large-scale human genomic research and policy development, such as the Global Alliance for Genomics and Health and the ‘ELSI 2.0’ initiative. Such initiatives may require international harmonization of ethical and regulatory requirements. To enable this, however, a greater understanding of issues and practices that relate to the ethical, legal and social implications (ELSI) of genomic research will be needed for the different countries and global regions involved in such research. Here, we review the ELSI practices and regulations for genomic research in six East Asian countries (China, Indonesia, Japan, Singapore, South Korea and Taiwan), highlighting the main similarities and differences between these countries, and more generally, in relation to Western countries. While there are significant differences in ELSI practices among these East Asian countries, there is a consistent emphasis on advancing genomic science and technology. In addition, considerable emphasis is placed on informed consent for participation in research, whether through the contribution of tissue samples or personal information. However, a higher level of engagement with interested stakeholders and the public will be needed in some countries. PMID:24944586

  9. Atlantic salmon populations reveal adaptive divergence of immune related genes - a duplicated genome under selection.

    PubMed

    Kjærner-Semb, Erik; Ayllon, Fernando; Furmanek, Tomasz; Wennevik, Vidar; Dahle, Geir; Niemelä, Eero; Ozerov, Mikhail; Vähä, Juha-Pekka; Glover, Kevin A; Rubin, Carl J; Wargelius, Anna; Edvardsen, Rolf B

    2016-08-11

    Populations of Atlantic salmon display highly significant genetic differences with unresolved molecular basis. These differences may result from separate postglacial colonization patterns, diversifying natural selection and adaptation, or a combination. Adaptation could be influenced or even facilitated by the recent whole genome duplication in the salmonid lineage which resulted in a partly tetraploid species with duplicated genes and regions. In order to elucidate the genes and genomic regions underlying the genetic differences, we conducted a genome wide association study using whole genome resequencing data from eight populations from Northern and Southern Norway. From a total of ~4.5 million sequencing-derived SNPs, more than 10 % showed significant differentiation between populations from these two regions and ten selective sweeps on chromosomes 5, 10, 11, 13-15, 21, 24 and 25 were identified. These comprised 59 genes, of which 15 had one or more differentiated missense mutation. Our analysis showed that most sweeps have paralogous regions in the partially tetraploid genome, each lacking the high number of significant SNPs found in the sweeps. The most significant sweep was found on Chr 25 and carried several missense mutations in the antiviral mx genes, suggesting that these populations have experienced differing viral pressures. Interestingly the second most significant sweep, found on Chr 5, contains two genes involved in the NF-KB pathway (nkap and nkrf), which is also a known pathogen target that controls a large number of processes in animals. Our results show that natural selection acting on immune related genes has contributed to genetic divergence between salmon populations in Norway. The differences between populations may have been facilitated by the plasticity of the salmon genome. The observed signatures of selection in duplicated genomic regions suggest that the recently duplicated genome has provided raw material for evolutionary adaptation.

  10. Global biogeography of Prochlorococcus genome diversity in the surface ocean.

    PubMed

    Kent, Alyssa G; Dupont, Chris L; Yooseph, Shibu; Martiny, Adam C

    2016-08-01

    Prochlorococcus, the smallest known photosynthetic bacterium, is abundant in the ocean's surface layer despite large variation in environmental conditions. There are several genetically divergent lineages within Prochlorococcus and superimposed on this phylogenetic diversity is extensive gene gain and loss. The environmental role in shaping the global ocean distribution of genome diversity in Prochlorococcus is largely unknown, particularly in a framework that considers the vertical and lateral mechanisms of evolution. Here we show that Prochlorococcus field populations from a global circumnavigation harbor extensive genome diversity across the surface ocean, but this diversity is not randomly distributed. We observed a significant correspondence between phylogenetic and gene content diversity, including regional differences in both phylogenetic composition and gene content that were related to environmental factors. Several gene families were strongly associated with specific regions and environmental factors, including the identification of a set of genes related to lower nutrient and temperature regions. Metagenomic assemblies of natural Prochlorococcus genomes reinforced this association by providing linkage of genes across genomic backbones. Overall, our results show that the phylogeography in Prochlorococcus taxonomy is echoed in its genome content. Thus environmental variation shapes the functional capabilities and associated ecosystem role of the globally abundant Prochlorococcus.

  11. Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures.

    PubMed

    Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B

    2013-01-01

    A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.

  12. Complete Chloroplast Genome Sequences of Important Oilseed Crop Sesamum indicum L

    PubMed Central

    Yi, Dong-Keun; Kim, Ki-Joong

    2012-01-01

    Sesamum indicum is an important crop plant species for yielding oil. The complete chloroplast (cp) genome of S. indicum (GenBank acc no. JN637766) is 153,324 bp in length, and has a pair of inverted repeat (IR) regions consisting of 25,141 bp each. The lengths of the large single copy (LSC) and the small single copy (SSC) regions are 85,170 bp and 17,872 bp, respectively. Comparative cp DNA sequence analyses of S. indicum with other cp genomes reveal that the genome structure, gene order, gene and intron contents, AT contents, codon usage, and transcription units are similar to the typical angiosperm cp genomes. Nucleotide diversity of the IR region between Sesamum and three other cp genomes is much lower than that of the LSC and SSC regions in both the coding region and noncoding region. As a summary, the regional constraints strongly affect the sequence evolution of the cp genomes, while the functional constraints weakly affect the sequence evolution of cp genomes. Five short inversions associated with short palindromic sequences that form step-loop structures were observed in the chloroplast genome of S. indicum. Twenty-eight different simple sequence repeat loci have been detected in the chloroplast genome of S. indicum. Almost all of the SSR loci were composed of A or T, so this may also contribute to the A-T richness of the cp genome of S. indicum. Seven large repeated loci in the chloroplast genome of S. indicum were also identified and these loci are useful to developing S. indicum-specific cp genome vectors. The complete cp DNA sequences of S. indicum reported in this paper are prerequisite to modifying this important oilseed crop by cp genetic engineering techniques. PMID:22606240

  13. Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

    PubMed

    Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W

    2016-06-13

    Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.

  14. Environmental genomics of "Haloquadratum walsbyi" in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species

    PubMed Central

    Legault, Boris A; Lopez-Lopez, Arantxa; Alba-Casado, Jose Carlos; Doolittle, W Ford; Bolhuis, Henk; Rodriguez-Valera, Francisco; Papke, R Thane

    2006-01-01

    Background Mature saturated brine (crystallizers) communities are largely dominated (>80% of cells) by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species) even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities. PMID:16820057

  15. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk.

    PubMed

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B; Huson, Daniel H; Frick, Julia-Stefanie

    2016-04-25

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. D-GENIES: dot plot large genomes in an interactive, efficient and simple way.

    PubMed

    Cabanettes, Floréal; Klopp, Christophe

    2018-01-01

    Dot plots are widely used to quickly compare sequence sets. They provide a synthetic similarity overview, highlighting repetitions, breaks and inversions. Different tools have been developed to easily generated genomic alignment dot plots, but they are often limited in the input sequence size. D-GENIES is a standalone and web application performing large genome alignments using minimap2 software package and generating interactive dot plots. It enables users to sort query sequences along the reference, zoom in the plot and download several image, alignment or sequence files. D-GENIES is an easy-to-install, open-source software package (GPL) developed in Python and JavaScript. The source code is available at https://github.com/genotoul-bioinfo/dgenies and it can be tested at http://dgenies.toulouse.inra.fr/.

  17. Animal Mitochondrial DNA as We Do Not Know It: mt-Genome Organization and Evolution in Nonbilaterian Lineages

    PubMed Central

    Pett, Walker

    2016-01-01

    Abstract Animal mitochondrial DNA (mtDNA) is commonly described as a small, circular molecule that is conserved in size, gene content, and organization. Data collected in the last decade have challenged this view by revealing considerable diversity in animal mitochondrial genome organization. Much of this diversity has been found in nonbilaterian animals (phyla Cnidaria, Ctenophora, Placozoa, and Porifera), which, from a phylogenetic perspective, form the main branches of the animal tree along with Bilateria. Within these groups, mt-genomes are characterized by varying numbers of both linear and circular chromosomes, extra genes (e.g. atp9, polB, tatC), large variation in the number of encoded mitochondrial transfer RNAs (tRNAs) (0–25), at least seven different genetic codes, presence/absence of introns, tRNA and mRNA editing, fragmented ribosomal RNA genes, translational frameshifting, highly variable substitution rates, and a large range of genome sizes. This newly discovered diversity allows a better understanding of the evolutionary plasticity and conservation of animal mtDNA and provides insights into the molecular and evolutionary mechanisms shaping mitochondrial genomes. PMID:27557826

  18. Post-genomic insights into the plant polysaccharide degradation potential of Aspergillus nidulans and comparison to Aspergillus niger and Aspergillus oryzae.

    PubMed

    Coutinho, Pedro M; Andersen, Mikael R; Kolenova, Katarina; vanKuyk, Patricia A; Benoit, Isabelle; Gruben, Birgit S; Trejo-Aguilar, Blanca; Visser, Hans; van Solingen, Piet; Pakula, Tiina; Seiboth, Bernard; Battaglia, Evy; Aguilar-Osorio, Guillermo; de Jong, Jan F; Ohm, Robin A; Aguilar, Mariana; Henrissat, Bernard; Nielsen, Jens; Stålbrand, Henrik; de Vries, Ronald P

    2009-03-01

    The plant polysaccharide degradative potential of Aspergillus nidulans was analysed in detail and compared to that of Aspergillus niger and Aspergillus oryzae using a combination of bioinformatics, physiology and transcriptomics. Manual verification indicated that 28.4% of the A. nidulans ORFs analysed in this study do not contain a secretion signal, of which 40% may be secreted through a non-classical method.While significant differences were found between the species in the numbers of ORFs assigned to the relevant CAZy families, no significant difference was observed in growth on polysaccharides. Growth differences were observed between the Aspergilli and Podospora anserina, which has a more different genomic potential for polysaccharide degradation, suggesting that large genomic differences are required to cause growth differences on polysaccharides. Differences were also detected between the Aspergilli in the presence of putative regulatory sequences in the promoters of the ORFs of this study and correlation of the presence of putative XlnR binding sites to induction by xylose was detected for A. niger. These data demonstrate differences at genome content, substrate specificity of the enzymes and gene regulation in these three Aspergilli, which likely reflect their individual adaptation to their natural biotope.

  19. Large transcription units unify copy number variants and common fragile sites arising under replication stress.

    PubMed

    Wilson, Thomas E; Arlt, Martin F; Park, So Hae; Rajendran, Sountharia; Paulsen, Michelle; Ljungman, Mats; Glover, Thomas W

    2015-02-01

    Copy number variants (CNVs) resulting from genomic deletions and duplications and common fragile sites (CFSs) seen as breaks on metaphase chromosomes are distinct forms of structural chromosome instability precipitated by replication inhibition. Although they share a common induction mechanism, it is not known how CNVs and CFSs are related or why some genomic loci are much more prone to their occurrence. Here we compare large sets of de novo CNVs and CFSs in several experimental cell systems to each other and to overlapping genomic features. We first show that CNV hotpots and CFSs occurred at the same human loci within a given cultured cell line. Bru-seq nascent RNA sequencing further demonstrated that although genomic regions with low CNV frequencies were enriched in transcribed genes, the CNV hotpots that matched CFSs specifically corresponded to the largest active transcription units in both human and mouse cells. Consistently, active transcription units >1 Mb were robust cell-type-specific predictors of induced CNV hotspots and CFS loci. Unlike most transcribed genes, these very large transcription units replicated late and organized deletion and duplication CNVs into their transcribed and flanking regions, respectively, supporting a role for transcription in replication-dependent lesion formation. These results indicate that active large transcription units drive extreme locus- and cell-type-specific genomic instability under replication stress, resulting in both CNVs and CFSs as different manifestations of perturbed replication dynamics. © 2015 Wilson et al.; Published by Cold Spring Harbor Laboratory Press.

  20. Large transcription units unify copy number variants and common fragile sites arising under replication stress

    PubMed Central

    Park, So Hae; Rajendran, Sountharia; Paulsen, Michelle; Ljungman, Mats; Glover, Thomas W.

    2015-01-01

    Copy number variants (CNVs) resulting from genomic deletions and duplications and common fragile sites (CFSs) seen as breaks on metaphase chromosomes are distinct forms of structural chromosome instability precipitated by replication inhibition. Although they share a common induction mechanism, it is not known how CNVs and CFSs are related or why some genomic loci are much more prone to their occurrence. Here we compare large sets of de novo CNVs and CFSs in several experimental cell systems to each other and to overlapping genomic features. We first show that CNV hotpots and CFSs occurred at the same human loci within a given cultured cell line. Bru-seq nascent RNA sequencing further demonstrated that although genomic regions with low CNV frequencies were enriched in transcribed genes, the CNV hotpots that matched CFSs specifically corresponded to the largest active transcription units in both human and mouse cells. Consistently, active transcription units >1 Mb were robust cell-type-specific predictors of induced CNV hotspots and CFS loci. Unlike most transcribed genes, these very large transcription units replicated late and organized deletion and duplication CNVs into their transcribed and flanking regions, respectively, supporting a role for transcription in replication-dependent lesion formation. These results indicate that active large transcription units drive extreme locus- and cell-type-specific genomic instability under replication stress, resulting in both CNVs and CFSs as different manifestations of perturbed replication dynamics. PMID:25373142

  1. Global Organization of a Positive-strand RNA Virus Genome

    PubMed Central

    Wu, Baodong; Grigull, Jörg; Ore, Moriam O.; Morin, Sylvie; White, K. Andrew

    2013-01-01

    The genomes of plus-strand RNA viruses contain many regulatory sequences and structures that direct different viral processes. The traditional view of these RNA elements are as local structures present in non-coding regions. However, this view is changing due to the discovery of regulatory elements in coding regions and functional long-range intra-genomic base pairing interactions. The ∼4.8 kb long RNA genome of the tombusvirus tomato bushy stunt virus (TBSV) contains these types of structural features, including six different functional long-distance interactions. We hypothesized that to achieve these multiple interactions this viral genome must utilize a large-scale organizational strategy and, accordingly, we sought to assess the global conformation of the entire TBSV genome. Atomic force micrographs of the genome indicated a mostly condensed structure composed of interconnected protrusions extending from a central hub. This configuration was consistent with the genomic secondary structure model generated using high-throughput selective 2′-hydroxyl acylation analysed by primer extension (i.e. SHAPE), which predicted different sized RNA domains originating from a central region. Known RNA elements were identified in both domain and inter-domain regions, and novel structural features were predicted and functionally confirmed. Interestingly, only two of the six long-range interactions known to form were present in the structural model. However, for those interactions that did not form, complementary partner sequences were positioned relatively close to each other in the structure, suggesting that the secondary structure level of viral genome structure could provide a basic scaffold for the formation of different long-range interactions. The higher-order structural model for the TBSV RNA genome provides a snapshot of the complex framework that allows multiple functional components to operate in concert within a confined context. PMID:23717202

  2. Genomic structural variation contributes to phenotypic change of industrial bioethanol yeast Saccharomyces cerevisiae.

    PubMed

    Zhang, Ke; Zhang, Li-Jie; Fang, Ya-Hong; Jin, Xin-Na; Qi, Lei; Wu, Xue-Chang; Zheng, Dao-Qiong

    2016-03-01

    Genomic structural variation (GSV) is a ubiquitous phenomenon observed in the genomes of Saccharomyces cerevisiae strains with different genetic backgrounds; however, the physiological and phenotypic effects of GSV are not well understood. Here, we first revealed the genetic characteristics of a widely used industrial S. cerevisiae strain, ZTW1, by whole genome sequencing. ZTW1 was identified as an aneuploidy strain and a large-scale GSV was observed in the ZTW1 genome compared with the genome of a diploid strain YJS329. These GSV events led to copy number variations (CNVs) in many chromosomal segments as well as one whole chromosome in the ZTW1 genome. Changes in the DNA dosage of certain functional genes directly affected their expression levels and the resultant ZTW1 phenotypes. Moreover, CNVs of large chromosomal regions triggered an aneuploidy stress in ZTW1. This stress decreased the proliferation ability and tolerance of ZTW1 to various stresses, while aneuploidy response stress may also provide some benefits to the fermentation performance of the yeast, including increased fermentation rates and decreased byproduct generation. This work reveals genomic characters of the bioethanol S. cerevisiae strain ZTW1 and suggests that GSV is an important kind of mutation that changes the traits of industrial S. cerevisiae strains. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. Genome Variation Map: a data repository of genome variations in BIG Data Center.

    PubMed

    Song, Shuhui; Tian, Dongmei; Li, Cuiping; Tang, Bixia; Dong, Lili; Xiao, Jingfa; Bao, Yiming; Zhao, Wenming; He, Hang; Zhang, Zhang

    2018-01-04

    The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm/) is a public data repository of genome variations. As a core resource in the BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, GVM dedicates to collect, integrate and visualize genome variations for a wide range of species, accepts submissions of different types of genome variations from all over the world and provides free open access to all publicly available data in support of worldwide research activities. Unlike existing related databases, GVM features integration of a large number of genome variations for a broad diversity of species including human, cultivated plants and domesticated animals. Specifically, the current implementation of GVM not only houses a total of ∼4.9 billion variants for 19 species including chicken, dog, goat, human, poplar, rice and tomato, but also incorporates 8669 individual genotypes and 13 262 manually curated high-quality genotype-to-phenotype associations for non-human species. In addition, GVM provides friendly intuitive web interfaces for data submission, browse, search and visualization. Collectively, GVM serves as an important resource for archiving genomic variation data, helpful for better understanding population genetic diversity and deciphering complex mechanisms associated with different phenotypes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Genic rather than genome-wide differences between sexually deceptive Ophrys orchids with different pollinators.

    PubMed

    Sedeek, Khalid E M; Scopece, Giovanni; Staedler, Yannick M; Schönenberger, Jürg; Cozzolino, Salvatore; Schiestl, Florian P; Schlüter, Philipp M

    2014-12-01

    High pollinator specificity and the potential for simple genetic changes to affect pollinator attraction make sexually deceptive orchids an ideal system for the study of ecological speciation, in which change of flower odour is likely important. This study surveys reproductive barriers and differences in floral phenotypes in a group of four closely related, coflowering sympatric Ophrys species and uses a genotyping-by-sequencing (GBS) approach to obtain information on the proportion of the genome that is differentiated between species. Ophrys species were found to effectively lack postpollination barriers, but are strongly isolated by their different pollinators (floral isolation) and, to a smaller extent, by shifts in flowering time (temporal isolation). Although flower morphology and perhaps labellum coloration may contribute to floral isolation, reproductive barriers may largely be due to differences in flower odour chemistry. GBS revealed shared polymorphism throughout the Ophrys genome, with very little population structure between species. Genome scans for FST outliers identified few markers that are highly differentiated between species and repeatable in several populations. These genome scans also revealed highly differentiated polymorphisms in genes with putative involvement in floral odour production, including a previously identified candidate gene thought to be involved in the biosynthesis of pseudo-pheromones by the orchid flowers. Taken together, these data suggest that ecological speciation associated with different pollinators in sexually deceptive orchids has a genic rather than a genomic basis, placing these species at an early phase of genomic divergence within the 'speciation continuum'. © 2014 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.

  5. Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures

    PubMed Central

    Stanton, Kelly Patrick; Parisi, Fabio; Strino, Francesco; Rabin, Neta; Asp, Patrik; Kluger, Yuval

    2013-01-01

    Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein–chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/wiki/Home/. PMID:23873955

  6. Arpeggio: harmonic compression of ChIP-seq data reveals protein-chromatin interaction signatures.

    PubMed

    Stanton, Kelly Patrick; Parisi, Fabio; Strino, Francesco; Rabin, Neta; Asp, Patrik; Kluger, Yuval

    2013-09-01

    Researchers generating new genome-wide data in an exploratory sequencing study can gain biological insights by comparing their data with well-annotated data sets possessing similar genomic patterns. Data compression techniques are needed for efficient comparisons of a new genomic experiment with large repositories of publicly available profiles. Furthermore, data representations that allow comparisons of genomic signals from different platforms and across species enhance our ability to leverage these large repositories. Here, we present a signal processing approach that characterizes protein-chromatin interaction patterns at length scales of several kilobases. This allows us to efficiently compare numerous chromatin-immunoprecipitation sequencing (ChIP-seq) data sets consisting of many types of DNA-binding proteins collected from a variety of cells, conditions and organisms. Importantly, these interaction patterns broadly reflect the biological properties of the binding events. To generate these profiles, termed Arpeggio profiles, we applied harmonic deconvolution techniques to the autocorrelation profiles of the ChIP-seq signals. We used 806 publicly available ChIP-seq experiments and showed that Arpeggio profiles with similar spectral densities shared biological properties. Arpeggio profiles of ChIP-seq data sets revealed characteristics that are not easily detected by standard peak finders. They also allowed us to relate sequencing data sets from different genomes, experimental platforms and protocols. Arpeggio is freely available at http://sourceforge.net/p/arpeggio/wiki/Home/.

  7. Effect of Teosinte Cytoplasmic Genomes on Maize Phenotype

    PubMed Central

    Allen, James O.

    2005-01-01

    Determining the contribution of organelle genes to plant phenotype is hampered by several factors, including the paucity of variation in the plastid and mitochondrial genomes. To circumvent this problem, evolutionary divergence between maize (Zea mays ssp. mays) and the teosintes, its closest relatives, was utilized as a source of cytoplasmic genetic variation. Maize lines in which the maize organelle genomes were replaced through serial backcrossing by those representing the entire genus, yielding alloplasmic sublines, or cytolines were created. To avoid the confounding effects of segregating nuclear alleles, an inbred maize line was utilized. Cytolines with Z. mays teosinte cytoplasms were generally indistinguishable from maize. However, cytolines with cytoplasm from the more distantly related Z. luxurians, Z. diploperennis, or Z. perennis exhibited a plethora of differences in growth, development, morphology, and function. Significant differences were observed for 56 of the 58 characters studied. Each cytoline was significantly different from the inbred line for most characters. For a given character, variation was often greater among cytolines having cytoplasms from the same species than among those from different species. The characters differed largely independently of each other. These results suggest that the cytoplasm contributes significantly to a large proportion of plant traits and that many of the organelle genes are phenotypically important. PMID:15731518

  8. The Evolution of Host Specialization in the Vertebrate Gut Symbiont Lactobacillus reuteri

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Frese, Steven A.; Benson, Andrew K.; Tannock, Gerald W.

    Recent research has provided mechanistic insight into the important contributions of the gut microbiota to vertebrate biology, but questions remain about the evolutionary processes that have shaped this symbiosis. In the present study, we showed in experiments with gnotobiotic mice that the evolution of Lactobacillus reuteri with rodents resulted in the emergence of host specialization. To identify genomic events marking adaptations to the murine host, we compared the genome of the rodent isolate L. reuteri 100-23 with that of the human isolate L. reuteri F275, and we identified hundreds of genes that were specific to each strain. In order tomore » differentiate true host-specific genome content from strain-level differences, comparative genome hybridizations were performed to query 57 L. reuteri strains originating from six different vertebrate hosts in combination with genome sequence comparisons of nine strains encompassing five phylogenetic lineages of the species. This approach revealed that rodent strains, although showing a high degree of genomic plasticity, possessed a specific genome inventory that was rare or absent in strains from other vertebrate hosts. The distinct genome content of L. reuteri lineages reflected the niche characteristics in the gastrointestinal tracts of their respective hosts, and inactivation of seven out of eight representative rodent-specific genes in L. reuteri 100-23 resulted in impaired ecological performance in the gut of mice. The comparative genomic analyses suggested fundamentally different trends of genome evolution in rodent and human L. reuteri populations, with the former possessing a large and adaptable pan-genome while the latter being subjected to a process of reductive evolution. In conclusion, this study provided experimental evidence and a molecular basis for the evolution of host specificity in a vertebrate gut symbiont, and it identified genomic events that have shaped this process.« less

  9. Independent assessment and improvement of wheat genome sequence assemblies using Fosill jumping libraries.

    PubMed

    Lu, Fu-Hao; McKenzie, Neil; Kettleborough, George; Heavens, Darren; Clark, Matthew D; Bevan, Michael W

    2018-05-01

    The accurate sequencing and assembly of very large, often polyploid, genomes remains a challenging task, limiting long-range sequence information and phased sequence variation for applications such as plant breeding. The 15-Gb hexaploid bread wheat (Triticum aestivum) genome has been particularly challenging to sequence, and several different approaches have recently generated long-range assemblies. Mapping and understanding the types of assembly errors are important for optimising future sequencing and assembly approaches and for comparative genomics. Here we use a Fosill 38-kb jumping library to assess medium and longer-range order of different publicly available wheat genome assemblies. Modifications to the Fosill protocol generated longer Illumina sequences and enabled comprehensive genome coverage. Analyses of two independent Bacterial Artificial Chromosome (BAC)-based chromosome-scale assemblies, two independent Illumina whole genome shotgun assemblies, and a hybrid Single Molecule Real Time (SMRT-PacBio) and short read (Illumina) assembly were carried out. We revealed a surprising scale and variety of discrepancies using Fosill mate-pair mapping and validated several of each class. In addition, Fosill mate-pairs were used to scaffold a whole genome Illumina assembly, leading to a 3-fold increase in N50 values. Our analyses, using an independent means to validate different wheat genome assemblies, show that whole genome shotgun assemblies based solely on Illumina sequences are significantly more accurate by all measures compared to BAC-based chromosome-scale assemblies and hybrid SMRT-Illumina approaches. Although current whole genome assemblies are reasonably accurate and useful, additional improvements will be needed to generate complete assemblies of wheat genomes using open-source, computationally efficient, and cost-effective methods.

  10. The Cancer Genome Atlas Pan-Cancer analysis project.

    PubMed

    Weinstein, John N; Collisson, Eric A; Mills, Gordon B; Shaw, Kenna R Mills; Ozenberger, Brad A; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M

    2013-10-01

    The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

  11. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition

    PubMed Central

    Lefébure, Tristan; Stanhope, Michael J

    2007-01-01

    Background The genus Streptococcus is one of the most diverse and important human and agricultural pathogens. This study employs comparative evolutionary analyses of 26 Streptococcus genomes to yield an improved understanding of the relative roles of recombination and positive selection in pathogen adaptation to their hosts. Results Streptococcus genomes exhibit extreme levels of evolutionary plasticity, with high levels of gene gain and loss during species and strain evolution. S. agalactiae has a large pan-genome, with little recombination in its core-genome, while S. pyogenes has a smaller pan-genome and much more recombination of its core-genome, perhaps reflecting the greater habitat, and gene pool, diversity for S. agalactiae compared to S. pyogenes. Core-genome recombination was evident in all lineages (18% to 37% of the core-genome judged to be recombinant), while positive selection was mainly observed during species differentiation (from 11% to 34% of the core-genome). Positive selection pressure was unevenly distributed across lineages and biochemical main role categories. S. suis was the lineage with the greatest level of positive selection pressure, the largest number of unique loci selected, and the largest amount of gene gain and loss. Conclusion Recombination is an important evolutionary force in shaping Streptococcus genomes, not only in the acquisition of significant portions of the genome as lineage specific loci, but also in facilitating rapid evolution of the core-genome. Positive selection, although undoubtedly a slower process, has nonetheless played an important role in adaptation of the core-genome of different Streptococcus species to different hosts. PMID:17475002

  12. Draft genome of the gayal, Bos frontalis

    PubMed Central

    Wang, Ming-Shan; Zeng, Yan; Wang, Xiao; Nie, Wen-Hui; Wang, Jin-Huan; Su, Wei-Ting; Xiong, Zi-Jun; Wang, Sheng; Qu, Kai-Xing; Yan, Shou-Qing; Yang, Min-Min; Wang, Wen; Dong, Yang; Zhang, Ya-Ping

    2017-01-01

    Abstract Gayal (Bos frontalis), also known as mithan or mithun, is a large endangered semi-domesticated bovine that has a limited geographical distribution in the hill-forests of China, Northeast India, Bangladesh, Myanmar, and Bhutan. Many questions about the gayal such as its origin, population history, and genetic basis of local adaptation remain largely unresolved. De novo sequencing and assembly of the whole gayal genome provides an opportunity to address these issues. We report a high-depth sequencing, de novo assembly, and annotation of a female Chinese gayal genome. Based on the Illumina genomic sequencing platform, we have generated 350.38 Gb of raw data from 16 different insert-size libraries. A total of 276.86 Gb of clean data is retained after quality control. The assembled genome is about 2.85 Gb with scaffold and contig N50 sizes of 2.74 Mb and 14.41 kb, respectively. Repetitive elements account for 48.13% of the genome. Gene annotation has yielded 26 667 protein-coding genes, of which 97.18% have been functionally annotated. BUSCO assessment shows that our assembly captures 93% (3183 of 4104) of the core eukaryotic genes and 83.1% of vertebrate universal single-copy orthologs. We provide the first comprehensive de novo genome of the gayal. This genetic resource is integral for investigating the origin of the gayal and performing comparative genomic studies to improve understanding of the speciation and divergence of bovine species. The assembled genome could be used as reference in future population genetic studies of gayal. PMID:29048483

  13. Large protein as a potential target for use in rabies diagnostics.

    PubMed

    Santos Katz, I S; Dias, M H; Lima, I F; Chaves, L B; Ribeiro, O G; Scheffer, K C; Iwai, L K

    Rabies is a zoonotic viral disease that remains a serious threat to public health worldwide. The rabies lyssavirus (RABV) genome encodes five structural proteins, multifunctional and significant for pathogenicity. The large protein (L) presents well-conserved genomic regions, which may be a good alternative to generate informative datasets for development of new methods for rabies diagnosis. This paper describes the development of a technique for the identification of L protein in several RABV strains from different hosts, demonstrating that MS-based proteomics is a potential method for antigen identification and a good alternative for rabies diagnosis.

  14. Systematic CpT (ApG) depletion and CpG excess are unique genomic signatures of large DNA viruses infecting invertebrates.

    PubMed

    Upadhyay, Mohita; Sharma, Neha; Vivekanandan, Perumal

    2014-01-01

    Differences in the relative abundance of dinucleotides, if any may provide important clues on host-driven evolution of viruses. We studied dinucleotide frequencies of large DNA viruses infecting vertebrates (n = 105; viruses infecting mammals = 99; viruses infecting aves = 6; viruses infecting reptiles = 1) and invertebrates (n = 88; viruses infecting insects = 84; viruses infecting crustaceans = 4). We have identified systematic depletion of CpT(ApG) dinucleotides and over-representation of CpG dinucleotides as the unique genomic signature of large DNA viruses infecting invertebrates. Detailed investigation of this unique genomic signature suggests the existence of invertebrate host-induced pressures specifically targeting CpT(ApG) and CpG dinucleotides. The depletion of CpT dinucleotides among large DNA viruses infecting invertebrates is at least in part, explained by non-canonical DNA methylation by the infected host. Our findings highlight the role of invertebrate host-related factors in shaping virus evolution and they also provide the necessary framework for future studies on evolution, epigenetics and molecular biology of viruses infecting this group of hosts.

  15. Genome re-assignment of Arachis trinitensis (Sect. Arachis, Leguminosae) and its implications for the genetic origin of cultivated peanut

    PubMed Central

    2010-01-01

    The karyotype structure of Arachis trinitensis was studied by conventional Feulgen staining, CMA/DAPI banding and rDNA loci detection by fluorescence in situ hybridization (FISH) in order to establish its genome status and test the hypothesis that this species is a genome donor of cultivated peanut. Conventional staining revealed that the karyotype lacked the small “A chromosomes” characteristic of the A genome. In agreement with this, chromosomal banding showed that none of the chromosomes had the large centromeric bands expected for A chromosomes. FISH revealed one pair each of 5S and 45S rDNA loci, located in different medium-sized metacentric chromosomes. Collectively, these results suggest that A. trinitensis should be removed from the A genome and be considered as a B or non-A genome species. The pattern of heterochromatic bands and rDNA loci of A. trinitensis differ markedly from any of the complements of A. hypogaea, suggesting that the former species is unlikely to be one of the wild diploid progenitors of the latter. PMID:21637581

  16. Test Pricing and Reimbursement in Genomic Medicine: Towards a General Strategy.

    PubMed

    Vozikis, Athanassios; Cooper, David N; Mitropoulou, Christina; Kambouris, Manousos E; Brand, Angela; Dolzan, Vita; Fortina, Paolo; Innocenti, Federico; Lee, Ming Ta Michael; Leyens, Lada; Macek, Milan; Al-Mulla, Fahd; Prainsack, Barbara; Squassina, Alessio; Taruscio, Domenica; van Schaik, Ron H; Vayena, Effy; Williams, Marc S; Patrinos, George P

    2016-01-01

    This paper aims to provide an overview of the rationale and basic principles guiding the governance of genomic testing services, to clarify their objectives, and allocate and define responsibilities among stakeholders in a health-care system, with a special focus on the EU countries. Particular attention is paid to issues pertaining to pricing and reimbursement policies, the availability of essential genomic tests which differs between various countries owing to differences in disease prevalence and public health relevance, the prescribing and use of genomic testing services according to existing or new guidelines, budgetary and fiscal control, the balance between price and access to innovative testing, monitoring and evaluation for cost-effectiveness and safety, and the development of research capacity. We conclude that addressing the specific items put forward in this article will help to create a robust policy in relation to pricing and reimbursement in genomic medicine. This will contribute to an effective and sustainable health-care system and will prove beneficial to the economy at large. © 2016 S. Karger AG, Basel.

  17. Sequence Polymorphisms and Structural Variations among Four Grapevine (Vitis vinifera L.) Cultivars Representing Sardinian Agriculture

    PubMed Central

    Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore

    2017-01-01

    The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732

  18. The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

    PubMed

    Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

    2015-04-01

    Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.

  19. Mitochondrial pathogenic mutations are population-specific.

    PubMed

    Breen, Michael S; Kondrashov, Fyodor A

    2010-12-31

    Surveying deleterious variation in human populations is crucial for our understanding, diagnosis and potential treatment of human genetic pathologies. A number of recent genome-wide analyses focused on the prevalence of segregating deleterious alleles in the nuclear genome. However, such studies have not been conducted for the mitochondrial genome. We present a systematic survey of polymorphisms in the human mitochondrial genome, including those predicted to be deleterious and those that correspond to known pathogenic mutations. Analyzing 4458 completely sequenced mitochondrial genomes we characterize the genetic diversity of different types of single nucleotide polymorphisms (SNPs) in African (L haplotypes) and non-African (M and N haplotypes) populations. We find that the overall level of polymorphism is higher in the mitochondrial compared to the nuclear genome, although the mitochondrial genome appears to be under stronger selection as indicated by proportionally fewer nonsynonymous than synonymous substitutions. The African mitochondrial genomes show higher heterozygosity, a greater number of polymorphic sites and higher frequencies of polymorphisms for synonymous, benign and damaging polymorphism than non-African genomes. However, African genomes carry significantly fewer SNPs that have been previously characterized as pathogenic compared to non-African genomes. Finding SNPs classified as pathogenic to be the only category of polymorphisms that are more abundant in non-African genomes is best explained by a systematic ascertainment bias that favours the discovery of pathogenic polymorphisms segregating in non-African populations. This further suggests that, contrary to the common disease-common variant hypothesis, pathogenic mutations are largely population-specific and different SNPs may be associated with the same disease in different populations. Therefore, to obtain a comprehensive picture of the deleterious variability in the human population, as well as to improve the diagnostics of individuals carrying African mitochondrial haplotypes, it is necessary to survey different populations independently. This article was reviewed by Dr Mikhail Gelfand, Dr Vasily Ramensky (nominated by Dr Eugene Koonin) and Dr David Rand (nominated by Dr Laurence Hurst).

  20. A DNA methylation map of human cancer at single base-pair resolution.

    PubMed

    Vidal, E; Sayols, S; Moran, S; Guillaumet-Adkins, A; Schroeder, M P; Royo, R; Orozco, M; Gut, M; Gut, I; Lopez-Bigas, N; Heyn, H; Esteller, M

    2017-10-05

    Although single base-pair resolution DNA methylation landscapes for embryonic and different somatic cell types provided important insights into epigenetic dynamics and cell-type specificity, such comprehensive profiling is incomplete across human cancer types. This prompted us to perform genome-wide DNA methylation profiling of 22 samples derived from normal tissues and associated neoplasms, including primary tumors and cancer cell lines. Unlike their invariant normal counterparts, cancer samples exhibited highly variable CpG methylation levels in a large proportion of the genome, involving progressive changes during tumor evolution. The whole-genome sequencing results from selected samples were replicated in a large cohort of 1112 primary tumors of various cancer types using genome-scale DNA methylation analysis. Specifically, we determined DNA hypermethylation of promoters and enhancers regulating tumor-suppressor genes, with potential cancer-driving effects. DNA hypermethylation events showed evidence of positive selection, mutual exclusivity and tissue specificity, suggesting their active participation in neoplastic transformation. Our data highlight the extensive changes in DNA methylation that occur in cancer onset, progression and dissemination.

  1. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

    PubMed Central

    2013-01-01

    Background The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. Results In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. Conclusions Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another. PMID:23870653

  2. Different Evolutionary Paths to Complexity for Small and Large Populations of Digital Organisms

    PubMed Central

    2016-01-01

    A major aim of evolutionary biology is to explain the respective roles of adaptive versus non-adaptive changes in the evolution of complexity. While selection is certainly responsible for the spread and maintenance of complex phenotypes, this does not automatically imply that strong selection enhances the chance for the emergence of novel traits, that is, the origination of complexity. Population size is one parameter that alters the relative importance of adaptive and non-adaptive processes: as population size decreases, selection weakens and genetic drift grows in importance. Because of this relationship, many theories invoke a role for population size in the evolution of complexity. Such theories are difficult to test empirically because of the time required for the evolution of complexity in biological populations. Here, we used digital experimental evolution to test whether large or small asexual populations tend to evolve greater complexity. We find that both small and large—but not intermediate-sized—populations are favored to evolve larger genomes, which provides the opportunity for subsequent increases in phenotypic complexity. However, small and large populations followed different evolutionary paths towards these novel traits. Small populations evolved larger genomes by fixing slightly deleterious insertions, while large populations fixed rare beneficial insertions that increased genome size. These results demonstrate that genetic drift can lead to the evolution of complexity in small populations and that purifying selection is not powerful enough to prevent the evolution of complexity in large populations. PMID:27923053

  3. A bacterial genome in transition - an exceptional enrichment of IS elements but lack of evidence for recent transposition in the symbiont Amoebophilus asiaticus

    PubMed Central

    2011-01-01

    Background Insertion sequence (IS) elements are important mediators of genome plasticity and are widespread among bacterial and archaeal genomes. The 1.88 Mbp genome of the obligate intracellular amoeba symbiont Amoebophilus asiaticus contains an unusually large number of transposase genes (n = 354; 23% of all genes). Results The transposase genes in the A. asiaticus genome can be assigned to 16 different IS elements termed ISCaa1 to ISCaa16, which are represented by 2 to 24 full-length copies, respectively. Despite this high IS element load, the A. asiaticus genome displays a GC skew pattern typical for most bacterial genomes, indicating that no major rearrangements have occurred recently. Additionally, the high sequence divergence of some IS elements, the high number of truncated IS element copies (n = 143), as well as the absence of direct repeats in most IS elements suggest that the IS elements of A. asiaticus are transpositionally inactive. Although we could show transcription of 13 IS elements, we did not find experimental evidence for transpositional activity, corroborating our results from sequence analyses. However, we detected contiguous transcripts between IS elements and their downstream genes at nine loci in the A. asiaticus genome, indicating that some IS elements influence the transcription of downstream genes, some of which might be important for host cell interaction. Conclusions Taken together, the IS elements in the A. asiaticus genome are currently in the process of degradation and largely represent reflections of the evolutionary past of A. asiaticus in which its genome was shaped by their activity. PMID:21943072

  4. GenomeRNAi: a database for cell-based RNAi phenotypes.

    PubMed

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at http://rnai.dkfz.de.

  5. GenomeRNAi: a database for cell-based RNAi phenotypes

    PubMed Central

    Horn, Thomas; Arziman, Zeynep; Berger, Juerg; Boutros, Michael

    2007-01-01

    RNA interference (RNAi) has emerged as a powerful tool to generate loss-of-function phenotypes in a variety of organisms. Combined with the sequence information of almost completely annotated genomes, RNAi technologies have opened new avenues to conduct systematic genetic screens for every annotated gene in the genome. As increasing large datasets of RNAi-induced phenotypes become available, an important challenge remains the systematic integration and annotation of functional information. Genome-wide RNAi screens have been performed both in Caenorhabditis elegans and Drosophila for a variety of phenotypes and several RNAi libraries have become available to assess phenotypes for almost every gene in the genome. These screens were performed using different types of assays from visible phenotypes to focused transcriptional readouts and provide a rich data source for functional annotation across different species. The GenomeRNAi database provides access to published RNAi phenotypes obtained from cell-based screens and maps them to their genomic locus, including possible non-specific regions. The database also gives access to sequence information of RNAi probes used in various screens. It can be searched by phenotype, by gene, by RNAi probe or by sequence and is accessible at PMID:17135194

  6. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGES

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; ...

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  7. BactoGeNIE: a large-scale comparative genome visualization for big displays

    PubMed Central

    2015-01-01

    Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021

  8. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  9. A pair of new BAC and BIBAC vectors that facilitate BAC/BIBAC library construction and intact large genomic DNA insert exchange.

    PubMed

    Shi, Xue; Zeng, Haiyang; Xue, Yadong; Luo, Meizhong

    2011-10-11

    Large-insert BAC and BIBAC libraries are important tools for structural and functional genomics studies of eukaryotic genomes. To facilitate the construction of BAC and BIBAC libraries and the transfer of complete large BAC inserts into BIBAC vectors, which is desired in positional cloning, we developed a pair of new BAC and BIBAC vectors. The new BAC vector pIndigoBAC536-S and the new BIBAC vector BIBAC-S have the following features: 1) both contain two 18-bp non-palindromic I-SceI sites in an inverted orientation at positions that flank an identical DNA fragment containing the lacZ selection marker and the cloning site. Large DNA inserts can be excised from the vectors as single fragments by cutting with I-SceI, allowing the inserts to be easily sized. More importantly, because the two vectors contain different antibiotic resistance genes for transformant selection and produce the same non-complementary 3' protruding ATAA ends by I-SceI that suppress self- and inter-ligations, the exchange of intact large genomic DNA inserts between the BAC and BIBAC vectors is straightforward; 2) both were constructed as high-copy composite vectors. Reliable linearized and dephosphorylated original low-copy pIndigoBAC536-S and BIBAC-S vectors that are ready for library construction can be prepared from the high-copy composite vectors pHZAUBAC1 and pHZAUBIBAC1, respectively, without the need for additional preparation steps or special reagents, thus simplifying the construction of BAC and BIBAC libraries. BIBAC clones constructed with the new BIBAC-S vector are stable in both E. coli and Agrobacterium. The vectors can be accessed through our website http://GResource.hzau.edu.cn. The two new vectors and their respective high-copy composite vectors can largely facilitate the construction and characterization of BAC and BIBAC libraries. The transfer of complete large genomic DNA inserts from one vector to the other is made straightforward.

  10. Genome-wide single nucleotide polymorphisms reveal population history and adaptive divergence in wild guppies.

    PubMed

    Willing, Eva-Maria; Bentzen, Paul; van Oosterhout, Cock; Hoffmann, Margarete; Cable, Joanne; Breden, Felix; Weigel, Detlef; Dreyer, Christine

    2010-03-01

    Adaptation of guppies (Poecilia reticulata) to contrasting upland and lowland habitats has been extensively studied with respect to behaviour, morphology and life history traits. Yet population history has not been studied at the whole-genome level. Although single nucleotide polymorphisms (SNPs) are the most abundant form of variation in many genomes and consequently very informative for a genome-wide picture of standing natural variation in populations, genome-wide SNP data are rarely available for wild vertebrates. Here we use genetically mapped SNP markers to comprehensively survey genetic variation within and among naturally occurring guppy populations from a wide geographic range in Trinidad and Venezuela. Results from three different clustering methods, Neighbor-net, principal component analysis (PCA) and Bayesian analysis show that the population substructure agrees with geographic separation and largely with previously hypothesized patterns of historical colonization. Within major drainages (Caroni, Oropouche and Northern), populations are genetically similar, but those in different geographic regions are highly divergent from one another, with some indications of ancient shared polymorphisms. Clear genomic signatures of a previous introduction experiment were seen, and we detected additional potential admixture events. Headwater populations were significantly less heterozygous than downstream populations. Pairwise F(ST) values revealed marked differences in allele frequencies among populations from different regions, and also among populations within the same region. F(ST) outlier methods indicated some regions of the genome as being under directional selection. Overall, this study demonstrates the power of a genome-wide SNP data set to inform for studies on natural variation, adaptation and evolution of wild populations.

  11. Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

    PubMed

    Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

    2017-07-11

    A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.

  12. A difference in the pattern of repair in a large genomic region in UV-irradiated normal human and Cockayne syndrome cells.

    PubMed

    Shanower, G A; Kantor, G J

    1997-11-01

    Xeroderma pigmentosum group C cells repair DNA damaged by ultraviolet radiation in an unusual pattern throughout the genome. They remove cyclobutane pyrimidine dimers only from the DNA of transcriptionally active chromatin regions and only from the strand that contains the transcribed strand. The repair proceeds in a manner that creates damage-free islands which are in some cases much larger than the active gene associated with them. For example, the small transcriptionally active beta-actin gene (3.5 kb) is repaired as part of a 50 kb single-stranded region. The repair responsible for creating these islands requires active transcription, suggesting that the two activities are coupled. A preferential repair pathway in normal human cells promotes repair of actively transcribed DNA strands and is coupled to transcription. It is not known if similar large islands, referred to as repair domains, are preferentially created as a result of the coupling. Data are presented showing that in normal cells, preferential repair in the beta-actin region is associated with the creation of a large, completely repaired region in the partially repaired genome. Repair at other genomic locations which contain inactive genes (insulin, 754) does not create similar large regions as quickly. In contrast, repair in Cockayne syndrome cells, which are defective in the preferential repair pathway but not in genome-overall repair, proceeds in the beta-actin region by a mechanism which does not create preferentially a large repaired region. Thus a correlation between the activity required to preferentially repair active genes and that required to create repaired domains is detected. We propose an involvement of the transcription-repair coupling factor in a coordinated repair pathway for removing DNA damage from entire transcription units.

  13. Anthocyanin inhibits propidium iodide DNA fluorescence in Euphorbia pulcherrima: implications for genome size variation and flow cytometry.

    PubMed

    Bennett, Michael D; Price, H James; Johnston, J Spencer

    2008-04-01

    Measuring genome size by flow cytometry assumes direct proportionality between nuclear DNA staining and DNA amount. By 1997 it was recognized that secondary metabolites may affect DNA staining, thereby causing inaccuracy. Here experiments are reported with poinsettia (Euphorbia pulcherrima) with green leaves and red bracts rich in phenolics. DNA content was estimated as fluorescence of propidium iodide (PI)-stained nuclei of poinsettia and/or pea (Pisum sativum) using flow cytometry. Tissue was chopped, or two tissues co-chopped, in Galbraith buffer alone or with six concentrations of cyanidin-3-rutinoside (a cyanidin-3-rhamnoglucoside contributing to red coloration in poinsettia). There were large differences in PI staining (35-70 %) between 2C nuclei from green leaf and red bract tissue in poinsettia. These largely disappeared when pea leaflets were co-chopped with poinsettia tissue as an internal standard. However, smaller (2.8-6.9 %) differences remained, and red bracts gave significantly lower 1C genome size estimates (1.69-1.76 pg) than green leaves (1.81 pg). Chopping pea or poinsettia tissue in buffer with 0-200 microm cyanidin-3-rutinoside showed that the effects of natural inhibitors in red bracts of poinsettia on PI staining were largely reproduced in a dose-dependent way by this anthocyanin. Given their near-ubiquitous distribution, many suspected roles and known affects on DNA staining, anthocyanins are a potent, potential cause of significant error variation in genome size estimations for many plant tissues and taxa. This has important implications of wide practical and theoretical significance. When choosing genome size calibration standards it seems prudent to select materials producing little or no anthocyanin. Reviewing the literature identifies clear examples in which claims of intraspecific variation in genome size are probably artefacts caused by natural variation in anthocyanin levels or correlated with environmental factors known to induce variation in pigmentation.

  14. Impact of QTL minor allele frequency on genomic evaluation using real genotype data and simulated phenotypes in Japanese Black cattle.

    PubMed

    Uemoto, Yoshinobu; Sasaki, Shinji; Kojima, Takatoshi; Sugimoto, Yoshikazu; Watanabe, Toshio

    2015-11-19

    Genetic variance that is not captured by single nucleotide polymorphisms (SNPs) is due to imperfect linkage disequilibrium (LD) between SNPs and quantitative trait loci (QTLs), and the extent of LD between SNPs and QTLs depends on different minor allele frequencies (MAF) between them. To evaluate the impact of MAF of QTLs on genomic evaluation, we performed a simulation study using real cattle genotype data. In total, 1368 Japanese Black cattle and 592,034 SNPs (Illumina BovineHD BeadChip) were used. We simulated phenotypes using real genotypes under different scenarios, varying the MAF categories, QTL heritability, number of QTLs, and distribution of QTL effect. After generating true breeding values and phenotypes, QTL heritability was estimated and the prediction accuracy of genomic estimated breeding value (GEBV) was assessed under different SNP densities, prediction models, and population size by a reference-test validation design. The extent of LD between SNPs and QTLs in this population was higher in the QTLs with high MAF than in those with low MAF. The effect of MAF of QTLs depended on the genetic architecture, evaluation strategy, and population size in genomic evaluation. In genetic architecture, genomic evaluation was affected by the MAF of QTLs combined with the QTL heritability and the distribution of QTL effect. The number of QTL was not affected on genomic evaluation if the number of QTL was more than 50. In the evaluation strategy, we showed that different SNP densities and prediction models affect the heritability estimation and genomic prediction and that this depends on the MAF of QTLs. In addition, accurate QTL heritability and GEBV were obtained using denser SNP information and the prediction model accounted for the SNPs with low and high MAFs. In population size, a large sample size is needed to increase the accuracy of GEBV. The MAF of QTL had an impact on heritability estimation and prediction accuracy. Most genetic variance can be captured using denser SNPs and the prediction model accounted for MAF, but a large sample size is needed to increase the accuracy of GEBV under all QTL MAF categories.

  15. Wheat EST resources for functional genomics of abiotic stress

    PubMed Central

    Houde, Mario; Belcaid, Mahdi; Ouellet, François; Danyluk, Jean; Monroy, Antonio F; Dryanova, Ani; Gulick, Patrick; Bergeron, Anne; Laroche, André; Links, Matthew G; MacCarthy, Luke; Crosby, William L; Sarhan, Fathey

    2006-01-01

    Background Wheat is an excellent species to study freezing tolerance and other abiotic stresses. However, the sequence of the wheat genome has not been completely characterized due to its complexity and large size. To circumvent this obstacle and identify genes involved in cold acclimation and associated stresses, a large scale EST sequencing approach was undertaken by the Functional Genomics of Abiotic Stress (FGAS) project. Results We generated 73,521 quality-filtered ESTs from eleven cDNA libraries constructed from wheat plants exposed to various abiotic stresses and at different developmental stages. In addition, 196,041 ESTs for which tracefiles were available from the National Science Foundation wheat EST sequencing program and DuPont were also quality-filtered and used in the analysis. Clustering of the combined ESTs with d2_cluster and TGICL yielded a few large clusters containing several thousand ESTs that were refractory to routine clustering techniques. To resolve this problem, the sequence proximity and "bridges" were identified by an e-value distance graph to manually break clusters into smaller groups. Assembly of the resolved ESTs generated a 75,488 unique sequence set (31,580 contigs and 43,908 singletons/singlets). Digital expression analyses indicated that the FGAS dataset is enriched in stress-regulated genes compared to the other public datasets. Over 43% of the unique sequence set was annotated and classified into functional categories according to Gene Ontology. Conclusion We have annotated 29,556 different sequences, an almost 5-fold increase in annotated sequences compared to the available wheat public databases. Digital expression analysis combined with gene annotation helped in the identification of several pathways associated with abiotic stress. The genomic resources and knowledge developed by this project will contribute to a better understanding of the different mechanisms that govern stress tolerance in wheat and other cereals. PMID:16772040

  16. Non-equivalent contributions of maternal and paternal genomes to early plant embryogenesis.

    PubMed

    Del Toro-De León, Gerardo; García-Aguilar, Marcelina; Gillmor, C Stewart

    2014-10-30

    Zygotic genome activation in metazoans typically occurs several hours to a day after fertilization, and thus maternal RNAs and proteins drive early animal embryo development. In plants, despite several molecular studies of post-fertilization transcriptional activation, the timing of zygotic genome activation remains a matter of debate. For example, two recent reports that used different hybrid ecotype combinations for RNA sequence profiling of early Arabidopsis embryo transcriptomes came to divergent conclusions. One identified paternal contributions that varied by gene, but with overall maternal dominance, while the other found that the maternal and paternal genomes are transcriptionally equivalent. Here we assess paternal gene activation functionally in an isogenic background, by performing a large-scale genetic analysis of 49 EMBRYO DEFECTIVE genes and testing the ability of wild-type paternal alleles to complement phenotypes conditioned by mutant maternal alleles. Our results demonstrate that wild-type paternal alleles for nine of these genes are completely functional 2 days after pollination, with the remaining 40 genes showing partial activity beginning at 2, 3 or 5 days after pollination. Using our functional assay, we also demonstrate that different hybrid combinations exhibit significant variation in paternal allele activation, reconciling the apparently contradictory results of previous transcriptional studies. The variation in timing of gene function that we observe confirms that paternal genome activation does not occur in one early discrete step, provides large-scale functional evidence that maternal and paternal genomes make non-equivalent contributions to early plant embryogenesis, and uncovers an unexpectedly profound effect of hybrid genetic backgrounds on paternal gene activity.

  17. Successful application of FTA Classic Card technology and use of bacteriophage phi29 DNA polymerase for large-scale field sampling and cloning of complete maize streak virus genomes.

    PubMed

    Owor, Betty E; Shepherd, Dionne N; Taylor, Nigel J; Edema, Richard; Monjane, Adérito L; Thomson, Jennifer A; Martin, Darren P; Varsani, Arvind

    2007-03-01

    Leaf samples from 155 maize streak virus (MSV)-infected maize plants were collected from 155 farmers' fields in 23 districts in Uganda in May/June 2005 by leaf-pressing infected samples onto FTA Classic Cards. Viral DNA was successfully extracted from cards stored at room temperature for 9 months. The diversity of 127 MSV isolates was analysed by PCR-generated RFLPs. Six representative isolates having different RFLP patterns and causing either severe, moderate or mild disease symptoms, were chosen for amplification from FTA cards by bacteriophage phi29 DNA polymerase using the TempliPhi system. Full-length genomes were inserted into a cloning vector using a unique restriction enzyme site, and sequenced. The 1.3-kb PCR product amplified directly from FTA-eluted DNA and used for RFLP analysis was also cloned and sequenced. Comparison of cloned whole genome sequences with those of the original PCR products indicated that the correct virus genome had been cloned and that no errors were introduced by the phi29 polymerase. This is the first successful large-scale application of FTA card technology to the field, and illustrates the ease with which large numbers of infected samples can be collected and stored for downstream molecular applications such as diversity analysis and cloning of potentially new virus genomes.

  18. Nucleolus association of chromosomal domains is largely maintained in cellular senescence despite massive nuclear reorganisation.

    PubMed

    Dillinger, Stefan; Straub, Tobias; Németh, Attila

    2017-01-01

    Mammalian chromosomes are organized in structural and functional domains of 0.1-10 Mb, which are characterized by high self-association frequencies in the nuclear space and different contact probabilities with nuclear sub-compartments. They exhibit distinct chromatin modification patterns, gene expression levels and replication timing. Recently, nucleolus-associated chromosomal domains (NADs) have been discovered, yet their precise genomic organization and dynamics are still largely unknown. Here, we use nucleolus genomics and single-cell experiments to address these questions in human embryonic fibroblasts during replicative senescence. Genome-wide mapping reveals 1,646 NADs in proliferating cells, which cover about 38% of the annotated human genome. They are mainly heterochromatic and correlate with late replicating loci. Using Hi-C data analysis, we show that interactions of NADs dominate interphase chromosome contacts in the 10-50 Mb distance range. Interestingly, only minute changes in nucleolar association are observed upon senescence. These spatial rearrangements in subdomains smaller than 100 kb are accompanied with local transcriptional changes. In contrast, large centromeric and pericentromeric satellite repeat clusters extensively dissociate from nucleoli in senescent cells. Accordingly, H3K9me3-marked heterochromatin gets remodelled at the perinucleolar space as revealed by immunofluorescence analyses. Collectively, this study identifies connections between the nucleolus, 3D genome structure, and cellular aging at the level of interphase chromosome organization.

  19. Nucleolus association of chromosomal domains is largely maintained in cellular senescence despite massive nuclear reorganisation

    PubMed Central

    Dillinger, Stefan

    2017-01-01

    Mammalian chromosomes are organized in structural and functional domains of 0.1–10 Mb, which are characterized by high self-association frequencies in the nuclear space and different contact probabilities with nuclear sub-compartments. They exhibit distinct chromatin modification patterns, gene expression levels and replication timing. Recently, nucleolus-associated chromosomal domains (NADs) have been discovered, yet their precise genomic organization and dynamics are still largely unknown. Here, we use nucleolus genomics and single-cell experiments to address these questions in human embryonic fibroblasts during replicative senescence. Genome-wide mapping reveals 1,646 NADs in proliferating cells, which cover about 38% of the annotated human genome. They are mainly heterochromatic and correlate with late replicating loci. Using Hi-C data analysis, we show that interactions of NADs dominate interphase chromosome contacts in the 10–50 Mb distance range. Interestingly, only minute changes in nucleolar association are observed upon senescence. These spatial rearrangements in subdomains smaller than 100 kb are accompanied with local transcriptional changes. In contrast, large centromeric and pericentromeric satellite repeat clusters extensively dissociate from nucleoli in senescent cells. Accordingly, H3K9me3-marked heterochromatin gets remodelled at the perinucleolar space as revealed by immunofluorescence analyses. Collectively, this study identifies connections between the nucleolus, 3D genome structure, and cellular aging at the level of interphase chromosome organization. PMID:28575119

  20. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants.

    PubMed

    Hehir-Kwa, Jayne Y; Marschall, Tobias; Kloosterman, Wigard P; Francioli, Laurent C; Baaijens, Jasmijn A; Dijkstra, Louis J; Abdellaoui, Abdel; Koval, Vyacheslav; Thung, Djie Tjwan; Wardenaar, René; Renkens, Ivo; Coe, Bradley P; Deelen, Patrick; de Ligt, Joep; Lameijer, Eric-Wubbo; van Dijk, Freerk; Hormozdiari, Fereydoun; Uitterlinden, André G; van Duijn, Cornelia M; Eichler, Evan E; de Bakker, Paul I W; Swertz, Morris A; Wijmenga, Cisca; van Ommen, Gert-Jan B; Slagboom, P Eline; Boomsma, Dorret I; Schönhuth, Alexander; Ye, Kai; Guryev, Victor

    2016-10-06

    Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Labbe, Jessy L; Uehling, Jessie K; Payen, Thibaut

    The last 10 years have seen the cost of sequencing complete genomes decrease at an incredible speed. This has led to an increase in the number of genomes sequenced in all the fungal tree of life as well as a wide variety of plant genomes. The increase in sequencing has permitted us to study the evolution of organisms on a genomic scale. A number of talks during the conference discussed the importance of transposable elements (TEs) that are present in almost all species of fungi. These TEs represent an especially large percentage of genomic space in fungi that interact withmore » plants. Thierry Rouxel (INRA, Nancy, France) showed the link between speciation in the Leptosphaeria complex and the expansion of TE families. For example in the Leptosphaeria complex, one species associated with oilseed rape has experienced a recent and massive burst of movement by a few TE families. The alterations caused by these TEs took place in discrete regions of the genome leading to shuffling of the genomic landscape and the appearance of genes specific to the species, such as effectors useful for the interactions with a particular plant (Rouxel et al., 2011). Other presentations showed the importance of TEs in affecting genome organization. For example, in Amanita different species appear to have been invaded by different TE families (Veneault-Fourrey & Martin, 2011).« less

  2. EULER-PCR: finishing experiments for repeat resolution.

    PubMed

    Mulyukov, Zufar; Pevzner, Pavel A

    2002-01-01

    Genomic sequencing typically generates a large collection of unordered contigs or scaffolds. Contig ordering (also known as gap closure) is a non-trivial algorithmic and experimental problem since even relatively simple-to-assemble bacterial genomes typically result in large set of contigs. Neighboring contigs maybe separated either by gaps in read coverage or by repeats. In the later case we say that the contigs are separated by pseudogaps, and we emphasize the important difference between gap closure and pseudogap closure. The existing gap closure approaches do not distinguish between gaps and pseudogaps and treat them in the same way. We describe a new fast strategy for closing pseudogaps (repeat resolution). Since in highly repetitive genomes, the number of pseudogaps may exceed the number of gaps by an order of magnitude, this approach provides a significant advantage over the existing gap closure methods.

  3. Mycobacterium leprae: genes, pseudogenes and genetic diversity

    PubMed Central

    Singh, Pushpendra; Cole, Stewart T

    2011-01-01

    Leprosy, which has afflicted human populations for millenia, results from infection with Mycobacterium leprae, an unculturable pathogen with an exceptionally long generation time. Considerable insight into the biology and drug resistance of the leprosy bacillus has been obtained from genomics. M. leprae has undergone reductive evolution and pseudogenes now occupy half of its genome. Comparative genomics of four different strains revealed remarkable conservation of the genome (99.995% identity) yet uncovered 215 polymorphic sites, mainly single nucleotide polymorphisms, and a handful of new pseudogenes. Mapping these polymorphisms in a large panel of strains defined 16 single nucleotide polymorphism-subtypes that showed strong geographical associations and helped retrace the evolution of M. leprae. PMID:21162636

  4. Demographic history, selection and functional diversity of the canine genome.

    PubMed

    Ostrander, Elaine A; Wayne, Robert K; Freedman, Adam H; Davis, Brian W

    2017-12-01

    The domestic dog represents one of the most dramatic long-term evolutionary experiments undertaken by humans. From a large wolf-like progenitor, unparalleled diversity in phenotype and behaviour has developed in dogs, providing a model for understanding the developmental and genomic mechanisms of diversification. We discuss pattern and process in domestication, beginning with general findings about early domestication and problems in documenting selection at the genomic level. Furthermore, we summarize genotype-phenotype studies based first on single nucleotide polymorphism (SNP) genotyping and then with whole-genome data and show how an understanding of evolution informs topics as different as human history, adaptive and deleterious variation, morphological development, ageing, cancer and behaviour.

  5. Emerging patterns of somatic mutations in cancer

    PubMed Central

    Watson, Ian R.; Takahashi, Koichi; Futreal, P. Andrew; Chin, Lynda

    2014-01-01

    The advance in technological tools for massively parallel, high-throughput sequencing of DNA has enabled the comprehensive characterization of somatic mutations in large number of tumor samples. Here, we review recent cancer genomic studies that have assembled emerging views of the landscapes of somatic mutations through deep sequencing analyses of the coding exomes and whole genomes in various cancer types. We discuss the comparative genomics of different cancers, including mutation rates, spectrums, and roles of environmental insults that influence these processes. We highlight the developing statistical approaches used to identify significantly mutated genes, and discuss the emerging biological and clinical insights from such analyses as well as the challenges ahead translating these genomic data into clinical impacts. PMID:24022702

  6. Genome-Wide Association Mapping and Genomic Prediction Elucidate the Genetic Architecture of Morphological Traits in Arabidopsis.

    PubMed

    Kooke, Rik; Kruijer, Willem; Bours, Ralph; Becker, Frank; Kuhn, André; van de Geest, Henri; Buntjer, Jaap; Doeswijk, Timo; Guerra, José; Bouwmeester, Harro; Vreugdenhil, Dick; Keurentjes, Joost J B

    2016-04-01

    Quantitative traits in plants are controlled by a large number of genes and their interaction with the environment. To disentangle the genetic architecture of such traits, natural variation within species can be explored by studying genotype-phenotype relationships. Genome-wide association studies that link phenotypes to thousands of single nucleotide polymorphism markers are nowadays common practice for such analyses. In many cases, however, the identified individual loci cannot fully explain the heritability estimates, suggesting missing heritability. We analyzed 349 Arabidopsis accessions and found extensive variation and high heritabilities for different morphological traits. The number of significant genome-wide associations was, however, very low. The application of genomic prediction models that take into account the effects of all individual loci may greatly enhance the elucidation of the genetic architecture of quantitative traits in plants. Here, genomic prediction models revealed different genetic architectures for the morphological traits. Integrating genomic prediction and association mapping enabled the assignment of many plausible candidate genes explaining the observed variation. These genes were analyzed for functional and sequence diversity, and good indications that natural allelic variation in many of these genes contributes to phenotypic variation were obtained. For ACS11, an ethylene biosynthesis gene, haplotype differences explaining variation in the ratio of petiole and leaf length could be identified. © 2016 American Society of Plant Biologists. All Rights Reserved.

  7. Prophage Integrase Typing Is a Useful Indicator of Genomic Diversity in Salmonella enterica

    PubMed Central

    Colavecchio, Anna; D’Souza, Yasmin; Tompkins, Elizabeth; Jeukens, Julie; Freschi, Luca; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Boyle, Brian; Bekal, Sadjia; Tamber, Sandeep; Levesque, Roger C.; Goodridge, Lawrence D.

    2017-01-01

    Salmonella enterica is a bacterial species that is a major cause of illness in humans and food-producing animals. S. enterica exhibits considerable inter-serovar diversity, as evidenced by the large number of host adapted serovars that have been identified. The development of methods to assess genome diversity in S. enterica will help to further define the limits of diversity in this foodborne pathogen. Thus, we evaluated a PCR assay, which targets prophage integrase genes, as a rapid method to investigate S. enterica genome diversity. To evaluate the PCR prophage integrase assay, 49 isolates of S. enterica were selected, including 19 clinical isolates from clonal serovars (Enteritidis and Heidelberg) that commonly cause human illness, and 30 isolates from food-associated Salmonella serovars that rarely cause human illness. The number of integrase genes identified by the PCR assay was compared to the number of integrase genes within intact prophages identified by whole genome sequencing and phage finding program PHASTER. The PCR assay identified a total of 147 prophage integrase genes within the 49 S. enterica genomes (79 integrase genes in the food-associated Salmonella isolates, 50 integrase genes in S. Enteritidis, and 18 integrase genes in S. Heidelberg). In comparison, whole genome sequencing and PHASTER identified a total of 75 prophage integrase genes within 102 intact prophages in the 49 S. enterica genomes (44 integrase genes in the food-associated Salmonella isolates, 21 integrase genes in S. Enteritidis, and 9 integrase genes in S. Heidelberg). Collectively, both the PCR assay and PHASTER identified the presence of a large diversity of prophage integrase genes in the food-associated isolates compared to the clinical isolates, thus indicating a high degree of diversity in the food-associated isolates, and confirming the clonal nature of S. Enteritidis and S. Heidelberg. Moreover, PHASTER revealed a diversity of 29 different types of prophages and 23 different integrase genes within the food-associated isolates, but only identified four different phages and integrase genes within clonal isolates of S. Enteritidis and S. Heidelberg. These results demonstrate the potential usefulness of PCR based detection of prophage integrase genes as a rapid indicator of genome diversity in S. enterica. PMID:28740489

  8. Prophage Integrase Typing Is a Useful Indicator of Genomic Diversity in Salmonella enterica.

    PubMed

    Colavecchio, Anna; D'Souza, Yasmin; Tompkins, Elizabeth; Jeukens, Julie; Freschi, Luca; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Boyle, Brian; Bekal, Sadjia; Tamber, Sandeep; Levesque, Roger C; Goodridge, Lawrence D

    2017-01-01

    Salmonella enterica is a bacterial species that is a major cause of illness in humans and food-producing animals. S. enterica exhibits considerable inter-serovar diversity, as evidenced by the large number of host adapted serovars that have been identified. The development of methods to assess genome diversity in S. enterica will help to further define the limits of diversity in this foodborne pathogen. Thus, we evaluated a PCR assay, which targets prophage integrase genes, as a rapid method to investigate S. enterica genome diversity. To evaluate the PCR prophage integrase assay, 49 isolates of S. enterica were selected, including 19 clinical isolates from clonal serovars (Enteritidis and Heidelberg) that commonly cause human illness, and 30 isolates from food-associated Salmonella serovars that rarely cause human illness. The number of integrase genes identified by the PCR assay was compared to the number of integrase genes within intact prophages identified by whole genome sequencing and phage finding program PHASTER. The PCR assay identified a total of 147 prophage integrase genes within the 49 S. enterica genomes (79 integrase genes in the food-associated Salmonella isolates, 50 integrase genes in S . Enteritidis, and 18 integrase genes in S . Heidelberg). In comparison, whole genome sequencing and PHASTER identified a total of 75 prophage integrase genes within 102 intact prophages in the 49 S. enterica genomes (44 integrase genes in the food-associated Salmonella isolates, 21 integrase genes in S . Enteritidis, and 9 integrase genes in S . Heidelberg). Collectively, both the PCR assay and PHASTER identified the presence of a large diversity of prophage integrase genes in the food-associated isolates compared to the clinical isolates, thus indicating a high degree of diversity in the food-associated isolates, and confirming the clonal nature of S . Enteritidis and S . Heidelberg. Moreover, PHASTER revealed a diversity of 29 different types of prophages and 23 different integrase genes within the food-associated isolates, but only identified four different phages and integrase genes within clonal isolates of S. Enteritidis and S. Heidelberg. These results demonstrate the potential usefulness of PCR based detection of prophage integrase genes as a rapid indicator of genome diversity in S. enterica .

  9. Developmental pathways inferred from modularity, morphological integration and fluctuating asymmetry patterns in the human face.

    PubMed

    Quinto-Sánchez, Mirsha; Muñoz-Muñoz, Francesc; Gomez-Valdes, Jorge; Cintas, Celia; Navarro, Pablo; Cerqueira, Caio Cesar Silva de; Paschetta, Carolina; de Azevedo, Soledad; Ramallo, Virginia; Acuña-Alonzo, Victor; Adhikari, Kaustubh; Fuentes-Guajardo, Macarena; Hünemeier, Tábita; Everardo, Paola; de Avila, Francisco; Jaramillo, Claudia; Arias, Williams; Gallo, Carla; Poletti, Giovani; Bedoya, Gabriel; Bortolini, Maria Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Rosique, Javier; Ruiz-Linares, Andres; Gonzalez-Jose, Rolando

    2018-01-17

    Facial asymmetries are usually measured and interpreted as proxies to developmental noise. However, analyses focused on its developmental and genetic architecture are scarce. To advance on this topic, studies based on a comprehensive and simultaneous analysis of modularity, morphological integration and facial asymmetries including both phenotypic and genomic information are needed. Here we explore several modularity hypotheses on a sample of Latin American mestizos, in order to test if modularity and integration patterns differ across several genomic ancestry backgrounds. To do so, 4104 individuals were analyzed using 3D photogrammetry reconstructions and a set of 34 facial landmarks placed on each individual. We found a pattern of modularity and integration that is conserved across sub-samples differing in their genomic ancestry background. Specifically, a signal of modularity based on functional demands and organization of the face is regularly observed across the whole sample. Our results shed more light on previous evidence obtained from Genome Wide Association Studies performed on the same samples, indicating the action of different genomic regions contributing to the expression of the nose and mouth facial phenotypes. Our results also indicate that large samples including phenotypic and genomic metadata enable a better understanding of the developmental and genetic architecture of craniofacial phenotypes.

  10. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    PubMed Central

    Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

    2007-01-01

    Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571

  11. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    PubMed

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.

  12. Genome-Wide Analysis of Grain Yield Stability and Environmental Interactions in a Multiparental Soybean Population.

    PubMed

    Xavier, Alencar; Jarquin, Diego; Howard, Reka; Ramasubramanian, Vishnu; Specht, James E; Graef, George L; Beavis, William D; Diers, Brian W; Song, Qijian; Cregan, Perry B; Nelson, Randall; Mian, Rouf; Shannon, J Grover; McHale, Leah; Wang, Dechun; Schapaugh, William; Lorenz, Aaron J; Xu, Shizhong; Muir, William M; Rainey, Katy M

    2018-02-02

    Genetic improvement toward optimized and stable agronomic performance of soybean genotypes is desirable for food security. Understanding how genotypes perform in different environmental conditions helps breeders develop sustainable cultivars adapted to target regions. Complex traits of importance are known to be controlled by a large number of genomic regions with small effects whose magnitude and direction are modulated by environmental factors. Knowledge of the constraints and undesirable effects resulting from genotype by environmental interactions is a key objective in improving selection procedures in soybean breeding programs. In this study, the genetic basis of soybean grain yield responsiveness to environmental factors was examined in a large soybean nested association population. For this, a genome-wide association to performance stability estimates generated from a Finlay-Wilkinson analysis and the inclusion of the interaction between marker genotypes and environmental factors was implemented. Genomic footprints were investigated by analysis and meta-analysis using a recently published multiparent model. Results indicated that specific soybean genomic regions were associated with stability, and that multiplicative interactions were present between environments and genetic background. Seven genomic regions in six chromosomes were identified as being associated with genotype-by-environment interactions. This study provides insight into genomic assisted breeding aimed at achieving a more stable agronomic performance of soybean, and documented opportunities to exploit genomic regions that were specifically associated with interactions involving environments and subpopulations. Copyright © 2018 Xavier et al.

  13. VAT: a computational framework to functionally annotate variants in personal genomes within a cloud-computing environment

    PubMed Central

    Habegger, Lukas; Balasubramanian, Suganthi; Chen, David Z.; Khurana, Ekta; Sboner, Andrea; Harmanci, Arif; Rozowsky, Joel; Clarke, Declan; Snyder, Michael; Gerstein, Mark

    2012-01-01

    Summary: The functional annotation of variants obtained through sequencing projects is generally assumed to be a simple intersection of genomic coordinates with genomic features. However, complexities arise for several reasons, including the differential effects of a variant on alternatively spliced transcripts, as well as the difficulty in assessing the impact of small insertions/deletions and large structural variants. Taking these factors into consideration, we developed the Variant Annotation Tool (VAT) to functionally annotate variants from multiple personal genomes at the transcript level as well as obtain summary statistics across genes and individuals. VAT also allows visualization of the effects of different variants, integrates allele frequencies and genotype data from the underlying individuals and facilitates comparative analysis between different groups of individuals. VAT can either be run through a command-line interface or as a web application. Finally, in order to enable on-demand access and to minimize unnecessary transfers of large data files, VAT can be run as a virtual machine in a cloud-computing environment. Availability and Implementation: VAT is implemented in C and PHP. The VAT web service, Amazon Machine Image, source code and detailed documentation are available at vat.gersteinlab.org. Contact: lukas.habegger@yale.edu or mark.gerstein@yale.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:22743228

  14. Intrapopulation Genome Size Variation in D. melanogaster Reflects Life History Variation and Plasticity

    PubMed Central

    Ellis, Lisa L.; Huang, Wen; Quinn, Andrew M.; Ahuja, Astha; Alfrejd, Ben; Gomez, Francisco E.; Hjelmen, Carl E.; Moore, Kristi L.; Mackay, Trudy F. C.; Johnston, J. Spencer; Tarone, Aaron M.

    2014-01-01

    We determined female genome sizes using flow cytometry for 211 Drosophila melanogaster sequenced inbred strains from the Drosophila Genetic Reference Panel, and found significant conspecific and intrapopulation variation in genome size. We also compared several life history traits for 25 lines with large and 25 lines with small genomes in three thermal environments, and found that genome size as well as genome size by temperature interactions significantly correlated with survival to pupation and adulthood, time to pupation, female pupal mass, and female eclosion rates. Genome size accounted for up to 23% of the variation in developmental phenotypes, but the contribution of genome size to variation in life history traits was plastic and varied according to the thermal environment. Expression data implicate differences in metabolism that correspond to genome size variation. These results indicate that significant genome size variation exists within D. melanogaster and this variation may impact the evolutionary ecology of the species. Genome size variation accounts for a significant portion of life history variation in an environmentally dependent manner, suggesting that potential fitness effects associated with genome size variation also depend on environmental conditions. PMID:25057905

  15. Host genetics of response to porcine reproductive and respiratory syndrome in nursery pigs.

    PubMed

    Dekkers, Jack; Rowland, Raymond R R; Lunney, Joan K; Plastow, Graham

    2017-09-01

    PRRS is the most costly disease in the US pig industry. While vaccination, biosecurity and eradication effort have had some success, the variability and infectiousness of PRRS virus strains have hampered the effectiveness of these measures. We propose the use of genetic selection of pigs as an additional and complementary effort. Several studies have shown that host response to PRRS infection has a sizeable genetic component and recent advances in genomics provide opportunities to capitalize on these genetic differences and improve our understanding of host response to PRRS. While work is also ongoing to understand the genetic basis of host response to reproductive PRRS, the focus of this review is on research conducted on host response to PRRS in the nursery and grow-finish phase as part of the PRRS Host Genetics Consortium. Using experimental infection of large numbers of commercial nursery pigs, combined with deep phenotyping and genomics, this research has identified a major gene that is associated with host response to PRRS. Further functional genomics work identified the GBP5 gene as harboring the putative causative mutation. GBP5 is associated with innate immune response. Subsequent work has validated the effect of this genomic region on host response to a second PRRSV strain and to PRRS vaccination and co-infection of nursery pigs with PRRSV and PCV2b. A genetic marker near GBP5 is available to the industry for use in selection. Genetic differences in host response beyond GBP5 appear to be highly polygenic, i.e. controlled by many genes across the genome, each with a small effect. Such effects can by capitalized on in a selection program using genomic prediction on large numbers of genetic markers across the genome. Additional work has also identified the genetic basis of antibody response to PRRS, which could lead to the use of vaccine response as an indicator trait to select for host response to PRRS. Other genomic analyses, including gene expression analyses, have identified genes and modules of genes that are associated with differences in host response to PRRS and can be used to further understand and utilize differences in host response. Together, these results demonstrate that genetic selection can be an additional and complementary tool to combat PRRS in the swine industry. Copyright © 2017 Elsevier B.V. All rights reserved.

  16. Radiation hybrid maps of D-genome of Aegilops tauschii and their application in sequence assembly of large and complex plant genomes

    USDA-ARS?s Scientific Manuscript database

    The large and complex genome of bread wheat (Triticum aestivum L., ~17 Gb) requires high-resolution genome maps saturated with ordered markers to assist in anchoring and orienting BAC contigs/ sequence scaffolds for whole genome sequence assembly. Radiation hybrid (RH) mapping has proven to be an e...

  17. Ensembl 2004.

    PubMed

    Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

    2004-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.

  18. Challenges and opportunities for genomic developmental neuropsychology: examples from the Penn-Drexel collaborative battery.

    PubMed

    Gur, Ruben C; Irani, Farzin; Seligman, Sarah; Calkins, Monica E; Richard, Jan; Gur, Raquel E

    2011-08-01

    Genomics has been revolutionizing medicine over the past decade by offering mechanistic insights into disease processes and engendering the age of "individualized medicine." Because of the sheer number of measures generated by gene sequencing methods, genomics requires "Big Science" where large datasets on genes are analyzed in reference to electronic medical record data. This revolution has largely bypassed the behavioral neurosciences, mainly because of the paucity of behavioral data in medical records and the labor-intensity of available neuropsychological assessment methods. We describe the development and implementation of an efficient neuroscience-based computerized battery, coupled with a computerized clinical assessment procedure. This assessment package has been applied to a genomic study of 10,000 children aged 8-21, of whom 1000 also undergo neuroimaging. Results from the first 3000 participants indicate sensitivity to neurodevelopmental trajectories. Sex differences were evident, with females outperforming males in memory and social cognition domains, while for spatial processing males were more accurate and faster, and they were faster on simple motor tasks. The study illustrates what will hopefully become a major component of the work of clinical and research neuropsychologists as invaluable participants in the dawning age of Big Science neuropsychological genomics.

  19. Practical Approaches for Detecting Selection in Microbial Genomes.

    PubMed

    Hedge, Jessica; Wilson, Daniel J

    2016-02-01

    Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.

  20. Atlas2 Cloud: a framework for personal genome analysis in the cloud

    PubMed Central

    2012-01-01

    Background Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. Results We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. Conclusions We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms. PMID:23134663

  1. Atlas2 Cloud: a framework for personal genome analysis in the cloud.

    PubMed

    Evani, Uday S; Challis, Danny; Yu, Jin; Jackson, Andrew R; Paithankar, Sameer; Bainbridge, Matthew N; Jakkamsetti, Adinarayana; Pham, Peter; Coarfa, Cristian; Milosavljevic, Aleksandar; Yu, Fuli

    2012-01-01

    Until recently, sequencing has primarily been carried out in large genome centers which have invested heavily in developing the computational infrastructure that enables genomic sequence analysis. The recent advancements in next generation sequencing (NGS) have led to a wide dissemination of sequencing technologies and data, to highly diverse research groups. It is expected that clinical sequencing will become part of diagnostic routines shortly. However, limited accessibility to computational infrastructure and high quality bioinformatic tools, and the demand for personnel skilled in data analysis and interpretation remains a serious bottleneck. To this end, the cloud computing and Software-as-a-Service (SaaS) technologies can help address these issues. We successfully enabled the Atlas2 Cloud pipeline for personal genome analysis on two different cloud service platforms: a community cloud via the Genboree Workbench, and a commercial cloud via the Amazon Web Services using Software-as-a-Service model. We report a case study of personal genome analysis using our Atlas2 Genboree pipeline. We also outline a detailed cost structure for running Atlas2 Amazon on whole exome capture data, providing cost projections in terms of storage, compute and I/O when running Atlas2 Amazon on a large data set. We find that providing a web interface and an optimized pipeline clearly facilitates usage of cloud computing for personal genome analysis, but for it to be routinely used for large scale projects there needs to be a paradigm shift in the way we develop tools, in standard operating procedures, and in funding mechanisms.

  2. Parallel altitudinal clines reveal trends in adaptive evolution of genome size in Zea mays

    PubMed Central

    Berg, Jeremy J.; Birchler, James A.; Grote, Mark N.; Lorant, Anne; Quezada, Juvenal

    2018-01-01

    While the vast majority of genome size variation in plants is due to differences in repetitive sequence, we know little about how selection acts on repeat content in natural populations. Here we investigate parallel changes in intraspecific genome size and repeat content of domesticated maize (Zea mays) landraces and their wild relative teosinte across altitudinal gradients in Mesoamerica and South America. We combine genotyping, low coverage whole-genome sequence data, and flow cytometry to test for evidence of selection on genome size and individual repeat abundance. We find that population structure alone cannot explain the observed variation, implying that clinal patterns of genome size are maintained by natural selection. Our modeling additionally provides evidence of selection on individual heterochromatic knob repeats, likely due to their large individual contribution to genome size. To better understand the phenotypes driving selection on genome size, we conducted a growth chamber experiment using a population of highland teosinte exhibiting extensive variation in genome size. We find weak support for a positive correlation between genome size and cell size, but stronger support for a negative correlation between genome size and the rate of cell production. Reanalyzing published data of cell counts in maize shoot apical meristems, we then identify a negative correlation between cell production rate and flowering time. Together, our data suggest a model in which variation in genome size is driven by natural selection on flowering time across altitudinal clines, connecting intraspecific variation in repetitive sequence to important differences in adaptive phenotypes. PMID:29746459

  3. Serendipitous discovery of Wolbachia genomes in multiple Drosophila species.

    PubMed

    Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C

    2005-01-01

    The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.

  4. Imputation of unordered markers and the impact on genomic selection accuracy

    USDA-ARS?s Scientific Manuscript database

    Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large propo...

  5. Integration of information and volume visualization for analysis of cell lineage and gene expression during embryogenesis

    NASA Astrophysics Data System (ADS)

    Cedilnik, Andrej; Baumes, Jeffrey; Ibanez, Luis; Megason, Sean; Wylie, Brian

    2008-01-01

    Dramatic technological advances in the field of genomics have made it possible to sequence the complete genomes of many different organisms. With this overwhelming amount of data at hand, biologists are now confronted with the challenge of understanding the function of the many different elements of the genome. One of the best places to start gaining insight on the mechanisms by which the genome controls an organism is the study of embryogenesis. There are multiple and inter-related layers of information that must be established in order to understand how the genome controls the formation of an organism. One is cell lineage which describes how patterns of cell division give rise to different parts of an organism. Another is gene expression which describes when and where different genes are turned on. Both of these data types can now be acquired using fluorescent laser-scanning (confocal or 2-photon) microscopy of embryos tagged with fluorescent proteins to generate 3D movies of developing embryos. However, analyzing the wealth of resulting images requires tools capable of interactively visualizing several different types of information as well as being scalable to terabytes of data. This paper describes how the combination of existing large data volume visualization and the new Titan information visualization framework of the Visualization Toolkit (VTK) can be applied to the problem of studying the cell lineage of an organism. In particular, by linking the visualization of spatial and temporal gene expression data with novel ways of visualizing cell lineage data, users can study how the genome regulates different aspects of embryonic development.

  6. An Exploration into Fern Genome Space.

    PubMed

    Wolf, Paul G; Sessa, Emily B; Marchant, Daniel Blaine; Li, Fay-Wei; Rothfels, Carl J; Sigel, Erin M; Gitzendanner, Matthew A; Visger, Clayton J; Banks, Jo Ann; Soltis, Douglas E; Soltis, Pamela S; Pryer, Kathleen M; Der, Joshua P

    2015-08-26

    Ferns are one of the few remaining major clades of land plants for which a complete genome sequence is lacking. Knowledge of genome space in ferns will enable broad-scale comparative analyses of land plant genes and genomes, provide insights into genome evolution across green plants, and shed light on genetic and genomic features that characterize ferns, such as their high chromosome numbers and large genome sizes. As part of an initial exploration into fern genome space, we used a whole genome shotgun sequencing approach to obtain low-density coverage (∼0.4X to 2X) for six fern species from the Polypodiales (Ceratopteris, Pteridium, Polypodium, Cystopteris), Cyatheales (Plagiogyria), and Gleicheniales (Dipteris). We explore these data to characterize the proportion of the nuclear genome represented by repetitive sequences (including DNA transposons, retrotransposons, ribosomal DNA, and simple repeats) and protein-coding genes, and to extract chloroplast and mitochondrial genome sequences. Such initial sweeps of fern genomes can provide information useful for selecting a promising candidate fern species for whole genome sequencing. We also describe variation of genomic traits across our sample and highlight some differences and similarities in repeat structure between ferns and seed plants. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  7. Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome

    PubMed Central

    Margulies, Elliott H.; Cooper, Gregory M.; Asimenos, George; Thomas, Daryl J.; Dewey, Colin N.; Siepel, Adam; Birney, Ewan; Keefe, Damian; Schwartz, Ariel S.; Hou, Minmei; Taylor, James; Nikolaev, Sergey; Montoya-Burgos, Juan I.; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Brown, James B.; Bickel, Peter; Holmes, Ian; Mullikin, James C.; Ureta-Vidal, Abel; Paten, Benedict; Stone, Eric A.; Rosenbloom, Kate R.; Kent, W. James; Bouffard, Gerard G.; Guan, Xiaobin; Hansen, Nancy F.; Idol, Jacquelyn R.; Maduro, Valerie V.B.; Maskeri, Baishali; McDowell, Jennifer C.; Park, Morgan; Thomas, Pamela J.; Young, Alice C.; Blakesley, Robert W.; Muzny, Donna M.; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Jiang, Huaiyang; Weinstock, George M.; Gibbs, Richard A.; Graves, Tina; Fulton, Robert; Mardis, Elaine R.; Wilson, Richard K.; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B.; Chang, Jean L.; Lindblad-Toh, Kerstin; Lander, Eric S.; Hinrichs, Angie; Trumbower, Heather; Clawson, Hiram; Zweig, Ann; Kuhn, Robert M.; Barber, Galt; Harte, Rachel; Karolchik, Donna; Field, Matthew A.; Moore, Richard A.; Matthewson, Carrie A.; Schein, Jacqueline E.; Marra, Marco A.; Antonarakis, Stylianos E.; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross; Haussler, David; Miller, Webb; Pachter, Lior; Green, Eric D.; Sidow, Arend

    2007-01-01

    A key component of the ongoing ENCODE project involves rigorous comparative sequence analyses for the initially targeted 1% of the human genome. Here, we present orthologous sequence generation, alignment, and evolutionary constraint analyses of 23 mammalian species for all ENCODE targets. Alignments were generated using four different methods; comparisons of these methods reveal large-scale consistency but substantial differences in terms of small genomic rearrangements, sensitivity (sequence coverage), and specificity (alignment accuracy). We describe the quantitative and qualitative trade-offs concomitant with alignment method choice and the levels of technical error that need to be accounted for in applications that require multisequence alignments. Using the generated alignments, we identified constrained regions using three different methods. While the different constraint-detecting methods are in general agreement, there are important discrepancies relating to both the underlying alignments and the specific algorithms. However, by integrating the results across the alignments and constraint-detecting methods, we produced constraint annotations that were found to be robust based on multiple independent measures. Analyses of these annotations illustrate that most classes of experimentally annotated functional elements are enriched for constrained sequences; however, large portions of each class (with the exception of protein-coding sequences) do not overlap constrained regions. The latter elements might not be under primary sequence constraint, might not be constrained across all mammals, or might have expendable molecular functions. Conversely, 40% of the constrained sequences do not overlap any of the functional elements that have been experimentally identified. Together, these findings demonstrate and quantify how many genomic functional elements await basic molecular characterization. PMID:17567995

  8. The UCSC genome browser and associated tools

    PubMed Central

    Haussler, David; Kent, W. James

    2013-01-01

    The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting. PMID:22908213

  9. The UCSC genome browser and associated tools.

    PubMed

    Kuhn, Robert M; Haussler, David; Kent, W James

    2013-03-01

    The UCSC Genome Browser (http://genome.ucsc.edu) is a graphical viewer for genomic data now in its 13th year. Since the early days of the Human Genome Project, it has presented an integrated view of genomic data of many kinds. Now home to assemblies for 58 organisms, the Browser presents visualization of annotations mapped to genomic coordinates. The ability to juxtapose annotations of many types facilitates inquiry-driven data mining. Gene predictions, mRNA alignments, epigenomic data from the ENCODE project, conservation scores from vertebrate whole-genome alignments and variation data may be viewed at any scale from a single base to an entire chromosome. The Browser also includes many other widely used tools, including BLAT, which is useful for alignments from high-throughput sequencing experiments. Private data uploaded as Custom Tracks and Data Hubs in many formats may be displayed alongside the rich compendium of precomputed data in the UCSC database. The Table Browser is a full-featured graphical interface, which allows querying, filtering and intersection of data tables. The Saved Session feature allows users to store and share customized views, enhancing the utility of the system for organizing multiple trains of thought. Binary Alignment/Map (BAM), Variant Call Format and the Personal Genome Single Nucleotide Polymorphisms (SNPs) data formats are useful for visualizing a large sequencing experiment (whole-genome or whole-exome), where the differences between the data set and the reference assembly may be displayed graphically. Support for high-throughput sequencing extends to compact, indexed data formats, such as BAM, bigBed and bigWig, allowing rapid visualization of large datasets from RNA-seq and ChIP-seq experiments via local hosting.

  10. Genomics education for medical professionals - the current UK landscape.

    PubMed

    Slade, Ingrid; Subramanian, Deepak N; Burton, Hilary

    2016-08-01

    Genomics education in the UK is at an early stage of development, and its pace of evolution has lagged behind that of the genomics research upon which it is based. As a result, knowledge of genomics and its applications remains limited among non-specialist clinicians. In this review article, we describe the complex landscape for genomics education within the UK, and highlight the large number and variety of organisations that can influence, direct and provide genomics training to medical professionals. Postgraduate genomics education is being shaped by the work of the Health Education England (HEE) Genomics Education Programme, working in conjunction with the Joint Committee on Genomics in Medicine. The success of their work will be greatly enhanced by the full cooperation and engagement of the many groups, societies and organisations involved with medical education and training (such as the royal colleges). Without this cooperation, there is a risk of poor coordination and unnecessary duplication of work. Leadership from an organisation such as the HEE Genomics Education Programme will have a key role in guiding the formulation and delivery of genomics education policy by various stakeholders among the different disciplines in medicine. © 2016 Royal College of Physicians.

  11. Quantifying the major mechanisms of recent gene duplications in the human and mouse genomes: a novel strategy to estimate gene duplication rates

    PubMed Central

    Pan, Deng; Zhang, Liqing

    2007-01-01

    Background The rate of gene duplication is an important parameter in the study of evolution, but the influence of gene conversion and technical problems have confounded previous attempts to provide a satisfying estimate. We propose a new strategy to estimate the rate that involves separate quantification of the rates of two different mechanisms of gene duplication and subsequent combination of the two rates, based on their respective contributions to the overall gene duplication rate. Results Previous estimates of gene duplication rates are based on small gene families. Therefore, to assess the applicability of this to families of all sizes, we looked at both two-copy gene families and the entire genome. We studied unequal crossover and retrotransposition, and found that these mechanisms of gene duplication are largely independent and account for a substantial amount of duplicated genes. Unequal crossover contributed more to duplications in the entire genome than retrotransposition did, but this contribution was significantly less in two-copy gene families, and duplicated genes arising from this mechanism are more likely to be retained. Combining rates of duplication using the two mechanisms, we estimated the overall rates to be from approximately 0.515 to 1.49 × 10-3 per gene per million years in human, and from approximately 1.23 to 4.23 × 10-3 in mouse. The rates estimated from two-copy gene families are always lower than those from the entire genome, and so it is not appropriate to use small families to estimate the rate for the entire genome. Conclusion We present a novel strategy for estimating gene duplication rates. Our results show that different mechanisms contribute differently to the evolution of small and large gene families. PMID:17683522

  12. Genome-wide SNP analysis reveals a genetic basis for sea-age variation in a wild population of Atlantic salmon (Salmo salar).

    PubMed

    Johnston, Susan E; Orell, Panu; Pritchard, Victoria L; Kent, Matthew P; Lien, Sigbjørn; Niemelä, Eero; Erkinaro, Jaakko; Primmer, Craig R

    2014-07-01

    Delaying sexual maturation can lead to larger body size and higher reproductive success, but carries an increased risk of death before reproducing. Classical life history theory predicts that trade-offs between reproductive success and survival should lead to the evolution of an optimal strategy in a given population. However, variation in mating strategies generally persists, and in general, there remains a poor understanding of genetic and physiological mechanisms underlying this variation. One extreme case of this is in the Atlantic salmon (Salmo salar), which can show variation in the age at which they return from their marine migration to spawn (i.e. their 'sea age'). This results in large size differences between strategies, with direct implications for individual fitness. Here, we used an Illumina Infinium SNP array to identify regions of the genome associated with variation in sea age in a large population of Atlantic salmon in Northern Europe, implementing individual-based genome-wide association studies (GWAS) and population-based FST outlier analyses. We identified several regions of the genome which vary in association with phenotype and/or selection between sea ages, with nearby genes having functions related to muscle development, metabolism, immune response and mate choice. In addition, we found that individuals of different sea ages belong to different, yet sympatric populations in this system, indicating that reproductive isolation may be driven by divergence between stable strategies. Overall, this study demonstrates how genome-wide methodologies can be integrated with samples collected from wild, structured populations to understand their ecology and evolution in a natural context. © 2014 John Wiley & Sons Ltd.

  13. The complete mitochondrial genome of the house dust mite Dermatophagoides pteronyssinus (Trouessart): a novel gene arrangement among arthropods

    PubMed Central

    Dermauw, Wannes; Van Leeuwen, Thomas; Vanholme, Bartel; Tirry, Luc

    2009-01-01

    Background The apparent scarcity of available sequence data has greatly impeded evolutionary studies in Acari (mites and ticks). This subclass encompasses over 48,000 species and forms the largest group within the Arachnida. Although mitochondrial genomes are widely utilised for phylogenetic and population genetic studies, only 20 mitochondrial genomes of Acari have been determined, of which only one belongs to the diverse order of the Sarcoptiformes. In this study, we describe the mitochondrial genome of the European house dust mite Dermatophagoides pteronyssinus, the most important member of this largely neglected group. Results The mitochondrial genome of D. pteronyssinus is a circular DNA molecule of 14,203 bp. It contains the complete set of 37 genes (13 protein coding genes, 2 rRNA genes and 22 tRNA genes), usually present in metazoan mitochondrial genomes. The mitochondrial gene order differs considerably from that of other Acari mitochondrial genomes. Compared to the mitochondrial genome of Limulus polyphemus, considered as the ancestral arthropod pattern, only 11 of the 38 gene boundaries are conserved. The majority strand has a 72.6% AT-content but a GC-skew of 0.194. This skew is the reverse of that normally observed for typical animal mitochondrial genomes. A microsatellite was detected in a large non-coding region (286 bp), which probably functions as the control region. Almost all tRNA genes lack a T-arm, provoking the formation of canonical cloverleaf tRNA-structures, and both rRNA genes are considerably reduced in size. Finally, the genomic sequence was used to perform a phylogenetic study. Both maximum likelihood and Bayesian inference analysis clustered D. pteronyssinus with Steganacarus magnus, forming a sistergroup of the Trombidiformes. Conclusion Although the mitochondrial genome of D. pteronyssinus shares different features with previously characterised Acari mitochondrial genomes, it is unique in many ways. Gene order is extremely rearranged and represents a new pattern within the Acari. Both tRNAs and rRNAs are truncated, corroborating the theory of the functional co-evolution of these molecules. Furthermore, the strong and reversed GC- and AT-skews suggest the inversion of the control region as an evolutionary event. Finally, phylogenetic analysis using concatenated mt gene sequences succeeded in recovering Acari relationships concordant with traditional views of phylogeny of Acari. PMID:19284646

  14. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.)

    PubMed Central

    2013-01-01

    Background Cotton, one of the world’s leading crops, is important to the world’s textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. Results We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome, even though G. raimondii contains a D genome (D5). Conclusions The library represents the first BIBAC library in cotton and related species, thus providing tools useful for integrative physical mapping, large-scale genome sequencing and large-scale functional analysis of the Upland cotton genome. Comparative sequence analysis provides insights into the Upland cotton genome, and a possible mechanism underlying the divergence and evolution of polyploid Upland cotton from its diploid putative progenitor species, G. raimondii. PMID:23537070

  15. TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes

    PubMed Central

    González, Abel D.; Espinosa, Vladimir; Vasconcelos, Ana T.; Pérez-Rueda, Ernesto; Collado-Vides, Julio

    2005-01-01

    Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. TRACTOR_DB may be currently accessed at http://www.bioinfo.cu/Tractor_DB, http://www.tractor.lncc.br/ or at http://www.cifn.unam.mx/Computational_Genomics/tractorDB. Contact Email id is tractor@cifn.unam.mx. PMID:15608293

  16. Evolution and Diversity of the Human Hepatitis D Virus Genome

    PubMed Central

    Huang, Chi-Ruei; Lo, Szecheng J.

    2010-01-01

    Human hepatitis delta virus (HDV) is the smallest RNA virus in genome. HDV genome is divided into a viroid-like sequence and a protein-coding sequence which could have originated from different resources and the HDV genome was eventually constituted through RNA recombination. The genome subsequently diversified through accumulation of mutations selected by interactions between the mutated RNA and proteins with host factors to successfully form the infectious virions. Therefore, we propose that the conservation of HDV nucleotide sequence is highly related with its functionality. Genome analysis of known HDV isolates shows that the C-terminal coding sequences of large delta antigen (LDAg) are the highest diversity than other regions of protein-coding sequences but they still retain biological functionality to interact with the heavy chain of clathrin can be selected and maintained. Since viruses interact with many host factors, including escaping the host immune response, how to design a program to predict RNA genome evolution is a great challenging work. PMID:20204073

  17. VCGDB: a dynamic genome database of the Chinese population

    PubMed Central

    2014-01-01

    Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases. PMID:24708222

  18. Systematic CpT (ApG) Depletion and CpG Excess Are Unique Genomic Signatures of Large DNA Viruses Infecting Invertebrates

    PubMed Central

    Upadhyay, Mohita; Sharma, Neha; Vivekanandan, Perumal

    2014-01-01

    Differences in the relative abundance of dinucleotides, if any may provide important clues on host-driven evolution of viruses. We studied dinucleotide frequencies of large DNA viruses infecting vertebrates (n = 105; viruses infecting mammals = 99; viruses infecting aves = 6; viruses infecting reptiles = 1) and invertebrates (n = 88; viruses infecting insects = 84; viruses infecting crustaceans = 4). We have identified systematic depletion of CpT(ApG) dinucleotides and over-representation of CpG dinucleotides as the unique genomic signature of large DNA viruses infecting invertebrates. Detailed investigation of this unique genomic signature suggests the existence of invertebrate host-induced pressures specifically targeting CpT(ApG) and CpG dinucleotides. The depletion of CpT dinucleotides among large DNA viruses infecting invertebrates is at least in part, explained by non-canonical DNA methylation by the infected host. Our findings highlight the role of invertebrate host-related factors in shaping virus evolution and they also provide the necessary framework for future studies on evolution, epigenetics and molecular biology of viruses infecting this group of hosts. PMID:25369195

  19. Large-scale genomic analyses reveal the population structure and evolutionary trends of Streptococcus agalactiae strains in Brazilian fish farms.

    PubMed

    Barony, Gustavo M; Tavares, Guilherme C; Pereira, Felipe L; Carvalho, Alex F; Dorella, Fernanda A; Leal, Carlos A G; Figueiredo, Henrique C P

    2017-10-19

    Streptococcus agalactiae is a major pathogen and a hindrance on tilapia farming worldwide. The aims of this work were to analyze the genomic evolution of Brazilian strains of S. agalactiae and to establish spatial and temporal relations between strains isolated from different outbreaks of streptococcosis. A total of 39 strains were obtained from outbreaks and their whole genomes were sequenced and annotated for comparative analysis of multilocus sequence typing, genomic similarity and whole genome multilocus sequence typing (wgMLST). The Brazilian strains presented two sequence types, including a newly described ST, and a non-typeable lineage. The use of wgMLST could differentiate each strain in a single clone and was used to establish temporal and geographical correlations among strains. Bayesian phylogenomic analysis suggests that the studied Brazilian population was co-introduced in the country with their host, approximately 60 years ago. Brazilian strains of S. agalactiae were shown to be heterogeneous in their genome sequences and were distributed in different regions of the country according to their genotype, which allowed the use of wgMLST analysis to track each outbreak event individually.

  20. Insights into the genetic architecture of morphological traits in two passerine bird species.

    PubMed

    Silva, C N S; McFarlane, S E; Hagen, I J; Rönnegård, L; Billing, A M; Kvalnes, T; Kemppainen, P; Rønning, B; Ringsby, T H; Sæther, B-E; Qvarnström, A; Ellegren, H; Jensen, H; Husby, A

    2017-09-01

    Knowledge about the underlying genetic architecture of phenotypic traits is needed to understand and predict evolutionary dynamics. The number of causal loci, magnitude of the effects and location in the genome are, however, still largely unknown. Here, we use genome-wide single-nucleotide polymorphism (SNP) data from two large-scale data sets on house sparrows and collared flycatchers to examine the genetic architecture of different morphological traits (tarsus length, wing length, body mass, bill depth, bill length, total and visible badge size and white wing patches). Genomic heritabilities were estimated using relatedness calculated from SNPs. The proportion of variance captured by the SNPs (SNP-based heritability) was lower in house sparrows compared with collared flycatchers, as expected given marker density (6348 SNPs in house sparrows versus 38 689 SNPs in collared flycatchers). Indeed, after downsampling to similar SNP density and sample size, this estimate was no longer markedly different between species. Chromosome-partitioning analyses demonstrated that the proportion of variance explained by each chromosome was significantly positively related to the chromosome size for some traits and, generally, that larger chromosomes tended to explain proportionally more variation than smaller chromosomes. Finally, we found two genome-wide significant associations with very small-effect sizes. One SNP on chromosome 20 was associated with bill length in house sparrows and explained 1.2% of phenotypic variation (V P ), and one SNP on chromosome 4 was associated with tarsus length in collared flycatchers (3% of V P ). Although we cannot exclude the possibility of undetected large-effect loci, our results indicate a polygenic basis for morphological traits.

  1. A Draft Sequence of the Neandertal Genome

    PubMed Central

    Green, Richard E.; Li, Heng; Zhai, Weiwei; Fritz, Markus Hsi-Yang; Hansen, Nancy F.; Durand, Eric Y.; Malaspinas, Anna-Sapfo; Jensen, Jeffrey D.; Marques-Bonet, Tomas; Alkan, Can; Prüfer, Kay; Meyer, Matthias; Burbano, Hernán A.; Good, Jeffrey M.; Schultz, Rigo; Aximu-Petri, Ayinuer; Butthof, Anne; Höber, Barbara; Höffner, Barbara; Siegemund, Madlen; Weihmann, Antje; Nusbaum, Chad; Lander, Eric S.; Russ, Carsten; Novod, Nathaniel; Affourtit, Jason; Egholm, Michael; Verna, Christine; Rudan, Pavao; Brajkovic, Dejana; Kucan, Željko; Gušic, Ivan; Doronichev, Vladimir B.; Golovanova, Liubov V.; Lalueza-Fox, Carles; de la Rasilla, Marco; Fortea, Javier; Rosas, Antonio; Schmitz, Ralf W.; Johnson, Philip L. F.; Eichler, Evan E.; Falush, Daniel; Birney, Ewan; Mullikin, James C.; Slatkin, Montgomery; Nielsen, Rasmus; Kelso, Janet; Lachmann, Michael; Reich, David; Pääbo, Svante

    2016-01-01

    Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other. PMID:20448178

  2. Characterization of an E3 Ubiquitin Ligase that Degrades Neurofibromin in Vitro and Vivo

    DTIC Science & Technology

    2012-04-01

    andDuronio, R.J. (2008). Identifying determinants of cullin binding specificity among the three functionally different Drosophila melanogaster Roc...Deshaies and Joazeiro, 2009; Nakayama and Nakayama, 2006). Although a large number of F-box proteins were found in the human genome (Jin et al., 2004... genome walking (Figure S1A). The insertion disrupts theDevelopmentaSag transcript resulting in a truncated fusionmRNA that encodes partial Sag N

  3. Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms

    PubMed Central

    Fleischmann, Andreas; Michael, Todd P.; Rivadavia, Fernando; Sousa, Aretuza; Wang, Wenqin; Temsch, Eva M.; Greilhuber, Johann; Müller, Kai F.; Heubl, Günther

    2014-01-01

    Background and Aims Some species of Genlisea possess ultrasmall nuclear genomes, the smallest known among angiosperms, and some have been found to have chromosomes of diminutive size, which may explain why chromosome numbers and karyotypes are not known for the majority of species of the genus. However, other members of the genus do not possess ultrasmall genomes, nor do most taxa studied in related genera of the family or order. This study therefore examined the evolution of genome sizes and chromosome numbers in Genlisea in a phylogenetic context. The correlations of genome size with chromosome number and size, with the phylogeny of the group and with growth forms and habitats were also examined. Methods Nuclear genome sizes were measured from cultivated plant material for a comprehensive sampling of taxa, including nearly half of all species of Genlisea and representing all major lineages. Flow cytometric measurements were conducted in parallel in two laboratories in order to compare the consistency of different methods and controls. Chromosome counts were performed for the majority of taxa, comparing different staining techniques for the ultrasmall chromosomes. Key Results Genome sizes of 15 taxa of Genlisea are presented and interpreted in a phylogenetic context. A high degree of congruence was found between genome size distribution and the major phylogenetic lineages. Ultrasmall genomes with 1C values of <100 Mbp were almost exclusively found in a derived lineage of South American species. The ancestral haploid chromosome number was inferred to be n = 8. Chromosome numbers in Genlisea ranged from 2n = 2x = 16 to 2n = 4x = 32. Ascendant dysploid series (2n = 36, 38) are documented for three derived taxa. The different ploidy levels corresponded to the two subgenera, but were not directly correlated to differences in genome size; the three different karyotype ranges mirrored the different sections of the genus. The smallest known plant genomes were not found in G. margaretae, as previously reported, but in G. tuberosa (1C ≈ 61 Mbp) and some strains of G. aurea (1C ≈ 64 Mbp). Conclusions Genlisea is an ideal candidate model organism for the understanding of genome reduction as the genus includes species with both relatively large (∼1700 Mbp) and ultrasmall (∼61 Mbp) genomes. This comparative, phylogeny-based analysis of genome sizes and karyotypes in Genlisea provides essential data for selection of suitable species for comparative whole-genome analyses, as well as for further studies on both the molecular and cytogenetic basis of genome reduction in plants. PMID:25274549

  4. Toxicogenomics in regulatory ecotoxicology

    USGS Publications Warehouse

    Ankley, Gerald T.; Daston, George P.; Degitz, Sigmund J.; Denslow, Nancy D.; Hoke, Robert A.; Kennedy, Sean W.; Miracle, Ann L.; Perkins, Edward J.; Snape, Jason; Tillitt, Donald E.; Tyler, Charles R.; Versteeg, Donald

    2006-01-01

    Recently, we have witnessed an explosion of different genomic approaches that, through a combination of advanced biological, instrumental, and bioinformatic techniques, can yield a previously unparalleled amount of data concerning the molecular and biochemical status of organisms. Fueled partially by large, well-publicized efforts such as the Human Genome Project, genomic research has become a rapidly growing topical area in multiple biological disciplines. Since 1999, when the term “toxicogenomics” was coined to describe the application of genomics to toxicology (1), a rapid increase in publications on the topic has occurred (Figure 1). The potential utility of toxicogenomics in toxicological research and regulatory activities has been the subject of scientific discussions and, as with any new technology, has evoked a wide range of opinion (2–6).

  5. Clustering analysis of proteins from microbial genomes at multiple levels of resolution.

    PubMed

    Zaslavsky, Leonid; Ciufo, Stacy; Fedorov, Boris; Tatusova, Tatiana

    2016-08-31

    Microbial genomes at the National Center for Biotechnology Information (NCBI) represent a large collection of more than 35,000 assemblies. There are several complexities associated with the data: a great variation in sampling density since human pathogens are densely sampled while other bacteria are less represented; different protein families occur in annotations with different frequencies; and the quality of genome annotation varies greatly. In order to extract useful information from these sophisticated data, the analysis needs to be performed at multiple levels of phylogenomic resolution and protein similarity, with an adequate sampling strategy. Protein clustering is used to construct meaningful and stable groups of similar proteins to be used for analysis and functional annotation. Our approach is to create protein clusters at three levels. First, tight clusters in groups of closely-related genomes (species-level clades) are constructed using a combined approach that takes into account both sequence similarity and genome context. Second, clustroids of conservative in-clade clusters are organized into seed global clusters. Finally, global protein clusters are built around the the seed clusters. We propose filtering strategies that allow limiting the protein set included in global clustering. The in-clade clustering procedure, subsequent selection of clustroids and organization into seed global clusters provides a robust representation and high rate of compression. Seed protein clusters are further extended by adding related proteins. Extended seed clusters include a significant part of the data and represent all major known cell machinery. The remaining part, coming from either non-conservative (unique) or rapidly evolving proteins, from rare genomes, or resulting from low-quality annotation, does not group together well. Processing these proteins requires significant computational resources and results in a large number of questionable clusters. The developed filtering strategies allow to identify and exclude such peripheral proteins limiting the protein dataset in global clustering. Overall, the proposed methodology allows the relevant data at different levels of details to be obtained and data redundancy eliminated while keeping biologically interesting variations.

  6. Members of the Candidate Phyla Radiation are functionally differentiated by carbon and nitrogen cycling capabilities.

    NASA Astrophysics Data System (ADS)

    Danczak, R.; Johnston, M.; Kenah, C.; Slattery, M.; Wrighton, K. C.; Wilkins, M.

    2017-12-01

    The Candidate Phyla Radiation (CPR) is a recently described expansion of the tree of life that represents more than 15% of all bacterial diversity and putatively contains over 70 different phyla. Despite this broad phylogenetic variation, these microorganisms often feature limited functional diversity, with members generally characterized as obligate fermenters. Additionally, much of the data describing CPR phyla has been generated from a limited number of environments, constraining our knowledge of their functional roles and biogeographical distribution. To better understand subsurface CPR microorganisms, we sampled four groundwater wells over two years across three Ohio counties. Samples were analyzed using 16S rRNA gene amplicon and shotgun metagenomic sequencing. Amplicon results indicated that CPR members comprised 2-20% of the microbial communities, with relative abundances stable through time in Athens and Greene county samples but dynamic in Licking county groundwater. Shotgun metagenomic analyses generated 71 putative CPR genomes, representing roughly 32 known phyla and potentially two new phyla, Candidatus Brownbacteria and Candidatus Hugbacteria. While these genomes largely mirrored typical CPR metabolism, some features were previously uncharacterized. For instance, a nirK-encoded nitrite reductase was found in four of our Parcubacteria genomes and multiple CPR genomes from other studies, indicating a possibly undescribed role for these microorganisms in denitrification. Additionally, glycoside hydrolase (GH) family profiles for our genomes and over 2000 other CPR genomes were analyzed to characterize their carbon processing potential. Although common trends were present throughout the radiation, differences highlighted mechanisms that may allow microorganisms across the CPR to occupy various subsurface niches. For example, members of the Microgenomates superphylum appear to potentially degrade a wider range of carbon substrates than other CPR phyla. The CPR appear to be distributed across a range of groundwater systems and often constitute a large fraction of the microbial population. Further sampling of such environments will resolve this phylogenetically broad radiation at finer taxonomic levels and will likely solidify functional differences between phyla.

  7. SNPchiMp: a database to disentangle the SNPchip jungle in bovine livestock.

    PubMed

    Nicolazzi, Ezequiel Luis; Picciolini, Matteo; Strozzi, Francesco; Schnabel, Robert David; Lawley, Cindy; Pirani, Ali; Brew, Fiona; Stella, Alessandra

    2014-02-11

    Currently, six commercial whole-genome SNP chips are available for cattle genotyping, produced by two different genotyping platforms. Technical issues need to be addressed to combine data that originates from the different platforms, or different versions of the same array generated by the manufacturer. For example: i) genome coordinates for SNPs may refer to different genome assemblies; ii) reference genome sequences are updated over time changing the positions, or even removing sequences which contain SNPs; iii) not all commercial SNP ID's are searchable within public databases; iv) SNPs can be coded using different formats and referencing different strands (e.g. A/B or A/C/T/G alleles, referencing forward/reverse, top/bottom or plus/minus strand); v) Due to new information being discovered, higher density chips do not necessarily include all the SNPs present in the lower density chips; and, vi) SNP IDs may not be consistent across chips and platforms. Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. Here we present SNPchiMp, a MySQL database linked to an open access web-based interface. Features of this interface include, but are not limited to, the following functions: 1) referencing the SNP mapping information to the latest genome assembly, 2) extraction of information contained in dbSNP for SNPs present in all commercially available bovine chips, and 3) identification of SNPs in common between two or more bovine chips (e.g. for SNP imputation from lower to higher density). In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers. This tool combines many different sources of information, that otherwise are time consuming to obtain and difficult to integrate. The SNPchiMp not only provides the information in a user-friendly format, but also enables researchers to perform a large number of operations with a few clicks of the mouse. This significantly reduces the time needed to execute the large number of operations required to manage SNP data.

  8. Large-Scale Comparative Phenotypic and Genomic Analyses Reveal Ecological Preferences of Shewanella Species and Identify Metabolic Pathways Conserved at the Genus Level ▿ †

    PubMed Central

    Rodrigues, Jorge L. M.; Serres, Margrethe H.; Tiedje, James M.

    2011-01-01

    The use of comparative genomics for the study of different microbiological species has increased substantially as sequence technologies become more affordable. However, efforts to fully link a genotype to its phenotype remain limited to the development of one mutant at a time. In this study, we provided a high-throughput alternative to this limiting step by coupling comparative genomics to the use of phenotype arrays for five sequenced Shewanella strains. Positive phenotypes were obtained for 441 nutrients (C, N, P, and S sources), with N-based compounds being the most utilized for all strains. Many genes and pathways predicted by genome analyses were confirmed with the comparative phenotype assay, and three degradation pathways believed to be missing in Shewanella were confirmed as missing. A number of previously unknown gene products were predicted to be parts of pathways or to have a function, expanding the number of gene targets for future genetic analyses. Ecologically, the comparative high-throughput phenotype analysis provided insights into niche specialization among the five different strains. For example, Shewanella amazonensis strain SB2B, isolated from the Amazon River delta, was capable of utilizing 60 C compounds, whereas Shewanella sp. strain W3-18-1, isolated from deep marine sediment, utilized only 25 of them. In spite of the large number of nutrient sources yielding positive results, our study indicated that except for the N sources, they were not sufficiently informative to predict growth phenotypes from increasing evolutionary distances. Our results indicate the importance of phenotypic evaluation for confirming genome predictions. This strategy will accelerate the functional discovery of genes and provide an ecological framework for microbial genome sequencing projects. PMID:21642407

  9. Accessing the SEED genome databases via Web services API: tools for programmers.

    PubMed

    Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A

    2010-06-14

    The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

  10. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar

    Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less

  11. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer

    DOE PAGES

    Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar; ...

    2016-11-16

    Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less

  12. Genomic Analysis of Childhood Brain Tumors: Methods for Genome-Wide Discovery and Precision Medicine Become Mainstream.

    PubMed

    Mack, Stephen C; Northcott, Paul A

    2017-07-20

    Recent breakthroughs in next-generation sequencing technology and complementary genomic platforms have transformed our capacity to interrogate the molecular landscapes of human cancers, including childhood brain tumors. Numerous high-throughput genomic studies have been reported for the major histologic brain tumor entities diagnosed in children, including interrogations at the level of the genome, epigenome, and transcriptome, many of which have yielded essential new insights into disease biology. The nature of these discoveries has been largely platform dependent, exemplifying the usefulness of applying different genomic and computational strategies, or integrative approaches, to address specific biologic and/or clinical questions. The goal of this article is to summarize the spectrum of molecular profiling methods available for investigating genomic aspects of childhood brain tumors in both the research and the clinical setting. We provide an overview of the main next-generation sequencing and array-based technologies currently being applied in this field and draw from key examples in the recent neuro-oncology literature to illustrate how these genomic approaches have profoundly advanced our understanding of individual tumor entities. Moreover, we discuss the current status of genomic profiling in the clinic and how different platforms are being used to improve patient diagnosis and stratification, as well as to identify actionable targets for informing molecularly guided therapies, especially for patients for whom conventional standard-of-care treatments have failed. Both the demand for genomic testing and the main challenges associated with incorporating genomics into the clinical management of pediatric patients with brain tumors are discussed, as are recommendations for incorporating these assays into future clinical trials.

  13. Cancer Genomics: Diversity and Disparity Across Ethnicity and Geography.

    PubMed

    Tan, Daniel S W; Mok, Tony S K; Rebbeck, Timothy R

    2016-01-01

    Ethnic and geographic differences in cancer incidence, prognosis, and treatment outcomes can be attributed to diversity in the inherited (germline) and somatic genome. Although international large-scale sequencing efforts are beginning to unravel the genomic underpinnings of cancer traits, much remains to be known about the underlying mechanisms and determinants of genomic diversity. Carcinogenesis is a dynamic, complex phenomenon representing the interplay between genetic and environmental factors that results in divergent phenotypes across ethnicities and geography. For example, compared with whites, there is a higher incidence of prostate cancer among Africans and African Americans, and the disease is generally more aggressive and fatal. Genome-wide association studies have identified germline susceptibility loci that may account for differences between the African and non-African patients, but the lack of availability of appropriate cohorts for replication studies and the incomplete understanding of genomic architecture across populations pose major limitations. We further discuss the transformative potential of routine diagnostic evaluation for actionable somatic alterations, using lung cancer as an example, highlighting implications of population disparities, current hurdles in implementation, and the far-reaching potential of clinical genomics in enhancing cancer prevention, diagnosis, and treatment. As we enter the era of precision cancer medicine, a concerted multinational effort is key to addressing population and genomic diversity as well as overcoming barriers and geographical disparities in research and health care delivery. © 2015 by American Society of Clinical Oncology.

  14. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.

    PubMed

    Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel

    2018-06-01

    The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  15. Plant functional genomics

    NASA Astrophysics Data System (ADS)

    Holtorf, Hauke; Guitton, Marie-Christine; Reski, Ralf

    2002-04-01

    Functional genome analysis of plants has entered the high-throughput stage. The complete genome information from key species such as Arabidopsis thaliana and rice is now available and will further boost the application of a range of new technologies to functional plant gene analysis. To broadly assign functions to unknown genes, different fast and multiparallel approaches are currently used and developed. These new technologies are based on known methods but are adapted and improved to accommodate for comprehensive, large-scale gene analysis, i.e. such techniques are novel in the sense that their design allows researchers to analyse many genes at the same time and at an unprecedented pace. Such methods allow analysis of the different constituents of the cell that help to deduce gene function, namely the transcripts, proteins and metabolites. Similarly the phenotypic variations of entire mutant collections can now be analysed in a much faster and more efficient way than before. The different methodologies have developed to form their own fields within the functional genomics technological platform and are termed transcriptomics, proteomics, metabolomics and phenomics. Gene function, however, cannot solely be inferred by using only one such approach. Rather, it is only by bringing together all the information collected by different functional genomic tools that one will be able to unequivocally assign functions to unknown plant genes. This review focuses on current technical developments and their impact on the field of plant functional genomics. The lower plant Physcomitrella is introduced as a new model system for gene function analysis, owing to its high rate of homologous recombination.

  16. Comparative Genome Analysis of Three Eukaryotic Parasites with Differing Abilities To Transform Leukocytes Reveals Key Mediators of Theileria-Induced Leukocyte Transformation

    PubMed Central

    Hayashida, Kyoko; Hara, Yuichiro; Abe, Takashi; Yamasaki, Chisato; Toyoda, Atsushi; Kosuge, Takehide; Suzuki, Yutaka; Sato, Yoshiharu; Kawashima, Shuichi; Katayama, Toshiaki; Wakaguri, Hiroyuki; Inoue, Noboru; Homma, Keiichi; Tada-Umezaki, Masahito; Yagi, Yukio; Fujii, Yasuyuki; Habara, Takuya; Kanehisa, Minoru; Watanabe, Hidemi; Ito, Kimihito; Gojobori, Takashi; Sugawara, Hideaki; Imanishi, Tadashi; Weir, William; Gardner, Malcolm; Pain, Arnab; Shiels, Brian; Hattori, Masahira; Nene, Vishvanath; Sugimoto, Chihiro

    2012-01-01

    ABSTRACT We sequenced the genome of Theileria orientalis, a tick-borne apicomplexan protozoan parasite of cattle. The focus of this study was a comparative genome analysis of T. orientalis relative to other highly pathogenic Theileria species, T. parva and T. annulata. T. parva and T. annulata induce transformation of infected cells of lymphocyte or macrophage/monocyte lineages; in contrast, T. orientalis does not induce uncontrolled proliferation of infected leukocytes and multiplies predominantly within infected erythrocytes. While synteny across homologous chromosomes of the three Theileria species was found to be well conserved overall, subtelomeric structures were found to differ substantially, as T. orientalis lacks the large tandemly arrayed subtelomere-encoded variable secreted protein-encoding gene family. Moreover, expansion of particular gene families by gene duplication was found in the genomes of the two transforming Theileria species, most notably, the TashAT/TpHN and Tar/Tpr gene families. Gene families that are present only in T. parva and T. annulata and not in T. orientalis, Babesia bovis, or Plasmodium were also identified. Identification of differences between the genome sequences of Theileria species with different abilities to transform and immortalize bovine leukocytes will provide insight into proteins and mechanisms that have evolved to induce and regulate this process. The T. orientalis genome database is available at http://totdb.czc.hokudai.ac.jp/. PMID:22951932

  17. Large-Scale Genomic Analysis of Codon Usage in Dengue Virus and Evaluation of Its Phylogenetic Dependence

    PubMed Central

    Lara-Ramírez, Edgar E.; Salazar, Ma Isabel; López-López, María de Jesús; Salas-Benito, Juan Santiago; Sánchez-Varela, Alejandro

    2014-01-01

    The increasing number of dengue virus (DENV) genome sequences available allows identifying the contributing factors to DENV evolution. In the present study, the codon usage in serotypes 1–4 (DENV1–4) has been explored for 3047 sequenced genomes using different statistics methods. The correlation analysis of total GC content (GC) with GC content at the three nucleotide positions of codons (GC1, GC2, and GC3) as well as the effective number of codons (ENC, ENCp) versus GC3 plots revealed mutational bias and purifying selection pressures as the major forces influencing the codon usage, but with distinct pressure on specific nucleotide position in the codon. The correspondence analysis (CA) and clustering analysis on relative synonymous codon usage (RSCU) within each serotype showed similar clustering patterns to the phylogenetic analysis of nucleotide sequences for DENV1–4. These clustering patterns are strongly related to the virus geographic origin. The phylogenetic dependence analysis also suggests that stabilizing selection acts on the codon usage bias. Our analysis of a large scale reveals new feature on DENV genomic evolution. PMID:25136631

  18. A DNA methylation map of human cancer at single base-pair resolution

    PubMed Central

    Vidal, E; Sayols, S; Moran, S; Guillaumet-Adkins, A; Schroeder, M P; Royo, R; Orozco, M; Gut, M; Gut, I; Lopez-Bigas, N; Heyn, H; Esteller, M

    2017-01-01

    Although single base-pair resolution DNA methylation landscapes for embryonic and different somatic cell types provided important insights into epigenetic dynamics and cell-type specificity, such comprehensive profiling is incomplete across human cancer types. This prompted us to perform genome-wide DNA methylation profiling of 22 samples derived from normal tissues and associated neoplasms, including primary tumors and cancer cell lines. Unlike their invariant normal counterparts, cancer samples exhibited highly variable CpG methylation levels in a large proportion of the genome, involving progressive changes during tumor evolution. The whole-genome sequencing results from selected samples were replicated in a large cohort of 1112 primary tumors of various cancer types using genome-scale DNA methylation analysis. Specifically, we determined DNA hypermethylation of promoters and enhancers regulating tumor-suppressor genes, with potential cancer-driving effects. DNA hypermethylation events showed evidence of positive selection, mutual exclusivity and tissue specificity, suggesting their active participation in neoplastic transformation. Our data highlight the extensive changes in DNA methylation that occur in cancer onset, progression and dissemination. PMID:28581523

  19. Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species.

    PubMed

    Chen, Zhiwen; Feng, Kun; Grover, Corrinne E; Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F; Wang, Kunbo; Hua, Jinping

    2016-01-01

    The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium.

  20. Chloroplast DNA Structural Variation, Phylogeny, and Age of Divergence among Diploid Cotton Species

    PubMed Central

    Li, Pengbo; Liu, Fang; Wang, Yumei; Xu, Qin; Shang, Mingzhao; Zhou, Zhongli; Cai, Xiaoyan; Wang, Xingxing; Wendel, Jonathan F.; Wang, Kunbo

    2016-01-01

    The cotton genus (Gossypium spp.) contains 8 monophyletic diploid genome groups (A, B, C, D, E, F, G, K) and a single allotetraploid clade (AD). To gain insight into the phylogeny of Gossypium and molecular evolution of the chloroplast genome in this group, we performed a comparative analysis of 19 Gossypium chloroplast genomes, six reported here for the first time. Nucleotide distance in non-coding regions was about three times that of coding regions. As expected, distances were smaller within than among genome groups. Phylogenetic topologies based on nucleotide and indel data support for the resolution of the 8 genome groups into 6 clades. Phylogenetic analysis of indel distribution among the 19 genomes demonstrates contrasting evolutionary dynamics in different clades, with a parallel genome downsizing in two genome groups and a biased accumulation of insertions in the clade containing the cultivated cottons leading to large (for Gossypium) chloroplast genomes. Divergence time estimates derived from the cpDNA sequence suggest that the major diploid clades had diverged approximately 10 to 11 million years ago. The complete nucleotide sequences of 6 cpDNA genomes are provided, offering a resource for cytonuclear studies in Gossypium. PMID:27309527

  1. Comparison of the genomic sequence of the microminipig, a novel breed of swine, with the genomic database for conventional pig.

    PubMed

    Miura, Naoki; Kucho, Ken-Ichi; Noguchi, Michiko; Miyoshi, Noriaki; Uchiumi, Toshiki; Kawaguchi, Hiroaki; Tanimoto, Akihide

    2014-01-01

    The microminipig, which weighs less than 10 kg at an early stage of maturity, has been reported as a potential experimental model animal. Its extremely small size and other distinct characteristics suggest the possibility of a number of differences between the genome of the microminipig and that of conventional pigs. In this study, we analyzed the genomes of two healthy microminipigs using a next-generation sequencer SOLiD™ system. We then compared the obtained genomic sequences with a genomic database for the domestic pig (Sus scrofa). The mapping coverage of sequenced tag from the microminipig to conventional pig genomic sequences was greater than 96% and we detected no clear, substantial genomic variance from these data. The results may indicate that the distinct characteristics of the microminipig derive from small-scale alterations in the genome, such as Single Nucleotide Polymorphisms or translational modifications, rather than large-scale deletion or insertion polymorphisms. Further investigation of the entire genomic sequence of the microminipig with methods enabling deeper coverage is required to elucidate the genetic basis of its distinct phenotypic traits. Copyright © 2014 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.

  2. Development of a database system for mapping insertional mutations onto the mouse genome with large-scale experimental data

    PubMed Central

    2009-01-01

    Background Insertional mutagenesis is an effective method for functional genomic studies in various organisms. It can rapidly generate easily tractable mutations. A large-scale insertional mutagenesis with the piggyBac (PB) transposon is currently performed in mice at the Institute of Developmental Biology and Molecular Medicine (IDM), Fudan University in Shanghai, China. This project is carried out via collaborations among multiple groups overseeing interconnected experimental steps and generates a large volume of experimental data continuously. Therefore, the project calls for an efficient database system for recording, management, statistical analysis, and information exchange. Results This paper presents a database application called MP-PBmice (insertional mutation mapping system of PB Mutagenesis Information Center), which is developed to serve the on-going large-scale PB insertional mutagenesis project. A lightweight enterprise-level development framework Struts-Spring-Hibernate is used here to ensure constructive and flexible support to the application. The MP-PBmice database system has three major features: strict access-control, efficient workflow control, and good expandability. It supports the collaboration among different groups that enter data and exchange information on daily basis, and is capable of providing real time progress reports for the whole project. MP-PBmice can be easily adapted for other large-scale insertional mutation mapping projects and the source code of this software is freely available at http://www.idmshanghai.cn/PBmice. Conclusion MP-PBmice is a web-based application for large-scale insertional mutation mapping onto the mouse genome, implemented with the widely used framework Struts-Spring-Hibernate. This system is already in use by the on-going genome-wide PB insertional mutation mapping project at IDM, Fudan University. PMID:19958505

  3. Unravelling the hidden ancestry of American admixed populations.

    PubMed

    Montinaro, Francesco; Busby, George B J; Pascali, Vincenzo L; Myers, Simon; Hellenthal, Garrett; Capelli, Cristian

    2015-03-24

    The movement of people into the Americas has brought different populations into contact, and contemporary American genomes are the product of a range of complex admixture events. Here we apply a haplotype-based ancestry identification approach to a large set of genome-wide SNP data from a variety of American, European and African populations to determine the contributions of different ancestral populations to the Americas. Our results provide a fine-scale characterization of the source populations, identify a series of novel, previously unreported contributions from Africa and Europe and highlight geohistorical structure in the ancestry of American admixed populations.

  4. The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects

    PubMed Central

    2011-01-01

    Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history. PMID:21569260

  5. Structured populations of Sulfolobus acidocaldarius with susceptibility to mobile genetic elements

    USGS Publications Warehouse

    Anderson, Rika E.; Kouris, Angela; Seward, Christopher H.; Campbell, Kate M.; Whitaker, Rachel J.

    2017-01-01

    The impact of a structured environment on genome evolution can be determined through comparative population genomics of species that live in the same habitat. Recent work comparing three genome sequences of Sulfolobus acidocaldarius suggested that highly structured, extreme, hot spring environments do not limit dispersal of this thermoacidophile, in contrast to other co-occurring Sulfolobus species. Instead, a high level of conservation among these three S. acidocaldarius genomes was hypothesized to result from rapid, global-scale dispersal promoted by low susceptibility to viruses that sets S. acidocaldarius apart from its sister Sulfolobus species. To test this hypothesis, we conducted a comparative analysis of 47 genomes of S. acidocaldarius from spatial and temporal sampling of two hot springs in Yellowstone National Park. While we confirm the low diversity in the core genome, we observe differentiation among S. acidocaldarius populations, likely resulting from low migration among hot spring “islands” in Yellowstone National Park. Patterns of genomic variation indicate that differing geological contexts result in the elimination or preservation of diversity among differentiated populations. We observe multiple deletions associated with a large genomic island rich in glycosyltransferases, differential integrations of the Sulfolobus turreted icosahedral virus, as well as two different plasmid elements. These data demonstrate that neither rapid dispersal nor lack of mobile genetic elements result in low diversity in the S. acidocaldariusgenomes. We suggest instead that significant differences in the recent evolutionary history, or the intrinsic evolutionary rates, of sister Sulfolobusspecies result in the relatively low diversity of the S. acidocaldarius genome.

  6. Allele-specific control of replication timing and genome organization during development.

    PubMed

    Rivera-Mulia, Juan Carlos; Dimond, Andrew; Vera, Daniel; Trevilla-Garcia, Claudia; Sasaki, Takayo; Zimmerman, Jared; Dupont, Catherine; Gribnau, Joost; Fraser, Peter; Gilbert, David M

    2018-05-07

    DNA replication occurs in a defined temporal order known as the replication-timing (RT) program. RT is regulated during development in discrete chromosomal units, coordinated with transcriptional activity and 3D genome organization. Here, we derived distinct cell types from F1 hybrid musculus X castaneus mouse crosses and exploited the high single nucleotide polymorphism (SNP) density to characterize allelic differences in RT (Repli-seq), genome organization (Hi-C and promoter-capture Hi-C), gene expression (total nuclear RNA-seq) and chromatin accessibility (ATAC-seq). We also present HARP: a new computational tool for sorting SNPs in phased genomes to efficiently measure allele-specific genome-wide data. Analysis of six different hybrid mESC clones with different genomes (C57BL/6, 129/sv and CAST/Ei), parental configurations and gender revealed significant RT asynchrony between alleles across ~12% of the autosomal genome linked to sub-species genomes but not to parental origin, growth conditions or gender. RT asynchrony in mESCs strongly correlated with changes in Hi-C compartments between alleles but not SNP density, gene expression, imprinting or chromatin accessibility. We then tracked mESC RT asynchronous regions during development by analyzing differentiated cell types including extraembryonic endoderm stem (XEN) cells, 4 male and female primary mouse embryonic fibroblasts (MEFs) and neural precursor cells (NPCs) differentiated in vitro from mESCs with opposite parental configurations. We found that RT asynchrony and allelic discordance in Hi-C compartments seen in mESCs was largely lost in all differentiated cell types, coordinated with a more uniform Hi-C compartment arrangement, suggesting that genome organization of homologues converges to similar folding patterns during cell fate commitment. Published by Cold Spring Harbor Laboratory Press.

  7. Genome size diversity in orchids: consequences and evolution

    PubMed Central

    Leitch, I. J.; Kahandawala, I.; Suda, J.; Hanson, L.; Ingrouille, M. J.; Chase, M. W.; Fay, M. F.

    2009-01-01

    Background The amount of DNA comprising the genome of an organism (its genome size) varies a remarkable 40 000-fold across eukaryotes, yet most groups are characterized by much narrower ranges (e.g. 14-fold in gymnosperms, 3- to 4-fold in mammals). Angiosperms stand out as one of the most variable groups with genome sizes varying nearly 2000-fold. Nevertheless within angiosperms the majority of families are characterized by genomes which are small and vary little. Species with large genomes are mostly restricted to a few monocots families including Orchidaceae. Scope A survey of the literature revealed that genome size data for Orchidaceae are comparatively rare representing just 327 species. Nevertheless they reveal that Orchidaceae are currently the most variable angiosperm family with genome sizes ranging 168-fold (1C = 0·33–55·4 pg). Analysing the data provided insights into the distribution, evolution and possible consequences to the plant of this genome size diversity. Conclusions Superimposing the data onto the increasingly robust phylogenetic tree of Orchidaceae revealed how different subfamilies were characterized by distinct genome size profiles. Epidendroideae possessed the greatest range of genome sizes, although the majority of species had small genomes. In contrast, the largest genomes were found in subfamilies Cypripedioideae and Vanilloideae. Genome size evolution within this subfamily was analysed as this is the only one with reasonable representation of data. This approach highlighted striking differences in genome size and karyotype evolution between the closely related Cypripedium, Paphiopedilum and Phragmipedium. As to the consequences of genome size diversity, various studies revealed that this has both practical (e.g. application of genetic fingerprinting techniques) and biological consequences (e.g. affecting where and when an orchid may grow) and emphasizes the importance of obtaining further genome size data given the considerable phylogenetic gaps which have been highlighted by the current study. PMID:19168860

  8. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.

    PubMed

    Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S

    2018-06-01

    Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.

  9. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files

    PubMed Central

    Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng

    2018-01-01

    Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754

  10. How life changes itself: the Read-Write (RW) genome.

    PubMed

    Shapiro, James A

    2013-09-01

    The genome has traditionally been treated as a Read-Only Memory (ROM) subject to change by copying errors and accidents. In this review, I propose that we need to change that perspective and understand the genome as an intricately formatted Read-Write (RW) data storage system constantly subject to cellular modifications and inscriptions. Cells operate under changing conditions and are continually modifying themselves by genome inscriptions. These inscriptions occur over three distinct time-scales (cell reproduction, multicellular development and evolutionary change) and involve a variety of different processes at each time scale (forming nucleoprotein complexes, epigenetic formatting and changes in DNA sequence structure). Research dating back to the 1930s has shown that genetic change is the result of cell-mediated processes, not simply accidents or damage to the DNA. This cell-active view of genome change applies to all scales of DNA sequence variation, from point mutations to large-scale genome rearrangements and whole genome duplications (WGDs). This conceptual change to active cell inscriptions controlling RW genome functions has profound implications for all areas of the life sciences. © 2013 Elsevier B.V. All rights reserved.

  11. Host-Associated Genomic Features of the Novel Uncultured Intracellular Pathogen Ca. Ichthyocystis Revealed by Direct Sequencing of Epitheliocysts

    PubMed Central

    Qi, Weihong; Vaughan, Lloyd; Katharios, Pantelis; Schlapbach, Ralph; Seth-Smith, Helena M.B.

    2016-01-01

    Advances in single-cell and mini-metagenome sequencing have enabled important investigations into uncultured bacteria. In this study, we applied the mini-metagenome sequencing method to assemble genome drafts of the uncultured causative agents of epitheliocystis, an emerging infectious disease in the Mediterranean aquaculture species gilthead seabream. We sequenced multiple cyst samples and constructed 11 genome drafts from a novel beta-proteobacterial lineage, Candidatus Ichthyocystis. The draft genomes demonstrate features typical of pathogenic bacteria with an obligate intracellular lifestyle: a reduced genome of up to 2.6 Mb, reduced G + C content, and reduced metabolic capacity. Reconstruction of metabolic pathways reveals that Ca. Ichthyocystis genomes lack all amino acid synthesis pathways, compelling them to scavenge from the fish host. All genomes encode type II, III, and IV secretion systems, a large repertoire of predicted effectors, and a type IV pilus. These are all considered to be virulence factors, required for adherence, invasion, and host manipulation. However, no evidence of lipopolysaccharide synthesis could be found. Beyond the core functions shared within the genus, alignments showed distinction into different species, characterized by alternative large gene families. These comprise up to a third of each genome, appear to have arisen through duplication and diversification, encode many effector proteins, and are seemingly critical for virulence. Thus, Ca. Ichthyocystis represents a novel obligatory intracellular pathogenic beta-proteobacterial lineage. The methods used: mini-metagenome analysis and manual annotation, have generated important insights into the lifestyle and evolution of the novel, uncultured pathogens, elucidating many putative virulence factors including an unprecedented array of novel gene families. PMID:27190004

  12. Comparative Genomic Analysis Reveals a Diverse Repertoire of Genes Involved in Prokaryote-Eukaryote Interactions within the Pseudovibrio Genus

    PubMed Central

    Romano, Stefano; Fernàndez-Guerra, Antonio; Reen, F. Jerry; Glöckner, Frank O.; Crowley, Susan P.; O'Sullivan, Orla; Cotter, Paul D.; Adams, Claire; Dobson, Alan D. W.; O'Gara, Fergal

    2016-01-01

    Strains of the Pseudovibrio genus have been detected worldwide, mainly as part of bacterial communities associated with marine invertebrates, particularly sponges. This recurrent association has been considered as an indication of a symbiotic relationship between these microbes and their host. Until recently, the availability of only two genomes, belonging to closely related strains, has limited the knowledge on the genomic and physiological features of the genus to a single phylogenetic lineage. Here we present 10 newly sequenced genomes of Pseudovibrio strains isolated from marine sponges from the west coast of Ireland, and including the other two publicly available genomes we performed an extensive comparative genomic analysis. Homogeneity was apparent in terms of both the orthologous genes and the metabolic features shared amongst the 12 strains. At the genomic level, a key physiological difference observed amongst the isolates was the presence only in strain P. axinellae AD2 of genes encoding proteins involved in assimilatory nitrate reduction, which was then proved experimentally. We then focused on studying those systems known to be involved in the interactions with eukaryotic and prokaryotic cells. This analysis revealed that the genus harbors a large diversity of toxin-like proteins, secretion systems and their potential effectors. Their distribution in the genus was not always consistent with the phylogenetic relationship of the strains. Finally, our analyses identified new genomic islands encoding potential toxin-immunity systems, previously unknown in the genus. Our analyses shed new light on the Pseudovibrio genus, indicating a large diversity of both metabolic features and systems for interacting with the host. The diversity in both distribution and abundance of these systems amongst the strains underlines how metabolically and phylogenetically similar bacteria may use different strategies to interact with the host and find a niche within its microbiota. Our data suggest the presence of a sponge-specific lineage of Pseudovibrio. The reduction in genome size and the loss of some systems potentially used to successfully enter the host, leads to the hypothesis that P. axinellae strain AD2 may be a lineage that presents an ancient association with the host and that may be vertically transmitted to the progeny. PMID:27065959

  13. Next-Generation Sequencing: The Translational Medicine Approach from “Bench to Bedside to Population”

    PubMed Central

    Beigh, Mohammad Muzafar

    2016-01-01

    Humans have predicted the relationship between heredity and diseases for a long time. Only in the beginning of the last century, scientists begin to discover the connotations between different genes and disease phenotypes. Recent trends in next-generation sequencing (NGS) technologies have brought a great momentum in biomedical research that in turn has remarkably augmented our basic understanding of human biology and its associated diseases. State-of-the-art next generation biotechnologies have started making huge strides in our current understanding of mechanisms of various chronic illnesses like cancers, metabolic disorders, neurodegenerative anomalies, etc. We are experiencing a renaissance in biomedical research primarily driven by next generation biotechnologies like genomics, transcriptomics, proteomics, metabolomics, lipidomics etc. Although genomic discoveries are at the forefront of next generation omics technologies, however, their implementation into clinical arena had been painstakingly slow mainly because of high reaction costs and unavailability of requisite computational tools for large-scale data analysis. However rapid innovations and steadily lowering cost of sequence-based chemistries along with the development of advanced bioinformatics tools have lately prompted launching and implementation of large-scale massively parallel genome sequencing programs in different fields ranging from medical genetics, infectious biology, agriculture sciences etc. Recent advances in large-scale omics-technologies is bringing healthcare research beyond the traditional “bench to bedside” approach to more of a continuum that will include improvements, in public healthcare and will be primarily based on predictive, preventive, personalized, and participatory medicine approach (P4). Recent large-scale research projects in genetic and infectious disease biology have indicated that massively parallel whole-genome/whole-exome sequencing, transcriptome analysis, and other functional genomic tools can reveal large number of unique functional elements and/or markers that otherwise would be undetected by traditional sequencing methodologies. Therefore, latest trends in the biomedical research is giving birth to the new branch in medicine commonly referred to as personalized and/or precision medicine. Developments in the post-genomic era are believed to completely restructure the present clinical pattern of disease prevention and treatment as well as methods of diagnosis and prognosis. The next important step in the direction of the precision/personalized medicine approach should be its early adoption in clinics for future medical interventions. Consequently, in coming year’s next generation biotechnologies will reorient medical practice more towards disease prediction and prevention approaches rather than curing them at later stages of their development and progression, even at wider population level(s) for general public healthcare system. PMID:28930123

  14. Efficiency of multi-breed genomic selection for dairy cattle breeds with different sizes of reference population.

    PubMed

    Hozé, C; Fritz, S; Phocas, F; Boichard, D; Ducrocq, V; Croiseau, P

    2014-01-01

    Single-breed genomic selection (GS) based on medium single nucleotide polymorphism (SNP) density (~50,000; 50K) is now routinely implemented in several large cattle breeds. However, building large enough reference populations remains a challenge for many medium or small breeds. The high-density BovineHD BeadChip (HD chip; Illumina Inc., San Diego, CA) containing 777,609 SNP developed in 2010 is characterized by short-distance linkage disequilibrium expected to be maintained across breeds. Therefore, combining reference populations can be envisioned. A population of 1,869 influential ancestors from 3 dairy breeds (Holstein, Montbéliarde, and Normande) was genotyped with the HD chip. Using this sample, 50K genotypes were imputed within breed to high-density genotypes, leading to a large HD reference population. This population was used to develop a multi-breed genomic evaluation. The goal of this paper was to investigate the gain of multi-breed genomic evaluation for a small breed. The advantage of using a large breed (Normande in the present study) to mimic a small breed is the large potential validation population to compare alternative genomic selection approaches more reliably. In the Normande breed, 3 training sets were defined with 1,597, 404, and 198 bulls, and a unique validation set included the 394 youngest bulls. For each training set, estimated breeding values (EBV) were computed using pedigree-based BLUP, single-breed BayesC, or multi-breed BayesC for which the reference population was formed by any of the Normande training data sets and 4,989 Holstein and 1,788 Montbéliarde bulls. Phenotypes were standardized by within-breed genetic standard deviation, the proportion of polygenic variance was set to 30%, and the estimated number of SNP with a nonzero effect was about 7,000. The 2 genomic selection (GS) approaches were performed using either the 50K or HD genotypes. The correlations between EBV and observed daughter yield deviations (DYD) were computed for 6 traits and using the different prediction approaches. Compared with pedigree-based BLUP, the average gain in accuracy with GS in small populations was 0.057 for the single-breed and 0.086 for multi-breed approach. This gain was up to 0.193 and 0.209, respectively, with the large reference population. Improvement of EBV prediction due to the multi-breed evaluation was higher for animals not closely related to the reference population. In the case of a breed with a small reference population size, the increase in correlation due to multi-breed GS was 0.141 for bulls without their sire in reference population compared with 0.016 for bulls with their sire in reference population. These results demonstrate that multi-breed GS can contribute to increase genomic evaluation accuracy in small breeds. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. Similar Genetic Architecture with Shared and Unique Quantitative Trait Loci for Bacterial Cold Water Disease Resistance in Two Rainbow Trout Breeding Populations

    PubMed Central

    Vallejo, Roger L.; Liu, Sixin; Gao, Guangtu; Fragomeni, Breno O.; Hernandez, Alvaro G.; Leeds, Timothy D.; Parsons, James E.; Martin, Kyle E.; Evenhuis, Jason P.; Welch, Timothy J.; Wiens, Gregory D.; Palti, Yniv

    2017-01-01

    Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonid aquaculture. In previous studies, we identified moderate-large effect quantitative trait loci (QTL) for BCWD resistance in rainbow trout (Oncorhynchus mykiss). However, the recent availability of a 57 K SNP array and a reference genome assembly have enabled us to conduct genome-wide association studies (GWAS) that overcome several experimental limitations from our previous work. In the current study, we conducted GWAS for BCWD resistance in two rainbow trout breeding populations using two genotyping platforms, the 57 K Affymetrix SNP array and restriction-associated DNA (RAD) sequencing. Overall, we identified 14 moderate-large effect QTL that explained up to 60.8% of the genetic variance in one of the two populations and 27.7% in the other. Four of these QTL were found in both populations explaining a substantial proportion of the variance, although major differences were also detected between the two populations. Our results confirm that BCWD resistance is controlled by the oligogenic inheritance of few moderate-large effect loci and a large-unknown number of loci each having a small effect on BCWD resistance. We detected differences in QTL number and genome location between two GWAS models (weighted single-step GBLUP and Bayes B), which highlights the utility of using different models to uncover QTL. The RAD-SNPs detected a greater number of QTL than the 57 K SNP array in one population, suggesting that the RAD-SNPs may uncover polymorphisms that are more unique and informative for the specific population in which they were discovered. PMID:29109734

  16. Similar Genetic Architecture with Shared and Unique Quantitative Trait Loci for Bacterial Cold Water Disease Resistance in Two Rainbow Trout Breeding Populations.

    PubMed

    Vallejo, Roger L; Liu, Sixin; Gao, Guangtu; Fragomeni, Breno O; Hernandez, Alvaro G; Leeds, Timothy D; Parsons, James E; Martin, Kyle E; Evenhuis, Jason P; Welch, Timothy J; Wiens, Gregory D; Palti, Yniv

    2017-01-01

    Bacterial cold water disease (BCWD) causes significant mortality and economic losses in salmonid aquaculture. In previous studies, we identified moderate-large effect quantitative trait loci (QTL) for BCWD resistance in rainbow trout ( Oncorhynchus mykiss ). However, the recent availability of a 57 K SNP array and a reference genome assembly have enabled us to conduct genome-wide association studies (GWAS) that overcome several experimental limitations from our previous work. In the current study, we conducted GWAS for BCWD resistance in two rainbow trout breeding populations using two genotyping platforms, the 57 K Affymetrix SNP array and restriction-associated DNA (RAD) sequencing. Overall, we identified 14 moderate-large effect QTL that explained up to 60.8% of the genetic variance in one of the two populations and 27.7% in the other. Four of these QTL were found in both populations explaining a substantial proportion of the variance, although major differences were also detected between the two populations. Our results confirm that BCWD resistance is controlled by the oligogenic inheritance of few moderate-large effect loci and a large-unknown number of loci each having a small effect on BCWD resistance. We detected differences in QTL number and genome location between two GWAS models (weighted single-step GBLUP and Bayes B), which highlights the utility of using different models to uncover QTL. The RAD-SNPs detected a greater number of QTL than the 57 K SNP array in one population, suggesting that the RAD-SNPs may uncover polymorphisms that are more unique and informative for the specific population in which they were discovered.

  17. Molecular analysis of vector genome structures after liver transduction by conventional and self-complementary adeno-associated viral serotype vectors in murine and nonhuman primate models.

    PubMed

    Sun, Xun; Lu, You; Bish, Lawrence T; Calcedo, Roberto; Wilson, James M; Gao, Guangping

    2010-06-01

    Vectors based on several new adeno-associated viral (AAV) serotypes demonstrated strong hepatocyte tropism and transduction efficiency in both small- and large-animal models for liver-directed gene transfer. Efficiency of liver transduction by AAV vectors can be further improved in both murine and nonhuman primate (NHP) animals when the vector genomes are packaged in a self-complementary (sc) format. In an attempt to understand potential molecular mechanism(s) responsible for enhanced transduction efficiency of the sc vector in liver, we performed extensive molecular studies of genome structures of conventional single-stranded (ss) and sc AAV vectors from liver after AAV gene transfer in both mice and NHPs. These included treatment with exonucleases with specific substrate preferences, single-cutter restriction enzyme digestion and polarity-specific hybridization-based vector genome mapping, and bacteriophage phi29 DNA polymerase-mediated and double-stranded circular template-specific rescue of persisted circular genomes. In mouse liver, vector genomes of both genome formats seemed to persist primarily as episomal circular forms, but sc vectors converted into circular forms more rapidly and efficiently. However, the overall differences in vector genome abundance and structure in the liver between ss and sc vectors could not account for the remarkable differences in transduction. Molecular structures of persistent genomes of both ss and sc vectors were significantly more heterogeneous in macaque liver, with noticeable structural rearrangements that warrant further characterizations.

  18. Molecular Analysis of Vector Genome Structures After Liver Transduction by Conventional and Self-Complementary Adeno-Associated Viral Serotype Vectors in Murine and Nonhuman Primate Models

    PubMed Central

    Sun, Xun; Lu, You; Bish, Lawrence T.; Calcedo, Roberto; Wilson, James M.

    2010-01-01

    Abstract Vectors based on several new adeno-associated viral (AAV) serotypes demonstrated strong hepatocyte tropism and transduction efficiency in both small- and large-animal models for liver-directed gene transfer. Efficiency of liver transduction by AAV vectors can be further improved in both murine and nonhuman primate (NHP) animals when the vector genomes are packaged in a self-complementary (sc) format. In an attempt to understand potential molecular mechanism(s) responsible for enhanced transduction efficiency of the sc vector in liver, we performed extensive molecular studies of genome structures of conventional single-stranded (ss) and sc AAV vectors from liver after AAV gene transfer in both mice and NHPs. These included treatment with exonucleases with specific substrate preferences, single-cutter restriction enzyme digestion and polarity-specific hybridization-based vector genome mapping, and bacteriophage ϕ29 DNA polymerase-mediated and double-stranded circular template-specific rescue of persisted circular genomes. In mouse liver, vector genomes of both genome formats seemed to persist primarily as episomal circular forms, but sc vectors converted into circular forms more rapidly and efficiently. However, the overall differences in vector genome abundance and structure in the liver between ss and sc vectors could not account for the remarkable differences in transduction. Molecular structures of persistent genomes of both ss and sc vectors were significantly more heterogeneous in macaque liver, with noticeable structural rearrangements that warrant further characterizations. PMID:20113166

  19. Performance comparison of two efficient genomic selection methods (gsbay & MixP) applied in aquacultural organisms

    NASA Astrophysics Data System (ADS)

    Su, Hailin; Li, Hengde; Wang, Shi; Wang, Yangfan; Bao, Zhenmin

    2017-02-01

    Genomic selection is more and more popular in animal and plant breeding industries all around the world, as it can be applied early in life without impacting selection candidates. The objective of this study was to bring the advantages of genomic selection to scallop breeding. Two different genomic selection tools MixP and gsbay were applied on genomic evaluation of simulated data and Zhikong scallop ( Chlamys farreri) field data. The data were compared with genomic best linear unbiased prediction (GBLUP) method which has been applied widely. Our results showed that both MixP and gsbay could accurately estimate single-nucleotide polymorphism (SNP) marker effects, and thereby could be applied for the analysis of genomic estimated breeding values (GEBV). In simulated data from different scenarios, the accuracy of GEBV acquired was ranged from 0.20 to 0.78 by MixP; it was ranged from 0.21 to 0.67 by gsbay; and it was ranged from 0.21 to 0.61 by GBLUP. Estimations made by MixP and gsbay were expected to be more reliable than those estimated by GBLUP. Predictions made by gsbay were more robust, while with MixP the computation is much faster, especially in dealing with large-scale data. These results suggested that both algorithms implemented by MixP and gsbay are feasible to carry out genomic selection in scallop breeding, and more genotype data will be necessary to produce genomic estimated breeding values with a higher accuracy for the industry.

  20. Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems.« less

  1. Whole genome sequencing of Rhodotorula mucilaginosa isolated from the chewing stick (Distemonanthus benthamianus): insights into Rhodotorula phylogeny, mitogenome dynamics and carotenoid biosynthesis.

    PubMed

    Gan, Han Ming; Thomas, Bolaji N; Cavanaugh, Nicole T; Morales, Grace H; Mayers, Ashley N; Savka, Michael A; Hudson, André O

    2017-01-01

    In industry, the yeast Rhodotorula mucilaginosa is commonly used for the production of carotenoids. The production of carotenoids is important because they are used as natural colorants in food and some carotenoids are precursors of retinol (vitamin A). However, the identification and molecular characterization of the carotenoid pathway/s in species belonging to the genus Rhodotorula is scarce due to the lack of genomic information thus potentially impeding effective metabolic engineering of these yeast strains for improved carotenoid production. In this study, we report the isolation, identification, characterization and the whole nuclear genome and mitogenome sequence of the endophyte R. mucilaginosa RIT389 isolated from Distemonanthus benthamianus, a plant known for its anti-fungal and antibacterial properties and commonly used as chewing sticks. The assembled genome of R. mucilaginosa RIT389 is 19 Mbp in length with an estimated genomic heterozygosity of 9.29%. Whole genome phylogeny supports the species designation of strain RIT389 within the genus in addition to supporting the monophyly of the currently sequenced Rhodotorula species. Further, we report for the first time, the recovery of the complete mitochondrial genome of R. mucilaginosa using the genome skimming approach. The assembled mitogenome is at least 7,000 bases larger than that of Rhodotorula taiwanensis which is largely attributed to the presence of large intronic regions containing open reading frames coding for homing endonuclease from the LAGLIDADG and GIY-YIG families. Furthermore, genomic regions containing the key genes for carotenoid production were identified in R. mucilaginosa RIT389, revealing differences in gene synteny that may play a role in the regulation of the biotechnologically important carotenoid synthesis pathways in yeasts.

  2. Living with genome instability: the adaptation of phytoplasmas todiverse environments of their insect and plant hosts

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bai, Xiaodong; Zhang, Jianhua; Ewing, Adam

    Phytoplasmas (Candidatus Phytoplasma, Class Mollicutes) cause disease in hundreds of economically important plants, and are obligately transmitted by sap-feeding insects of the order Hemiptera, mainly leafhoppers and psyllids. The 706,569-bp chromosome and four plasmids of aster yellows phytoplasma strain witches broom (AY-WB) were sequenced and compared to the onion yellows phytoplasma strain M (OY-M) genome. The phytoplasmas have small repeat-rich genomes. The repeated DNAs are organized into large clusters, potential mobile units (PMUs), which contain tra5 insertion sequences (ISs), and specialized sigma factors and membrane proteins. So far, PMUs are unique to phytoplasmas. Compared to mycoplasmas, phytoplasmas lack several recombinationmore » and DNA modification functions, and therefore phytoplasmas probably use different mechanisms of recombination, likely involving PMUs, for the creation of variability, allowing phytoplasmas to adjust to the diverse environments of plants and insects. The irregular GC skews and presence of ISs and large repeated sequences in the AY-WB and OY-M genomes are indicative of high genomic plasticity. Nevertheless, segments of {approx}250 kb, located between genes lplA and glnQ are syntenic between the two phytoplasmas, contain the majority of the metabolic genes and no ISs. AY-WB is further along in the reductive evolution process than OY-M. The AY-WB genome is {approx}154 kb smaller than the OY-M genome, primarily as a result of fewer multicopy sequences, including PMUs. Further, AY-WB lacks genes that are truncated and are part of incomplete pathways in OY-M. This is the first comparative phytoplasma genome analysis and report of the existence of PMUs in phytoplasma genomes.« less

  3. Navigating the currents of seascape genomics: how spatial analyses can augment population genomic studies

    PubMed Central

    Crandall, Eric D.; Liggins, Libby; Bongaerts, Pim; Treml, Eric A.

    2016-01-01

    Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated selection and that seascape genomic approaches are especially useful for identifying robust locus-by-environment associations. PMID:29491947

  4. Whole genome sequencing of Rhodotorula mucilaginosa isolated from the chewing stick (Distemonanthus benthamianus): insights into Rhodotorula phylogeny, mitogenome dynamics and carotenoid biosynthesis

    PubMed Central

    Thomas, Bolaji N.; Cavanaugh, Nicole T.; Morales, Grace H.; Mayers, Ashley N.; Savka, Michael A.

    2017-01-01

    In industry, the yeast Rhodotorula mucilaginosa is commonly used for the production of carotenoids. The production of carotenoids is important because they are used as natural colorants in food and some carotenoids are precursors of retinol (vitamin A). However, the identification and molecular characterization of the carotenoid pathway/s in species belonging to the genus Rhodotorula is scarce due to the lack of genomic information thus potentially impeding effective metabolic engineering of these yeast strains for improved carotenoid production. In this study, we report the isolation, identification, characterization and the whole nuclear genome and mitogenome sequence of the endophyte R. mucilaginosa RIT389 isolated from Distemonanthus benthamianus, a plant known for its anti-fungal and antibacterial properties and commonly used as chewing sticks. The assembled genome of R. mucilaginosa RIT389 is 19 Mbp in length with an estimated genomic heterozygosity of 9.29%. Whole genome phylogeny supports the species designation of strain RIT389 within the genus in addition to supporting the monophyly of the currently sequenced Rhodotorula species. Further, we report for the first time, the recovery of the complete mitochondrial genome of R. mucilaginosa using the genome skimming approach. The assembled mitogenome is at least 7,000 bases larger than that of Rhodotorula taiwanensis which is largely attributed to the presence of large intronic regions containing open reading frames coding for homing endonuclease from the LAGLIDADG and GIY-YIG families. Furthermore, genomic regions containing the key genes for carotenoid production were identified in R. mucilaginosa RIT389, revealing differences in gene synteny that may play a role in the regulation of the biotechnologically important carotenoid synthesis pathways in yeasts. PMID:29158974

  5. Navigating the currents of seascape genomics: how spatial analyses can augment population genomic studies.

    PubMed

    Riginos, Cynthia; Crandall, Eric D; Liggins, Libby; Bongaerts, Pim; Treml, Eric A

    2016-12-01

    Population genomic approaches are making rapid inroads in the study of non-model organisms, including marine taxa. To date, these marine studies have predominantly focused on rudimentary metrics describing the spatial and environmental context of their study region (e.g., geographical distance, average sea surface temperature, average salinity). We contend that a more nuanced and considered approach to quantifying seascape dynamics and patterns can strengthen population genomic investigations and help identify spatial, temporal, and environmental factors associated with differing selective regimes or demographic histories. Nevertheless, approaches for quantifying marine landscapes are complicated. Characteristic features of the marine environment, including pelagic living in flowing water (experienced by most marine taxa at some point in their life cycle), require a well-designed spatial-temporal sampling strategy and analysis. Many genetic summary statistics used to describe populations may be inappropriate for marine species with large population sizes, large species ranges, stochastic recruitment, and asymmetrical gene flow. Finally, statistical approaches for testing associations between seascapes and population genomic patterns are still maturing with no single approach able to capture all relevant considerations. None of these issues are completely unique to marine systems and therefore similar issues and solutions will be shared for many organisms regardless of habitat. Here, we outline goals and spatial approaches for landscape genomics with an emphasis on marine systems and review the growing empirical literature on seascape genomics. We review established tools and approaches and highlight promising new strategies to overcome select issues including a strategy to spatially optimize sampling. Despite the many challenges, we argue that marine systems may be especially well suited for identifying candidate genomic regions under environmentally mediated selection and that seascape genomic approaches are especially useful for identifying robust locus-by-environment associations.

  6. Use of randomly mutagenized genomic cDNA banks of potato spindle tuber viroid to screen for viable versions of the viroid genome.

    PubMed

    Więsyk, Aneta; Candresse, Thierry; Zagórski, Włodzimierz; Góra-Sochacka, Anna

    2011-02-01

    In an effort to study sequence space allowing the recovery of viable potato spindle tuber viroid (PSTVd) variants we have developed an in vivo selection (Selex) method to produce and bulk-inoculate by agroinfiltration large PSTVd cDNA banks in which a short stretch of the genome is mutagenized to saturation. This technique was applied to two highly conserved 6 nt-long regions of the PSTVd genome, the left terminal loop (TL bank) and part of the polypurine stretch in the upper strand of pre-melting loop 1 (PM1 bank). In each case, PSTVd accumulation was observed in a large fraction of bank-inoculated tomato plants. Characterization of the progeny molecules showed the recovery of the parental PSTVd sequence in 89 % (TL bank) and 18 % (PM1 bank) of the analysed plants. In addition, viable and genetically stable PSTVd variants with mutations outside of the known natural variability of PSTVd were recovered in both cases, although at different rates. In the case of the TL region, mutations were recovered at five of the six mutagenized positions (357, 358, 359, 1 and 3 of the genome) while for the PM1 region mutations were recovered at all six targeted positions (50-55), providing significant new insight on the plasticity of the PSTVd genome.

  7. Comparative genome analysis of 19 Ureaplasma urealyticum and Ureaplasma parvum strains

    PubMed Central

    2012-01-01

    Background Ureaplasma urealyticum (UUR) and Ureaplasma parvum (UPA) are sexually transmitted bacteria among humans implicated in a variety of disease states including but not limited to: nongonococcal urethritis, infertility, adverse pregnancy outcomes, chorioamnionitis, and bronchopulmonary dysplasia in neonates. There are 10 distinct serotypes of UUR and 4 of UPA. Efforts to determine whether difference in pathogenic potential exists at the ureaplasma serovar level have been hampered by limitations of antibody-based typing methods, multiple cross-reactions and poor discriminating capacity in clinical samples containing two or more serovars. Results We determined the genome sequences of the American Type Culture Collection (ATCC) type strains of all UUR and UPA serovars as well as four clinical isolates of UUR for which we were not able to determine serovar designation. UPA serovars had 0.75−0.78 Mbp genomes and UUR serovars were 0.84−0.95 Mbp. The original classification of ureaplasma isolates into distinct serovars was largely based on differences in the major ureaplasma surface antigen called the multiple banded antigen (MBA) and reactions of human and animal sera to the organisms. Whole genome analysis of the 14 serovars and the 4 clinical isolates showed the mba gene was part of a large superfamily, which is a phase variable gene system, and that some serovars have identical sets of mba genes. Most of the differences among serovars are hypothetical genes, and in general the two species and 14 serovars are extremely similar at the genome level. Conclusions Comparative genome analysis suggests UUR is more capable of acquiring genes horizontally, which may contribute to its greater virulence for some conditions. The overwhelming evidence of extensive horizontal gene transfer among these organisms from our previous studies combined with our comparative analysis indicates that ureaplasmas exist as quasi-species rather than as stable serovars in their native environment. Therefore, differential pathogenicity and clinical outcome of a ureaplasmal infection is most likely not on the serovar level, but rather may be due to the presence or absence of potential pathogenicity factors in an individual ureaplasma clinical isolate and/or patient to patient differences in terms of autoimmunity and microbiome. PMID:22646228

  8. Comparative genome analysis of 19 Ureaplasma urealyticum and Ureaplasma parvum strains.

    PubMed

    Paralanov, Vanya; Lu, Jin; Duffy, Lynn B; Crabb, Donna M; Shrivastava, Susmita; Methé, Barbara A; Inman, Jason; Yooseph, Shibu; Xiao, Li; Cassell, Gail H; Waites, Ken B; Glass, John I

    2012-05-30

    Ureaplasma urealyticum (UUR) and Ureaplasma parvum (UPA) are sexually transmitted bacteria among humans implicated in a variety of disease states including but not limited to: nongonococcal urethritis, infertility, adverse pregnancy outcomes, chorioamnionitis, and bronchopulmonary dysplasia in neonates. There are 10 distinct serotypes of UUR and 4 of UPA. Efforts to determine whether difference in pathogenic potential exists at the ureaplasma serovar level have been hampered by limitations of antibody-based typing methods, multiple cross-reactions and poor discriminating capacity in clinical samples containing two or more serovars. We determined the genome sequences of the American Type Culture Collection (ATCC) type strains of all UUR and UPA serovars as well as four clinical isolates of UUR for which we were not able to determine serovar designation. UPA serovars had 0.75-0.78 Mbp genomes and UUR serovars were 0.84-0.95 Mbp. The original classification of ureaplasma isolates into distinct serovars was largely based on differences in the major ureaplasma surface antigen called the multiple banded antigen (MBA) and reactions of human and animal sera to the organisms. Whole genome analysis of the 14 serovars and the 4 clinical isolates showed the mba gene was part of a large superfamily, which is a phase variable gene system, and that some serovars have identical sets of mba genes. Most of the differences among serovars are hypothetical genes, and in general the two species and 14 serovars are extremely similar at the genome level. Comparative genome analysis suggests UUR is more capable of acquiring genes horizontally, which may contribute to its greater virulence for some conditions. The overwhelming evidence of extensive horizontal gene transfer among these organisms from our previous studies combined with our comparative analysis indicates that ureaplasmas exist as quasi-species rather than as stable serovars in their native environment. Therefore, differential pathogenicity and clinical outcome of a ureaplasmal infection is most likely not on the serovar level, but rather may be due to the presence or absence of potential pathogenicity factors in an individual ureaplasma clinical isolate and/or patient to patient differences in terms of autoimmunity and microbiome.

  9. DNA transposons have colonized the genome of the giant virus Pandoravirus salinus.

    PubMed

    Sun, Cheng; Feschotte, Cédric; Wu, Zhiqiang; Mueller, Rachel Lockridge

    2015-06-12

    Transposable elements are mobile DNA sequences that are widely distributed in prokaryotic and eukaryotic genomes, where they represent a major force in genome evolution. However, transposable elements have rarely been documented in viruses, and their contribution to viral genome evolution remains largely unexplored. Pandoraviruses are recently described DNA viruses with genome sizes that exceed those of some prokaryotes, rivaling parasitic eukaryotes. These large genomes appear to include substantial noncoding intergenic spaces, which provide potential locations for transposable element insertions. However, no mobile genetic elements have yet been reported in pandoravirus genomes. Here, we report a family of miniature inverted-repeat transposable elements (MITEs) in the Pandoravirus salinus genome, representing the first description of a virus populated with a canonical transposable element family that proliferated by transposition within the viral genome. The MITE family, which we name Submariner, includes 30 copies with all the hallmarks of MITEs: short length, terminal inverted repeats, TA target site duplication, and no coding capacity. Submariner elements show signs of transposition and are undetectable in the genome of Pandoravirus dulcis, the closest known relative Pandoravirus salinus. We identified a DNA transposon related to Submariner in the genome of Acanthamoeba castellanii, a species thought to host pandoraviruses, which contains remnants of coding sequence for a Tc1/mariner transposase. These observations suggest that the Submariner MITEs of P. salinus belong to the widespread Tc1/mariner superfamily and may have been mobilized by an amoebozoan host. Ten of the 30 MITEs in the P. salinus genome are located within coding regions of predicted genes, while others are close to genes, suggesting that these transposons may have contributed to viral genetic novelty. Our discovery highlights the remarkable ability of DNA transposons to colonize and shape genomes from all domains of life, as well as giant viruses. Our findings continue to blur the division between viral and cellular genomes, adhering to the emerging view that the content, dynamics, and evolution of the genomes of giant viruses do not substantially differ from those of cellular organisms.

  10. Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

    PubMed

    Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

    2014-07-01

    Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  11. The genomic architecture and association genetics of adaptive characters using a candidate SNP approach in boreal black spruce

    PubMed Central

    2013-01-01

    Background The genomic architecture of adaptive traits remains poorly understood in non-model plants. Various approaches can be used to bridge this gap, including the mapping of quantitative trait loci (QTL) in pedigrees, and genetic association studies in non-structured populations. Here we present results on the genomic architecture of adaptive traits in black spruce, which is a widely distributed conifer of the North American boreal forest. As an alternative to the usual candidate gene approach, a candidate SNP approach was developed for association testing. Results A genetic map containing 231 gene loci was used to identify QTL that were related to budset timing and to tree height assessed over multiple years and sites. Twenty-two unique genomic regions were identified, including 20 that were related to budset timing and 6 that were related to tree height. From results of outlier detection and bulk segregant analysis for adaptive traits using DNA pool sequencing of 434 genes, 52 candidate SNPs were identified and subsequently tested in genetic association studies for budset timing and tree height assessed over multiple years and sites. A total of 34 (65%) SNPs were significantly associated with budset timing, or tree height, or both. Although the percentages of explained variance (PVE) by individual SNPs were small, several significant SNPs were shared between sites and among years. Conclusions The sharing of genomic regions and significant SNPs between budset timing and tree height indicates pleiotropic effects. Significant QTLs and SNPs differed quite greatly among years, suggesting that different sets of genes for the same characters are involved at different stages in the tree’s life history. The functional diversity of genes carrying significant SNPs and low observed PVE further indicated that a large number of polymorphisms are involved in adaptive genetic variation. Accordingly, for undomesticated species such as black spruce with natural populations of large effective size and low linkage disequilibrium, efficient marker systems that are predictive of adaptation should require the survey of large numbers of SNPs. Candidate SNP approaches like the one developed in the present study could contribute to reducing these numbers. PMID:23724860

  12. Genomic epidemiology of Cryptococcus yeasts identifies adaptation to environmental niches underpinning infection across an African HIV/AIDS cohort.

    PubMed

    Vanhove, Mathieu; Beale, Mathew A; Rhodes, Johanna; Chanda, Duncan; Lakhi, Shabir; Kwenda, Geoffrey; Molloy, Sile; Karunaharan, Natasha; Stone, Neil; Harrison, Thomas S; Bicanic, Tihana; Fisher, Matthew C

    2017-04-01

    Emerging infections caused by fungi have become a widely recognized global phenomenon and are causing an increasing burden of disease. Genomic techniques are providing new insights into the structure of fungal populations, revealing hitherto undescribed fine-scale adaptations to environments and hosts that govern their emergence as infections. Cryptococcal meningitis is a neglected tropical disease that is responsible for a large proportion of AIDS-related deaths across Africa; however, the ecological determinants that underlie a patient's risk of infection remain largely unexplored. Here, we use genome sequencing and ecological genomics to decipher the evolutionary ecology of the aetiological agents of cryptococcal meningitis, Cryptococcus neoformans and Cryptococcus gattii, across the central African country of Zambia. We show that the occurrence of these two pathogens is differentially associated with biotic (macroecological) and abiotic (physical) factors across two key African ecoregions, Central Miombo woodlands and Zambezi Mopane woodlands. We show that speciation of Cryptococcus has resulted in adaptation to occupy different ecological niches, with C. neoformans found to occupy Zambezi Mopane woodlands and C. gattii primarily recovered from Central Miombo woodlands. Genome sequencing shows that C. neoformans causes 95% of human infections in this region, of which over three-quarters belonged to the globalized lineage VNI. We show that VNI infections are largely associated with urbanized populations in Zambia. Conversely, the majority of C. neoformans isolates recovered in the environment belong to the genetically diverse African-endemic lineage VNB, and we show hitherto unmapped levels of genomic diversity within this lineage. Our results reveal the complex evolutionary ecology that underpins the reservoirs of infection for this, and likely other, deadly pathogenic fungi. © 2016 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  13. A Single Transcriptome of a Green Toad (Bufo viridis) Yields Candidate Genes for Sex Determination and -Differentiation and Non-Anonymous Population Genetic Markers

    PubMed Central

    Gerchen, Jörn F.; Reichert, Samuel J.; Röhr, Johannes T.; Dieterich, Christoph; Kloas, Werner

    2016-01-01

    Large genome size, including immense repetitive and non-coding fractions, still present challenges for capacity, bioinformatics and thus affordability of whole genome sequencing in most amphibians. Here, we test the performance of a single transcriptome to understand whether it can provide a cost-efficient resource for species with large unknown genomes. Using RNA from six different tissues from a single Palearctic green toad (Bufo viridis) specimen and Hiseq2000, we obtained 22,5 Mio reads and publish >100,000 unigene sequences. To evaluate efficacy and quality, we first use this data to identify green toad specific candidate genes, known from other vertebrates for their role in sex determination and differentiation. Of a list of 37 genes, the transcriptome yielded 32 (87%), many of which providing the first such data for this non-model anuran species. However, for many of these genes, only fragments could be retrieved. In order to allow also applications to population genetics, we further used the transcriptome for the targeted development of 21 non-anonymous microsatellites and tested them in genetic families and backcrosses. Eleven markers were specifically developed to be located on the B. viridis sex chromosomes; for eight markers we can indeed demonstrate sex-specific transmission in genetic families. Depending on phylogenetic distance, several markers, which are sex-linked in green toads, show high cross-amplification success across the anuran phylogeny, involving nine systematic anuran families. Our data support the view that single transcriptome sequencing (based on multiple tissues) provides a reliable genomic resource and cost-efficient method for non-model amphibian species with large genome size and, despite limitations, should be considered as long as genome sequencing remains unaffordable for most species. PMID:27232626

  14. Meta-Analysis of Genome-Wide Scans Provides Evidence for Sex- and Site-Specific Regulation of Bone Mass

    PubMed Central

    Sham, Pak C; Zintzaras, Elias; Lewis, Cathryn M; Deng, Hong-Wen; Econs, Michael J; Karasik, David; Devoto, Marcella; Kammerer, Candace M; Spector, Tim; Andrew, Toby; Cupples, L Adrienne; Duncan, Emma L; Foroud, Tatiana; Kiel, Douglas P; Koller, Daniel; Langdahl, Bente; Mitchell, Braxton D; Peacock, Munro; Recker, Robert; Shen, Hui; Sol-Church, Katia; Spotila, Loretta D; Uitterlinden, Andre G; Wilson, Scott G; Kung, Annie WC; Ralston, Stuart H

    2014-01-01

    Several genome-wide scans have been performed to detect loci that regulate BMD, but these have yielded inconsistent results, with limited replication of linkage peaks in different studies. In an effort to improve statistical power for detection of these loci, we performed a meta-analysis of genome-wide scans in which spine or hip BMD were studied. Evidence was gained to suggest that several chromosomal loci regulate BMD in a site-specific and sex-specific manner. Introduction BMD is a heritable trait and an important predictor of osteoporotic fracture risk. Several genome-wide scans have been performed in an attempt to detect loci that regulate BMD, but there has been limited replication of linkage peaks between studies. In an attempt to resolve these inconsistencies, we conducted a collaborative meta-analysis of genome-wide linkage scans in which femoral neck BMD (FN-BMD) or lumbar spine BMD (LS-BMD) had been studied. Materials and Methods Data were accumulated from nine genome-wide scans involving 11,842 subjects. Data were analyzed separately for LS-BMD and FN-BMD and by sex. For each study, genomic bins of 30 cM were defined and ranked according to the maximum LOD score they contained. While various densitometers were used in different studies, the ranking approach that we used means that the results are not confounded by the fact that different measurement devices were used. Significance for high average rank and heterogeneity was obtained through Monte Carlo testing. Results For LS-BMD, the quantitative trait locus (QTL) with greatest significance was on chromosome 1p13.3-q23.3 (p = 0.004), but this exhibited high heterogeneity and the effect was specific for women. Other significant LS-BMD QTLs were on chromosomes 12q24.31-qter, 3p25.3-p22.1, 11p12-q13.3, and 1q32-q42.3, including one on 18p11-q12.3 that had not been detected by individual studies. For FN-BMD, the strongest QTL was on chromosome 9q31.1-q33.3 (p = 0.002). Other significant QTLs were identified on chromosomes 17p12-q21.33, 14q13.1-q24.1, 9q21.32-q31.1, and 5q14.3-q23.2. There was no correlation in average ranks of bins between men and women and the loci that regulated BMD in men and women and at different sites were largely distinct. Conclusions This large-scale meta-analysis provided evidence for replication of several QTLs identified in previous studies and also identified a QTL on chromosome 18p11-q12.3, which had not been detected by individual studies. However, despite the large sample size, none of the individual loci identified reached genome-wide significance. PMID:17228994

  15. From Genes to Environment: Using integrative genomics to build a “systems level” understanding of autism spectrum disorders

    PubMed Central

    Hu, Valerie W.

    2012-01-01

    Autism spectrum disorders (ASD) are pervasive neurodevelopmental disorders that affect an estimated 1 in 110 individuals. Although there is a strong genetic component associated with these disorders, this review focuses on the multi-factorial nature of ASD and how different genome-wide (genomic) approaches contribute to our understanding of autism. Emphasis is placed on the need to study defined ASD phenotypes as well as to integrate large-scale ‘omics’ data in order to develop a “systems level” perspective of ASD which, in turn, is necessary to allow predictions regarding responses to specific perturbations and interventions. PMID:22497667

  16. Isolation and amplification of genomic DNA from recalcitrant dried berries of black pepper (Piper nigrum L.)--a medicinal spice.

    PubMed

    Dhanya, K; Kizhakkayil, Jaleel; Syamkumar, S; Sasikumar, B

    2007-10-01

    Black pepper is an important medicinal spice traded internationally. The extraction of high quality genomic DNA for PCR amplification from dried black pepper is challenging because of the presence of the exceptionally large amount of oxidized polyphenolic compounds, polysaccharides and other secondary metabolites. Here we report a modified hexadecyl trimethyl ammonium bromide (CTAB) protocol by incorporating potassium acetate and a final PEG precipitation step to isolate PCR amplifiable genomic DNA from dried and powdered berries of black pepper. The protocol has trade implication as it will help in the PCR characterization of traded black peppers from different countries.

  17. Random Distribution Pattern and Non-adaptivity of Genome Size in a Highly Variable Population of Festuca pallens

    PubMed Central

    Šmarda, Petr; Bureš, Petr; Horová, Lucie

    2007-01-01

    Background and Aims The spatial and statistical distribution of genome sizes and the adaptivity of genome size to some types of habitat, vegetation or microclimatic conditions were investigated in a tetraploid population of Festuca pallens. The population was previously documented to vary highly in genome size and is assumed as a model for the study of the initial stages of genome size differentiation. Methods Using DAPI flow cytometry, samples were measured repeatedly with diploid Festuca pallens as the internal standard. Altogether 172 plants from 57 plots (2·25 m2), distributed in contrasting habitats over the whole locality in South Moravia, Czech Republic, were sampled. The differences in DNA content were confirmed by the double peaks of simultaneously measured samples. Key Results At maximum, a 1·115-fold difference in genome size was observed. The statistical distribution of genome sizes was found to be continuous and best fits the extreme (Gumbel) distribution with rare occurrences of extremely large genomes (positive-skewed), as it is similar for the log-normal distribution of the whole Angiosperms. Even plants from the same plot frequently varied considerably in genome size and the spatial distribution of genome sizes was generally random and unautocorrelated (P > 0·05). The observed spatial pattern and the overall lack of correlations of genome size with recognized vegetation types or microclimatic conditions indicate the absence of ecological adaptivity of genome size in the studied population. Conclusions These experimental data on intraspecific genome size variability in Festuca pallens argue for the absence of natural selection and the selective non-significance of genome size in the initial stages of genome size differentiation, and corroborate the current hypothetical model of genome size evolution in Angiosperms (Bennetzen et al., 2005, Annals of Botany 95: 127–132). PMID:17565968

  18. How may targeted proteomics complement genomic data in breast cancer?

    PubMed

    Guerin, Mathilde; Gonçalves, Anthony; Toiron, Yves; Baudelet, Emilie; Audebert, Stéphane; Boyer, Jean-Baptiste; Borg, Jean-Paul; Camoin, Luc

    2017-01-01

    Breast cancer (BC) is the most common female cancer in the world and was recently deconstructed in different molecular entities. Although most of the recent assays to characterize tumors at the molecular level are genomic-based, proteins are the actual executors of cellular functions and represent the vast majority of targets for anticancer drugs. Accumulated data has demonstrated an important level of quantitative and qualitative discrepancies between genomic/transcriptomic alterations and their protein counterparts, mostly related to the large number of post-translational modifications. Areas covered: This review will present novel proteomics technologies such as Reverse Phase Protein Array (RPPA) or mass-spectrometry (MS) based approaches that have emerged and that could progressively replace old-fashioned methods (e.g. immunohistochemistry, ELISA, etc.) to validate proteins as diagnostic, prognostic or predictive biomarkers, and eventually monitor them in the routine practice. Expert commentary: These different targeted proteomic approaches, able to complement genomic data in BC and characterize tumors more precisely, will permit to go through a more personalized treatment for each patient and tumor.

  19. Comparative analysis of metazoan chromatin organization.

    PubMed

    Ho, Joshua W K; Jung, Youngsook L; Liu, Tao; Alver, Burak H; Lee, Soohyun; Ikegami, Kohta; Sohn, Kyung-Ah; Minoda, Aki; Tolstorukov, Michael Y; Appert, Alex; Parker, Stephen C J; Gu, Tingting; Kundaje, Anshul; Riddle, Nicole C; Bishop, Eric; Egelhofer, Thea A; Hu, Sheng'en Shawn; Alekseyenko, Artyom A; Rechtsteiner, Andreas; Asker, Dalal; Belsky, Jason A; Bowman, Sarah K; Chen, Q Brent; Chen, Ron A-J; Day, Daniel S; Dong, Yan; Dose, Andrea C; Duan, Xikun; Epstein, Charles B; Ercan, Sevinc; Feingold, Elise A; Ferrari, Francesco; Garrigues, Jacob M; Gehlenborg, Nils; Good, Peter J; Haseley, Psalm; He, Daniel; Herrmann, Moritz; Hoffman, Michael M; Jeffers, Tess E; Kharchenko, Peter V; Kolasinska-Zwierz, Paulina; Kotwaliwale, Chitra V; Kumar, Nischay; Langley, Sasha A; Larschan, Erica N; Latorre, Isabel; Libbrecht, Maxwell W; Lin, Xueqiu; Park, Richard; Pazin, Michael J; Pham, Hoang N; Plachetka, Annette; Qin, Bo; Schwartz, Yuri B; Shoresh, Noam; Stempor, Przemyslaw; Vielle, Anne; Wang, Chengyang; Whittle, Christina M; Xue, Huiling; Kingston, Robert E; Kim, Ju Han; Bernstein, Bradley E; Dernburg, Abby F; Pirrotta, Vincenzo; Kuroda, Mitzi I; Noble, William S; Tullius, Thomas D; Kellis, Manolis; MacAlpine, David M; Strome, Susan; Elgin, Sarah C R; Liu, Xiaole Shirley; Lieb, Jason D; Ahringer, Julie; Karpen, Gary H; Park, Peter J

    2014-08-28

    Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.

  20. A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis

    PubMed Central

    Yang, Yilong

    2017-01-01

    Abstract The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Access Array system and 454 sequencing platform. About 24 single-copy or low-copy nuclear genes distributed across the genome were amplified and sequenced from 96 genomic DNA samples representing 16 Fragaria species from diploid (2×) to decaploid (10×), including the most extensive sampling of octoploid taxa yet reported. Individual gene trees were constructed by different tree-building methods. Mosaic genomic structures of diploid Fragaria species consisting of sequences at different phylogenetic positions were observed. Our findings support the presence in octoploid species of genetic signatures from at least five diploid ancestors (F. vesca, F. iinumae, F. bucharica, F. viridis, and at least one additional allele contributor of unknown identity), and questions the extent to which distinct subgenomes are preserved over evolutionary time in the allopolyploid Fragaria species. In addition, our data support divergence between the two wild octoploid species, F. virginiana and F. chiloensis. PMID:29045639

  1. The Effect of Different Oceanic Abiotic Factors on Prokaryotic Body Sizes

    NASA Astrophysics Data System (ADS)

    Pidathala, S.; Bellon, M.; Heim, N.; Payne, J.

    2016-12-01

    We are studying the impact of abiotic factors in the Pacific and Atlantic on prokaryotic body sizes and genome sizes because we are interested in the manner in which abiotic factors influence genome sizes independent of their influence on body sizes. Some research has been done in the past on marine bacterial evolution, including data collection on marine ecology in relation to bacterial body sizes (Straza 2009). We are using the abiotic factors: temperature, salinity, and pH to compare the biovolumes/genome sizes of different phyla by using R. We made 9 scatter plots to model these relationships. Regardless of the phyla or the ocean, we found that there is no relation between pH, temperature, and body size, with several exceptions: Deinococcus. thermus has an indirect relationship with size in respect to temperature; size only correlates to temperature for phyla that are thermophiles. We also found that bacteria like D. thermus and Thermotogae are taxa only found in higher temperatures. Additionally, almost all phyla have genome sizes restricted by certain pH levels:, Proteobacteria only reach genomes with acidity levels greater than 6. In terms of salinity levels, certain bacteria are only found within a small range, and others, like Proteobacteria, can only reach genomes at low salinity levels. Finally, Proteobacteria have large genome sizes between 30 and 40 °, and Crenarchaeota have constant genome sizes in higher temperatures. Conclusively, we discovered that these abiotic factors generally do not affect body size, with the exception of D. thermus' indirect relationship to temperature due to its small biovolume in high temperatures. However, we determined that these abiotic factors have a great impact on genome sizes. This is due to genome size independence from body size. Also, genome size could have served as an adaptive feature for bacteria in marine environments, explaining why different phyla may have diverged to accommodate their lifestyles.

  2. Using microarrays to identify positional candidate genes for QTL: the case study of ACTH response in pigs.

    PubMed

    Jouffe, Vincent; Rowe, Suzanne; Liaubet, Laurence; Buitenhuis, Bart; Hornshøj, Henrik; SanCristobal, Magali; Mormède, Pierre; de Koning, D J

    2009-07-16

    Microarray studies can supplement QTL studies by suggesting potential candidate genes in the QTL regions, which by themselves are too large to provide a limited selection of candidate genes. Here we provide a case study where we explore ways to integrate QTL data and microarray data for the pig, which has only a partial genome sequence. We outline various procedures to localize differentially expressed genes on the pig genome and link this with information on published QTL. The starting point is a set of 237 differentially expressed cDNA clones in adrenal tissue from two pig breeds, before and after treatment with adrenocorticotropic hormone (ACTH). Different approaches to localize the differentially expressed (DE) genes to the pig genome showed different levels of success and a clear lack of concordance for some genes between the various approaches. For a focused analysis on 12 genes, overlapping QTL from the public domain were presented. Also, differentially expressed genes underlying QTL for ACTH response were described. Using the latest version of the draft sequence, the differentially expressed genes were mapped to the pig genome. This enabled co-location of DE genes and previously studied QTL regions, but the draft genome sequence is still incomplete and will contain many errors. A further step to explore links between DE genes and QTL at the pathway level was largely unsuccessful due to the lack of annotation of the pig genome. This could be improved by further comparative mapping analyses but this would be time consuming. This paper provides a case study for the integration of QTL data and microarray data for a species with limited genome sequence information and annotation. The results illustrate the challenges that must be addressed but also provide a roadmap for future work that is applicable to other non-model species.

  3. Variable presence of the inverted repeat and plastome stability in Erodium

    PubMed Central

    Blazier, John C.; Jansen, Robert K.; Mower, Jeffrey P.; Govindu, Madhu; Zhang, Jin; Weng, Mao-Lun; Ruhlman, Tracey A.

    2016-01-01

    Background and Aims Several unrelated lineages such as plastids, viruses and plasmids, have converged on quadripartite genomes of similar size with large and small single copy regions and a large inverted repeat (IR). Except for Erodium (Geraniaceae), saguaro cactus and some legumes, the plastomes of all photosynthetic angiosperms display this structure. The functional significance of the IR is not understood and Erodium provides a system to examine the role of the IR in the long-term stability of these genomes. We compared the degree of genomic rearrangement in plastomes of Erodium that differ in the presence and absence of the IR. Methods We sequenced 17 new Erodium plastomes. Using 454, Illumina, PacBio and Sanger sequences, 16 genomes were assembled and categorized along with one incomplete and two previously published Erodium plastomes. We conducted phylogenetic analyses among these species using a dataset of 19 protein-coding genes and determined if significantly higher evolutionary rates had caused the long branch seen previously in phylogenetic reconstructions within the genus. Bioinformatic comparisons were also performed to evaluate plastome evolution across the genus. Key Results Erodium plastomes fell into four types (Type 1–4) that differ in their substitution rates, short dispersed repeat content and degree of genomic rearrangement, gene and intron content and GC content. Type 4 plastomes had significantly higher rates of synonymous substitutions (dS) for all genes and for 14 of the 19 genes non-synonymous substitutions (dN) were significantly accelerated. We evaluated the evidence for a single IR loss in Erodium and in doing so discovered that Type 4 plastomes contain a novel IR. Conclusions The presence or absence of the IR does not affect plastome stability in Erodium. Rather, the overall repeat content shows a negative correlation with genome stability, a pattern in agreement with other angiosperm groups and recent findings on genome stability in bacterial endosymbionts. PMID:27192713

  4. Variable presence of the inverted repeat and plastome stability in Erodium.

    PubMed

    Blazier, John C; Jansen, Robert K; Mower, Jeffrey P; Govindu, Madhu; Zhang, Jin; Weng, Mao-Lun; Ruhlman, Tracey A

    2016-06-01

    Several unrelated lineages such as plastids, viruses and plasmids, have converged on quadripartite genomes of similar size with large and small single copy regions and a large inverted repeat (IR). Except for Erodium (Geraniaceae), saguaro cactus and some legumes, the plastomes of all photosynthetic angiosperms display this structure. The functional significance of the IR is not understood and Erodium provides a system to examine the role of the IR in the long-term stability of these genomes. We compared the degree of genomic rearrangement in plastomes of Erodium that differ in the presence and absence of the IR. We sequenced 17 new Erodium plastomes. Using 454, Illumina, PacBio and Sanger sequences, 16 genomes were assembled and categorized along with one incomplete and two previously published Erodium plastomes. We conducted phylogenetic analyses among these species using a dataset of 19 protein-coding genes and determined if significantly higher evolutionary rates had caused the long branch seen previously in phylogenetic reconstructions within the genus. Bioinformatic comparisons were also performed to evaluate plastome evolution across the genus. Erodium plastomes fell into four types (Type 1-4) that differ in their substitution rates, short dispersed repeat content and degree of genomic rearrangement, gene and intron content and GC content. Type 4 plastomes had significantly higher rates of synonymous substitutions (dS) for all genes and for 14 of the 19 genes non-synonymous substitutions (dN) were significantly accelerated. We evaluated the evidence for a single IR loss in Erodium and in doing so discovered that Type 4 plastomes contain a novel IR. The presence or absence of the IR does not affect plastome stability in Erodium. Rather, the overall repeat content shows a negative correlation with genome stability, a pattern in agreement with other angiosperm groups and recent findings on genome stability in bacterial endosymbionts. © The Author 2016. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. BAMSI: a multi-cloud service for scalable distributed filtering of massive genome data.

    PubMed

    Ausmees, Kristiina; John, Aji; Toor, Salman Z; Hellander, Andreas; Nettelblad, Carl

    2018-06-26

    The advent of next-generation sequencing (NGS) has made whole-genome sequencing of cohorts of individuals a reality. Primary datasets of raw or aligned reads of this sort can get very large. For scientific questions where curated called variants are not sufficient, the sheer size of the datasets makes analysis prohibitively expensive. In order to make re-analysis of such data feasible without the need to have access to a large-scale computing facility, we have developed a highly scalable, storage-agnostic framework, an associated API and an easy-to-use web user interface to execute custom filters on large genomic datasets. We present BAMSI, a Software as-a Service (SaaS) solution for filtering of the 1000 Genomes phase 3 set of aligned reads, with the possibility of extension and customization to other sets of files. Unique to our solution is the capability of simultaneously utilizing many different mirrors of the data to increase the speed of the analysis. In particular, if the data is available in private or public clouds - an increasingly common scenario for both academic and commercial cloud providers - our framework allows for seamless deployment of filtering workers close to data. We show results indicating that such a setup improves the horizontal scalability of the system, and present a possible use case of the framework by performing an analysis of structural variation in the 1000 Genomes data set. BAMSI constitutes a framework for efficient filtering of large genomic data sets that is flexible in the use of compute as well as storage resources. The data resulting from the filter is assumed to be greatly reduced in size, and can easily be downloaded or routed into e.g. a Hadoop cluster for subsequent interactive analysis using Hive, Spark or similar tools. In this respect, our framework also suggests a general model for making very large datasets of high scientific value more accessible by offering the possibility for organizations to share the cost of hosting data on hot storage, without compromising the scalability of downstream analysis.

  6. Comparative genome analysis identifies two large deletions in the genome of highly-passaged attenuated Streptococcus agalactiae strain YM001 compared to the parental pathogenic strain HN016.

    PubMed

    Wang, Rui; Li, Liping; Huang, Yan; Luo, Fuguang; Liang, Wanwen; Gan, Xi; Huang, Ting; Lei, Aiying; Chen, Ming; Chen, Lianfu

    2015-11-04

    Streptococcus agalactiae (S. agalactiae), also known as group B Streptococcus (GBS), is an important pathogen for neonatal pneumonia, meningitis, bovine mastitis, and fish meningoencephalitis. The global outbreaks of Streptococcus disease in tilapia cause huge economic losses and threaten human food hygiene safety as well. To investigate the mechanism of S. agalactiae pathogenesis in tilapia and develop attenuated S. agalactiae vaccine, this study sequenced and comparatively analyzed the whole genomes of virulent wild-type S. agalactiae strain HN016 and its highly-passaged attenuated strain YM001 derived from tilapia. We performed Illumina sequencing of DNA prepared from strain HN016 and YM001. Sequencedreads were assembled and nucleotide comparisons, single nucleotide polymorphism (SNP) , indels were analyzed between the draft genomes of HN016 and YM001. Clustered regularly interspaced short palindromic repeats (CRISPRs) and prophage were detected and analyzed in different S. agalactiae strains. The genome of S. agalactiae YM001 was 2,047,957 bp with a GC content of 35.61 %; it contained 2044 genes and 88 RNAs. Meanwhile, the genome of S. agalactiae HN016 was 2,064,722 bp with a GC content of 35.66 %; it had 2063 genes and 101 RNAs. Comparative genome analysis indicated that compared with HN016, YM001 genome had two significant large deletions, at the sizes of 5832 and 11,116 bp respectively, resulting in the deletion of three rRNA and ten tRNA genes, as well as the deletion and functional damage of ten genes related to metabolism, transport, growth, anti-stress, etc. Besides these two large deletions, other ten deletions and 28 single nucleotide variations (SNVs) were also identified, mainly affecting the metabolism- and growth-related genes. The genome of attenuated S. agalactiae YM001 showed significant variations, resulting in the deletion of 10 functional genes, compared to the parental pathogenic strain HN016. The deleted and mutated functional genes all encode metabolism- and growth-related proteins, not the known virulence proteins, indicating that the metabolism- and growth-related genes are important for the pathogenesis of S. agalactiae.

  7. redGEM: Systematic reduction and analysis of genome-scale metabolic reconstructions for development of consistent core metabolic models

    PubMed Central

    Ataman, Meric

    2017-01-01

    Genome-scale metabolic reconstructions have proven to be valuable resources in enhancing our understanding of metabolic networks as they encapsulate all known metabolic capabilities of the organisms from genes to proteins to their functions. However the complexity of these large metabolic networks often hinders their utility in various practical applications. Although reduced models are commonly used for modeling and in integrating experimental data, they are often inconsistent across different studies and laboratories due to different criteria and detail, which can compromise transferability of the findings and also integration of experimental data from different groups. In this study, we have developed a systematic semi-automatic approach to reduce genome-scale models into core models in a consistent and logical manner focusing on the central metabolism or subsystems of interest. The method minimizes the loss of information using an approach that combines graph-based search and optimization methods. The resulting core models are shown to be able to capture key properties of the genome-scale models and preserve consistency in terms of biomass and by-product yields, flux and concentration variability and gene essentiality. The development of these “consistently-reduced” models will help to clarify and facilitate integration of different experimental data to draw new understanding that can be directly extendable to genome-scale models. PMID:28727725

  8. RNAslider: a faster engine for consecutive windows folding and its application to the analysis of genomic folding asymmetry.

    PubMed

    Horesh, Yair; Wexler, Ydo; Lebenthal, Ilana; Ziv-Ukelson, Michal; Unger, Ron

    2009-03-04

    Scanning large genomes with a sliding window in search of locally stable RNA structures is a well motivated problem in bioinformatics. Given a predefined window size L and an RNA sequence S of size N (L < N), the consecutive windows folding problem is to compute the minimal free energy (MFE) for the folding of each of the L-sized substrings of S. The consecutive windows folding problem can be naively solved in O(NL3) by applying any of the classical cubic-time RNA folding algorithms to each of the N-L windows of size L. Recently an O(NL2) solution for this problem has been described. Here, we describe and implement an O(NLpsi(L)) engine for the consecutive windows folding problem, where psi(L) is shown to converge to O(1) under the assumption of a standard probabilistic polymer folding model, yielding an O(L) speedup which is experimentally confirmed. Using this tool, we note an intriguing directionality (5'-3' vs. 3'-5') folding bias, i.e. that the minimal free energy (MFE) of folding is higher in the native direction of the DNA than in the reverse direction of various genomic regions in several organisms including regions of the genomes that do not encode proteins or ncRNA. This bias largely emerges from the genomic dinucleotide bias which affects the MFE, however we see some variations in the folding bias in the different genomic regions when normalized to the dinucleotide bias. We also present results from calculating the MFE landscape of a mouse chromosome 1, characterizing the MFE of the long ncRNA molecules that reside in this chromosome. The efficient consecutive windows folding engine described in this paper allows for genome wide scans for ncRNA molecules as well as large-scale statistics. This is implemented here as a software tool, called RNAslider, and applied to the scanning of long chromosomes, leading to the observation of features that are visible only on a large scale.

  9. GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome.

    PubMed

    Simovski, Boris; Vodák, Daniel; Gundersen, Sveinung; Domanska, Diana; Azab, Abdulrahman; Holden, Lars; Holden, Marit; Grytten, Ivar; Rand, Knut; Drabløs, Finn; Johansen, Morten; Mora, Antonio; Lund-Andersen, Christin; Fromm, Bastian; Eskeland, Ragnhild; Gabrielsen, Odd Stokke; Ferkingstad, Egil; Nakken, Sigve; Bengtsen, Mads; Nederbragt, Alexander Johan; Thorarensen, Hildur Sif; Akse, Johannes Andreas; Glad, Ingrid; Hovig, Eivind; Sandve, Geir Kjetil

    2017-07-01

    Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no. © The Author 2017. Published by Oxford University Press.

  10. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    PubMed

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  11. Complete genome sequence of Nocardiopsis dassonvillei type strain (IMRU 509T)

    PubMed Central

    Sun, Hui; Lapidus, Alla; Nolan, Matt; Lucas, Susan; Del Rio, Tijana Glavina; Tice, Hope; Cheng, Jan-Fang; Tapia, Roxane; Han, Cliff; Goodwin, Lynne; Pitluck, Sam; Pagani, Ioanna; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Djao, Olivier Duplex Ngatchou; Rohde, Manfred; Sikorski, Johannes; Göker, Markus; Woyke, Tanja; Bristow, James; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter

    2010-01-01

    Nocardiopsis dassonvillei (Brocq-Rousseau 1904) Meyer 1976 is the type species of the genus Nocardiopsis, which in turn is the type genus of the family Nocardiopsaceae. This species is of interest because of its ecological versatility. Members of N. dassonvillei have been isolated from a large variety of natural habitats such as soil and marine sediments, from different plant and animal materials as well as from human patients. Moreover, representatives of the genus Nocardiopsis participate actively in biopolymer degradation. This is the first complete genome sequence in the family Nocardiopsaceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 6,543,312 bp long genome consist of a 5.77 Mbp chromosome and a 0.78 Mbp plasmid and with its 5,570 protein-coding and 77 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304737

  12. Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences.

    PubMed

    O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S

    2011-01-01

    Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.

  13. Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

    PubMed

    Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

    2015-08-24

    Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.

  14. Comparative genomic characterization of citrus-associated Xylella fastidiosa strains.

    PubMed

    da Silva, Vivian S; Shida, Cláudio S; Rodrigues, Fabiana B; Ribeiro, Diógenes C D; de Souza, Alessandra A; Coletta-Filho, Helvécio D; Machado, Marcos A; Nunes, Luiz R; de Oliveira, Regina Costa

    2007-12-21

    The xylem-inhabiting bacterium Xylella fastidiosa (Xf) is the causal agent of Pierce's disease (PD) in vineyards and citrus variegated chlorosis (CVC) in orange trees. Both of these economically-devastating diseases are caused by distinct strains of this complex group of microorganisms, which has motivated researchers to conduct extensive genomic sequencing projects with Xf strains. This sequence information, along with other molecular tools, have been used to estimate the evolutionary history of the group and provide clues to understand the capacity of Xf to infect different hosts, causing a variety of symptoms. Nonetheless, although significant amounts of information have been generated from Xf strains, a large proportion of these efforts has concentrated on the study of North American strains, limiting our understanding about the genomic composition of South American strains - which is particularly important for CVC-associated strains. This paper describes the first genome-wide comparison among South American Xf strains, involving 6 distinct citrus-associated bacteria. Comparative analyses performed through a microarray-based approach allowed identification and characterization of large mobile genetic elements that seem to be exclusive to South American strains. Moreover, a large-scale sequencing effort, based on Suppressive Subtraction Hybridization (SSH), identified 290 new ORFs, distributed in 135 Groups of Orthologous Elements, throughout the genomes of these bacteria. Results from microarray-based comparisons provide further evidence concerning activity of horizontally transferred elements, reinforcing their importance as major mediators in the evolution of Xf. Moreover, the microarray-based genomic profiles showed similarity between Xf strains 9a5c and Fb7, which is unexpected, given the geographical and chronological differences associated with the isolation of these microorganisms. The newly identified ORFs, obtained by SSH, represent an approximately 10% increase in our current knowledge of the South American Xf gene pool and include new putative virulence factors, as well as novel potential markers for strain identification. Surprisingly, this list of novel elements include sequences previously believed to be unique to North American strains, pointing to the necessity of revising the list of specific markers that may be used for identification of distinct Xf strains.

  15. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-Scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shih, Patrick

    2012-03-22

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  16. CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-Scale Genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

    ScienceCinema

    Shih, Patrick

    2018-01-10

    Patrick Shih, representing both the University of California, Berkeley and JGI, gives a talk titled "CyanoGEBA: A Better Understanding of Cynobacterial Diversity through Large-scale Genomics" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.

  17. Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

    PubMed

    Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

    2017-05-22

    Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.

  18. CGCI Investigators Reveal Comprehensive Landscape of Diffuse Large B-Cell Lymphoma (DLBCL) Genomes | Office of Cancer Genomics

    Cancer.gov

    Researchers from British Columbia Cancer Agency used whole genome sequencing to analyze 40 DLBCL cases and 13 cell lines in order to fill in the gaps of the complex landscape of DLBCL genomes. Their analysis, “Mutational and structural analysis of diffuse large B-cell lymphoma using whole genome sequencing,” was published online in Blood on May 22. The authors are Ryan Morin, Marco Marra, and colleagues.  

  19. Re-annotation, improved large-scale assembly and establishment of a catalogue of noncoding loci for the genome of the model brown alga Ectocarpus.

    PubMed

    Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark

    2017-04-01

    The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.

  20. Challenges and opportunities for genomic developmental neuropsychology: Examples from the Penn-Drexel collaborative battery

    PubMed Central

    Gur, Ruben C.; Irani, Farzin; Seligman, Sarah; Calkins, Monica E.; Richard, Jan; Gur, Raquel E.

    2014-01-01

    Genomics has been revolutionizing medicine over the past decade by offering mechanistic insights into disease processes and harboring the age of “individualized medicine.” Because of the sheer number of measures generated by gene sequencing methods, genomics requires “Big Science” where large datasets on genes are analyzed in reference to electronic medical record data. This revolution has largely bypassed the behavioral neurosciences, mainly because of the paucity of behavioral data in medical records and the labor intensity of available neuropsychological assessment methods. We describe the development and implementation of an efficient neuroscience-based computerized battery, coupled with a computerized clinical assessment procedure. This assessment package has been applied to a genomic study of 10,000 children aged 8-21, of whom 1000 also undergo neuroimaging. Results from the first 3000 participants indicate sensitivity to neurodevelopmental trajectories. Sex differences were evident, with females outperforming males in memory and social cognition domains, while for spatial processing males were more accurate and faster, and they were faster on simple motor tasks. The study illustrates what will hopefully become a major component of the work of clinical and research neuropsychologists as invaluable participants in the dawning age of Big Science neuropsychological genomics. PMID:21902564

  1. The Multipartite Mitochondrial Genome of Liposcelis bostrychophila: Insights into the Evolution of Mitochondrial Genomes in Bilateral Animals

    PubMed Central

    Yuan, Ming-Long; Dou, Wei; Barker, Stephen C.; Wang, Jin-Jun

    2012-01-01

    Booklice (order Psocoptera) in the genus Liposcelis are major pests to stored grains worldwide and are closely related to parasitic lice (order Phthiraptera). We sequenced the mitochondrial (mt) genome of Liposcelis bostrychophila and found that the typical single mt chromosome of bilateral animals has fragmented into and been replaced by two medium-sized chromosomes in this booklouse; each of these chromosomes has about half of the genes of the typical mt chromosome of bilateral animals. These mt chromosomes are 8,530 bp (mt chromosome I) and 7,933 bp (mt chromosome II) in size. Intriguingly, mt chromosome I is twice as abundant as chromosome II. It appears that the selection pressure for compact mt genomes in bilateral animals favors small mt chromosomes when small mt chromosomes co-exist with the typical large mt chromosomes. Thus, small mt chromosomes may have selective advantages over large mt chromosomes in bilateral animals. Phylogenetic analyses of mt genome sequences of Psocodea (i.e. Psocoptera plus Phthiraptera) indicate that: 1) the order Psocoptera (booklice and barklice) is paraphyletic; and 2) the order Phthiraptera (the parasitic lice) is monophyletic. Within parasitic lice, however, the suborder Ischnocera is paraphyletic; this differs from the traditional view that each suborder of parasitic lice is monophyletic. PMID:22479490

  2. Universal and idiosyncratic characteristic lengths in bacterial genomes

    NASA Astrophysics Data System (ADS)

    Junier, Ivan; Frémont, Paul; Rivoire, Olivier

    2018-05-01

    In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10–20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.

  3. Not All Particles Are Equal: The Selective Enrichment of Particle-Associated Bacteria from the Mediterranean Sea.

    PubMed

    López-Pérez, Mario; Kimes, Nikole E; Haro-Moreno, Jose M; Rodriguez-Valera, Francisco

    2016-01-01

    We have used two metagenomic approaches, direct sequencing of natural samples and sequencing after enrichment, to characterize communities of prokaryotes associated to particles. In the first approximation, different size filters (0.22 and 5 μm) were used to identify prokaryotic microbes of free-living and particle-attached bacterial communities in the Mediterranean water column. A subtractive metagenomic approach was used to characterize the dominant microbial groups in the large size fraction that were not present in the free-living one. They belonged mainly to Actinobacteria, Planctomycetes, Flavobacteria and Proteobacteria. In addition, marine microbial communities enriched by incubation with different kinds of particulate material have been studied by metagenomic assembly. Different particle kinds (diatomaceous earth, sand, chitin and cellulose) were colonized by very different communities of bacteria belonging to Roseobacter, Vibrio, Bacteriovorax, and Lacinutrix that were distant relatives of genomes already described from marine habitats. Besides, using assembly from deep metagenomic sequencing from the particle-specific enrichments we were able to determine a total of 20 groups of contigs (eight of them with >50% completeness) and reconstruct de novo five new genomes of novel species within marine clades (>79% completeness and <1.8% contamination). We also describe for the first time the genome of a marine Rhizobiales phage that seems to infect a broad range of Alphaproteobacteria and live in habitats as diverse as soil, marine sediment and water column. The metagenomic recruitment of the communities found by direct sequencing of the large size filter and by enrichment had nearly no overlap. These results indicate that these reconstructed genomes are part of the rare biosphere which exists at nominal levels under natural conditions.

  4. Genome Science: A Video Tour of the Washington University Genome Sequencing Center for High School and Undergraduate Students

    PubMed Central

    2005-01-01

    Sequencing of the human genome has ushered in a new era of biology. The technologies developed to facilitate the sequencing of the human genome are now being applied to the sequencing of other genomes. In 2004, a partnership was formed between Washington University School of Medicine Genome Sequencing Center's Outreach Program and Washington University Department of Biology Science Outreach to create a video tour depicting the processes involved in large-scale sequencing. “Sequencing a Genome: Inside the Washington University Genome Sequencing Center” is a tour of the laboratory that follows the steps in the sequencing pipeline, interspersed with animated explanations of the scientific procedures used at the facility. Accompanying interviews with the staff illustrate different entry levels for a career in genome science. This video project serves as an example of how research and academic institutions can provide teachers and students with access and exposure to innovative technologies at the forefront of biomedical research. Initial feedback on the video from undergraduate students, high school teachers, and high school students provides suggestions for use of this video in a classroom setting to supplement present curricula. PMID:16341256

  5. Oncogenomic portals for the visualization and analysis of genome-wide cancer data

    PubMed Central

    Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

    2016-01-01

    Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice. PMID:26484415

  6. Oncogenomic portals for the visualization and analysis of genome-wide cancer data.

    PubMed

    Klonowska, Katarzyna; Czubak, Karol; Wojciechowska, Marzena; Handschuh, Luiza; Zmienko, Agnieszka; Figlerowicz, Marek; Dams-Kozlowska, Hanna; Kozlowski, Piotr

    2016-01-05

    Somatically acquired genomic alterations that drive oncogenic cellular processes are of great scientific and clinical interest. Since the initiation of large-scale cancer genomic projects (e.g., the Cancer Genome Project, The Cancer Genome Atlas, and the International Cancer Genome Consortium cancer genome projects), a number of web-based portals have been created to facilitate access to multidimensional oncogenomic data and assist with the interpretation of the data. The portals provide the visualization of small-size mutations, copy number variations, methylation, and gene/protein expression data that can be correlated with the available clinical, epidemiological, and molecular features. Additionally, the portals enable to analyze the gathered data with the use of various user-friendly statistical tools. Herein, we present a highly illustrated review of seven portals, i.e., Tumorscape, UCSC Cancer Genomics Browser, ICGC Data Portal, COSMIC, cBioPortal, IntOGen, and BioProfiling.de. All of the selected portals are user-friendly and can be exploited by scientists from different cancer-associated fields, including those without bioinformatics background. It is expected that the use of the portals will contribute to a better understanding of cancer molecular etiology and will ultimately accelerate the translation of genomic knowledge into clinical practice.

  7. Next-generation sequencing detects repetitive elements expansion in giant genomes of annual killifish genus Austrolebias (Cyprinodontiformes, Rivulidae).

    PubMed

    García, G; Ríos, N; Gutiérrez, V

    2015-06-01

    Among Neotropical fish fauna, the South American killifish genus Austrolebias (Cyprinodontiformes: Rivulidae) constitutes an excellent model to study the genomic evolutionary processes underlying speciation events. Recently, unusually large genome size has been described in 16 species of this genus, with an average DNA content of about 5.95 ± 0.45 pg per diploid cell (mean C-value of about 2.98 pg). In the present paper we explore the possible origin of this unparallel genomic increase by means of comparative analysis of the repetitive components using NGS (454-Roche) technology in the lowest and highest Rivulidae genomes. Here, we provide the first annotated Rivulidae-repeated sequences composition and their relative repetitive fraction in both genomes. Remarkably, the genomic proportion of the moderately repetitive DNA in Austrolebias charrua genome represents approximately twice (45%) of the repetitive components of the highly related rivulinae taxon Cynopoecilus melanotaenia (25%). Present work provides evidence about the impact of the repeat families that could be distinctly proliferated among sublineages within Rivulidae fish group, explaining the great genome size differences encompassing the differentiation and speciation events in this family.

  8. The impact of selection, gene flow and demographic history on heterogeneous genomic divergence: three-spine sticklebacks in divergent environments.

    PubMed

    Ferchaud, Anne-Laure; Hansen, Michael M

    2016-01-01

    Heterogeneous genomic divergence between populations may reflect selection, but should also be seen in conjunction with gene flow and drift, particularly population bottlenecks. Marine and freshwater three-spine stickleback (Gasterosteus aculeatus) populations often exhibit different lateral armour plate morphs. Moreover, strikingly parallel genomic footprints across different marine-freshwater population pairs are interpreted as parallel evolution and gene reuse. Nevertheless, in some geographic regions like the North Sea and Baltic Sea, different patterns are observed. Freshwater populations in coastal regions are often dominated by marine morphs, suggesting that gene flow overwhelms selection, and genomic parallelism may also be less pronounced. We used RAD sequencing for analysing 28 888 SNPs in two marine and seven freshwater populations in Denmark, Europe. Freshwater populations represented a variety of environments: river populations accessible to gene flow from marine sticklebacks and large and small isolated lakes with and without fish predators. Sticklebacks in an accessible river environment showed minimal morphological and genomewide divergence from marine populations, supporting the hypothesis of gene flow overriding selection. Allele frequency spectra suggested bottlenecks in all freshwater populations, and particularly two small lake populations. However, genomic footprints ascribed to selection could nevertheless be identified. No genomic regions were consistent freshwater-marine outliers, and parallelism was much lower than in other comparable studies. Two genomic regions previously described to be under divergent selection in freshwater and marine populations were outliers between different freshwater populations. We ascribe these patterns to stronger environmental heterogeneity among freshwater populations in our study as compared to most other studies, although the demographic history involving bottlenecks should also be considered in the interpretation of results. © 2015 John Wiley & Sons Ltd.

  9. Minimal-assumption inference from population-genomic data

    NASA Astrophysics Data System (ADS)

    Weissman, Daniel; Hallatschek, Oskar

    Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.

  10. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    NASA Astrophysics Data System (ADS)

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-03-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.

  11. Clustering of Pan- and Core-genome of Lactobacillus provides Novel Evolutionary Insights for Differentiation.

    PubMed

    Inglin, Raffael C; Meile, Leo; Stevens, Marc J A

    2018-04-24

    Bacterial taxonomy aims to classify bacteria based on true evolutionary events and relies on a polyphasic approach that includes phenotypic, genotypic and chemotaxonomic analyses. Until now, complete genomes are largely ignored in taxonomy. The genus Lactobacillus consists of 173 species and many genomes are available to study taxonomy and evolutionary events. We analyzed and clustered 98 completely sequenced genomes of the genus Lactobacillus and 234 draft genomes of 5 different Lactobacillus species, i.e. L. reuteri, L. delbrueckii, L. plantarum, L. rhamnosus and L. helveticus. The core-genome of the genus Lactobacillus contains 266 genes and the pan-genome 20'800 genes. Clustering of the Lactobacillus pan- and core-genome resulted in two highly similar trees. This shows that evolutionary history is traceable in the core-genome and that clustering of the core-genome is sufficient to explore relationships. Clustering of core- and pan-genomes at species' level resulted in similar trees as well. Detailed analyses of the core-genomes showed that the functional class "genetic information processing" is conserved in the core-genome but that "signaling and cellular processes" is not. The latter class encodes functions that are involved in environmental interactions. Evolution of lactobacilli seems therefore directed by the environment. The type species L. delbrueckii was analyzed in detail and its pan-genome based tree contained two major clades whose members contained different genes yet identical functions. In addition, evidence for horizontal gene transfer between strains of L. delbrueckii, L. plantarum, and L. rhamnosus, and between species of the genus Lactobacillus is presented. Our data provide evidence for evolution of some lactobacilli according to a parapatric-like model for species differentiation. Core-genome trees are useful to detect evolutionary relationships in lactobacilli and might be useful in taxonomic analyses. Lactobacillus' evolution is directed by the environment and HGT.

  12. Techniques for Large-Scale Bacterial Genome Manipulation and Characterization of the Mutants with Respect to In Silico Metabolic Reconstructions.

    PubMed

    diCenzo, George C; Finan, Turlough M

    2018-01-01

    The rate at which all genes within a bacterial genome can be identified far exceeds the ability to characterize these genes. To assist in associating genes with cellular functions, a large-scale bacterial genome deletion approach can be employed to rapidly screen tens to thousands of genes for desired phenotypes. Here, we provide a detailed protocol for the generation of deletions of large segments of bacterial genomes that relies on the activity of a site-specific recombinase. In this procedure, two recombinase recognition target sequences are introduced into known positions of a bacterial genome through single cross-over plasmid integration. Subsequent expression of the site-specific recombinase mediates recombination between the two target sequences, resulting in the excision of the intervening region and its loss from the genome. We further illustrate how this deletion system can be readily adapted to function as a large-scale in vivo cloning procedure, in which the region excised from the genome is captured as a replicative plasmid. We next provide a procedure for the metabolic analysis of bacterial large-scale genome deletion mutants using the Biolog Phenotype MicroArray™ system. Finally, a pipeline is described, and a sample Matlab script is provided, for the integration of the obtained data with a draft metabolic reconstruction for the refinement of the reactions and gene-protein-reaction relationships in a metabolic reconstruction.

  13. Evolution of the Australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements.

    PubMed

    Metcalfe, Cushla J; Filée, Jonathan; Germon, Isabelle; Joss, Jean; Casane, Didier

    2012-11-01

    Haploid genomes greater than 25,000 Mb are rare, within the animals only the lungfish and some of the salamanders and crustaceans are known to have genomes this large. There is very little data on the structure of genomes this size. It is known, however, that for animal genomes up to 3,000 Mb, there is in general a good correlation between genome size and the percent of the genome composed of repetitive sequence and that this repetitive component is highly dynamic. In this study, we sampled the Australian lungfish genome using three mini-genomic libraries and found that with very little sequence, the results converged on an estimate of 40% of the genome being composed of recognizable transposable elements (TEs), chiefly from the CR1 and L2 long interspersed nuclear element clades. We further characterized the CR1 and L2 elements in the lungfish genome and show that although most CR1 elements probably represent recent amplifications, the L2 elements are more diverse and are more likely the result of a series of amplifications. We suggest that our sampling method has probably underestimated the recognizable TE content. However, on the basis of the most likely sources of error, we suggest that this very large genome is not largely composed of recently amplified, undetected TEs but may instead include a large component of older degenerate TEs. Based on these estimates, and on Thomson's (Thomson K. 1972. An attempt to reconstruct evolutionary changes in the cellular DNA content of lungfish. J Exp Zool. 180:363-372) inference that in the lineage leading to the extant Australian lungfish, there was massive increase in genome size between 350 and 200 mya, after which the size of the genome changed little, we speculate that the very large Australian lungfish genome may be the result of a massive amplification of TEs followed by a long period with a very low rate of sequence removal and some ongoing TE activity.

  14. A draft physical map of a D-genome cotton species (Gossypium raimondii)

    PubMed Central

    2010-01-01

    Background Genetically anchored physical maps of large eukaryotic genomes have proven useful both for their intrinsic merit and as an adjunct to genome sequencing. Cultivated tetraploid cottons, Gossypium hirsutum and G. barbadense, share a common ancestor formed by a merger of the A and D genomes about 1-2 million years ago. Toward the long-term goal of characterizing the spectrum of diversity among cotton genomes, the worldwide cotton community has prioritized the D genome progenitor Gossypium raimondii for complete sequencing. Results A whole genome physical map of G. raimondii, the putative D genome ancestral species of tetraploid cottons was assembled, integrating genetically-anchored overgo hybridization probes, agarose based fingerprints and 'high information content fingerprinting' (HICF). A total of 13,662 BAC-end sequences and 2,828 DNA probes were used in genetically anchoring 1585 contigs to a cotton consensus genetic map, and 370 and 438 contigs, respectively to Arabidopsis thaliana (AT) and Vitis vinifera (VV) whole genome sequences. Conclusion Several lines of evidence suggest that the G. raimondii genome is comprised of two qualitatively different components. Much of the gene rich component is aligned to the Arabidopsis and Vitis vinifera genomes and shows promise for utilizing translational genomic approaches in understanding this important genome and its resident genes. The integrated genetic-physical map is of value both in assembling and validating a planned reference sequence. PMID:20569427

  15. CellLineNavigator: a workbench for cancer cell line analysis

    PubMed Central

    Krupp, Markus; Itzel, Timo; Maass, Thorsten; Hildebrandt, Andreas; Galle, Peter R.; Teufel, Andreas

    2013-01-01

    The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search ability, a simple data and an intuitive querying interface were implemented. It allows the user to explore and filter gene expression, focusing on pathological or physiological conditions. For a more complex search, the advanced query interface may be used to query for (i) differentially expressed genes; (ii) pathological or physiological conditions; or (iii) gene names or functional attributes, such as Kyoto Encyclopaedia of Genes and Genomes pathway maps. These queries may also be combined. Finally, CellLineNavigator allows additional advanced analysis of differentially regulated genes by a direct link to the Database for Annotation, Visualization and Integrated Discovery (DAVID) Bioinformatics Resources. PMID:23118487

  16. Genetic Comparison of B. Anthracis and its Close Relatives Using AFLP and PCR Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jackson, P.J.; Hill, K.K.; Laker, M.T.

    1999-02-01

    Amplified Fragment length Polymorphism (AFLP) analysis allows a rapid, relatively simple analysis of a large portion of a microbial genome, providing information about the species and its phylogenetic relationship to other microbes (Vos, et al., 1995). The method simply surveys the genome for length and sequence polymorphisms. The pattern identified can be used for comparison to the genomes of other species. Unlike other methods, it does not rely on analysis of a single genetic locus that may bias the interpretation of results and it does not require any prior knowledge of the targeted organism. Moreover, a standard set of reagentsmore » can be applied to any species without using species-specific information or molecular probes. The authors are using AFLP's to rapidly identify different bacterial species. A comparison of AFLP profiles generated from a large battery of B. anthracis strains shows very little variability among different isolates (Keim, et al., 1997). By contrast, there is a significant difference between AFLP profiles generated for any B. anthracis strain and even the most closely related Bacillus species. Sufficient variability is apparent among all known microbial species to allow phylogenetic analysis based on large numbers of genetically unlinked loci. These striking differences among AFLP profiles allow unambiguous identification of previously identified species and phylogenetic placement of newly characterized isolates relative to known species based on a large number of independent genetic loci. Data generated thus far show that the method provides phylogenetic analyses that are consistent with other widely accepted phylogenetic methods. However, AFLP analysis provides a more detailed analysis of the targets and samples a much larger portion of the genome. Consequently, it provides an inexpensive, rapid means of characterizing microbial isolates to further differentiate among strains and closely related microbial species. Such information cannot be rapidly generated by other means. AFLP sample analysis quickly generates a very large amount of molecular information about microbial genomes. However, this information cannot be analyzed rapidly using manual methods. The authors are developing a large archive of electronic AFLP signatures that is being used to identify isolates collected from medical, veterinary, forensic and environmental samples. They are also developing the computational packages necessary to rapidly and unambiguously analyze the AFLP profiles and conduct a phylogenetic comparison of these data relative to information already in the database. They will use this archive and the associated algorithms to determine the species identity of previously uncharacterized isolates and place them phylogenetically relative to other microbes based on their AFLP signatures. This study provides significant new information about microbes with environmental, veterinary and medical significance. This information can be used in further studies to understand the relationships among these species and the factors that distinguish them from one another. It should also allow identification of unique factors that contribute to important microbial traits including pathogenicity and virulence. They are also using AFLP data to identify, isolate and sequence DNA fragments that are unique to particular microbial species and strains. The fragment patterns and sequence information provide insights into the complexity and organization of bacterial genomes relative to one another. They also provide the information necessary for development of species-specific PCR primers that can be used to interrogate complex samples for the presence of B. anthracis, other microbial pathogens or their remnants.« less

  17. Comparative Genomics Analysis of Streptococcus Isolates from the Human Small Intestine Reveals their Adaptation to a Highly Dynamic Ecosystem

    PubMed Central

    Van den Bogert, Bartholomeus; Boekhorst, Jos; Herrmann, Ruth; Smid, Eddy J.; Zoetendal, Erwin G.; Kleerebezem, Michiel

    2013-01-01

    The human small-intestinal microbiota is characterised by relatively large and dynamic Streptococcus populations. In this study, genome sequences of small-intestinal streptococci from S. mitis, S. bovis, and S. salivarius species-groups were determined and compared with those from 58 Streptococcus strains in public databases. The Streptococcus pangenome consists of 12,403 orthologous groups of which 574 are shared among all sequenced streptococci and are defined as the Streptococcus core genome. Genome mining of the small-intestinal streptococci focused on functions playing an important role in the interaction of these streptococci in the small-intestinal ecosystem, including natural competence and nutrient-transport and metabolism. Analysis of the small-intestinal Streptococcus genomes predicts a high capacity to synthesize amino acids and various vitamins as well as substantial divergence in their carbohydrate transport and metabolic capacities, which is in agreement with observed physiological differences between these Streptococcus strains. Gene-specific PCR-strategies enabled evaluation of conservation of Streptococcus populations in intestinal samples from different human individuals, revealing that the S. salivarius strains were frequently detected in the small-intestine microbiota, supporting the representative value of the genomes provided in this study. Finally, the Streptococcus genomes allow prediction of the effect of dietary substances on Streptococcus population dynamics in the human small-intestine. PMID:24386196

  18. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia

    PubMed Central

    2014-01-01

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems. PMID:24655715

  19. Comparative analysis of the complete genome of an epidemic hospital sequence type 203 clone of vancomycin-resistant Enterococcus faecium

    PubMed Central

    2013-01-01

    Background In this report we have explored the genomic and microbiological basis for a sustained increase in bloodstream infections at a major Australian hospital caused by Enterococcus faecium multi-locus sequence type (ST) 203, an outbreak strain that has largely replaced a predecessor ST17 sequence type. Results To establish a ST203 reference sequence we fully assembled and annotated the genome of Aus0085, a 2009 vancomycin-resistant Enterococcus faecium (VREfm) bloodstream isolate, and the first example of a completed ST203 genome. Aus0085 has a 3.2 Mb genome, comprising a 2.9 Mb circular chromosome and six circular plasmids (2 kb–130 kb). Twelve percent of the 3222 coding sequences (CDS) in Aus0085 are not present in ST17 E. faecium Aus0004 and ST18 E. faecium TX16. Extending this comparison to an additional 12 ST17 and 14 ST203 E. faecium hospital isolate genomes revealed only six genomic regions spanning 41 kb that were present in all ST203 and absent from all ST17 genomes. The 40 CDS have predicted functions that include ion transport, riboflavin metabolism and two phosphotransferase systems. Comparison of the vancomycin resistance-conferring Tn1549 transposon between Aus0004 and Aus0085 revealed differences in transposon length and insertion site, and van locus sequence variation that correlated with a higher vancomycin MIC in Aus0085. Additional phenotype comparisons between ST17 and ST203 isolates showed that while there were no differences in biofilm-formation and killing of Galleria mellonella, ST203 isolates grew significantly faster and out-competed ST17 isolates in growth assays. Conclusions Here we have fully assembled and annotated the first ST203 genome, and then characterized the genomic differences between ST17 and ST203 E. faecium. We also show that ST203 E. faecium are faster growing and can out-compete ST17 E. faecium. While a causal genetic basis for these phenotype differences is not provided here, this study revealed conserved genetic differences between the two clones, differences that can now be tested to explain the molecular basis for the success and emergence of ST203 E. faecium. PMID:24004955

  20. Understanding the differences between genome sequences of Escherichia coli B strains REL606 and BL21(DE3), and comparison of the closely related E. coli B and K-12 genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Studier, F.W.; Daegelen, P.; Lenski, R. E.

    2009-12-01

    Each difference between the genome sequences of Escherichia coli B strains REL606 and BL21(DE3) can be interpreted in light of known laboratory manipulations plus a gene conversion between ribosomal RNA operons. Two treatments with 1-methyl-3-nitro-1-nitrosoguanidine in the REL606 lineage produced at least 93 single-base-pair mutations ({approx} 90% GC-to-AT transitions) and 3 single-base-pair GC deletions. Two UV treatments in the BL21(DE3) lineage produced only 4 single-base-pair mutations but 16 large deletions. P1 transductions from K-12 into the two B lineages produced 317 single-base-pair differences and 9 insertions or deletions, reflecting differences between B DNA in BL21(DE3) and integrated restriction fragments ofmore » K-12 DNA inherited by REL606. Two sites showed selective enrichment of spontaneous mutations. No unselected spontaneous single-base-pair mutations were evident. The genome sequences revealed that a progenitor of REL606 had been misidentified, explaining initially perplexing differences. Limited sequencing of other B strains defined characteristic properties of B and allowed assembly of the inferred genome of the ancestral B of Delbrueck and Luria. Comparison of the B and K-12 genomes shows that more than half of the 3793 proteins of their basic genomes are predicted to be identical, although {approx} 310 appear to be functional in either B or K-12 but not in both. The ancestral basic genome appears to have had {approx} 4039 coding sequences occupying {approx} 4.0 Mbp. Repeated horizontal transfer from diverged Escherichia coli genomes and homologous recombination may explain the observed variable distribution of single-base-pair differences. Fifteen sites are occupied by phage-related elements, but only six by comparable elements at the same site. More than 50 sites are occupied by IS elements in both B and K, 16 in common, and likely founding IS elements are identified. A signature of widespread cryptic phage P4-type mobile elements was identified. Complex deletions (dense clusters of small deletions and substitutions) apparently removed nonessential genes from {approx} 30 sites in the basic genomes.« less

  1. Genome Analysis of Staphylococcus agnetis, an Agent of Lameness in Broiler Chickens

    PubMed Central

    Ojha, Sohita; Pummill, Jeff F.; Koon, Joseph A.; Wideman, Robert F.; Rhoads, Douglas D.

    2015-01-01

    Lameness in broiler chickens is a significant animal welfare and financial issue. Lameness can be enhanced by rearing young broilers on wire flooring. We have identified Staphylococcus agnetis as significantly involved in bacterial chondronecrosis with osteomyelitis (BCO) in proximal tibia and femorae, leading to lameness in broiler chickens in the wire floor system. Administration of S. agnetis in water induces lameness. Previously reported in some cases of cattle mastitis, this is the first report of this poorly described pathogen in chickens. We used long and short read next generation sequencing to assemble single finished contigs for the genome and a large plasmid from the chicken pathogen. Comparison of the S. agnetis genome to those of other pathogenic Staphylococci shows that S.agnetis contains a distinct repertoire of virulence determinants. Additionally, the S. agnetis genome has several regions that differ substantially from the genomes of other pathogenic Staphylococci. Comparison of our finished genome to a recent draft genome for a cattle mastitis isolate suggests that future investigations focus on the evolutionary epidemiology of this emerging pathogen of domestic animals. PMID:26606420

  2. Automated array-based genomic profiling in chronic lymphocytic leukemia: Development of a clinical tool and discovery of recurrent genomic alterations

    PubMed Central

    Schwaenen, Carsten; Nessling, Michelle; Wessendorf, Swen; Salvi, Tatjana; Wrobel, Gunnar; Radlwimmer, Bernhard; Kestler, Hans A.; Haslinger, Christian; Stilgenbauer, Stephan; Döhner, Hartmut; Bentz, Martin; Lichter, Peter

    2004-01-01

    B cell chronic lymphocytic leukemia (B-CLL) is characterized by a highly variable clinical course. Recurrent chromosomal imbalances provide significant prognostic markers. Risk-adapted therapy based on genomic alterations has become an option that is currently being tested in clinical trials. To supply a robust tool for such large scale studies, we developed a comprehensive DNA microarray dedicated to the automated analysis of recurrent genomic imbalances in B-CLL by array-based comparative genomic hybridization (matrix–CGH). Validation of this chip in a series of 106 B-CLL cases revealed a high specificity and sensitivity that fulfils the criteria for application in clinical oncology. This chip is immediately applicable within clinical B-CLL treatment trials that evaluate whether B-CLL cases with distinct chromosomal abnormalities should be treated with chemotherapy of different intensities and/or stem cell transplantation. Through the control set of DNA fragments equally distributed over the genome, recurrent genomic imbalances were discovered: trisomy of chromosome 19 and gain of the MYCN oncogene correlating with an elevation of MYCN mRNA expression. PMID:14730057

  3. Assembling large genomes: analysis of the stick insect (Clitarchus hookeri) genome reveals a high repeat content and sex-biased genes associated with reproduction.

    PubMed

    Wu, Chen; Twort, Victoria G; Crowhurst, Ross N; Newcomb, Richard D; Buckley, Thomas R

    2017-11-16

    Stick insects (Phasmatodea) have a high incidence of parthenogenesis and other alternative reproductive strategies, yet the genetic basis of reproduction is poorly understood. Phasmatodea includes nearly 3000 species, yet only the genome of Timema cristinae has been published to date. Clitarchus hookeri is a geographical parthenogenetic stick insect distributed across New Zealand. Sexual reproduction dominates in northern habitats but is replaced by parthenogenesis in the south. Here, we present a de novo genome assembly of a female C. hookeri and use it to detect candidate genes associated with gamete production and development in females and males. We also explore the factors underlying large genome size in stick insects. The C. hookeri genome assembly was 4.2 Gb, similar to the flow cytometry estimate, making it the second largest insect genome sequenced and assembled to date. Like the large genome of Locusta migratoria, the genome of C. hookeri is also highly repetitive and the predicted gene models are much longer than those from most other sequenced insect genomes, largely due to longer introns. Miniature inverted repeat transposable elements (MITEs), absent in the much smaller T. cristinae genome, is the most abundant repeat type in the C. hookeri genome assembly. Mapping RNA-Seq reads from female and male gonadal transcriptomes onto the genome assembly resulted in the identification of 39,940 gene loci, 15.8% and 37.6% of which showed female-biased and male-biased expression, respectively. The genes that were over-expressed in females were mostly associated with molecular transportation, developmental process, oocyte growth and reproductive process; whereas, the male-biased genes were enriched in rhythmic process, molecular transducer activity and synapse. Several genes involved in the juvenile hormone synthesis pathway were also identified. The evolution of large insect genomes such as L. migratoria and C. hookeri genomes is most likely due to the accumulation of repetitive regions and intron elongation. MITEs contributed significantly to the growth of C. hookeri genome size yet are surprisingly absent from the T. cristinae genome. Sex-biased genes identified from gonadal tissues, including genes involved in juvenile hormone synthesis, provide interesting candidates for the further study of flexible reproduction in stick insects.

  4. Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

    PubMed Central

    Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis

    2008-01-01

    Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375

  5. An ensemble model of competitive multi-factor binding of the genome

    PubMed Central

    Wasson, Todd; Hartemink, Alexander J.

    2009-01-01

    Hundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models tend to consider positions as being either binding sites or not. Here, we present a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an “occupancy profile,” a probabilistic description of the DNA occupancy of each factor at each position. We implement our model efficiently as the software package COMPETE. We demonstrate genome-wide and at specific loci how modeling nucleosome binding alters TF binding, and vice versa, and illustrate how factor concentration influences binding occupancy. Binding cooperativity between nearby TFs arises implicitly via mutual competition with nucleosomes. Our method applies not only to TFs, but also recapitulates known occupancy profiles of a well-studied replication origin with and without ORC binding. Importantly, the sequence preferences our model takes as input are derived from in vitro experiments. This ensures that the calculated occupancy profiles are the result of the forces of competition represented explicitly in our model and the inherent sequence affinities of the constituent DBFs. PMID:19720867

  6. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome.

    PubMed

    Collins, Ryan L; Brand, Harrison; Redin, Claire E; Hanscom, Carrie; Antolik, Caroline; Stone, Matthew R; Glessner, Joseph T; Mason, Tamara; Pregno, Giulia; Dorrani, Naghmeh; Mandrile, Giorgia; Giachino, Daniela; Perrin, Danielle; Walsh, Cole; Cipicchio, Michelle; Costello, Maura; Stortchevoi, Alexei; An, Joon-Yong; Currall, Benjamin B; Seabra, Catarina M; Ragavendran, Ashok; Margolin, Lauren; Martinez-Agosto, Julian A; Lucente, Diane; Levy, Brynn; Sanders, Stephan J; Wapner, Ronald J; Quintero-Rivera, Fabiola; Kloosterman, Wigard; Talkowski, Michael E

    2017-03-06

    Structural variation (SV) influences genome organization and contributes to human disease. However, the complete mutational spectrum of SV has not been routinely captured in disease association studies. We sequenced 689 participants with autism spectrum disorder (ASD) and other developmental abnormalities to construct a genome-wide map of large SV. Using long-insert jumping libraries at 105X mean physical coverage and linked-read whole-genome sequencing from 10X Genomics, we document seven major SV classes at ~5 kb SV resolution. Our results encompass 11,735 distinct large SV sites, 38.1% of which are novel and 16.8% of which are balanced or complex. We characterize 16 recurrent subclasses of complex SV (cxSV), revealing that: (1) cxSV are larger and rarer than canonical SV; (2) each genome harbors 14 large cxSV on average; (3) 84.4% of large cxSVs involve inversion; and (4) most large cxSV (93.8%) have not been delineated in previous studies. Rare SVs are more likely to disrupt coding and regulatory non-coding loci, particularly when truncating constrained and disease-associated genes. We also identify multiple cases of catastrophic chromosomal rearrangements known as chromoanagenesis, including somatic chromoanasynthesis, and extreme balanced germline chromothripsis events involving up to 65 breakpoints and 60.6 Mb across four chromosomes, further defining rare categories of extreme cxSV. These data provide a foundational map of large SV in the morbid human genome and demonstrate a previously underappreciated abundance and diversity of cxSV that should be considered in genomic studies of human disease.

  7. Research Guidelines in the Era of Large-scale Collaborations: An Analysis of Genome-wide Association Study Consortia

    PubMed Central

    Austin, Melissa A.; Hair, Marilyn S.; Fullerton, Stephanie M.

    2012-01-01

    Scientific research has shifted from studies conducted by single investigators to the creation of large consortia. Genetic epidemiologists, for example, now collaborate extensively for genome-wide association studies (GWAS). The effect has been a stream of confirmed disease-gene associations. However, effects on human subjects oversight, data-sharing, publication and authorship practices, research organization and productivity, and intellectual property remain to be examined. The aim of this analysis was to identify all research consortia that had published the results of a GWAS analysis since 2005, characterize them, determine which have publicly accessible guidelines for research practices, and summarize the policies in these guidelines. A review of the National Human Genome Research Institute’s Catalog of Published Genome-Wide Association Studies identified 55 GWAS consortia as of April 1, 2011. These consortia were comprised of individual investigators, research centers, studies, or other consortia and studied 48 different diseases or traits. Only 14 (25%) were found to have publicly accessible research guidelines on consortia websites. The available guidelines provide information on organization, governance, and research protocols; half address institutional review board approval. Details of publication, authorship, data-sharing, and intellectual property vary considerably. Wider access to consortia guidelines is needed to establish appropriate research standards with broad applicability to emerging forms of large-scale collaboration. PMID:22491085

  8. Analyzing large scale genomic data on the cloud with Sparkhit

    PubMed Central

    Huang, Liren; Krüger, Jan

    2018-01-01

    Abstract Motivation The increasing amount of next-generation sequencing data poses a fundamental challenge on large scale genomic analytics. Existing tools use different distributed computational platforms to scale-out bioinformatics workloads. However, the scalability of these tools is not efficient. Moreover, they have heavy run time overheads when pre-processing large amounts of data. To address these limitations, we have developed Sparkhit: a distributed bioinformatics framework built on top of the Apache Spark platform. Results Sparkhit integrates a variety of analytical methods. It is implemented in the Spark extended MapReduce model. It runs 92–157 times faster than MetaSpark on metagenomic fragment recruitment and 18–32 times faster than Crossbow on data pre-processing. We analyzed 100 terabytes of data across four genomic projects in the cloud in 21 h, which includes the run times of cluster deployment and data downloading. Furthermore, our application on the entire Human Microbiome Project shotgun sequencing data was completed in 2 h, presenting an approach to easily associate large amounts of public datasets with reference data. Availability and implementation Sparkhit is freely available at: https://rhinempi.github.io/sparkhit/. Contact asczyrba@cebitec.uni-bielefeld.de Supplementary information Supplementary data are available at Bioinformatics online. PMID:29253074

  9. Unprecedented large inverted repeats at the replication terminus of circular bacterial chromosomes suggest a novel mode of chromosome rescue

    PubMed Central

    El Kafsi, Hela; Loux, Valentin; Mariadassou, Mahendra; Blin, Camille; Chiapello, Hélène; Abraham, Anne-Laure; Maguin, Emmanuelle; van de Guchte, Maarten

    2017-01-01

    The first Lactobacillus delbrueckii ssp. bulgaricus genome sequence revealed the presence of a very large inverted repeat (IR), a DNA sequence arrangement which thus far seemed inconceivable in a non-manipulated circular bacterial chromosome, at the replication terminus. This intriguing observation prompted us to investigate if similar IRs could be found in other bacteria. IRs with sizes varying from 38 to 76 kbp were found at the replication terminus of all 5 L. delbrueckii ssp. bulgaricus chromosomes analysed, but in none of 1373 other chromosomes. They represent the first naturally occurring very large IRs detected in circular bacterial genomes. A comparison of the L. bulgaricus replication terminus regions and the corresponding regions without IR in 5 L. delbrueckii ssp. lactis genomes leads us to propose a model for the formation and evolution of the IRs. The DNA sequence data are consistent with a novel model of chromosome rescue after premature replication termination or irreversible chromosome damage near the replication terminus, involving mechanisms analogous to those proposed in the formation of very large IRs in human cancer cells. We postulate that the L. delbrueckii ssp. bulgaricus-specific IRs in different strains derive from a single ancestral IR of at least 93 kbp. PMID:28281695

  10. Home - The Cancer Genome Atlas - Cancer Genome - TCGA

    Cancer.gov

    The Cancer Genome Atlas (TCGA) is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including large-scale genome sequencing.

  11. Genome-wide map of Apn1 binding sites under oxidative stress in Saccharomyces cerevisiae.

    PubMed

    Morris, Lydia P; Conley, Andrew B; Degtyareva, Natalya; Jordan, I King; Doetsch, Paul W

    2017-11-01

    The DNA is cells is continuously exposed to reactive oxygen species resulting in toxic and mutagenic DNA damage. Although the repair of oxidative DNA damage occurs primarily through the base excision repair (BER) pathway, the nucleotide excision repair (NER) pathway processes some of the same lesions. In addition, damage tolerance mechanisms, such as recombination and translesion synthesis, enable cells to tolerate oxidative DNA damage, especially when BER and NER capacities are exceeded. Thus, disruption of BER alone or disruption of BER and NER in Saccharomyces cerevisiae leads to increased mutations as well as large-scale genomic rearrangements. Previous studies demonstrated that a particular region of chromosome II is susceptible to chronic oxidative stress-induced chromosomal rearrangements, suggesting the existence of DNA damage and/or DNA repair hotspots. Here we investigated the relationship between oxidative damage and genomic instability utilizing chromatin immunoprecipitation combined with DNA microarray technology to profile DNA repair sites along yeast chromosomes under different oxidative stress conditions. We targeted the major yeast AP endonuclease Apn1 as a representative BER protein. Our results indicate that Apn1 target sequences are enriched for cytosine and guanine nucleotides. We predict that BER protects these sites in the genome because guanines and cytosines are thought to be especially susceptible to oxidative attack, thereby preventing large-scale genome destabilization from chronic accumulation of DNA damage. Information from our studies should provide insight into how regional deployment of oxidative DNA damage management systems along chromosomes protects against large-scale rearrangements. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  12. GDC 2: Compression of large collections of genomes

    PubMed Central

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-01-01

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about. PMID:26108279

  13. GDC 2: Compression of large collections of genomes.

    PubMed

    Deorowicz, Sebastian; Danek, Agnieszka; Niemiec, Marcin

    2015-06-25

    The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid in the personalized medicine. One of the significant side effects of this change is a necessity of storage and transfer of huge amounts of genomic data. In this paper we deal with the problem of compression of large collections of complete genomic sequences. We propose an algorithm that is able to compress the collection of 1092 human diploid genomes about 9,500 times. This result is about 4 times better than what is offered by the other existing compressors. Moreover, our algorithm is very fast as it processes the data with speed 200 MB/s on a modern workstation. In a consequence the proposed algorithm allows storing the complete genomic collections at low cost, e.g., the examined collection of 1092 human genomes needs only about 700 MB when compressed, what can be compared to about 6.7 TB of uncompressed FASTA files. The source code is available at http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=gdc&subpage=about.

  14. Radiation-induced genomic instability: radiation quality and dose response

    NASA Technical Reports Server (NTRS)

    Smith, Leslie E.; Nagar, Shruti; Kim, Grace J.; Morgan, William F.

    2003-01-01

    Genomic instability is a term used to describe a phenomenon that results in the accumulation of multiple changes required to convert a stable genome of a normal cell to an unstable genome characteristic of a tumor. There has been considerable recent debate concerning the importance of genomic instability in human cancer and its temporal occurrence in the carcinogenic process. Radiation is capable of inducing genomic instability in mammalian cells and instability is thought to be the driving force responsible for radiation carcinogenesis. Genomic instability is characterized by a large collection of diverse endpoints that include large-scale chromosomal rearrangements and aberrations, amplification of genetic material, aneuploidy, micronucleus formation, microsatellite instability, and gene mutation. The capacity of radiation to induce genomic instability depends to a large extent on radiation quality or linear energy transfer (LET) and dose. There appears to be a low dose threshold effect with low LET, beyond which no additional genomic instability is induced. Low doses of both high and low LET radiation are capable of inducing this phenomenon. This report reviews data concerning dose rate effects of high and low LET radiation and their capacity to induce genomic instability assayed by chromosomal aberrations, delayed lethal mutations, micronuclei and apoptosis.

  15. Patterns of genome size variation in snapping shrimp.

    PubMed

    Jeffery, Nicholas W; Hultgren, Kristin; Chak, Solomon Tin Chi; Gregory, T Ryan; Rubenstein, Dustin R

    2016-06-01

    Although crustaceans vary extensively in genome size, little is known about how genome size may affect the ecology and evolution of species in this diverse group, in part due to the lack of large genome size datasets. Here we investigate interspecific, intraspecific, and intracolony variation in genome size in 39 species of Synalpheus shrimps, representing one of the largest genome size datasets for a single genus within crustaceans. We find that genome size ranges approximately 4-fold across Synalpheus with little phylogenetic signal, and is not related to body size. In a subset of these species, genome size is related to chromosome size, but not to chromosome number, suggesting that despite large genomes, these species are not polyploid. Interestingly, there appears to be 35% intraspecific genome size variation in Synalpheus idios among geographic regions, and up to 30% variation in Synalpheus duffyi genome size within the same colony.

  16. Comparative Genome Analysis Provides Insights into Both the Lifestyle of Acidithiobacillus ferrivorans Strain CF27 and the Chimeric Nature of the Iron-Oxidizing Acidithiobacilli Genomes.

    PubMed

    Tran, Tam T T; Mangenot, Sophie; Magdelenat, Ghislaine; Payen, Emilie; Rouy, Zoé; Belahbib, Hassiba; Grail, Barry M; Johnson, D Barrie; Bonnefoy, Violaine; Talla, Emmanuel

    2017-01-01

    The iron-oxidizing species Acidithiobacillus ferrivorans is one of few acidophiles able to oxidize ferrous iron and reduced inorganic sulfur compounds at low temperatures (<10°C). To complete the genome of At. ferrivorans strain CF27, new sequences were generated, and an update assembly and functional annotation were undertaken, followed by a comparative analysis with other Acidithiobacillus species whose genomes are publically available. The At. ferrivorans CF27 genome comprises a 3,409,655 bp chromosome and a 46,453 bp plasmid. At. ferrivorans CF27 possesses genes allowing its adaptation to cold, metal(loid)-rich environments, as well as others that enable it to sense environmental changes, allowing At. ferrivorans CF27 to escape hostile conditions and to move toward favorable locations. Interestingly, the genome of At. ferrivorans CF27 exhibits a large number of genomic islands (mostly containing genes of unknown function), suggesting that a large number of genes has been acquired by horizontal gene transfer over time. Furthermore, several genes specific to At. ferrivorans CF27 have been identified that could be responsible for the phenotypic differences of this strain compared to other Acidithiobacillus species. Most genes located inside At. ferrivorans CF27-specific gene clusters which have been analyzed were expressed by both ferrous iron-grown and sulfur-attached cells, indicating that they are not pseudogenes and may play a role in both situations. Analysis of the taxonomic composition of genomes of the Acidithiobacillia infers that they are chimeric in nature, supporting the premise that they belong to a particular taxonomic class, distinct to other proteobacterial subgroups.

  17. Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size.

    PubMed

    Kelly, Laura J; Renny-Byfield, Simon; Pellicer, Jaume; Macas, Jiří; Novák, Petr; Neumann, Pavel; Lysak, Martin A; Day, Peter D; Berger, Madeleine; Fay, Michael F; Nichols, Richard A; Leitch, Andrew R; Leitch, Ilia J

    2015-10-01

    Plants exhibit an extraordinary range of genome sizes, varying by > 2000-fold between the smallest and largest recorded values. In the absence of polyploidy, changes in the amount of repetitive DNA (transposable elements and tandem repeats) are primarily responsible for genome size differences between species. However, there is ongoing debate regarding the relative importance of amplification of repetitive DNA versus its deletion in governing genome size. Using data from 454 sequencing, we analysed the most repetitive fraction of some of the largest known genomes for diploid plant species, from members of Fritillaria. We revealed that genomic expansion has not resulted from the recent massive amplification of just a handful of repeat families, as shown in species with smaller genomes. Instead, the bulk of these immense genomes is composed of highly heterogeneous, relatively low-abundance repeat-derived DNA, supporting a scenario where amplified repeats continually accumulate due to infrequent DNA removal. Our results indicate that a lack of deletion and low turnover of repetitive DNA are major contributors to the evolution of extremely large genomes and show that their size cannot simply be accounted for by the activity of a small number of high-abundance repeat families. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  18. Asymmetric histone modifications between the original and derived loci of human segmental duplications

    PubMed Central

    Zheng, Deyou

    2008-01-01

    Background Sequencing and annotation of several mammalian genomes have revealed that segmental duplications are a common architectural feature of primate genomes; in fact, about 5% of the human genome is composed of large blocks of interspersed segmental duplications. These segmental duplications have been implicated in genomic copy-number variation, gene novelty, and various genomic disorders. However, the molecular processes involved in the evolution and regulation of duplicated sequences remain largely unexplored. Results In this study, the profile of about 20 histone modifications within human segmental duplications was characterized using high-resolution, genome-wide data derived from a ChIP-Seq study. The analysis demonstrates that derivative loci of segmental duplications often differ significantly from the original with respect to many histone methylations. Further investigation showed that genes are present three times more frequently in the original than in the derivative, whereas pseudogenes exhibit the opposite trend. These asymmetries tend to increase with the age of segmental duplications. The uneven distribution of genes and pseudogenes does not, however, fully account for the asymmetry in the profile of histone modifications. Conclusion The first systematic analysis of histone modifications between segmental duplications demonstrates that two seemingly 'identical' genomic copies are distinct in their epigenomic properties. Results here suggest that local chromatin environments may be implicated in the discrimination of derived copies of segmental duplications from their originals, leading to a biased pseudogenization of the new duplicates. The data also indicate that further exploration of the interactions between histone modification and sequence degeneration is necessary in order to understand the divergence of duplicated sequences. PMID:18598352

  19. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses

    PubMed Central

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A.; Janke, Axel

    2015-01-01

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. PMID:26019166

  20. The Variable Regions of Lactobacillus rhamnosus Genomes Reveal the Dynamic Evolution of Metabolic and Host-Adaptation Repertoires

    PubMed Central

    Ceapa, Corina; Davids, Mark; Ritari, Jarmo; Lambert, Jolanda; Wels, Michiel; Douillard, François P.; Smokvina, Tamara; de Vos, Willem M.; Knol, Jan; Kleerebezem, Michiel

    2016-01-01

    Lactobacillus rhamnosus is a diverse Gram-positive species with strains isolated from different ecological niches. Here, we report the genome sequence analysis of 40 diverse strains of L. rhamnosus and their genomic comparison, with a focus on the variable genome. Genomic comparison of 40 L. rhamnosus strains discriminated the conserved genes (core genome) and regions of plasticity involving frequent rearrangements and horizontal transfer (variome). The L. rhamnosus core genome encompasses 2,164 genes, out of 4,711 genes in total (the pan-genome). The accessory genome is dominated by genes encoding carbohydrate transport and metabolism, extracellular polysaccharides (EPS) biosynthesis, bacteriocin production, pili production, the cas system, and the associated clustered regularly interspaced short palindromic repeat (CRISPR) loci, and more than 100 transporter functions and mobile genetic elements like phages, plasmid genes, and transposons. A clade distribution based on amino acid differences between core (shared) proteins matched with the clade distribution obtained from the presence–absence of variable genes. The phylogenetic and variome tree overlap indicated that frequent events of gene acquisition and loss dominated the evolutionary segregation of the strains within this species, which is paralleled by evolutionary diversification of core gene functions. The CRISPR-Cas system could have contributed to this evolutionary segregation. Lactobacillus rhamnosus strains contain the genetic and metabolic machinery with strain-specific gene functions required to adapt to a large range of environments. A remarkable congruency of the evolutionary relatedness of the strains’ core and variome functions, possibly favoring interspecies genetic exchanges, underlines the importance of gene-acquisition and loss within the L. rhamnosus strain diversification. PMID:27358423

  1. WheatGenome.info: an integrated database and portal for wheat genome information.

    PubMed

    Lai, Kaitao; Berkman, Paul J; Lorenc, Michal Tadeusz; Duran, Chris; Smits, Lars; Manoli, Sahana; Stiller, Jiri; Edwards, David

    2012-02-01

    Bread wheat (Triticum aestivum) is one of the most important crop plants, globally providing staple food for a large proportion of the human population. However, improvement of this crop has been limited due to its large and complex genome. Advances in genomics are supporting wheat crop improvement. We provide a variety of web-based systems hosting wheat genome and genomic data to support wheat research and crop improvement. WheatGenome.info is an integrated database resource which includes multiple web-based applications. These include a GBrowse2-based wheat genome viewer with BLAST search portal, TAGdb for searching wheat second-generation genome sequence data, wheat autoSNPdb, links to wheat genetic maps using CMap and CMap3D, and a wheat genome Wiki to allow interaction between diverse wheat genome sequencing activities. This system includes links to a variety of wheat genome resources hosted at other research organizations. This integrated database aims to accelerate wheat genome research and is freely accessible via the web interface at http://www.wheatgenome.info/.

  2. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat.

    PubMed

    Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C

    2014-06-01

    Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection.

  3. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat

    PubMed Central

    Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C

    2014-01-01

    Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection. PMID:24518889

  4. Discovery and mapping of single feature polymorphisms in wheat using Affymetrix arrays

    PubMed Central

    Bernardo, Amy N; Bradbury, Peter J; Ma, Hongxiang; Hu, Shengwa; Bowden, Robert L; Buckler, Edward S; Bai, Guihua

    2009-01-01

    Background Wheat (Triticum aestivum L.) is a staple food crop worldwide. The wheat genome has not yet been sequenced due to its huge genome size (~17,000 Mb) and high levels of repetitive sequences; the whole genome sequence may not be expected in the near future. Available linkage maps have low marker density due to limitation in available markers; therefore new technologies that detect genome-wide polymorphisms are still needed to discover a large number of new markers for construction of high-resolution maps. A high-resolution map is a critical tool for gene isolation, molecular breeding and genomic research. Single feature polymorphism (SFP) is a new microarray-based type of marker that is detected by hybridization of DNA or cRNA to oligonucleotide probes. This study was conducted to explore the feasibility of using the Affymetrix GeneChip to discover and map SFPs in the large hexaploid wheat genome. Results Six wheat varieties of diverse origins (Ning 7840, Clark, Jagger, Encruzilhada, Chinese Spring, and Opata 85) were analyzed for significant probe by variety interactions and 396 probe sets with SFPs were identified. A subset of 164 unigenes was sequenced and 54% showed polymorphism within probes. Microarray analysis of 71 recombinant inbred lines from the cross Ning 7840/Clark identified 955 SFPs and 877 of them were mapped together with 269 simple sequence repeat markers. The SFPs were randomly distributed within a chromosome but were unevenly distributed among different genomes. The B genome had the most SFPs, and the D genome had the least. Map positions of a selected set of SFPs were validated by mapping single nucleotide polymorphism using SNaPshot and comparing with expressed sequence tags mapping data. Conclusion The Affymetrix array is a cost-effective platform for SFP discovery and SFP mapping in wheat. The new high-density map constructed in this study will be a useful tool for genetic and genomic research in wheat. PMID:19480702

  5. Genome size variation affects song attractiveness in grasshoppers: evidence for sexual selection against large genomes.

    PubMed

    Schielzeth, Holger; Streitner, Corinna; Lampe, Ulrike; Franzke, Alexandra; Reinhold, Klaus

    2014-12-01

    Genome size is largely uncorrelated to organismal complexity and adaptive scenarios. Genetic drift as well as intragenomic conflict have been put forward to explain this observation. We here study the impact of genome size on sexual attractiveness in the bow-winged grasshopper Chorthippus biguttulus. Grasshoppers show particularly large variation in genome size due to the high prevalence of supernumerary chromosomes that are considered (mildly) selfish, as evidenced by non-Mendelian inheritance and fitness costs if present in high numbers. We ranked male grasshoppers by song characteristics that are known to affect female preferences in this species and scored genome sizes of attractive and unattractive individuals from the extremes of this distribution. We find that attractive singers have significantly smaller genomes, demonstrating that genome size is reflected in male courtship songs and that females prefer songs of males with small genomes. Such a genome size dependent mate preference effectively selects against selfish genetic elements that tend to increase genome size. The data therefore provide a novel example of how sexual selection can reinforce natural selection and can act as an agent in an intragenomic arms race. Furthermore, our findings indicate an underappreciated route of how choosy females could gain indirect benefits. © 2014 The Author(s). Evolution © 2014 The Society for the Study of Evolution.

  6. Genome-wide computational prediction and analysis of core promoter elements across plant monocots and dicots

    USDA-ARS?s Scientific Manuscript database

    Transcription initiation, essential to gene expression regulation, involves recruitment of basal transcription factors to the core promoter elements (CPEs). The distribution of currently known CPEs across plant genomes is largely unknown. This is the first large scale genome-wide report on the compu...

  7. Evolutionary dynamics of retrotransposons assessed by high-throughput sequencing in wild relatives of wheat.

    PubMed

    Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian

    2013-01-01

    Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.

  8. Discovery, genotyping and characterization of structural variation and novel sequence at single nucleotide resolution from de novo genome assemblies on a population scale.

    PubMed

    Liu, Siyang; Huang, Shujia; Rao, Junhua; Ye, Weijian; Krogh, Anders; Wang, Jun

    2015-01-01

    Comprehensive recognition of genomic variation in one individual is important for understanding disease and developing personalized medication and treatment. Many tools based on DNA re-sequencing exist for identification of single nucleotide polymorphisms, small insertions and deletions (indels) as well as large deletions. However, these approaches consistently display a substantial bias against the recovery of complex structural variants and novel sequence in individual genomes and do not provide interpretation information such as the annotation of ancestral state and formation mechanism. We present a novel approach implemented in a single software package, AsmVar, to discover, genotype and characterize different forms of structural variation and novel sequence from population-scale de novo genome assemblies up to nucleotide resolution. Application of AsmVar to several human de novo genome assemblies captures a wide spectrum of structural variants and novel sequences present in the human population in high sensitivity and specificity. Our method provides a direct solution for investigating structural variants and novel sequences from de novo genome assemblies, facilitating the construction of population-scale pan-genomes. Our study also highlights the usefulness of the de novo assembly strategy for definition of genome structure.

  9. Advances in Genetical Genomics of Plants

    PubMed Central

    Joosen, R.V.L.; Ligterink, W.; Hilhorst, H.W.M.; Keurentjes, J.J.B.

    2009-01-01

    Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research. PMID:20514216

  10. Mimivirus shows dramatic genome reduction after intraamoebal culture

    PubMed Central

    Boyer, Mickaël; Azza, Saïd; Barrassi, Lina; Klose, Thomas; Campocasso, Angélique; Pagnier, Isabelle; Fournous, Ghislain; Borg, Audrey; Robert, Catherine; Zhang, Xinzheng; Desnues, Christelle; Henrissat, Bernard; Rossmann, Michael G.; La Scola, Bernard; Raoult, Didier

    2011-01-01

    Most phagocytic protist viruses have large particles and genomes as well as many laterally acquired genes that may be associated with a sympatric intracellular life (a community-associated lifestyle with viruses, bacteria, and eukaryotes) and the presence of virophages. By subculturing Mimivirus 150 times in a germ-free amoebal host, we observed the emergence of a bald form of the virus that lacked surface fibers and replicated in a morphologically different type of viral factory. When studying a 0.40-μm filtered cloned particle, we found that its genome size shifted from 1.2 (M1) to 0.993 Mb (M4), mainly due to large deletions occurring at both ends of the genome. Some of the lost genes are encoding enzymes required for posttranslational modification of the structural viral proteins, such as glycosyltransferases and ankyrin repeat proteins. Proteomic analysis allowed identification of three proteins, probably required for the assembly of virus fibers. The genes for two of these were found to be deleted from the M4 virus genome. The proteins associated with fibers are highly antigenic and can be recognized by mouse and human antimimivirus antibodies. In addition, the bald strain (M4) was not able to propagate the sputnik virophage. Overall, the Mimivirus transition from a sympatric to an allopatric lifestyle was associated with a stepwise genome reduction and the production of a predominantly bald virophage resistant strain. The new axenic ecosystem allowed the allopatric Mimivirus to lose unnecessary genes that might be involved in the control of competitors. PMID:21646533

  11. Mimivirus shows dramatic genome reduction after intraamoebal culture.

    PubMed

    Boyer, Mickaël; Azza, Saïd; Barrassi, Lina; Klose, Thomas; Campocasso, Angélique; Pagnier, Isabelle; Fournous, Ghislain; Borg, Audrey; Robert, Catherine; Zhang, Xinzheng; Desnues, Christelle; Henrissat, Bernard; Rossmann, Michael G; La Scola, Bernard; Raoult, Didier

    2011-06-21

    Most phagocytic protist viruses have large particles and genomes as well as many laterally acquired genes that may be associated with a sympatric intracellular life (a community-associated lifestyle with viruses, bacteria, and eukaryotes) and the presence of virophages. By subculturing Mimivirus 150 times in a germ-free amoebal host, we observed the emergence of a bald form of the virus that lacked surface fibers and replicated in a morphologically different type of viral factory. When studying a 0.40-μm filtered cloned particle, we found that its genome size shifted from 1.2 (M1) to 0.993 Mb (M4), mainly due to large deletions occurring at both ends of the genome. Some of the lost genes are encoding enzymes required for posttranslational modification of the structural viral proteins, such as glycosyltransferases and ankyrin repeat proteins. Proteomic analysis allowed identification of three proteins, probably required for the assembly of virus fibers. The genes for two of these were found to be deleted from the M4 virus genome. The proteins associated with fibers are highly antigenic and can be recognized by mouse and human antimimivirus antibodies. In addition, the bald strain (M4) was not able to propagate the sputnik virophage. Overall, the Mimivirus transition from a sympatric to an allopatric lifestyle was associated with a stepwise genome reduction and the production of a predominantly bald virophage resistant strain. The new axenic ecosystem allowed the allopatric Mimivirus to lose unnecessary genes that might be involved in the control of competitors.

  12. Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi

    PubMed Central

    Dupont, Pierre-Yves; Cox, Murray P.

    2017-01-01

    Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported. PMID:28235827

  13. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

    PubMed Central

    2009-01-01

    Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416

  14. Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome.

    PubMed

    Hamberger, Björn; Hall, Dawn; Yuen, Mack; Oddy, Claire; Hamberger, Britta; Keeling, Christopher I; Ritland, Carol; Ritland, Kermit; Bohlmann, Jörg

    2009-08-06

    Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes.

  15. Complete chloroplast genome sequence of a tree fern Alsophila spinulosa: insights into evolutionary changes in fern chloroplast genomes.

    PubMed

    Gao, Lei; Yi, Xuan; Yang, Yong-Xia; Su, Ying-Juan; Wang, Ting

    2009-06-11

    Ferns have generally been neglected in studies of chloroplast genomics. Before this study, only one polypod and two basal ferns had their complete chloroplast (cp) genome reported. Tree ferns represent an ancient fern lineage that first occurred in the Late Triassic. In recent phylogenetic analyses, tree ferns were shown to be the sister group of polypods, the most diverse group of living ferns. Availability of cp genome sequence from a tree fern will facilitate interpretation of the evolutionary changes of fern cp genomes. Here we have sequenced the complete cp genome of a scaly tree fern Alsophila spinulosa (Cyatheaceae). The Alsophila cp genome is 156,661 base pairs (bp) in size, and has a typical quadripartite structure with the large (LSC, 86,308 bp) and small single copy (SSC, 21,623 bp) regions separated by two copies of an inverted repeat (IRs, 24,365 bp each). This genome contains 117 different genes encoding 85 proteins, 4 rRNAs and 28 tRNAs. Pseudogenes of ycf66 and trnT-UGU are also detected in this genome. A unique trnR-UCG gene (derived from trnR-CCG) is found between rbcL and accD. The Alsophila cp genome shares some unusual characteristics with the previously sequenced cp genome of the polypod fern Adiantum capillus-veneris, including the absence of 5 tRNA genes that exist in most other cp genomes. The genome shows a high degree of synteny with that of Adiantum, but differs considerably from two basal ferns (Angiopteris evecta and Psilotum nudum). At one endpoint of an ancient inversion we detected a highly repeated 565-bp-region that is absent from the Adiantum cp genome. An additional minor inversion of the trnD-GUC, which is possibly shared by all ferns, was identified by comparison between the fern and other land plant cp genomes. By comparing four fern cp genome sequences it was confirmed that two major rearrangements distinguish higher leptosporangiate ferns from basal fern lineages. The Alsophila cp genome is very similar to that of the polypod fern Adiantum in terms of gene content, gene order and GC content. However, there exist some striking differences between them: the trnR-UCG gene represents a putative molecular apomorphy of tree ferns; and the repeats observed at one inversion endpoint may be a vestige of some unknown rearrangement(s). This work provided fresh insights into the fern cp genome evolution as well as useful data for future phylogenetic studies.

  16. Genome-wide chromatin state transitions associated with developmental and environmental cues.

    PubMed

    Zhu, Jiang; Adli, Mazhar; Zou, James Y; Verstappen, Griet; Coyne, Michael; Zhang, Xiaolan; Durham, Timothy; Miri, Mohammad; Deshpande, Vikram; De Jager, Philip L; Bennett, David A; Houmard, Joseph A; Muoio, Deborah M; Onder, Tamer T; Camahort, Ray; Cowan, Chad A; Meissner, Alexander; Epstein, Charles B; Shoresh, Noam; Bernstein, Bradley E

    2013-01-31

    Differences in chromatin organization are key to the multiplicity of cell states that arise from a single genetic background, yet the landscapes of in vivo tissues remain largely uncharted. Here, we mapped chromatin genome-wide in a large and diverse collection of human tissues and stem cells. The maps yield unprecedented annotations of functional genomic elements and their regulation across developmental stages, lineages, and cellular environments. They also reveal global features of the epigenome, related to nuclear architecture, that also vary across cellular phenotypes. Specifically, developmental specification is accompanied by progressive chromatin restriction as the default state transitions from dynamic remodeling to generalized compaction. Exposure to serum in vitro triggers a distinct transition that involves de novo establishment of domains with features of constitutive heterochromatin. We describe how these global chromatin state transitions relate to chromosome and nuclear architecture, and discuss their implications for lineage fidelity, cellular senescence, and reprogramming. Copyright © 2013 Elsevier Inc. All rights reserved.

  17. Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prochnik, Simon E.; Umen, James; Nedelcu, Aurora

    2010-07-01

    Analysis of the Volvox carteri genome reveals that this green alga's increased organismal complexity and multicellularity are associated with modifications in protein families shared with its unicellular ancestor, and not with large-scale innovations in protein coding capacity. The multicellular green alga Volvox carteri and its morphologically diverse close relatives (the volvocine algae) are uniquely suited for investigating the evolution of multicellularity and development. We sequenced the 138 Mb genome of V. carteri and compared its {approx}14,500 predicted proteins to those of its unicellular relative, Chlamydomonas reinhardtii. Despite fundamental differences in organismal complexity and life history, the two species have similarmore » protein-coding potentials, and few species-specific protein-coding gene predictions. Interestingly, volvocine algal-specific proteins are enriched in Volvox, including those associated with an expanded and highly compartmentalized extracellular matrix. Our analysis shows that increases in organismal complexity can be associated with modifications of lineage-specific proteins rather than large-scale invention of protein-coding capacity.« less

  18. LTR Retrotransposons Contribute to Genomic Gigantism in Plethodontid Salamanders

    PubMed Central

    Sun, Cheng; Shepard, Donald B.; Chong, Rebecca A.; López Arriaza, José; Hall, Kathryn; Castoe, Todd A.; Feschotte, Cédric; Pollock, David D.; Mueller, Rachel Lockridge

    2012-01-01

    Among vertebrates, most of the largest genomes are found within the salamanders, a clade of amphibians that includes 613 species. Salamander genome sizes range from ∼14 to ∼120 Gb. Because genome size is correlated with nucleus and cell sizes, as well as other traits, morphological evolution in salamanders has been profoundly affected by genomic gigantism. However, the molecular mechanisms driving genomic expansion in this clade remain largely unknown. Here, we present the first comparative analysis of transposable element (TE) content in salamanders. Using high-throughput sequencing, we generated genomic shotgun data for six species from the Plethodontidae, the largest family of salamanders. We then developed a pipeline to mine TE sequences from shotgun data in taxa with limited genomic resources, such as salamanders. Our summaries of overall TE abundance and diversity for each species demonstrate that TEs make up a substantial portion of salamander genomes, and that all of the major known types of TEs are represented in salamanders. The most abundant TE superfamilies found in the genomes of our six focal species are similar, despite substantial variation in genome size. However, our results demonstrate a major difference between salamanders and other vertebrates: salamander genomes contain much larger amounts of long terminal repeat (LTR) retrotransposons, primarily Ty3/gypsy elements. Thus, the extreme increase in genome size that occurred in salamanders was likely accompanied by a shift in TE landscape. These results suggest that increased proliferation of LTR retrotransposons was a major molecular mechanism contributing to genomic expansion in salamanders. PMID:22200636

  19. Low gene copy number shows that arbuscular mycorrhizal fungi inherit genetically different nuclei.

    PubMed

    Hijri, Mohamed; Sanders, Ian R

    2005-01-13

    Arbuscular mycorrhizal fungi (AMF) are ancient asexually reproducing organisms that form symbioses with the majority of plant species, improving plant nutrition and promoting plant diversity. Little is known about the evolution or organization of the genomes of any eukaryotic symbiont or ancient asexual organism. Direct evidence shows that one AMF species is heterokaryotic; that is, containing populations of genetically different nuclei. It has been suggested, however, that the genetic variation passed from generation to generation in AMF is simply due to multiple chromosome sets (that is, high ploidy). Here we show that previously documented genetic variation in Pol-like sequences, which are passed from generation to generation, cannot be due to either high ploidy or repeated gene duplications. Our results provide the clearest evidence so far for substantial genetic differences among nuclei in AMF. We also show that even AMF with a very large nuclear DNA content are haploid. An underlying principle of evolutionary theory is that an individual passes on one or half of its genome to each of its progeny. The coexistence of a population of many genomes in AMF and their transfer to subsequent generations, therefore, has far-reaching consequences for understanding genome evolution.

  20. Rates and Genomic Consequences of Spontaneous Mutational Events in Drosophila melanogaster

    PubMed Central

    Schrider, Daniel R.; Houle, David; Lynch, Michael; Hahn, Matthew W.

    2013-01-01

    Because spontaneous mutation is the source of all genetic diversity, measuring mutation rates can reveal how natural selection drives patterns of variation within and between species. We sequenced eight genomes produced by a mutation-accumulation experiment in Drosophila melanogaster. Our analysis reveals that point mutation and small indel rates vary significantly between the two different genetic backgrounds examined. We also find evidence that ∼2% of mutational events affect multiple closely spaced nucleotides. Unlike previous similar experiments, we were able to estimate genome-wide rates of large deletions and tandem duplications. These results suggest that, at least in inbred lines like those examined here, mutational pressures may result in net growth rather than contraction of the Drosophila genome. By comparing our mutation rate estimates to polymorphism data, we are able to estimate the fraction of new mutations that are eliminated by purifying selection. These results suggest that ∼99% of duplications and deletions are deleterious—making them 10 times more likely to be removed by selection than nonsynonymous mutations. Our results illuminate not only the rates of new small- and large-scale mutations, but also the selective forces that they encounter once they arise. PMID:23733788

  1. The Whole-Genome and Transcriptome of the Manila Clam (Ruditapes philippinarum).

    PubMed

    Mun, Seyoung; Kim, Yun-Ji; Markkandan, Kesavan; Shin, Wonseok; Oh, Sumin; Woo, Jiyoung; Yoo, Jongsu; An, Hyesuck; Han, Kyudong

    2017-06-01

    The manila clam, Ruditapes philippinarum, is an important bivalve species in worldwide aquaculture including Korea. The aquaculture production of R. philippinarum is under threat from diverse environmental factors including viruses, microorganisms, parasites, and water conditions with subsequently declining production. In spite of its importance as a marine resource, the reference genome of R. philippinarum for comprehensive genetic studies is largely unexplored. Here, we report the de novo whole-genome and transcriptome assembly of R. philippinarum across three different tissues (foot, gill, and adductor muscle), and provide the basic data for advanced studies in selective breeding and disease control in order to obtain successful aquaculture systems. An approximately 2.56 Gb high quality whole-genome was assembled with various library construction methods. A total of 108,034 protein coding gene models were predicted and repetitive elements including simple sequence repeats and noncoding RNAs were identified to further understanding of the genetic background of R. philippinarum for genomics-assisted breeding. Comparative analysis with the bivalve marine invertebrates uncover that the gene family related to complement C1q was enriched. Furthermore, we performed transcriptome analysis with three different tissues in order to support genome annotation and then identified 41,275 transcripts which were annotated. The R. philippinarum genome resource will markedly advance a wide range of potential genetic studies, a reference genome for comparative analysis of bivalve species and unraveling mechanisms of biological processes in molluscs. We believe that the R. philippinarum genome will serve as an initial platform for breeding better-quality clams using a genomic approach. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Pharmacogenetics: Implications of Race and Ethnicity on Defining Genetic Profiles for Personalized Medicine

    PubMed Central

    Ortega, Victor E.; Meyers, Deborah A.

    2014-01-01

    Pharmacogenetics is being used to develop personalized therapies specific to individuals from different ethnic or racial groups. Pharmacogenetic studies to date have been primarily performed in trial cohorts consisting of non-Hispanic whites of European descent. A “bottleneck” or collapse of genetic diversity associated with the first human colonization of Europe during the Upper Paleolithic period, followed by the recent mixing of African, European, and Native American ancestries has resulted in different ethnic groups with varying degrees of genetic diversity. Differences in genetic ancestry may introduce genetic variation which has the potential to alter the therapeutic efficacy of commonly used asthma therapies, for example β2-adrenergic receptor agonists (beta agonists). Pharmacogenetic studies of admixed ethnic groups have been limited to small candidate gene association studies of which the best example is the gene coding for the receptor target of beta agonist therapy, ADRB2. Large consortium-based sequencing studies are using next-generation whole-genome sequencing to provide a diverse genome map of different admixed populations which can be used for future pharmacogenetic studies. These studies will include candidate gene studies, genome-wide association studies, and whole-genome admixture-based approaches which account for ancestral genetic structure, complex haplotypes, gene-gene interactions, and rare variants to detect and replicate novel pharmacogenetic loci. PMID:24369795

  3. Genomics Portals: integrative web-platform for mining genomics data.

    PubMed

    Shinde, Kaustubh; Phatak, Mukta; Johannes, Freudenberg M; Chen, Jing; Li, Qian; Vineet, Joshi K; Hu, Zhen; Ghosh, Krishnendu; Meller, Jaroslaw; Medvedovic, Mario

    2010-01-13

    A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org.

  4. Genomics Portals: integrative web-platform for mining genomics data

    PubMed Central

    2010-01-01

    Background A large amount of experimental data generated by modern high-throughput technologies is available through various public repositories. Our knowledge about molecular interaction networks, functional biological pathways and transcriptional regulatory modules is rapidly expanding, and is being organized in lists of functionally related genes. Jointly, these two sources of information hold a tremendous potential for gaining new insights into functioning of living systems. Results Genomics Portals platform integrates access to an extensive knowledge base and a large database of human, mouse, and rat genomics data with basic analytical visualization tools. It provides the context for analyzing and interpreting new experimental data and the tool for effective mining of a large number of publicly available genomics datasets stored in the back-end databases. The uniqueness of this platform lies in the volume and the diversity of genomics data that can be accessed and analyzed (gene expression, ChIP-chip, ChIP-seq, epigenomics, computationally predicted binding sites, etc), and the integration with an extensive knowledge base that can be used in such analysis. Conclusion The integrated access to primary genomics data, functional knowledge and analytical tools makes Genomics Portals platform a unique tool for interpreting results of new genomics experiments and for mining the vast amount of data stored in the Genomics Portals backend databases. Genomics Portals can be accessed and used freely at http://GenomicsPortals.org. PMID:20070909

  5. mySyntenyPortal: an application package to construct websites for synteny block analysis.

    PubMed

    Lee, Jongin; Lee, Daehwan; Sim, Mikang; Kwon, Daehong; Kim, Juyeon; Ko, Younhee; Kim, Jaebum

    2018-06-05

    Advances in sequencing technologies have facilitated large-scale comparative genomics based on whole genome sequencing. Constructing and investigating conserved genomic regions among multiple species (called synteny blocks) are essential in the comparative genomics. However, they require significant amounts of computational resources and time in addition to bioinformatics skills. Many web interfaces have been developed to make such tasks easier. However, these web interfaces cannot be customized for users who want to use their own set of genome sequences or definition of synteny blocks. To resolve this limitation, we present mySyntenyPortal, a stand-alone application package to construct websites for synteny block analyses by using users' own genome data. mySyntenyPortal provides both command line and web-based interfaces to build and manage websites for large-scale comparative genomic analyses. The websites can be also easily published and accessed by other users. To demonstrate the usability of mySyntenyPortal, we present an example study for building websites to compare genomes of three mammalian species (human, mouse, and cow) and show how they can be easily utilized to identify potential genes affected by genome rearrangements. mySyntenyPortal will contribute for extended comparative genomic analyses based on large-scale whole genome sequences by providing unique functionality to support the easy creation of interactive websites for synteny block analyses from user's own genome data.

  6. The complete mitochondrial genome of the medicinal fungus Ganoderma applanatum (Polyporales, Basidiomycota).

    PubMed

    Wang, Xin-Cun; Shao, Junjie; Liu, Chang

    2016-07-01

    We have determined the complete nucleotide sequence of the mitochondrial genome of the medicinal fungus Ganoderma applanatum (Pers.) Pat. using the next-generation sequencing technology. The circular molecule is 119,803 bp long with a GC content of 26.66%. Gene prediction revealed genes encoding 15 conserved proteins, 25 tRNAs, the large and small ribosomal RNAs, all genes are located on the same strand except trnW-CCA. Compared with previously sequenced genomes of G. lucidum, G. meredithiae and G. sinense, the order of the protein and rRNA genes is highly conserved; however, the types of tRNA genes are slightly different. The mitochondrial genome of G. applanatum will contribute to the understanding of the phylogeny and evolution of Ganoderma and Ganodermataceae, the group containing many species with high medicinal values.

  7. Two Rounds of Whole Genome Duplication in the Ancestral Vertebrate

    PubMed Central

    Dehal, Paramvir; Boore, Jeffrey L

    2005-01-01

    The hypothesis that the relatively large and complex vertebrate genome was created by two ancient, whole genome duplications has been hotly debated, but remains unresolved. We reconstructed the evolutionary relationships of all gene families from the complete gene sets of a tunicate, fish, mouse, and human, and then determined when each gene duplicated relative to the evolutionary tree of the organisms. We confirmed the results of earlier studies that there remains little signal of these events in numbers of duplicated genes, gene tree topology, or the number of genes per multigene family. However, when we plotted the genomic map positions of only the subset of paralogous genes that were duplicated prior to the fish–tetrapod split, their global physical organization provides unmistakable evidence of two distinct genome duplication events early in vertebrate evolution indicated by clear patterns of four-way paralogous regions covering a large part of the human genome. Our results highlight the potential for these large-scale genomic events to have driven the evolutionary success of the vertebrate lineage. PMID:16128622

  8. Representation matters: quantitative behavioral variation in wild worm strains

    NASA Astrophysics Data System (ADS)

    Brown, Andre

    Natural genetic variation in populations is the basis of genome-wide association studies, an approach that has been applied in large studies of humans to study the genetic architecture of complex traits including disease risk. Of course, the traits you choose to measure determine which associated genes you discover (or miss). In large-scale human studies, the measured traits are usually taken as a given during the association step because they are expensive to collect and standardize. Working with the nematode worm C. elegans, we do not have the same constraints. In this talk I will describe how large-scale imaging of worm behavior allows us to develop alternative representations of behavior that vary differently across wild populations. The alternative representations yield novel traits that can be used for genome-wide association studies and may reveal basic properties of the genotype-phenotype map that are obscured if only a small set of fixed traits are used.

  9. Evolutionary Story of a Satellite DNA from Phodopus sungorus (Rodentia, Cricetidae)

    PubMed Central

    Paço, Ana; Adega, Filomena; Meštrović, Nevenka; Plohl, Miroslav; Chaves, Raquel

    2014-01-01

    With the goal to contribute for the understanding of satellite DNA evolution and its genomic involvement, in this work it was isolated and characterized the first satellite DNA (PSUcentSat) from Phodopus sungorus (Cricetidae). Physical mapping of this sequence in P. sungorus showed large PSUcentSat arrays located at the heterochromatic (peri)centromeric region of five autosomal pairs and Y-chromosome. The presence of orthologous PSUcentSat sequences in the genomes of other Cricetidae and Muridae rodents was also verified, presenting however, an interspersed chromosomal distribution. This distribution pattern suggests a PSUcentSat-scattered location in an ancestor of Muridae/Cricetidae families, that assumed afterwards, in the descendant genome of P. sungorus a restricted localization to few chromosomes in the (peri)centromeric region. We believe that after the divergence of the studied species, PSUcentSat was most probably highly amplified in the (peri)centromeric region of some chromosome pairs of this hamster by recombinational mechanisms. The bouquet chromosome configuration (prophase I) possibly displays an important role in this selective amplification, providing physical proximity of centromeric regions between chromosomes with similar size and/or morphology. This seems particularly evident for the acrocentric chromosomes of P. sungorus (including the Y-chromosome), all presenting large PSUcentSat arrays at the (peri)centromeric region. The conservation of this sequence in the studied genomes and its (peri)centromeric amplification in P. sungorus strongly suggests functional significance, possibly displaying this satellite family different functions in the different genomes. The verification of PSUcentSat transcriptional activity in normal proliferative cells suggests that its transcription is not stage-limited, as described for some other satellites. PMID:25336681

  10. Frequent loss of lineages and deficient duplications accounted for low copy number of disease resistance genes in Cucurbitaceae

    PubMed Central

    2013-01-01

    Background The sequenced genomes of cucumber, melon and watermelon have relatively few R-genes, with 70, 75 and 55 copies only, respectively. The mechanism for low copy number of R-genes in Cucurbitaceae genomes remains unknown. Results Manual annotation of R-genes in the sequenced genomes of Cucurbitaceae species showed that approximately half of them are pseudogenes. Comparative analysis of R-genes showed frequent loss of R-gene loci in different Cucurbitaceae species. Phylogenetic analysis, data mining and PCR cloning using degenerate primers indicated that Cucurbitaceae has limited number of R-gene lineages (subfamilies). Comparison between R-genes from Cucurbitaceae and those from poplar and soybean suggested frequent loss of R-gene lineages in Cucurbitaceae. Furthermore, the average number of R-genes per lineage in Cucurbitaceae species is approximately 1/3 that in soybean or poplar. Therefore, both loss of lineages and deficient duplications in extant lineages accounted for the low copy number of R-genes in Cucurbitaceae. No extensive chimeras of R-genes were found in any of the sequenced Cucurbitaceae genomes. Nevertheless, one lineage of R-genes from Trichosanthes kirilowii, a wild Cucurbitaceae species, exhibits chimeric structures caused by gene conversions, and may contain a large number of distinct R-genes in natural populations. Conclusions Cucurbitaceae species have limited number of R-gene lineages and each genome harbors relatively few R-genes. The scarcity of R-genes in Cucurbitaceae species was due to frequent loss of R-gene lineages and infrequent duplications in extant lineages. The evolutionary mechanisms for large variation of copy number of R-genes in different plant species were discussed. PMID:23682795

  11. Newly discovered young CORE-SINEs in marsupial genomes.

    PubMed

    Munemasa, Maruo; Nikaido, Masato; Nishihara, Hidenori; Donnellan, Stephen; Austin, Christopher C; Okada, Norihiro

    2008-01-15

    Although recent mammalian genome projects have uncovered a large part of genomic component of various groups, several repetitive sequences still remain to be characterized and classified for particular groups. The short interspersed repetitive elements (SINEs) distributed among marsupial genomes are one example. We have identified and characterized two new SINEs from marsupial genomes that belong to the CORE-SINE family, characterized by a highly conserved "CORE" domain. PCR and genomic dot blot analyses revealed that the distribution of each SINE shows distinct patterns among the marsupial genomes, implying different timing of their retroposition during the evolution of marsupials. The members of Mar3 (Marsupialia 3) SINE are distributed throughout the genomes of all marsupials, whereas the Mac1 (Macropodoidea 1) SINE is distributed specifically in the genomes of kangaroos. Sequence alignment of the Mar3 SINEs revealed that they can be further divided into four subgroups, each of which has diagnostic nucleotides. The insertion patterns of each SINE at particular genomic loci, together with the distribution patterns of each SINE, suggest that the Mar3 SINEs have intensively amplified after the radiation of diprotodontians, whereas the Mac1 SINE has amplified only slightly after the divergence of hypsiprimnodons from other macropods. By compiling the information of CORE-SINEs characterized to date, we propose a comprehensive picture of how SINE evolution occurred in the genomes of marsupials.

  12. Evaluation of non-additive genetic variation in feed-related traits of broiler chickens.

    PubMed

    Li, Y; Hawken, R; Sapp, R; George, A; Lehnert, S A; Henshall, J M; Reverter, A

    2017-03-01

    Genome-wide association mapping and genomic predictions of phenotype of individuals in livestock are predominately based on the detection and estimation of additive genetic effects. Non-additive genetic effects are largely ignored. Studies in animals, plants, and humans to assess the impact of non-additive genetic effects in genetic analyses have led to differing conclusions. In this paper, we examined the consequences of including non-additive genetic effects in genome-wide association mapping and genomic prediction of total genetic values in a commercial population of 5,658 broiler chickens genotyped for 45,176 single nucleotide polymorphism (SNP) markers. We employed mixed-model equations and restricted maximum likelihood to analyze 7 feed related traits (TRT1 - TRT7). Dominance variance accounted for a significant proportion of the total genetic variance in all 7 traits, ranging from 29.5% for TRT1 to 58.4% for TRT7. Using a 5-fold cross-validation schema, we found that in spite of the large dominance component, including the estimated dominance effects in the prediction of total genetic values did not improve the accuracy of the predictions for any of the phenotypes. We offer some possible explanations for this counter-intuitive result including the possible confounding of dominance deviations with common environmental effects such as hatch, different directional effects of SNP additive and dominance variations, and the gene-gene interactions' failure to contribute to the level of variance. © 2016 Poultry Science Association Inc.

  13. Phylogeny and mitochondrial gene order variation in Lophotrochozoa in the light of new mitogenomic data from Nemertea

    PubMed Central

    Podsiadlowski, Lars; Braband, Anke; Struck, Torsten H; von Döhren, Jörn; Bartolomaeus, Thomas

    2009-01-01

    Background The new animal phylogeny established several taxa which were not identified by morphological analyses, most prominently the Ecdysozoa (arthropods, roundworms, priapulids and others) and Lophotrochozoa (molluscs, annelids, brachiopods and others). Lophotrochozoan interrelationships are under discussion, e.g. regarding the position of Nemertea (ribbon worms), which were discussed to be sister group to e.g. Mollusca, Brachiozoa or Platyhelminthes. Mitochondrial genomes contributed well with sequence data and gene order characters to the deep metazoan phylogeny debate. Results In this study we present the first complete mitochondrial genome record for a member of the Nemertea, Lineus viridis. Except two trnP and trnT, all genes are located on the same strand. While gene order is most similar to that of the brachiopod Terebratulina retusa, sequence based analyses of mitochondrial genes place nemerteans close to molluscs, phoronids and entoprocts without clear preference for one of these taxa as sister group. Conclusion Almost all recent analyses with large datasets show good support for a taxon comprising Annelida, Mollusca, Brachiopoda, Phoronida and Nemertea. But the relationships among these taxa vary between different studies. The analysis of gene order differences gives evidence for a multiple independent occurrence of a large inversion in the mitochondrial genome of Lophotrochozoa and a re-inversion of the same part in gastropods. We hypothesize that some regions of the genome have a higher chance for intramolecular recombination than others and gene order data have to be analysed carefully to detect convergent rearrangement events. PMID:19660126

  14. A New Perspective on Polyploid Fragaria (Strawberry) Genome Composition Based on Large-Scale, Multi-Locus Phylogenetic Analysis.

    PubMed

    Yang, Yilong; Davis, Thomas M

    2017-12-01

    The subgenomic compositions of the octoploid (2n = 8× = 56) strawberry (Fragaria) species, including the economically important cultivated species Fragaria x ananassa, have been a topic of long-standing interest. Phylogenomic approaches utilizing next-generation sequencing technologies offer a new window into species relationships and the subgenomic compositions of polyploids. We have conducted a large-scale phylogenetic analysis of Fragaria (strawberry) species using the Fluidigm Access Array system and 454 sequencing platform. About 24 single-copy or low-copy nuclear genes distributed across the genome were amplified and sequenced from 96 genomic DNA samples representing 16 Fragaria species from diploid (2×) to decaploid (10×), including the most extensive sampling of octoploid taxa yet reported. Individual gene trees were constructed by different tree-building methods. Mosaic genomic structures of diploid Fragaria species consisting of sequences at different phylogenetic positions were observed. Our findings support the presence in octoploid species of genetic signatures from at least five diploid ancestors (F. vesca, F. iinumae, F. bucharica, F. viridis, and at least one additional allele contributor of unknown identity), and questions the extent to which distinct subgenomes are preserved over evolutionary time in the allopolyploid Fragaria species. In addition, our data support divergence between the two wild octoploid species, F. virginiana and F. chiloensis. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Sauropod dinosaurs evolved moderately sized genomes unrelated to body size.

    PubMed

    Organ, Chris L; Brusatte, Stephen L; Stein, Koen

    2009-12-22

    Sauropodomorph dinosaurs include the largest land animals to have ever lived, some reaching up to 10 times the mass of an African elephant. Despite their status defining the upper range for body size in land animals, it remains unknown whether sauropodomorphs evolved larger-sized genomes than non-avian theropods, their sister taxon, or whether a relationship exists between genome size and body size in dinosaurs, two questions critical for understanding broad patterns of genome evolution in dinosaurs. Here we report inferences of genome size for 10 sauropodomorph taxa. The estimates are derived from a Bayesian phylogenetic generalized least squares approach that generates posterior distributions of regression models relating genome size to osteocyte lacunae volume in extant tetrapods. We estimate that the average genome size of sauropodomorphs was 2.02 pg (range of species means: 1.77-2.21 pg), a value in the upper range of extant birds (mean = 1.42 pg, range: 0.97-2.16 pg) and near the average for extant non-avian reptiles (mean = 2.24 pg, range: 1.05-5.44 pg). The results suggest that the variation in size and architecture of genomes in extinct dinosaurs was lower than the variation found in mammals. A substantial difference in genome size separates the two major clades within dinosaurs, Ornithischia (large genomes) and Saurischia (moderate to small genomes). We find no relationship between body size and estimated genome size in extinct dinosaurs, which suggests that neutral forces did not dominate the evolution of genome size in this group.

  16. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  17. Biased distributions and decay of long interspersed nuclear elements in the chicken genome.

    PubMed

    Abrusán, György; Krambeck, Hans-Jürgen; Junier, Thomas; Giordano, Joti; Warburton, Peter E

    2008-01-01

    The genomes of birds are much smaller than mammalian genomes, and transposable elements (TEs) make up only 10% of the chicken genome, compared with the 45% of the human genome. To study the mechanisms that constrain the copy numbers of TEs, and as a consequence the genome size of birds, we analyzed the distributions of LINEs (CR1's) and SINEs (MIRs) on the chicken autosomes and Z chromosome. We show that (1) CR1 repeats are longest on the Z chromosome and their length is negatively correlated with the local GC content; (2) the decay of CR1 elements is highly biased, and the 5'-ends of the insertions are lost much faster than their 3'-ends; (3) the GC distribution of CR1 repeats shows a bimodal pattern with repeats enriched in both AT-rich and GC-rich regions of the genome, but the CR1 families show large differences in their GC distribution; and (4) the few MIRs in the chicken are most abundant in regions with intermediate GC content. Our results indicate that the primary mechanism that removes repeats from the chicken genome is ectopic exchange and that the low abundance of repeats in avian genomes is likely to be the consequence of their high recombination rates.

  18. Genome editing of crops: A renewed opportunity for food security.

    PubMed

    Georges, Fawzy; Ray, Heather

    2017-01-02

    Genome editing of crop plants is a rapidly advancing technology whereby targeted mutations can be introduced into a plant genome in a highly specific manner and with great precision. For the most part, the technology does not incorporate transgenic modifications and is far superior to conventional chemical mutagenesis. In this study we bring into focus some of the underlying differences between the 3 existing technologies: classical plant breeding, genetic modification and genome editing. We discuss some of the main achievements from each area and highlight their common characteristics and individual limitations, while emphasizing the unique capabilities of genome editing. We subsequently examine the possible regulatory mechanisms which governments may be inclined to use in assessing the status of genome edited products. If assessed on the basis of their phenotype rather than the process by which they are obtained, these products will be categorized as equivalent to those produced by classical mutagenesis. This would mean that genome edited products will not be subject to the restrictions imposed on genetically modified products, except in some cases where the mutation involves a large sequence insertion into the genome. We conclude by examining the potential of societal acceptance of genome editing technology, reinforced by a scientific perspective on promoting such acceptance.

  19. Endozoicomonas genomes reveal functional adaptation and plasticity in bacterial strains symbiotically associated with diverse marine hosts

    PubMed Central

    Neave, Matthew J.; Michell, Craig T.; Apprill, Amy; Voolstra, Christian R.

    2017-01-01

    Endozoicomonas bacteria are globally distributed and often abundantly associated with diverse marine hosts including reef-building corals, yet their function remains unknown. In this study we generated novel Endozoicomonas genomes from single cells and metagenomes obtained directly from the corals Stylophora pistillata, Pocillopora verrucosa, and Acropora humilis. We then compared these culture-independent genomes to existing genomes of bacterial isolates acquired from a sponge, sea slug, and coral to examine the functional landscape of this enigmatic genus. Sequencing and analysis of single cells and metagenomes resulted in four novel genomes with 60–76% and 81–90% genome completeness, respectively. These data also confirmed that Endozoicomonas genomes are large and are not streamlined for an obligate endosymbiotic lifestyle, implying that they have free-living stages. All genomes show an enrichment of genes associated with carbon sugar transport and utilization and protein secretion, potentially indicating that Endozoicomonas contribute to the cycling of carbohydrates and the provision of proteins to their respective hosts. Importantly, besides these commonalities, the genomes showed evidence for differential functional specificity and diversification, including genes for the production of amino acids. Given this metabolic diversity of Endozoicomonas we propose that different genotypes play disparate roles and have diversified in concert with their hosts. PMID:28094347

  20. Genome editing of crops: A renewed opportunity for food security

    PubMed Central

    Georges, Fawzy

    2017-01-01

    ABSTRACT Genome editing of crop plants is a rapidly advancing technology whereby targeted mutations can be introduced into a plant genome in a highly specific manner and with great precision. For the most part, the technology does not incorporate transgenic modifications and is far superior to conventional chemical mutagenesis. In this study we bring into focus some of the underlying differences between the 3 existing technologies: classical plant breeding, genetic modification and genome editing. We discuss some of the main achievements from each area and highlight their common characteristics and individual limitations, while emphasizing the unique capabilities of genome editing. We subsequently examine the possible regulatory mechanisms which governments may be inclined to use in assessing the status of genome edited products. If assessed on the basis of their phenotype rather than the process by which they are obtained, these products will be categorized as equivalent to those produced by classical mutagenesis. This would mean that genome edited products will not be subject to the restrictions imposed on genetically modified products, except in some cases where the mutation involves a large sequence insertion into the genome. We conclude by examining the potential of societal acceptance of genome editing technology, reinforced by a scientific perspective on promoting such acceptance. PMID:28075688

  1. A variant of Rubus yellow net virus with altered genomic organization.

    PubMed

    Diaz-Lara, Alfredo; Mosier, Nola J; Keller, Karen E; Martin, Robert R

    2015-02-01

    Rubus yellow net virus (RYNV) is a member of the genus Badnavirus (family: Caulimoviridae). RYNV infects Rubus species causing chlorosis of the tissue along the leaf veins, giving an unevenly distributed netted symptom in some cultivars of red and black raspberry. Recently, a strain of RYNV was sequenced from a Rubus idaeus plant in Alberta, Canada, exhibiting such symptoms. The viral genome contained seven open reading frames (ORFs) with five of them in the sense-strand, including a large polyprotein. Here we describe a graft-transmissible strain of RYNV from Europe infecting cultivar 'Baumforth's Seedling A' (named RYNV-BS), which was sequenced using rolling circle amplification, enzymatic digestion, cloning and primer walking, and it was resequenced at a 5X coverage. This sequence was then compared with the RYNV-Ca genome and significant differences were observed. Genomic analysis identified differences in the arrangement of coding regions, promoter elements, and presence of motifs. The genomic organization of RYNV-BS consisted of five ORFs (four ORFs in the sense-strand and one ORF in the antisense-strand). ORFs 1, 2, and 3 showed a high degree of homology to RYNV-Ca, while ORFs 4 and 6 of RYNV-BS were quite distinct. Also, the predicted ORFs 5 and 7 in the RYNV-Ca were absent in the RYNV-BS sequence. These differences may account for the lack of aphid transmissibility of RYNV-BS.

  2. Genomic rearrangements and signatures of breeding in the allo-octoploid strawberry as revealed through an allele dose based SSR linkage map

    PubMed Central

    2014-01-01

    Background Breeders in the allo-octoploid strawberry currently make little use of molecular marker tools. As a first step of a QTL discovery project on fruit quality traits and resistance to soil-borne pathogens such as Phytophthora cactorum and Verticillium we built a genome-wide SSR linkage map for the cross Holiday x Korona. We used the previously published MADCE method to obtain full haplotype information for both of the parental cultivars, facilitating in-depth studies on their genomic organisation. Results The linkage map incorporates 508 segregating loci and represents each of the 28 chromosome pairs of octoploid strawberry, spanning an estimated length of 2050 cM. The sub-genomes are denoted according to their sequence divergence from F. vesca as revealed by marker performance. The map revealed high overall synteny between the sub-genomes, but also revealed two large inversions on LG2C and LG2D, of which the latter was confirmed using a separate mapping population. We discovered interesting breeding features within the parental cultivars by in-depth analysis of our haplotype data. The linkage map-derived homozygosity level of Holiday was similar to the pedigree-derived inbreeding level (33% and 29%, respectively). For Korona we found that the observed homozygosity level was over three times higher than expected from the pedigree (13% versus 3.6%). This could indicate selection pressure on genes that have favourable effects in homozygous states. The level of kinship between Holiday and Korona derived from our linkage map was 2.5 times higher than the pedigree-derived value. This large difference could be evidence of selection pressure enacted by strawberry breeders towards specific haplotypes. Conclusion The obtained SSR linkage map provides a good base for QTL discovery. It also provides the first biologically relevant basis for the discernment and notation of sub-genomes. For the first time, we revealed genomic rearrangements that were verified in a separate mapping population. We believe that haplotype information will become increasingly important in identifying marker-trait relationships and regions that are under selection pressure within breeding material. Our attempt at providing a biological basis for the discernment of sub-genomes warrants follow-up studies to streamline the naming of the sub-genomes among different octoploid strawberry maps. PMID:24581289

  3. Genomic rearrangements and signatures of breeding in the allo-octoploid strawberry as revealed through an allele dose based SSR linkage map.

    PubMed

    van Dijk, Thijs; Pagliarani, Giulia; Pikunova, Anna; Noordijk, Yolanda; Yilmaz-Temel, Hulya; Meulenbroek, Bert; Visser, Richard G F; van de Weg, Eric

    2014-03-01

    Breeders in the allo-octoploid strawberry currently make little use of molecular marker tools. As a first step of a QTL discovery project on fruit quality traits and resistance to soil-borne pathogens such as Phytophthora cactorum and Verticillium we built a genome-wide SSR linkage map for the cross Holiday x Korona. We used the previously published MADCE method to obtain full haplotype information for both of the parental cultivars, facilitating in-depth studies on their genomic organisation. The linkage map incorporates 508 segregating loci and represents each of the 28 chromosome pairs of octoploid strawberry, spanning an estimated length of 2050 cM. The sub-genomes are denoted according to their sequence divergence from F. vesca as revealed by marker performance. The map revealed high overall synteny between the sub-genomes, but also revealed two large inversions on LG2C and LG2D, of which the latter was confirmed using a separate mapping population. We discovered interesting breeding features within the parental cultivars by in-depth analysis of our haplotype data. The linkage map-derived homozygosity level of Holiday was similar to the pedigree-derived inbreeding level (33% and 29%, respectively). For Korona we found that the observed homozygosity level was over three times higher than expected from the pedigree (13% versus 3.6%). This could indicate selection pressure on genes that have favourable effects in homozygous states. The level of kinship between Holiday and Korona derived from our linkage map was 2.5 times higher than the pedigree-derived value. This large difference could be evidence of selection pressure enacted by strawberry breeders towards specific haplotypes. The obtained SSR linkage map provides a good base for QTL discovery. It also provides the first biologically relevant basis for the discernment and notation of sub-genomes. For the first time, we revealed genomic rearrangements that were verified in a separate mapping population. We believe that haplotype information will become increasingly important in identifying marker-trait relationships and regions that are under selection pressure within breeding material. Our attempt at providing a biological basis for the discernment of sub-genomes warrants follow-up studies to streamline the naming of the sub-genomes among different octoploid strawberry maps.

  4. EUPAN enables pan-genome studies of a large number of eukaryotic genomes.

    PubMed

    Hu, Zhiqiang; Sun, Chen; Lu, Kuang-Chen; Chu, Xixia; Zhao, Yue; Lu, Jinyuan; Shi, Jianxin; Wei, Chaochun

    2017-08-01

    Pan-genome analyses are routinely carried out for bacteria to interpret the within-species gene presence/absence variations (PAVs). However, pan-genome analyses are rare for eukaryotes due to the large sizes and higher complexities of their genomes. Here we proposed EUPAN, a eukaryotic pan-genome analysis toolkit, enabling automatic large-scale eukaryotic pan-genome analyses and detection of gene PAVs at a relatively low sequencing depth. In the previous studies, we demonstrated the effectiveness and high accuracy of EUPAN in the pan-genome analysis of 453 rice genomes, in which we also revealed widespread gene PAVs among individual rice genomes. Moreover, EUPAN can be directly applied to the current re-sequencing projects primarily focusing on single nucleotide polymorphisms. EUPAN is implemented in Perl, R and C ++. It is supported under Linux and preferred for a computer cluster with LSF and SLURM job scheduling system. EUPAN together with its standard operating procedure (SOP) is freely available for non-commercial use (CC BY-NC 4.0) at http://cgm.sjtu.edu.cn/eupan/index.html . ccwei@sjtu.edu.cn or jianxin.shi@sjtu.edu.cn. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Genome resequencing in Populus: Revealing large-scale genome variation and implications on specialized-trait genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Muchero, Wellington; Labbe, Jessy L; Priya, Ranjan

    2014-01-01

    To date, Populus ranks among a few plant species with a complete genome sequence and other highly developed genomic resources. With the first genome sequence among all tree species, Populus has been adopted as a suitable model organism for genomic studies in trees. However, far from being just a model species, Populus is a key renewable economic resource that plays a significant role in providing raw materials for the biofuel and pulp and paper industries. Therefore, aside from leading frontiers of basic tree molecular biology and ecological research, Populus leads frontiers in addressing global economic challenges related to fuel andmore » fiber production. The latter fact suggests that research aimed at improving quality and quantity of Populus as a raw material will likely drive the pursuit of more targeted and deeper research in order to unlock the economic potential tied in molecular biology processes that drive this tree species. Advances in genome sequence-driven technologies, such as resequencing individual genotypes, which in turn facilitates large scale SNP discovery and identification of large scale polymorphisms are key determinants of future success in these initiatives. In this treatise we discuss implications of genome sequence-enable technologies on Populus genomic and genetic studies of complex and specialized-traits.« less

  6. The complete genomes of Lactobacillus plantarum and Lactobacillus johnsonii reveal extensive differences in chromosome organization and gene content.

    PubMed

    Boekhorst, Jos; Siezen, Roland J; Zwahlen, Marie-Camille; Vilanova, David; Pridmore, Raymond D; Mercenier, Annick; Kleerebezem, Michiel; de Vos, Willem M; Brüssow, Harald; Desiere, Frank

    2004-11-01

    The first comprehensive comparative analysis of lactobacilli was done by comparing the genomes of Lactobacillus plantarum (3.3 Mb) and Lactobacillus johnsonii (2.0 Mb). L. johnsonii is predominantly found in the gastrointestinal tract, while L. plantarum is also found on plants and plant-derived material, and is used in a variety of industrial fermentations. The L. plantarum and L. johnsonii chromosomes have only 28 regions with conservation of gene order, totalling about 0.75 Mb; these regions are not co-linear, indicating major chromosomal rearrangements. Metabolic reconstruction indicates many differences between L. johnsonii and L. plantarum: numerous enzymes involved in sugar metabolism and in biosynthesis of amino acids, nucleotides, fatty acids and cofactors are lacking in L. johnsonii. Major differences were seen in the number and types of putative extracellular proteins, which are of interest because of their possible role in host-microbe interactions. The differences between L. plantarum and L. johnsonii, both in genome organization and gene content, are exceptionally large for two bacteria of the same genus, emphasizing the difficulty in taxonomic classification of lactobacilli.

  7. Extracting DNA from 'jaws': high yield and quality from archived tiger shark (Galeocerdo cuvier) skeletal material.

    PubMed

    Nielsen, E E; Morgan, J A T; Maher, S L; Edson, J; Gauthier, M; Pepperell, J; Holmes, B J; Bennett, M B; Ovenden, J R

    2017-05-01

    Archived specimens are highly valuable sources of DNA for retrospective genetic/genomic analysis. However, often limited effort has been made to evaluate and optimize extraction methods, which may be crucial for downstream applications. Here, we assessed and optimized the usefulness of abundant archived skeletal material from sharks as a source of DNA for temporal genomic studies. Six different methods for DNA extraction, encompassing two different commercial kits and three different protocols, were applied to material, so-called bio-swarf, from contemporary and archived jaws and vertebrae of tiger sharks (Galeocerdo cuvier). Protocols were compared for DNA yield and quality using a qPCR approach. For jaw swarf, all methods provided relatively high DNA yield and quality, while large differences in yield between protocols were observed for vertebrae. Similar results were obtained from samples of white shark (Carcharodon carcharias). Application of the optimized methods to 38 museum and private angler trophy specimens dating back to 1912 yielded sufficient DNA for downstream genomic analysis for 68% of the samples. No clear relationships between age of samples, DNA quality and quantity were observed, likely reflecting different preparation and storage methods for the trophies. Trial sequencing of DNA capture genomic libraries using 20 000 baits revealed that a significant proportion of captured sequences were derived from tiger sharks. This study demonstrates that archived shark jaws and vertebrae are potential high-yield sources of DNA for genomic-scale analysis. It also highlights that even for similar tissue types, a careful evaluation of extraction protocols can vastly improve DNA yield. © 2016 John Wiley & Sons Ltd.

  8. Whole-Genome Resequencing of Experimental Populations Reveals Polygenic Basis of Egg-Size Variation in Drosophila melanogaster

    PubMed Central

    Jha, Aashish R.; Miles, Cecelia M.; Lippert, Nodia R.; Brown, Christopher D.; White, Kevin P.; Kreitman, Martin

    2015-01-01

    Complete genome resequencing of populations holds great promise in deconstructing complex polygenic traits to elucidate molecular and developmental mechanisms of adaptation. Egg size is a classic adaptive trait in insects, birds, and other taxa, but its highly polygenic architecture has prevented high-resolution genetic analysis. We used replicated experimental evolution in Drosophila melanogaster and whole-genome sequencing to identify consistent signatures of polygenic egg-size adaptation. A generalized linear-mixed model revealed reproducible allele frequency differences between replicated experimental populations selected for large and small egg volumes at approximately 4,000 single nucleotide polymorphisms (SNPs). Several hundred distinct genomic regions contain clusters of these SNPs and have lower heterozygosity than the genomic background, consistent with selection acting on polymorphisms in these regions. These SNPs are also enriched among genes expressed in Drosophila ovaries and many of these genes have well-defined functions in Drosophila oogenesis. Additional genes regulating egg development, growth, and cell size show evidence of directional selection as genes regulating these biological processes are enriched for highly differentiated SNPs. Genetic crosses performed with a subset of candidate genes demonstrated that these genes influence egg size, at least in the large genetic background. These findings confirm the highly polygenic architecture of this adaptive trait, and suggest the involvement of many novel candidate genes in regulating egg size. PMID:26044351

  9. Orthogonal control of expression mean and variance by epigenetic features at different genomic loci

    DOE PAGES

    Dey, Siddharth S.; Foley, Jonathan E.; Limsirichai, Prajit; ...

    2015-05-05

    While gene expression noise has been shown to drive dramatic phenotypic variations, the molecular basis for this variability in mammalian systems is not well understood. Gene expression has been shown to be regulated by promoter architecture and the associated chromatin environment. However, the exact contribution of these two factors in regulating expression noise has not been explored. Using a dual-reporter lentiviral model system, we deconvolved the influence of the promoter sequence to systematically study the contribution of the chromatin environment at different genomic locations in regulating expression noise. By integrating a large-scale analysis to quantify mRNA levels by smFISH andmore » protein levels by flow cytometry in single cells, we found that mean expression and noise are uncorrelated across genomic locations. Furthermore, we showed that this independence could be explained by the orthogonal control of mean expression by the transcript burst size and noise by the burst frequency. Finally, we showed that genomic locations displaying higher expression noise are associated with more repressed chromatin, thereby indicating the contribution of the chromatin environment in regulating expression noise.« less

  10. Modular architecture of the T4 phage superfamily: A conserved core genome and a plastic periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Comeau, Andre M.; Bertrand, Claire; Letarov, Andrei

    2007-06-05

    Among the most numerous objects in the biosphere, phages show enormous diversity in morphology and genetic content. We have sequenced 7 T4-like phages and compared their genome architecture. All seven phages share a core genome with T4 that is interrupted by several hyperplastic regions (HPRs) where most of their divergence occurs. The core primarily includes homologues of essential T4 genes, such as the virion structure and DNA replication genes. In contrast, the HPRs contain mostly novel genes of unknown function and origin. A few of the HPR genes that can be assigned putative functions, such as a series of novelmore » Internal Proteins, are implicated in phage adaptation to the host. Thus, the T4-like genome appears to be partitioned into discrete segments that fulfil different functions and behave differently in evolution. Such partitioning may be critical for these large and complex phages to maintain their flexibility, while simultaneously allowing them to conserve their highly successful virion design and mode of replication.« less

  11. Extensive Gene Remodeling in the Viral World: New Evidence for Nongradual Evolution in the Mobilome Network

    PubMed Central

    Jachiet, Pierre-Alain; Colson, Philippe; Lopez, Philippe; Bapteste, Eric

    2014-01-01

    Complex nongradual evolutionary processes such as gene remodeling are difficult to model, to visualize, and to investigate systematically. Despite these challenges, the creation of composite (or mosaic) genes by combination of genetic segments from unrelated gene families was established as an important adaptive phenomena in eukaryotic genomes. In contrast, almost no general studies have been conducted to quantify composite genes in viruses. Although viral genome mosaicism has been well-described, the extent of gene mosaicism and its rules of emergence remain largely unexplored. Applying methods from graph theory to inclusive similarity networks, and using data from more than 3,000 complete viral genomes, we provide the first demonstration that composite genes in viruses are 1) functionally biased, 2) involved in key aspects of the arm race between cells and viruses, and 3) can be classified into two distinct types of composite genes in all viral classes. Beyond the quantification of the widespread recombination of genes among different viruses of the same class, we also report a striking sharing of genetic information between viruses of different classes and with different nucleic acid types. This latter discovery provides novel evidence for the existence of a large and complex mobilome network, which appears partly bound by the sharing of genetic information and by the formation of composite genes between mobile entities with different genetic material. Considering that there are around 10E31 viruses on the planet, gene remodeling appears as a hugely significant way of generating and moving novel sequences between different kinds of organisms on Earth. PMID:25104113

  12. A DNA-based pattern classifier with in vitro learning and associative recall for genomic characterization and biosensing without explicit sequence knowledge.

    PubMed

    Lee, Ju Seok; Chen, Junghuei; Deaton, Russell; Kim, Jin-Woo

    2014-01-01

    Genetic material extracted from in situ microbial communities has high promise as an indicator of biological system status. However, the challenge is to access genomic information from all organisms at the population or community scale to monitor the biosystem's state. Hence, there is a need for a better diagnostic tool that provides a holistic view of a biosystem's genomic status. Here, we introduce an in vitro methodology for genomic pattern classification of biological samples that taps large amounts of genetic information from all genes present and uses that information to detect changes in genomic patterns and classify them. We developed a biosensing protocol, termed Biological Memory, that has in vitro computational capabilities to "learn" and "store" genomic sequence information directly from genomic samples without knowledge of their explicit sequences, and that discovers differences in vitro between previously unknown inputs and learned memory molecules. The Memory protocol was designed and optimized based upon (1) common in vitro recombinant DNA operations using 20-base random probes, including polymerization, nuclease digestion, and magnetic bead separation, to capture a snapshot of the genomic state of a biological sample as a DNA memory and (2) the thermal stability of DNA duplexes between new input and the memory to detect similarities and differences. For efficient read out, a microarray was used as an output method. When the microarray-based Memory protocol was implemented to test its capability and sensitivity using genomic DNA from two model bacterial strains, i.e., Escherichia coli K12 and Bacillus subtilis, results indicate that the Memory protocol can "learn" input DNA, "recall" similar DNA, differentiate between dissimilar DNA, and detect relatively small concentration differences in samples. This study demonstrated not only the in vitro information processing capabilities of DNA, but also its promise as a genomic pattern classifier that could access information from all organisms in a biological system without explicit genomic information. The Memory protocol has high potential for many applications, including in situ biomonitoring of ecosystems, screening for diseases, biosensing of pathological features in water and food supplies, and non-biological information processing of memory devices, among many.

  13. Mining of the Uncharacterized Cytochrome P450 Genes Involved in Alkaloid Biosynthesis in California Poppy Using a Draft Genome Sequence

    PubMed Central

    Hori, Kentaro; Yamada, Yasuyuki; Purwanto, Ratmoyo; Minakuchi, Yohei; Toyoda, Atsushi; Hirakawa, Hideki

    2018-01-01

    Abstract Land plants produce specialized low molecular weight metabolites to adapt to various environmental stressors, such as UV radiation, pathogen infection, wounding and animal feeding damage. Due to the large variety of stresses, plants produce various chemicals, particularly plant species-specific alkaloids, through specialized biosynthetic pathways. In this study, using a draft genome sequence and querying known biosynthetic cytochrome P450 (P450) enzyme-encoding genes, we characterized the P450 genes involved in benzylisoquinoline alkaloid (BIA) biosynthesis in California poppy (Eschscholzia californica), as P450s are key enzymes involved in the diversification of specialized metabolism. Our in silico studies showed that all identified enzyme-encoding genes involved in BIA biosynthesis were found in the draft genome sequence of approximately 489 Mb, which covered approximately 97% of the whole genome (502 Mb). Further analyses showed that some P450 families involved in BIA biosynthesis, i.e. the CYP80, CYP82 and CYP719 families, were more enriched in the genome of E. californica than in the genome of Arabidopsis thaliana, a plant that does not produce BIAs. CYP82 family genes were highly abundant, so we measured the expression of CYP82 genes with respect to alkaloid accumulation in different plant tissues and two cell lines whose BIA production differs to estimate the functions of the genes. Further characterization revealed two highly homologous P450s (CYP82P2 and CYP82P3) that exhibited 10-hydroxylase activities with different substrate specificities. Here, we discuss the evolution of the P450 genes and the potential for further genome mining of the genes encoding the enzymes involved in BIA biosynthesis. PMID:29301019

  14. Genome-wide analysis of LTR-retrotransposon diversity and its impact on the evolution of the genus Helianthus (L.).

    PubMed

    Mascagni, Flavia; Giordani, Tommaso; Ceccarelli, Marilena; Cavallini, Andrea; Natali, Lucia

    2017-08-18

    Genome divergence by mobile elements activity and recombination is a continuous process that plays a key role in the evolution of species. Nevertheless, knowledge on retrotransposon-related variability among species belonging to the same genus is still limited. Considering the importance of the genus Helianthus, a model system for studying the ecological genetics of speciation and adaptation, we performed a comparative analysis of the repetitive genome fraction across ten species and one subspecies of sunflower, focusing on long terminal repeat retrotransposons at superfamily, lineage and sublineage levels. After determining the relative genome size of each species, genomic DNA was isolated and subjected to Illumina sequencing. Then, different assembling and clustering approaches allowed exploring the repetitive component of all genomes. On average, repetitive DNA in Helianthus species represented more than 75% of the genome, being composed mostly by long terminal repeat retrotransposons. Also, the prevalence of Gypsy over Copia superfamily was observed and, among lineages, Chromovirus was by far the most represented. Although nearly all the same sublineages are present in all species, we found considerable variability in the abundance of diverse retrotransposon lineages and sublineages, especially between annual and perennial species. This large variability should indicate that different events of amplification or loss related to these elements occurred following species separation and should have been involved in species differentiation. Our data allowed us inferring on the extent of interspecific repetitive DNA variation related to LTR-RE abundance, investigating the relationship between changes of LTR-RE abundance and the evolution of the genus, and determining the degree of coevolution of different LTR-RE lineages or sublineages between and within species. Moreover, the data suggested that LTR-RE abundance in a species was affected by the annual or perennial habit of that species.

  15. Interspecific hybridization as a genomic stressor inducing mobilization of transposable elements in Drosophila

    PubMed Central

    Guerreiro, Maria Pilar García

    2014-01-01

    Transposable elements (TEs) are DNA sequences able to be mobilized in host genomes. They are currently recognized as the major mutation inducers because of their insertion in the target, their effect on neighboring regions, or their ectopic recombination. A large number of factors including chemical and physical factors as well as intraspecific crosses have traditionally been identified as inducers of transposition. Besides environmental factors, interspecific crosses have also been proposed as promoters of transposition of particular TEs in plants and different animals. Our previous published work includes a genome-wide survey with the set of genomic TEs and shows that interspecific hybridization between the species Drosophila buzzatii and Drosophila koepferae induces genomic instability by transposition bursts. A high percentage of this instability corresponds to TEs belonging to classes I and II. The detailed study of three TEs (Osvaldo, Helena, and Galileo), representative of the different TE families, shows an increase of transposition in hybrids compared with parental species, that varies depending on the element. This study suggests ample variation in TE regulation mechanisms and the question is why this variation occurs. Interspecific hybridization is a genomic stressor that disrupts the stability of TEs probably contributing to a relaxation of the mechanisms controlling TEs in the Drosophila genome. In this commentary paper we will discuss these results and the molecular mechanisms that could explain these increases of transposition rates observed in interspecific Drosophila hybrids. PMID:25136509

  16. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    PubMed

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  17. Cancer Genomics: Integrative and Scalable Solutions in R / Bioconductor | Informatics Technology for Cancer Research (ITCR)

    Cancer.gov

    This proposal develops scalable R / Bioconductor software infrastructure and data resources to integrate complex, heterogeneous, and large cancer genomic experiments. The falling cost of genomic assays facilitates collection of multiple data types (e.g., gene and transcript expression, structural variation, copy number, methylation, and microRNA data) from a set of clinical specimens. Furthermore, substantial resources are now available from large consortium activities like The Cancer Genome Atlas (TCGA).

  18. Idiosyncratic Genome Degradation in a Bacterial Endosymbiont of Periodical Cicadas.

    PubMed

    Campbell, Matthew A; Łukasik, Piotr; Simon, Chris; McCutcheon, John P

    2017-11-20

    When a free-living bacterium transitions to a host-beneficial endosymbiotic lifestyle, it almost invariably loses a large fraction of its genome [1, 2]. The resulting small genomes often become stable in size, structure, and coding capacity [3-5], as exemplified by Sulcia muelleri, a nutritional endosymbiont of cicadas. Sulcia's partner endosymbiont, Hodgkinia cicadicola, similarly remains co-linear in some cicadas diverged by millions of years [6, 7]. But in the long-lived periodical cicada Magicicada tredecim, the Hodgkinia genome has split into dozens of tiny, gene-sparse circles that sometimes reside in distinct Hodgkinia cells [8]. Previous data suggested that all other Magicicada species harbor complex Hodgkinia populations, but the timing, number of origins, and outcomes of the splitting process were unknown. Here, by sequencing Hodgkinia metagenomes from the remaining six Magicicada and two sister species, we show that each Magicicada species harbors Hodgkinia populations of at least 20 genomic circles. We find little synteny among the 256 Hodgkinia circles analyzed except between the most closely related cicada species. Gene phylogenies show multiple Hodgkinia lineages in the common ancestor of Magicicada and its closest known relatives but that most splitting has occurred within Magicicada and has given rise to highly variable Hodgkinia gene dosages among species. These data show that Hodgkinia genome degradation has proceeded down different paths in different Magicicada species and support a model of genomic degradation that is stochastic in outcome and nonadaptive for the host. These patterns mirror the genomic instability seen in some mitochondria. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes).

    PubMed

    Kukekova, Anna V; Johnson, Jennifer L; Teiling, Clotilde; Li, Lewyn; Oskina, Irina N; Kharlamova, Anastasiya V; Gulevich, Rimma G; Padte, Ravee; Dubreuil, Michael M; Vladimirova, Anastasiya V; Shepeleva, Darya V; Shikhevich, Svetlana G; Sun, Qi; Ponnala, Lalit; Temnykh, Svetlana V; Trut, Lyudmila N; Acland, Gregory M

    2011-10-03

    Two strains of the silver fox (Vulpes vulpes), with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence). Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p < 0.05) was observed for 335 genes, fewer than 3% of the total number of genes identified in the fox transcriptome. Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information.

  20. Sequence comparison of prefrontal cortical brain transcriptome from a tame and an aggressive silver fox (Vulpes vulpes)

    PubMed Central

    2011-01-01

    Background Two strains of the silver fox (Vulpes vulpes), with markedly different behavioral phenotypes, have been developed by long-term selection for behavior. Foxes from the tame strain exhibit friendly behavior towards humans, paralleling the sociability of canine puppies, whereas foxes from the aggressive strain are defensive and exhibit aggression to humans. To understand the genetic differences underlying these behavioral phenotypes fox-specific genomic resources are needed. Results cDNA from mRNA from pre-frontal cortex of a tame and an aggressive fox was sequenced using the Roche 454 FLX Titanium platform (> 2.5 million reads & 0.9 Gbase of tame fox sequence; >3.3 million reads & 1.2 Gbase of aggressive fox sequence). Over 80% of the fox reads were assembled into contigs. Mapping fox reads against the fox transcriptome assembly and the dog genome identified over 30,000 high confidence fox-specific SNPs. Fox transcripts for approximately 14,000 genes were identified using SwissProt and the dog RefSeq databases. An at least 2-fold expression difference between the two samples (p < 0.05) was observed for 335 genes, fewer than 3% of the total number of genes identified in the fox transcriptome. Conclusions Transcriptome sequencing significantly expanded genomic resources available for the fox, a species without a sequenced genome. In a very cost efficient manner this yielded a large number of fox-specific SNP markers for genetic studies and provided significant insights into the gene expression profile of the fox pre-frontal cortex; expression differences between the two fox samples; and a catalogue of potentially important gene-specific sequence variants. This result demonstrates the utility of this approach for developing genomic resources in species with limited genomic information. PMID:21967120

  1. Lessons learnt on the analysis of large sequence data in animal genomics.

    PubMed

    Biscarini, F; Cozzi, P; Orozco-Ter Wengel, P

    2018-04-06

    The 'omics revolution has made a large amount of sequence data available to researchers and the industry. This has had a profound impact in the field of bioinformatics, stimulating unprecedented advancements in this discipline. Mostly, this is usually looked at from the perspective of human 'omics, in particular human genomics. Plant and animal genomics, however, have also been deeply influenced by next-generation sequencing technologies, with several genomics applications now popular among researchers and the breeding industry. Genomics tends to generate huge amounts of data, and genomic sequence data account for an increasing proportion of big data in biological sciences, due largely to decreasing sequencing and genotyping costs and to large-scale sequencing and resequencing projects. The analysis of big data poses a challenge to scientists, as data gathering currently takes place at a faster pace than does data processing and analysis, and the associated computational burden is increasingly taxing, making even simple manipulation, visualization and transferring of data a cumbersome operation. The time consumed by the processing and analysing of huge data sets may be at the expense of data quality assessment and critical interpretation. Additionally, when analysing lots of data, something is likely to go awry-the software may crash or stop-and it can be very frustrating to track the error. We herein review the most relevant issues related to tackling these challenges and problems, from the perspective of animal genomics, and provide researchers that lack extensive computing experience with guidelines that will help when processing large genomic data sets. © 2018 Stichting International Foundation for Animal Genetics.

  2. Evidence for inter-specific recombination among the mitochondrial genomes of Fusarium species in the Gibberella fujikuroi complex.

    PubMed

    Fourie, Gerda; van der Merwe, Nicolaas A; Wingfield, Brenda D; Bogale, Mesfin; Tudzynski, Bettina; Wingfield, Michael J; Steenkamp, Emma T

    2013-09-08

    The availability of mitochondrial genomes has allowed for the resolution of numerous questions regarding the evolutionary history of fungi and other eukaryotes. In the Gibberella fujikuroi species complex, the exact relationships among the so-called "African", "Asian" and "American" Clades remain largely unresolved, irrespective of the markers employed. In this study, we considered the feasibility of using mitochondrial genes to infer the phylogenetic relationships among Fusarium species in this complex. The mitochondrial genomes of representatives of the three Clades (Fusarium circinatum, F. verticillioides and F. fujikuroi) were characterized and we determined whether or not the mitochondrial genomes of these fungi have value in resolving the higher level evolutionary relationships in the complex. Overall, the mitochondrial genomes of the three species displayed a high degree of synteny, with all the genes (protein coding genes, unique ORFs, ribosomal RNA and tRNA genes) in identical order and orientation, as well as introns that share similar positions within genes. The intergenic regions and introns generally contributed significantly to the size differences and diversity observed among these genomes. Phylogenetic analysis of the concatenated protein-coding dataset separated members of the Gibberella fujikuroi complex from other Fusarium species and suggested that F. fujikuroi ("Asian" Clade) is basal in the complex. However, individual mitochondrial gene trees were largely incongruent with one another and with the concatenated gene tree, because six distinct phylogenetic trees were recovered from the various single gene datasets. The mitochondrial genomes of Fusarium species in the Gibberella fujikuroi complex are remarkably similar to those of the previously characterized Fusarium species and Sordariomycetes. Despite apparently representing a single replicative unit, all of the genes encoded on the mitochondrial genomes of these fungi do not share the same evolutionary history. This incongruence could be due to biased selection on some genes or recombination among mitochondrial genomes. The results thus suggest that the use of individual mitochondrial genes for phylogenetic inference could mask the true relationships between species in this complex.

  3. A High Quality Draft Consensus Sequence of the Genome of a Heterozygous Grapevine Variety

    PubMed Central

    Cartwright, Dustin A.; Cestaro, Alessandro; Pruss, Dmitry; Pindo, Massimo; FitzGerald, Lisa M.; Vezzulli, Silvia; Reid, Julia; Malacarne, Giulia; Iliev, Diana; Coppola, Giuseppina; Wardell, Bryan; Micheletti, Diego; Macalma, Teresita; Facci, Marco; Mitchell, Jeff T.; Perazzolli, Michele; Eldredge, Glenn; Gatto, Pamela; Oyzerski, Rozan; Moretto, Marco; Gutin, Natalia; Stefanini, Marco; Chen, Yang; Segala, Cinzia; Davenport, Christine; Demattè, Lorenzo; Mraz, Amy; Battilana, Juri; Stormo, Keith; Costa, Fabrizio; Tao, Quanzhou; Si-Ammour, Azeddine; Harkins, Tim; Lackey, Angie; Perbost, Clotilde; Taillon, Bruce; Stella, Alessandra; Solovyev, Victor; Fawcett, Jeffrey A.; Sterck, Lieven; Vandepoele, Klaas; Grando, Stella M.; Toppo, Stefano; Moser, Claudio; Lanchbury, Jerry; Bogden, Robert; Skolnick, Mark; Sgaramella, Vittorio; Bhatnagar, Satish K.; Fontana, Paolo; Gutin, Alexander; Van de Peer, Yves; Salamini, Francesco; Viola, Roberto

    2007-01-01

    Background Worldwide, grapes and their derived products have a large market. The cultivated grape species Vitis vinifera has potential to become a model for fruit trees genetics. Like many plant species, it is highly heterozygous, which is an additional challenge to modern whole genome shotgun sequencing. In this paper a high quality draft genome sequence of a cultivated clone of V. vinifera Pinot Noir is presented. Principal Findings We estimate the genome size of V. vinifera to be 504.6 Mb. Genomic sequences corresponding to 477.1 Mb were assembled in 2,093 metacontigs and 435.1 Mb were anchored to the 19 linkage groups (LGs). The number of predicted genes is 29,585, of which 96.1% were assigned to LGs. This assembly of the grape genome provides candidate genes implicated in traits relevant to grapevine cultivation, such as those influencing wine quality, via secondary metabolites, and those connected with the extreme susceptibility of grape to pathogens. Single nucleotide polymorphism (SNP) distribution was consistent with a diffuse haplotype structure across the genome. Of around 2,000,000 SNPs, 1,751,176 were mapped to chromosomes and one or more of them were identified in 86.7% of anchored genes. The relative age of grape duplicated genes was estimated and this made possible to reveal a relatively recent Vitis-specific large scale duplication event concerning at least 10 chromosomes (duplication not reported before). Conclusions Sanger shotgun sequencing and highly efficient sequencing by synthesis (SBS), together with dedicated assembly programs, resolved a complex heterozygous genome. A consensus sequence of the genome and a set of mapped marker loci were generated. Homologous chromosomes of Pinot Noir differ by 11.2% of their DNA (hemizygous DNA plus chromosomal gaps). SNP markers are offered as a tool with the potential of introducing a new era in the molecular breeding of grape. PMID:18094749

  4. Rapid and Recent Evolution of LTR Retrotransposons Drives Rice Genome Evolution During the Speciation of AA-Genome Oryza Species

    PubMed Central

    Zhang, Qun-Jie; Gao, Li-Zhi

    2017-01-01

    The dynamics of long terminal repeat (LTR) retrotransposons and their contribution to genome evolution during plant speciation have remained largely unanswered. Here, we perform a genome-wide comparison of all eight Oryza AA-genome species, and identify 3911 intact LTR retrotransposons classified into 790 families. The top 44 most abundant LTR retrotransposon families show patterns of rapid and distinct diversification since the species split over the last ∼4.8 MY (million years). Phylogenetic and read depth analyses of 11 representative retrotransposon families further provide a comprehensive evolutionary landscape of these changes. Compared with Ty1-copia, independent bursts of Ty3-gypsy retrotransposon expansions have occurred with the three largest showing signatures of lineage-specific evolution. The estimated insertion times of 2213 complete retrotransposons from the top 23 most abundant families reveal divergent life histories marked by speedy accumulation, decline, and extinction that differed radically between species. We hypothesize that this rapid evolution of LTR retrotransposons not only divergently shaped the architecture of rice genomes but also contributed to the process of speciation and diversification of rice. PMID:28413161

  5. Inversions and inverted transpositions as the basis for an almost universal "format" of genome sequences.

    PubMed

    Albrecht-Buehler, Guenter

    2007-09-01

    In genome duplexes that exceed 100 kb the frequency distributions of their trinucleotides (triplet profiles) are the same in both strands. This remarkable symmetry, sometimes called Chargaff's second parity rule, is not the result of base pairing, but can be explained as the result of countless inversions and inverted transpositions that occurred throughout evolution (G. Albrecht-Buehler, 2006, Proc. Natl. Acad. Sci. USA 103, 17828-17833). Furthermore, comparing the triplet profiles of genomes from a large number of different taxa and species revealed that they were not only strand-symmetrical, but even surprisingly similar to one another (majority profile; G. Albrecht-Buehler, 2007, Genomics 89, 596-601). The present article proposes that the same inversion/transposition mechanism(s) that created the strand symmetry may also explain the existence of the majority profile. Thus they may be key factors in the creation of an almost universal "format" in which genome sequences are written. One may speculate that this universality of genome format may facilitate horizontal gene transfer and, thus, accelerate evolution.

  6. Transposable Elements in Human Cancer: Causes and Consequences of Deregulation.

    PubMed

    Anwar, Sumadi Lukman; Wulaningsih, Wahyu; Lehmann, Ulrich

    2017-05-04

    Transposable elements (TEs) comprise nearly half of the human genome and play an essential role in the maintenance of genomic stability, chromosomal architecture, and transcriptional regulation. TEs are repetitive sequences consisting of RNA transposons, DNA transposons, and endogenous retroviruses that can invade the human genome with a substantial contribution in human evolution and genomic diversity. TEs are therefore firmly regulated from early embryonic development and during the entire course of human life by epigenetic mechanisms, in particular DNA methylation and histone modifications. The deregulation of TEs has been reported in some developmental diseases, as well as for different types of human cancers. To date, the role of TEs, the mechanisms underlying TE reactivation, and the interplay with DNA methylation in human cancers remain largely unexplained. We reviewed the loss of epigenetic regulation and subsequent genomic instability, chromosomal aberrations, transcriptional deregulation, oncogenic activation, and aberrations of non-coding RNAs as the potential mechanisms underlying TE deregulation in human cancers.

  7. Transposable Elements in Human Cancer: Causes and Consequences of Deregulation

    PubMed Central

    Anwar, Sumadi Lukman; Wulaningsih, Wahyu; Lehmann, Ulrich

    2017-01-01

    Transposable elements (TEs) comprise nearly half of the human genome and play an essential role in the maintenance of genomic stability, chromosomal architecture, and transcriptional regulation. TEs are repetitive sequences consisting of RNA transposons, DNA transposons, and endogenous retroviruses that can invade the human genome with a substantial contribution in human evolution and genomic diversity. TEs are therefore firmly regulated from early embryonic development and during the entire course of human life by epigenetic mechanisms, in particular DNA methylation and histone modifications. The deregulation of TEs has been reported in some developmental diseases, as well as for different types of human cancers. To date, the role of TEs, the mechanisms underlying TE reactivation, and the interplay with DNA methylation in human cancers remain largely unexplained. We reviewed the loss of epigenetic regulation and subsequent genomic instability, chromosomal aberrations, transcriptional deregulation, oncogenic activation, and aberrations of non-coding RNAs as the potential mechanisms underlying TE deregulation in human cancers. PMID:28471386

  8. Contribution of Mobile Group II Introns to Sinorhizobium meliloti Genome Evolution.

    PubMed

    Toro, Nicolás; Martínez-Abarca, Francisco; Molina-Sánchez, María D; García-Rodríguez, Fernando M; Nisa-Martínez, Rafael

    2018-01-01

    Mobile group II introns are ribozymes and retroelements that probably originate from bacteria. Sinorhizobium meliloti , the nitrogen-fixing endosymbiont of legumes of genus Medicago , harbors a large number of these retroelements. One of these elements, RmInt1, has been particularly successful at colonizing this multipartite genome. Many studies have improved our understanding of RmInt1 and phylogenetically related group II introns, their mobility mechanisms, spread and dynamics within S. meliloti and closely related species. Although RmInt1 conserves the ancient retroelement behavior, its evolutionary history suggests that this group II intron has played a role in the short- and long-term evolution of the S. meliloti genome. We will discuss its proposed role in genome evolution by controlling the spread and coexistence of potentially harmful mobile genetic elements, by ectopic transposition to different genetic loci as a source of early genomic variation and by generating sequence variation after a very slow degradation process, through intron remnants that may have continued to evolve, contributing to bacterial speciation.

  9. Contribution of Mobile Group II Introns to Sinorhizobium meliloti Genome Evolution

    PubMed Central

    Toro, Nicolás; Martínez-Abarca, Francisco; Molina-Sánchez, María D.; García-Rodríguez, Fernando M.; Nisa-Martínez, Rafael

    2018-01-01

    Mobile group II introns are ribozymes and retroelements that probably originate from bacteria. Sinorhizobium meliloti, the nitrogen-fixing endosymbiont of legumes of genus Medicago, harbors a large number of these retroelements. One of these elements, RmInt1, has been particularly successful at colonizing this multipartite genome. Many studies have improved our understanding of RmInt1 and phylogenetically related group II introns, their mobility mechanisms, spread and dynamics within S. meliloti and closely related species. Although RmInt1 conserves the ancient retroelement behavior, its evolutionary history suggests that this group II intron has played a role in the short- and long-term evolution of the S. meliloti genome. We will discuss its proposed role in genome evolution by controlling the spread and coexistence of potentially harmful mobile genetic elements, by ectopic transposition to different genetic loci as a source of early genomic variation and by generating sequence variation after a very slow degradation process, through intron remnants that may have continued to evolve, contributing to bacterial speciation. PMID:29670598

  10. The Génolevures database.

    PubMed

    Martin, Tiphaine; Sherman, David J; Durrens, Pascal

    2011-01-01

    The Génolevures online database (URL: http://www.genolevures.org) stores and provides the data and results obtained by the Génolevures Consortium through several campaigns of genome annotation of the yeasts in the Saccharomycotina subphylum (hemiascomycetes). This database is dedicated to large-scale comparison of these genomes, storing not only the different chromosomal elements detected in the sequences, but also the logical relations between them. The database is divided into a public part, accessible to anyone through Internet, and a private part where the Consortium members make genome annotations with our Magus annotation system; this system is used to annotate several related genomes in parallel. The public database is widely consulted and offers structured data, organized using a REST web site architecture that allows for automated requests. The implementation of the database, as well as its associated tools and methods, is evolving to cope with the influx of genome sequences produced by Next Generation Sequencing (NGS). Copyright © 2011 Académie des sciences. Published by Elsevier SAS. All rights reserved.

  11. First complete genome sequence of infectious laryngotracheitis virus

    PubMed Central

    2011-01-01

    Background Infectious laryngotracheitis virus (ILTV) is an alphaherpesvirus that causes acute respiratory disease in chickens worldwide. To date, only one complete genomic sequence of ILTV has been reported. This sequence was generated by concatenating partial sequences from six different ILTV strains. Thus, the full genomic sequence of a single (individual) strain of ILTV has not been determined previously. This study aimed to use high throughput sequencing technology to determine the complete genomic sequence of a live attenuated vaccine strain of ILTV. Results The complete genomic sequence of the Serva vaccine strain of ILTV was determined, annotated and compared to the concatenated ILTV reference sequence. The genome size of the Serva strain was 152,628 bp, with a G + C content of 48%. A total of 80 predicted open reading frames were identified. The Serva strain had 96.5% DNA sequence identity with the concatenated ILTV sequence. Notably, the concatenated ILTV sequence was found to lack four large regions of sequence, including 528 bp and 594 bp of sequence in the UL29 and UL36 genes, respectively, and two copies of a 1,563 bp sequence in the repeat regions. Considerable differences in the size of the predicted translation products of 4 other genes (UL54, UL30, UL37 and UL38) were also identified. More than 530 single-nucleotide polymorphisms (SNPs) were identified. Most SNPs were located within three genomic regions, corresponding to sequence from the SA-2 ILTV vaccine strain in the concatenated ILTV sequence. Conclusions This is the first complete genomic sequence of an individual ILTV strain. This sequence will facilitate future comparative genomic studies of ILTV by providing an appropriate reference sequence for the sequence analysis of other ILTV strains. PMID:21501528

  12. Genome-Wide Search Identifies 1.9 Mb from the Polar Bear Y Chromosome for Evolutionary Analyses.

    PubMed

    Bidon, Tobias; Schreck, Nancy; Hailer, Frank; Nilsson, Maria A; Janke, Axel

    2015-05-27

    The male-inherited Y chromosome is the major haploid fraction of the mammalian genome, rendering Y-linked sequences an indispensable resource for evolutionary research. However, despite recent large-scale genome sequencing approaches, only a handful of Y chromosome sequences have been characterized to date, mainly in model organisms. Using polar bear (Ursus maritimus) genomes, we compare two different in silico approaches to identify Y-linked sequences: 1) Similarity to known Y-linked genes and 2) difference in the average read depth of autosomal versus sex chromosomal scaffolds. Specifically, we mapped available genomic sequencing short reads from a male and a female polar bear against the reference genome and identify 112 Y-chromosomal scaffolds with a combined length of 1.9 Mb. We verified the in silico findings for the longer polar bear scaffolds by male-specific in vitro amplification, demonstrating the reliability of the average read depth approach. The obtained Y chromosome sequences contain protein-coding sequences, single nucleotide polymorphisms, microsatellites, and transposable elements that are useful for evolutionary studies. A high-resolution phylogeny of the polar bear patriline shows two highly divergent Y chromosome lineages, obtained from analysis of the identified Y scaffolds in 12 previously published male polar bear genomes. Moreover, we find evidence of gene conversion among ZFX and ZFY sequences in the giant panda lineage and in the ancestor of ursine and tremarctine bears. Thus, the identification of Y-linked scaffold sequences from unordered genome sequences yields valuable data to infer phylogenomic and population-genomic patterns in bears. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  13. Genome-Wide Identification and Transferability of Microsatellite Markers between Palmae Species

    PubMed Central

    Xiao, Yong; Xia, Wei; Ma, Jianwei; Mason, Annaliese S.; Fan, Haikuo; Shi, Peng; Lei, Xintao; Ma, Zilong; Peng, Ming

    2016-01-01

    The Palmae family contains 202 genera and approximately 2800 species. Except for Elaeis guineensis and Phoenix dactylifera, almost no genetic and genomic information is available for Palmae species. Therefore, this is an obstacle to the conservation and genetic assessment of Palmae species, especially those that are currently endangered. The study was performed to develop a large number of microsatellite markers which can be used for genetic analysis in different Palmae species. Based on the assembled genome of E. guineensis and P. dactylifera, a total of 814 383 and 371 629 microsatellites were identified. Among these microsatellites identified in E. guineensis, 734 509 primer pairs could be designed from the flanking sequences of these microsatellites. The majority (618 762) of these designed primer pairs had in silico products in the genome of E. guineensis. These 618 762 primer pairs were subsequently used to in silico amplify the genome of P. dactylifera. A total of 7 265 conserved microsatellites were identified between E. guineensis and P. dactylifera. One hundred and thirty-five primer pairs flanking the conserved SSRs were stochastically selected and validated to have high cross-genera transferability, varying from 16.7 to 93.3% with an average of 73.7%. These genome-wide conserved microsatellite markers will provide a useful tool for genetic assessment and conservation of different Palmae species in the future. PMID:27826307

  14. Genomics-Enabled Next-Generation Breeding Approaches for Developing System-Specific Drought Tolerant Hybrids in Maize

    PubMed Central

    Nepolean, Thirunavukkarsau; Kaul, Jyoti; Mukri, Ganapati; Mittal, Shikha

    2018-01-01

    Breeding science has immensely contributed to the global food security. Several varieties and hybrids in different food crops including maize have been released through conventional breeding. The ever growing population, decreasing agricultural land, lowering water table, changing climate, and other variables pose tremendous challenge to the researchers to improve the production and productivity of food crops. Drought is one of the major problems to sustain and improve the productivity of food crops including maize in tropical and subtropical production systems. With advent of novel genomics and breeding tools, the way of doing breeding has been tremendously changed in the last two decades. Drought tolerance is a combination of several component traits with a quantitative mode of inheritance. Rapid DNA and RNA sequencing tools and high-throughput SNP genotyping techniques, trait mapping, functional characterization, genomic selection, rapid generation advancement, and other tools are now available to understand the genetics of drought tolerance and to accelerate the breeding cycle. Informatics play complementary role by managing the big-data generated from the large-scale genomics and breeding experiments. Genome editing is the latest technique to alter specific genes to improve the trait expression. Integration of novel genomics, next-generation breeding, and informatics tools will accelerate the stress breeding process and increase the genetic gain under different production systems. PMID:29696027

  15. TCGA4U: A Web-Based Genomic Analysis Platform To Explore And Mine TCGA Genomic Data For Translational Research.

    PubMed

    Huang, Zhenzhen; Duan, Huilong; Li, Haomin

    2015-01-01

    Large-scale human cancer genomics projects, such as TCGA, generated large genomics data for further study. Exploring and mining these data to obtain meaningful analysis results can help researchers find potential genomics alterations that intervene the development and metastasis of tumors. We developed a web-based gene analysis platform, named TCGA4U, which used statistics methods and models to help translational investigators explore, mine and visualize human cancer genomic characteristic information from the TCGA datasets. Furthermore, through Gene Ontology (GO) annotation and clinical data integration, the genomic data were transformed into biological process, molecular function, cellular component and survival curves to help researchers identify potential driver genes. Clinical researchers without expertise in data analysis will benefit from such a user-friendly genomic analysis platform.

  16. Living with Genome Instability: the Adaptation of Phytoplasmas to Diverse Environments of Their Insect and Plant Hosts††

    PubMed Central

    Bai, Xiaodong; Zhang, Jianhua; Ewing, Adam; Miller, Sally A.; Jancso Radek, Agnes; Shevchenko, Dmitriy V.; Tsukerman, Kiryl; Walunas, Theresa; Lapidus, Alla; Campbell, John W.; Hogenhout, Saskia A.

    2006-01-01

    Phytoplasmas (“Candidatus Phytoplasma,” class Mollicutes) cause disease in hundreds of economically important plants and are obligately transmitted by sap-feeding insects of the order Hemiptera, mainly leafhoppers and psyllids. The 706,569-bp chromosome and four plasmids of aster yellows phytoplasma strain witches' broom (AY-WB) were sequenced and compared to the onion yellows phytoplasma strain M (OY-M) genome. The phytoplasmas have small repeat-rich genomes. This comparative analysis revealed that the repeated DNAs are organized into large clusters of potential mobile units (PMUs), which contain tra5 insertion sequences (ISs) and genes for specialized sigma factors and membrane proteins. So far, these PMUs appear to be unique to phytoplasmas. Compared to mycoplasmas, phytoplasmas lack several recombination and DNA modification functions, and therefore, phytoplasmas may use different mechanisms of recombination, likely involving PMUs, for the creation of variability, allowing phytoplasmas to adjust to the diverse environments of plants and insects. The irregular GC skews and the presence of ISs and large repeated sequences in the AY-WB and OY-M genomes are indicative of high genomic plasticity. Nevertheless, segments of ∼250 kb located between the lplA and glnQ genes are syntenic between the two phytoplasmas and contain the majority of the metabolic genes and no ISs. AY-WB appears to be further along in the reductive evolution process than OY-M. The AY-WB genome is ∼154 kb smaller than the OY-M genome, primarily as a result of fewer multicopy sequences, including PMUs. Furthermore, AY-WB lacks genes that are truncated and are part of incomplete pathways in OY-M. PMID:16672622

  17. An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics.

    PubMed

    Omasits, Ulrich; Varadarajan, Adithi R; Schmid, Michael; Goetze, Sandra; Melidis, Damianos; Bourqui, Marc; Nikolayeva, Olga; Québatte, Maxime; Patrignani, Andrea; Dehio, Christoph; Frey, Juerg E; Robinson, Mark D; Wollscheid, Bernd; Ahrens, Christian H

    2017-12-01

    Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote. © 2017 Omasits et al.; Published by Cold Spring Harbor Laboratory Press.

  18. Epiviz: a view inside the design of an integrated visual analysis software for genomics

    PubMed Central

    2015-01-01

    Background Computational and visual data analysis for genomics has traditionally involved a combination of tools and resources, of which the most ubiquitous consist of genome browsers, focused mainly on integrative visualization of large numbers of big datasets, and computational environments, focused on data modeling of a small number of moderately sized datasets. Workflows that involve the integration and exploration of multiple heterogeneous data sources, small and large, public and user specific have been poorly addressed by these tools. In our previous work, we introduced Epiviz, which bridges the gap between the two types of tools, simplifying these workflows. Results In this paper we expand on the design decisions behind Epiviz, and introduce a series of new advanced features that further support the type of interactive exploratory workflow we have targeted. We discuss three ways in which Epiviz advances the field of genomic data analysis: 1) it brings code to interactive visualizations at various different levels; 2) takes the first steps in the direction of collaborative data analysis by incorporating user plugins from source control providers, as well as by allowing analysis states to be shared among the scientific community; 3) combines established analysis features that have never before been available simultaneously in a genome browser. In our discussion section, we present security implications of the current design, as well as a series of limitations and future research steps. Conclusions Since many of the design choices of Epiviz are novel in genomics data analysis, this paper serves both as a document of our own approaches with lessons learned, as well as a start point for future efforts in the same direction for the genomics community. PMID:26328750

  19. Comparative genome analysis of a large Dutch Legionella pneumophila strain collection identifies five markers highly correlated with clinical strains

    PubMed Central

    2010-01-01

    Background Discrimination between clinical and environmental strains within many bacterial species is currently underexplored. Genomic analyses have clearly shown the enormous variability in genome composition between different strains of a bacterial species. In this study we have used Legionella pneumophila, the causative agent of Legionnaire's disease, to search for genomic markers related to pathogenicity. During a large surveillance study in The Netherlands well-characterized patient-derived strains and environmental strains were collected. We have used a mixed-genome microarray to perform comparative-genome analysis of 257 strains from this collection. Results Microarray analysis indicated that 480 DNA markers (out of in total 3360 markers) showed clear variation in presence between individual strains and these were therefore selected for further analysis. Unsupervised statistical analysis of these markers showed the enormous genomic variation within the species but did not show any correlation with a pathogenic phenotype. We therefore used supervised statistical analysis to identify discriminating markers. Genetic programming was used both to identify predictive markers and to define their interrelationships. A model consisting of five markers was developed that together correctly predicted 100% of the clinical strains and 69% of the environmental strains. Conclusions A novel approach for identifying predictive markers enabling discrimination between clinical and environmental isolates of L. pneumophila is presented. Out of over 3000 possible markers, five were selected that together enabled correct prediction of all the clinical strains included in this study. This novel approach for identifying predictive markers can be applied to all bacterial species, allowing for better discrimination between strains well equipped to cause human disease and relatively harmless strains. PMID:20630115

  20. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.

  1. Ecophysiology of Freshwater Verrucomicrobia Inferred from Metagenome-Assembled Genomes

    PubMed Central

    He, Shaomei; Stevens, Sarah L. R.; Chan, Leong-Keat; Bertilsson, Stefan; Glavina del Rio, Tijana; Tringe, Susannah G.; Malmstrom, Rex R.

    2017-01-01

    ABSTRACT Microbes are critical in carbon and nutrient cycling in freshwater ecosystems. Members of the Verrucomicrobia are ubiquitous in such systems, and yet their roles and ecophysiology are not well understood. In this study, we recovered 19 Verrucomicrobia draft genomes by sequencing 184 time-series metagenomes from a eutrophic lake and a humic bog that differ in carbon source and nutrient availabilities. These genomes span four of the seven previously defined Verrucomicrobia subdivisions and greatly expand knowledge of the genomic diversity of freshwater Verrucomicrobia. Genome analysis revealed their potential role as (poly)saccharide degraders in freshwater, uncovered interesting genomic features for this lifestyle, and suggested their adaptation to nutrient availabilities in their environments. Verrucomicrobia populations differ significantly between the two lakes in glycoside hydrolase gene abundance and functional profiles, reflecting the autochthonous and terrestrially derived allochthonous carbon sources of the two ecosystems, respectively. Interestingly, a number of genomes recovered from the bog contained gene clusters that potentially encode a novel porin-multiheme cytochrome c complex and might be involved in extracellular electron transfer in the anoxic humus-rich environment. Notably, most epilimnion genomes have large numbers of so-called “Planctomycete-specific” cytochrome c-encoding genes, which exhibited distribution patterns nearly opposite to those seen with glycoside hydrolase genes, probably associated with the different levels of environmental oxygen availability and carbohydrate complexity between lakes/layers. Overall, the recovered genomes represent a major step toward understanding the role, ecophysiology, and distribution of Verrucomicrobia in freshwater. IMPORTANCE Freshwater Verrucomicrobia spp. are cosmopolitan in lakes and rivers, and yet their roles and ecophysiology are not well understood, as cultured freshwater Verrucomicrobia spp. are restricted to one subdivision of this phylum. Here, we greatly expanded the known genomic diversity of this freshwater lineage by recovering 19 Verrucomicrobia draft genomes from 184 metagenomes collected from a eutrophic lake and a humic bog across multiple years. Most of these genomes represent the first freshwater representatives of several Verrucomicrobia subdivisions. Genomic analysis revealed Verrucomicrobia to be potential (poly)saccharide degraders and suggested their adaptation to carbon sources of different origins in the two contrasting ecosystems. We identified putative extracellular electron transfer genes and so-called “Planctomycete-specific” cytochrome c-encoding genes and identified their distinct distribution patterns between the lakes/layers. Overall, our analysis greatly advances the understanding of the function, ecophysiology, and distribution of freshwater Verrucomicrobia, while highlighting their potential role in freshwater carbon cycling. PMID:28959738

  2. Revealing the selection history of adaptive loci using genome-wide scans for selection: an example from domestic sheep.

    PubMed

    Rochus, Christina Marie; Tortereau, Flavie; Plisson-Petit, Florence; Restoux, Gwendal; Moreno-Romieux, Carole; Tosser-Klopp, Gwenola; Servin, Bertrand

    2018-01-23

    One of the approaches to detect genetics variants affecting fitness traits is to identify their surrounding genomic signatures of past selection. With established methods for detecting selection signatures and the current and future availability of large datasets, such studies should have the power to not only detect these signatures but also to infer their selective histories. Domesticated animals offer a powerful model for these approaches as they adapted rapidly to environmental and human-mediated constraints in a relatively short time. We investigated this question by studying a large dataset of 542 individuals from 27 domestic sheep populations raised in France, genotyped for more than 500,000 SNPs. Population structure analysis revealed that this set of populations harbour a large part of European sheep diversity in a small geographical area, offering a powerful model for the study of adaptation. Identification of extreme SNP and haplotype frequency differences between populations listed 126 genomic regions likely affected by selection. These signatures revealed selection at loci commonly identified as selection targets in many species ("selection hotspots") including ABCG2, LCORL/NCAPG, MSTN, and coat colour genes such as ASIP, MC1R, MITF, and TYRP1. For one of these regions (ABCG2, LCORL/NCAPG), we could propose a historical scenario leading to the introgression of an adaptive allele into a new genetic background. Among selection signatures, we found clear evidence for parallel selection events in different genetic backgrounds, most likely for different mutations. We confirmed this allelic heterogeneity in one case by resequencing the MC1R gene in three black-faced breeds. Our study illustrates how dense genetic data in multiple populations allows the deciphering of evolutionary history of populations and of their adaptive mutations.

  3. The complete chloroplast genome sequences of Lychnis wilfordii and Silene capitata and comparative analyses with other Caryophyllaceae genomes.

    PubMed

    Kang, Jong-Soo; Lee, Byoung Yoon; Kwak, Myounghai

    2017-01-01

    The complete chloroplast genomes of Lychnis wilfordii and Silene capitata were determined and compared with ten previously reported Caryophyllaceae chloroplast genomes. The chloroplast genome sequences of L. wilfordii and S. capitata contain 152,320 bp and 150,224 bp, respectively. The gene contents and orders among 12 Caryophyllaceae species are consistent, but several microstructural changes have occurred. Expansion of the inverted repeat (IR) regions at the large single copy (LSC)/IRb and small single copy (SSC)/IR boundaries led to partial or entire gene duplications. Additionally, rearrangements of the LSC region were caused by gene inversions and/or transpositions. The 18 kb inversions, which occurred three times in different lineages of tribe Sileneae, were thought to be facilitated by the intermolecular duplicated sequences. Sequence analyses of the L. wilfordii and S. capitata genomes revealed 39 and 43 repeats, respectively, including forward, palindromic, and reverse repeats. In addition, a total of 67 and 56 simple sequence repeats were discovered in the L. wilfordii and S. capitata chloroplast genomes, respectively. Finally, we constructed phylogenetic trees of the 12 Caryophyllaceae species and two Amaranthaceae species based on 73 protein-coding genes using both maximum parsimony and likelihood methods.

  4. Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species

    PubMed Central

    Wang, Jing; Street, Nathaniel R.; Scofield, Douglas G.; Ingvarsson, Pär K.

    2016-01-01

    A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. PMID:26721855

  5. Natural Selection and Recombination Rate Variation Shape Nucleotide Polymorphism Across the Genomes of Three Related Populus Species.

    PubMed

    Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K

    2016-03-01

    A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.

  6. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

    PubMed

    Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.

  7. Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

    PubMed Central

    Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

    2013-01-01

    Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204

  8. The complete chloroplast genome of Capsicum annuum var. glabriusculum using Illumina sequencing.

    PubMed

    Raveendar, Sebastin; Na, Young-Wang; Lee, Jung-Ro; Shim, Donghwan; Ma, Kyung-Ho; Lee, Sok-Young; Chung, Jong-Wook

    2015-07-20

    Chloroplast (cp) genome sequences provide a valuable source for DNA barcoding. Molecular phylogenetic studies have concentrated on DNA sequencing of conserved gene loci. However, this approach is time consuming and more difficult to implement when gene organization differs among species. Here we report the complete re-sequencing of the cp genome of Capsicum pepper (Capsicum annuum var. glabriusculum) using the Illumina platform. The total length of the cp genome is 156,817 bp with a 37.7% overall GC content. A pair of inverted repeats (IRs) of 50,284 bp were separated by a small single copy (SSC; 18,948 bp) and a large single copy (LSC; 87,446 bp). The number of cp genes in C. annuum var. glabriusculum is the same as that in other Capsicum species. Variations in the lengths of LSC; SSC and IR regions were the main contributors to the size variation in the cp genome of this species. A total of 125 simple sequence repeat (SSR) and 48 insertions or deletions variants were found by sequence alignment of Capsicum cp genome. These findings provide a foundation for further investigation of cp genome evolution in Capsicum and other higher plants.

  9. Assembly of the Lactuca sativa, L. cv. Tizian draft genome sequence reveals differences within major resistance complex 1 as compared to the cv. Salinas reference genome.

    PubMed

    Verwaaijen, Bart; Wibberg, Daniel; Nelkner, Johanna; Gordin, Miriam; Rupp, Oliver; Winkler, Anika; Bremges, Andreas; Blom, Jochen; Grosch, Rita; Pühler, Alfred; Schlüter, Andreas

    2018-02-10

    Lettuce (Lactuca sativa, L.) is an important annual plant of the family Asteraceae (Compositae). The commercial lettuce cultivar Tizian has been used in various scientific studies investigating the interaction of the plant with phytopathogens or biological control agents. Here, we present the de novo draft genome sequencing and gene prediction for this specific cultivar derived from transcriptome sequence data. The assembled scaffolds amount to a size of 2.22 Gb. Based on RNAseq data, 31,112 transcript isoforms were identified. Functional predictions for these transcripts were determined within the GenDBE annotation platform. Comparison with the cv. Salinas reference genome revealed a high degree of sequence similarity on genome and transcriptome levels, with an average amino acid identity of 99%. Furthermore, it was observed that two large regions are either missing or are highly divergent within the cv. Tizian genome compared to cv. Salinas. One of these regions covers the major resistance complex 1 region of cv. Salinas. The cv. Tizian draft genome sequence provides a valuable resource for future functional and transcriptome analyses focused on this lettuce cultivar. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Genomic tests for ovarian cancer detection and management.

    PubMed

    Myers, Evan R; Havrilesky, Laura J; Kulasingam, Shalini L; Sanders, Gillian D; Cline, Kathryn E; Gray, Rebecca N; Berchuck, Andrew; McCrory, Douglas C

    2006-10-01

    To assess the evidence that the use of genomic tests for ovarian cancer screening, diagnosis, and treatment leads to improved outcomes. PubMed and reference lists of recent reviews. We evaluated tests for: (a) single gene products; (b) genetic variations affecting risk of ovarian cancer; (c) gene expression; and (d) proteomics. For tests covered in recent evidence reports (cancer antigen 125 [CA-125] and breast cancer genes 1 and 2 [BRCA1/2]), we added studies published subsequent to the reports. We sought evidence on: (a) the analytic performance of tests in clinical laboratories; (b) the sensitivity and specificity of tests in different patient populations; (c) the clinical impact of testing in asymptomatic women, women with suspected ovarian cancer, and women with diagnosed ovarian cancer; (d) the harms of genomic testing; and (e) the impact of direct-to-consumer and direct-to-physician advertising on appropriate use of tests. We also constructed a computer simulation model to test the impact of different assumptions about ovarian cancer natural history on the relative effectiveness of different strategies. There are reasonable data on the clinical laboratory performance of most radioimmunoassays, but the majority of the data on other genomic tests comes from research laboratories. Genomic test sensitivity/specificity estimates are limited by small sample sizes, spectrum bias, and unrealistically large prevalences of ovarian cancer; in particular, estimates of positive predictive values derived from most of the studies are substantially higher than would be expected in most screening or diagnostic settings. We found no evidence relevant to the question of the impact of genomic tests on health outcomes in asymptomatic women. Although there is a relatively large literature on the association of test results and various clinical outcomes, the clinical utility of changing management based on these results has not been evaluated. We found no evidence that genomic tests for ovarian cancer have unique harms beyond those common to other tests for genetic susceptibility or other tests used in screening, diagnosis, and management of ovarian cancer. Studies of a direct-to-consumer campaign for BRCA1/2 testing suggest increased utilization, but the effect on "appropriateness" was unclear. Model simulations suggest that annual screening, even with a highly sensitive test, will not reduce ovarian cancer mortality by more than 50 percent; frequent screening has a very low positive predictive value, even with a highly specific test. Although research remains promising, adaptation of genomic tests into clinical practice must await appropriately designed and powered studies in relevant clinical settings.

  11. The draft genome of the pest tephritid fruit fly Bactrocera tryoni: resources for the genomic analysis of hybridising species.

    PubMed

    Gilchrist, Anthony Stuart; Shearman, Deborah C A; Frommer, Marianne; Raphael, Kathryn A; Deshpande, Nandan P; Wilkins, Marc R; Sherwin, William B; Sved, John A

    2014-12-20

    The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations. We produced a draft de novo genome assembly of Australia's major tephritid pest species, Bactrocera tryoni. The male genome (650-700 Mbp) includes approximately 150 Mb of interspersed repetitive DNA sequences and 60 Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions. This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA families. These genetic resources will facilitate the further investigations of genetic mechanisms responsible for the behavioural and morphological differences between these three species and other tephritids. We have also shown how whole genome sequence data can be used to generate simple diagnostic tests between very closely-related species where only one of the species is scaffolded.

  12. Missing data imputation and haplotype phase inference for genome-wide association studies

    PubMed Central

    Browning, Sharon R.

    2009-01-01

    Imputation of missing data and the use of haplotype-based association tests can improve the power of genome-wide association studies (GWAS). In this article, I review methods for haplotype inference and missing data imputation, and discuss their application to GWAS. I discuss common features of the best algorithms for haplotype phase inference and missing data imputation in large-scale data sets, as well as some important differences between classes of methods, and highlight the methods that provide the highest accuracy and fastest computational performance. PMID:18850115

  13. Genomic Characterization of VIM Metallo-β-Lactamase-Producing Alcaligenes faecalis from Gaza, Palestine.

    PubMed

    Al Laham, Nahed; Chavda, Kalyan D; Cienfuegos-Gallet, Astrid V; Kreiswirth, Barry N; Chen, Liang

    2017-11-01

    Carbapenemase-producing Gram-negative bacteria (CP-GNB) have increasingly spread worldwide, and different families of carbapenemases have been identified in various bacterial species. Here, we report the identification of five VIM metallo-β-lactamase-producing Alcaligenes faecalis isolates associated with a small outbreak in a large hospital in Gaza, Palestine. Next-generation sequencing analysis showed bla VIM-2 is harbored by a chromosomal genomic island among three strains, while bla VIM-4 is carried by a novel plasmid in two strains. Copyright © 2017 American Society for Microbiology.

  14. The temporal program of chromosome replication: genomewide replication in clb5{Delta} Saccharomyces cerevisiae.

    PubMed

    McCune, Heather J; Danielson, Laura S; Alvino, Gina M; Collingwood, David; Delrow, Jeffrey J; Fangman, Walton L; Brewer, Bonita J; Raghuraman, M K

    2008-12-01

    Temporal regulation of origin activation is widely thought to explain the pattern of early- and late-replicating domains in the Saccharomyces cerevisiae genome. Recently, single-molecule analysis of replication suggested that stochastic processes acting on origins with different probabilities of activation could generate the observed kinetics of replication without requiring an underlying temporal order. To distinguish between these possibilities, we examined a clb5Delta strain, where origin firing is largely limited to the first half of S phase, to ask whether all origins nonspecifically show decreased firing (as expected for disordered firing) or if only some origins ("late" origins) are affected. Approximately half the origins in the mutant genome show delayed replication while the remainder replicate largely on time. The delayed regions can encompass hundreds of kilobases and generally correspond to regions that replicate late in wild-type cells. Kinetic analysis of replication in wild-type cells reveals broad windows of origin firing for both early and late origins. Our results are consistent with a temporal model in which origins can show some heterogeneity in both time and probability of origin firing, but clustering of temporally like origins nevertheless yields a genome that is organized into blocks showing different replication times.

  15. Regulation of TRP channels by steroids: Implications in physiology and diseases.

    PubMed

    Kumar, Ashutosh; Kumari, Shikha; Majhi, Rakesh Kumar; Swain, Nirlipta; Yadav, Manoj; Goswami, Chandan

    2015-09-01

    While effects of different steroids on the gene expression and regulation are well established, it is proven that steroids can also exert rapid non-genomic actions in several tissues and cells. In most cases, these non-genomic rapid effects of steroids are actually due to intracellular mobilization of Ca(2+)- and other ions suggesting that Ca(2+) channels are involved in such effects. Transient Receptor Potential (TRP) ion channels or TRPs are the largest group of non-selective and polymodal ion channels which cause Ca(2+)-influx in response to different physical and chemical stimuli. While non-genomic actions of different steroids on different ion channels have been established to some extent, involvement of TRPs in such functions is largely unexplored. In this review, we critically analyze the literature and summarize how different steroids as well as their metabolic precursors and derivatives can exert non-genomic effects by acting on different TRPs qualitatively and/or quantitatively. Such effects have physiological repercussion on systems such as in sperm cells, immune cells, bone cells, neuronal cells and many others. Different TRPs are also endogenously expressed in diverse steroid-producing tissues and thus may have importance in steroid synthesis as well, a process which is tightly controlled by the intracellular Ca(2+) concentrations. Tissue and cell-specific expression of TRP channels are also regulated by different steroids. Understanding of the crosstalk between TRP channels and different steroids may have strong significance in physiological, endocrinological and pharmacological context and in future these compounds can also be used as potential biomedicine. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Measurement and genetics of human subcortical and hippocampal asymmetries in large datasets.

    PubMed

    Guadalupe, Tulio; Zwiers, Marcel P; Teumer, Alexander; Wittfeld, Katharina; Vasquez, Alejandro Arias; Hoogman, Martine; Hagoort, Peter; Fernandez, Guillen; Buitelaar, Jan; Hegenscheid, Katrin; Völzke, Henry; Franke, Barbara; Fisher, Simon E; Grabe, Hans J; Francks, Clyde

    2014-07-01

    Functional and anatomical asymmetries are prevalent features of the human brain, linked to gender, handedness, and cognition. However, little is known about the neurodevelopmental processes involved. In zebrafish, asymmetries arise in the diencephalon before extending within the central nervous system. We aimed to identify genes involved in the development of subtle, left-right volumetric asymmetries of human subcortical structures using large datasets. We first tested the feasibility of measuring left-right volume differences in such large-scale samples, as assessed by two automated methods of subcortical segmentation (FSL|FIRST and FreeSurfer), using data from 235 subjects who had undergone MRI twice. We tested the agreement between the first and second scan, and the agreement between the segmentation methods, for measures of bilateral volumes of six subcortical structures and the hippocampus, and their volumetric asymmetries. We also tested whether there were biases introduced by left-right differences in the regional atlases used by the methods, by analyzing left-right flipped images. While many bilateral volumes were measured well (scan-rescan r = 0.6-0.8), most asymmetries, with the exception of the caudate nucleus, showed lower repeatabilites. We meta-analyzed genome-wide association scan results for caudate nucleus asymmetry in a combined sample of 3,028 adult subjects but did not detect associations at genome-wide significance (P < 5 × 10(-8) ). There was no enrichment of genetic association in genes involved in left-right patterning of the viscera. Our results provide important information for researchers who are currently aiming to carry out large-scale genome-wide studies of subcortical and hippocampal volumes, and their asymmetries. Copyright © 2013 Wiley Periodicals, Inc.

  17. A Novel Protective Vaccine Antigen from the Core Escherichia coli Genome

    PubMed Central

    Moriel, Danilo G.; Tan, Lendl; Goh, Kelvin G. K.; Ipe, Deepak S.; Lo, Alvin W.; Peters, Kate M.

    2016-01-01

    ABSTRACT Escherichia coli is a versatile pathogen capable of causing intestinal and extraintestinal infections that result in a huge burden of global human disease. The diversity of E. coli is reflected by its multiple different pathotypes and mosaic genome composition. E. coli strains are also a major driver of antibiotic resistance, emphasizing the urgent need for new treatment and prevention measures. Here, we used a large data set comprising 1,700 draft and complete genomes to define the core and accessory genome of E. coli and demonstrated the overlapping relationship between strains from different pathotypes. In combination with proteomic investigation, this analysis revealed core genes that encode surface-exposed or secreted proteins that represent potential broad-coverage vaccine antigens. One of these antigens, YncE, was characterized as a conserved immunogenic antigen able to protect against acute systemic infection in mice after vaccination. Overall, this work provides a genomic blueprint for future analyses of conserved and accessory E. coli genes. The work also identified YncE as a novel antigen that could be exploited in the development of a vaccine against all pathogenic E. coli strains—an important direction given the high global incidence of infections caused by multidrug-resistant strains for which there are few effective antibiotics. IMPORTANCE E. coli is a multifaceted pathogen of major significance to global human health and an important contributor to increasing antibiotic resistance. Given the paucity of therapies still effective against multidrug-resistant pathogenic E. coli strains, novel treatment and prevention strategies are urgently required. In this study, we defined the core and accessory components of the E. coli genome by examining a large collection of draft and completely sequenced strains available from public databases. This data set was mined by employing a reverse-vaccinology approach in combination with proteomics to identify putative broadly protective vaccine antigens. One such antigen was identified that was highly immunogenic and induced protection in a mouse model of bacteremia. Overall, our study provides a genomic and proteomic framework for the selection of novel vaccine antigens that could mediate broad protection against pathogenic E. coli. PMID:27904885

  18. Heritable gene expression differences between apomictic clone members in Taraxacum officinale: Insights into early stages of evolutionary divergence in asexual plants.

    PubMed

    Ferreira de Carvalho, Julie; Oplaat, Carla; Pappas, Nikolaos; Derks, Martijn; de Ridder, Dick; Verhoeven, Koen J F

    2016-03-08

    Asexual reproduction has the potential to enhance deleterious mutation accumulation and to constrain adaptive evolution. One source of mutations that can be especially relevant in recent asexuals is activity of transposable elements (TEs), which may have experienced selection for high transposition rates in sexual ancestor populations. Predictions of genomic divergence under asexual reproduction therefore likely include a large contribution of transposable elements but limited adaptive divergence. For plants empirical insight into genome divergence under asexual reproduction remains limited. Here, we characterize expression divergence between clone members of a single apomictic lineage of the common dandelion (Taraxacum officinale) to contribute to our knowledge of genome evolution under asexuality. Using RNA-Seq, we show that about one third of heritable divergence within the apomictic lineage is driven by TEs and TE-related gene activity. In addition, we identify non-random transcriptional differences in pathways related to acyl-lipid and abscisic acid metabolisms which might reflect functional divergence within the apomictic lineage. We analyze SNPs in the transcriptome to assess genetic divergence between the apomictic clone members and reveal that heritable expression differences between the accessions are not explained simply by genome-wide genetic divergence. The present study depicts a first effort towards a more complete understanding of apomictic plant genome evolution. We identify abundant TE activity and ecologically relevant functional genes and pathways affecting heritable within-lineage expression divergence. These findings offer valuable resources for future work looking at epigenetic silencing and Cis-regulation of gene expression with particular emphasis on the effects of TE activity on asexual species' genome.

  19. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

    PubMed Central

    2011-01-01

    Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061

  20. Genomic basis for the convergent evolution of electric organs

    PubMed Central

    Gallant, Jason R.; Traeger, Lindsay L.; Volkening, Jeremy D.; Moffett, Howell; Chen, Po-Hao; Novina, Carl D.; Phillips, George N.; Anand, Rene; Wells, Gregg B.; Pinch, Matthew; Güth, Robert; Unguez, Graciela A.; Albert, James S.; Zakon, Harold H.; Samanta, Manoj P.; Sussman, Michael R.

    2017-01-01

    Little is known about the genetic basis of convergent traits that originate repeatedly over broad taxonomic scales. The myogenic electric organ has evolved six times in fishes to produce electric fields used in communication, navigation, predation, or defense. We have examined the genomic basis of the convergent anatomical and physiological origins of these organs by assembling the genome of the electric eel (Electrophorus electricus) and sequencing electric organ and skeletal muscle transcriptomes from three lineages that have independently evolved electric organs. Our results indicate that, despite millions of years of evolution and large differences in the morphology of electric organ cells, independent lineages have leveraged similar transcription factors and developmental and cellular pathways in the evolution of electric organs. PMID:24970089

  1. Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

    PubMed Central

    Lynch, Erin A.; Langille, Morgan G. I.; Darling, Aaron; Wilbanks, Elizabeth G.; Haltiner, Caitlin; Shao, Katie S. Y.; Starr, Michael O.; Teiling, Clotilde; Harkins, Timothy T.; Edwards, Robert A.; Eisen, Jonathan A.; Facciotti, Marc T.

    2012-01-01

    We report the sequencing of seven genomes from two haloarchaeal genera, Haloferax and Haloarcula. Ease of cultivation and the existence of well-developed genetic and biochemical tools for several diverse haloarchaeal species make haloarchaea a model group for the study of archaeal biology. The unique physiological properties of these organisms also make them good candidates for novel enzyme discovery for biotechnological applications. Seven genomes were sequenced to ∼20×coverage and assembled to an average of 50 contigs (range 5 scaffolds - 168 contigs). Comparisons of protein-coding gene compliments revealed large-scale differences in COG functional group enrichment between these genera. Analysis of genes encoding machinery for DNA metabolism reveals genera-specific expansions of the general transcription factor TATA binding protein as well as a history of extensive duplication and horizontal transfer of the proliferating cell nuclear antigen. Insights gained from this study emphasize the importance of haloarchaea for investigation of archaeal biology. PMID:22848480

  2. Importing statistical measures into Artemis enhances gene identification in the Leishmania genome project.

    PubMed

    Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J

    2003-06-07

    Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.

  3. Molecular biology of mycoplasmas: from the minimum cell concept to the artificial cell.

    PubMed

    Cordova, Caio M M; Hoeltgebaum, Daniela L; Machado, Laís D P N; Santos, Larissa Dos

    2016-01-01

    Mycoplasmas are a large group of bacteria, sorted into different genera in the Mollicutes class, whose main characteristic in common, besides the small genome, is the absence of cell wall. They are considered cellular and molecular biology study models. We present an updated review of the molecular biology of these model microorganisms and the development of replicative vectors for the transformation of mycoplasmas. Synthetic biology studies inspired by these pioneering works became possible and won the attention of the mainstream media. For the first time, an artificial genome was synthesized (a minimal genome produced from consensus sequences obtained from mycoplasmas). For the first time, a functional artificial cell has been constructed by introducing a genome completely synthesized within a cell envelope of a mycoplasma obtained by transformation techniques. Therefore, this article offers an updated insight to the state of the art of these peculiar organisms' molecular biology.

  4. A review of genomic data warehousing systems.

    PubMed

    Triplet, Thomas; Butler, Gregory

    2014-07-01

    To facilitate the integration and querying of genomics data, a number of generic data warehousing frameworks have been developed. They differ in their design and capabilities, as well as their intended audience. We provide a comprehensive and quantitative review of those genomic data warehousing frameworks in the context of large-scale systems biology. We reviewed in detail four genomic data warehouses (BioMart, BioXRT, InterMine and PathwayTools) freely available to the academic community. We quantified 20 aspects of the warehouses, covering the accuracy of their responses, their computational requirements and development efforts. Performance of the warehouses was evaluated under various hardware configurations to help laboratories optimize hardware expenses. Each aspect of the benchmark may be dynamically weighted by scientists using our online tool BenchDW (http://warehousebenchmark.fungalgenomics.ca/benchmark/) to build custom warehouse profiles and tailor our results to their specific needs.

  5. GenomeDiagram: a python package for the visualization of large-scale genomic data.

    PubMed

    Pritchard, Leighton; White, Jennifer A; Birch, Paul R J; Toth, Ian K

    2006-03-01

    We present GenomeDiagram, a flexible, open-source Python module for the visualization of large-scale genomic, comparative genomic and other data with reference to a single chromosome or other biological sequence. GenomeDiagram may be used to generate publication-quality vector graphics, rastered images and in-line streamed graphics for webpages. The package integrates with datatypes from the BioPython project, and is available for Windows, Linux and Mac OS X systems. GenomeDiagram is freely available as source code (under GNU Public License) at http://bioinf.scri.ac.uk/lp/programs.html, and requires Python 2.3 or higher, and recent versions of the ReportLab and BioPython packages. A user manual, example code and images are available at http://bioinf.scri.ac.uk/lp/programs.html.

  6. Genomic tools for behavioural ecologists to understand repeatable individual differences in behaviour.

    PubMed

    Bengston, Sarah E; Dahan, Romain A; Donaldson, Zoe; Phelps, Steven M; van Oers, Kees; Sih, Andrew; Bell, Alison M

    2018-06-01

    Behaviour is a key interface between an animal's genome and its environment. Repeatable individual differences in behaviour have been extensively documented in animals, but the molecular underpinnings of behavioural variation among individuals within natural populations remain largely unknown. Here, we offer a critical review of when molecular techniques may yield new insights, and we provide specific guidance on how and whether the latest tools available are appropriate given different resources, system and organismal constraints, and experimental designs. Integrating molecular genetic techniques with other strategies to study the proximal causes of behaviour provides opportunities to expand rapidly into new avenues of exploration. Such endeavours will enable us to better understand how repeatable individual differences in behaviour have evolved, how they are expressed and how they can be maintained within natural populations of animals.

  7. Draft genome of a Xanthomonas perforans strain associated with pith necrosis.

    PubMed

    Torelli, Emanuela; Aiello, Dalia; Polizzi, Giancarlo; Firrao, Giuseppe; Cirvilleri, Gabriella

    2015-02-01

    Xanthomonas perforans causes bacterial spot of tomato and pepper. A genome draft of an unusual isolate (strain 4P1S2), differing in that it was associated with stem pith necrosis, was assembled from Illumina MiSeq sequencing data using the draft of X. perforans strain 91-118 as a reference. The resulting draft (accession number JRWW00000000) largely overlapped with the reference draft. In addition, the reads not mapping on the reference assembly were selected and used for a further assembly, that revealed a large putative plasmid. The analysis of the predicted proteins showed only few gene features that could be potentially implicated in the switch of a phytopathological behavior. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  8. Spatial organization of chromatin domains and compartments in single chromosomes

    NASA Astrophysics Data System (ADS)

    Wang, Siyuan; Su, Jun-Han; Beliveau, Brian; Bintu, Bogdan; Moffitt, Jeffrey; Wu, Chao-Ting; Zhuang, Xiaowei

    The spatial organization of chromatin critically affects genome function. Recent chromosome-conformation-capture studies have revealed topologically associating domains (TADs) as a conserved feature of chromatin organization, but how TADs are spatially organized in individual chromosomes remains unknown. Here, we developed an imaging method for mapping the spatial positions of numerous genomic regions along individual chromosomes and traced the positions of TADs in human interphase autosomes and X chromosomes. We observed that chromosome folding deviates from the ideal fractal-globule model at large length scales and that TADs are largely organized into two compartments spatially arranged in a polarized manner in individual chromosomes. Active and inactive X chromosomes adopt different folding and compartmentalization configurations. These results suggest that the spatial organization of chromatin domains can change in response to regulation.

  9. Simultaneous non-contiguous deletions using large synthetic DNA and site-specific recombinases

    PubMed Central

    Krishnakumar, Radha; Grose, Carissa; Haft, Daniel H.; Zaveri, Jayshree; Alperovich, Nina; Gibson, Daniel G.; Merryman, Chuck; Glass, John I.

    2014-01-01

    Toward achieving rapid and large scale genome modification directly in a target organism, we have developed a new genome engineering strategy that uses a combination of bioinformatics aided design, large synthetic DNA and site-specific recombinases. Using Cre recombinase we swapped a target 126-kb segment of the Escherichia coli genome with a 72-kb synthetic DNA cassette, thereby effectively eliminating over 54 kb of genomic DNA from three non-contiguous regions in a single recombination event. We observed complete replacement of the native sequence with the modified synthetic sequence through the action of the Cre recombinase and no competition from homologous recombination. Because of the versatility and high-efficiency of the Cre-lox system, this method can be used in any organism where this system is functional as well as adapted to use with other highly precise genome engineering systems. Compared to present-day iterative approaches in genome engineering, we anticipate this method will greatly speed up the creation of reduced, modularized and optimized genomes through the integration of deletion analyses data, transcriptomics, synthetic biology and site-specific recombination. PMID:24914053

  10. Genome Sequencing of Museum Specimens Reveals Rapid Changes in the Genetic Composition of Honey Bees in California

    PubMed Central

    Ramirez, Santiago R; Dean, Cheryl A; Sciligo, Amber; Tsutsui, Neil D

    2018-01-01

    Abstract The western honey bee, Apis mellifera, is an enormously influential pollinator in both natural and managed ecosystems. In North America, this species has been introduced numerous times from a variety of different source populations in Europe and Africa. Since then, feral populations have expanded into many different environments across their broad introduced range. Here, we used whole genome sequencing of historical museum specimens and newly collected modern populations from California (USA) to analyze the impact of demography and selection on introduced populations during the past 105 years. We find that populations from both northern and southern California exhibit pronounced genetic changes, but have changed in different ways. In northern populations, honey bees underwent a substantial shift from western European to eastern European ancestry since the 1960s, whereas southern populations are dominated by the introgression of Africanized genomes during the past two decades. Additionally, we identify an isolated island population that has experienced comparatively little change over a large time span. Fine-scale comparison of different populations and time points also revealed SNPs that differ in frequency, highlighting a number of genes that may be important for recent adaptations in these introduced populations. PMID:29346588

  11. Trajectories and Drivers of Genome Evolution in Surface-Associated Marine Phaeobacter

    PubMed Central

    Sikorski, Johannes; Bunk, Boyke; Scheuner, Carmen; Meier-Kolthoff, Jan P; Spröer, Cathrin; Gram, Lone; Overmann, Jörg

    2017-01-01

    Abstract The extent of genome divergence and the evolutionary events leading to speciation of marine bacteria have mostly been studied for (locally) abundant, free-living groups. The genus Phaeobacter is found on different marine surfaces, seems to occupy geographically disjunct habitats, and is involved in different biotic interactions, and was therefore targeted in the present study. The analysis of the chromosomes of 32 closely related but geographically spread Phaeobacter strains revealed an exceptionally large, highly syntenic core genome. The flexible gene pool is constantly but slightly expanding across all Phaeobacter lineages. The horizontally transferred genes mostly originated from bacteria of the Roseobacter group and horizontal transfer most likely was mediated by gene transfer agents. No evidence for geographic isolation and habitat specificity of the different phylogenomic Phaeobacter clades was detected based on the sources of isolation. In contrast, the functional gene repertoire and physiological traits of different phylogenomic Phaeobacter clades were sufficiently distinct to suggest an adaptation to an associated lifestyle with algae, to additional nutrient sources, or toxic heavy metals. Our study reveals that the evolutionary trajectories of surface-associated marine bacteria can differ significantly from free-living marine bacteria or marine generalists. PMID:29194520

  12. Genome sequencing of four Aureobasidium pullulans varieties: biotechnological potential, stress tolerance, and description of new species.

    PubMed

    Gostinčar, Cene; Ohm, Robin A; Kogej, Tina; Sonjak, Silva; Turk, Martina; Zajc, Janja; Zalar, Polona; Grube, Martin; Sun, Hui; Han, James; Sharma, Aditi; Chiniquy, Jennifer; Ngan, Chew Yee; Lipzen, Anna; Barry, Kerrie; Grigoriev, Igor V; Gunde-Cimerman, Nina

    2014-07-01

    Aureobasidium pullulans is a black-yeast-like fungus used for production of the polysaccharide pullulan and the antimycotic aureobasidin A, and as a biocontrol agent in agriculture. It can cause opportunistic human infections, and it inhabits various extreme environments. To promote the understanding of these traits, we performed de-novo genome sequencing of the four varieties of A. pullulans. The 25.43-29.62 Mb genomes of these four varieties of A. pullulans encode between 10266 and 11866 predicted proteins. Their genomes encode most of the enzyme families involved in degradation of plant material and many sugar transporters, and they have genes possibly associated with degradation of plastic and aromatic compounds. Proteins believed to be involved in the synthesis of pullulan and siderophores, but not of aureobasidin A, are predicted. Putative stress-tolerance genes include several aquaporins and aquaglyceroporins, large numbers of alkali-metal cation transporters, genes for the synthesis of compatible solutes and melanin, all of the components of the high-osmolarity glycerol pathway, and bacteriorhodopsin-like proteins. All of these genomes contain a homothallic mating-type locus. The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species: A. pullulans, A. melanogenum, A. subglaciale and A. namibiae. The redundancy observed in several gene families can be linked to the nutritional versatility of these species and their particular stress tolerance. The availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote our understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.

  13. Pyrosequencing-based comparative genome analysis of the nosocomial pathogen Enterococcus faecium and identification of a large transferable pathogenicity island

    PubMed Central

    2010-01-01

    Background The Gram-positive bacterium Enterococcus faecium is an important cause of nosocomial infections in immunocompromized patients. Results We present a pyrosequencing-based comparative genome analysis of seven E. faecium strains that were isolated from various sources. In the genomes of clinical isolates several antibiotic resistance genes were identified, including the vanA transposon that confers resistance to vancomycin in two strains. A functional comparison between E. faecium and the related opportunistic pathogen E. faecalis based on differences in the presence of protein families, revealed divergence in plant carbohydrate metabolic pathways and oxidative stress defense mechanisms. The E. faecium pan-genome was estimated to be essentially unlimited in size, indicating that E. faecium can efficiently acquire and incorporate exogenous DNA in its gene pool. One of the most prominent sources of genomic diversity consists of bacteriophages that have integrated in the genome. The CRISPR-Cas system, which contributes to immunity against bacteriophage infection in prokaryotes, is not present in the sequenced strains. Three sequenced isolates carry the esp gene, which is involved in urinary tract infections and biofilm formation. The esp gene is located on a large pathogenicity island (PAI), which is between 64 and 104 kb in size. Conjugation experiments showed that the entire esp PAI can be transferred horizontally and inserts in a site-specific manner. Conclusions Genes involved in environmental persistence, colonization and virulence can easily be aquired by E. faecium. This will make the development of successful treatment strategies targeted against this organism a challenge for years to come. PMID:20398277

  14. Genome-wide SNP scan in a porcine Large White×Minzhu intercross population reveals a locus influencing muscle mass on chromosome 2.

    PubMed

    Liu, Xin; Wang, Li Gang; Luo, Wei Zhen; Li, Yong; Liang, Jing; Yan, Hua; Zhao, Ke Bin; Wang, Li Xian; Zhang, Long Chao

    2014-12-01

    A high-density single nucleotide polymorphism (SNP) array containing 62 163 markers was employed for a genome-wide association study (GWAS) to identify variants associated with lean meat in ham (LMH, %) and lean meat percentage (LMP, %) within a porcine Large White×Minzhu intercross population. For each individual, LMH and LMP were measured after slaughter at the age of 240±7 days. A total of 557 F2 animals were genotyped. The GWAS revealed that 21 SNPs showed significant genome-wide or chromosome-wide associations with LMH and LMP by the Genome-wide Rapid Association using Mixed Model and Regression-Genomic Control approach. Nineteen significant genome-wide SNPs were mapped to the distal end of Sus Scrofa Chromosome (SSC) 2, where a major known gene responsible for muscle mass, IGF2 is located. A conditioned analysis, in which the genotype of the strongest associated SNP is included as a fixed effect in the model, showed that those significant SNPs on SSC2 were derived from a single quantitative trait locus. The two chromosome-wide association SNPs on SSC1 disappeared after conditioned analysis suggested the association signal is a false association derived from using a F2 population. The present result is expected to lead to novel insights into muscle mass in different pig breeds and lays a preliminary foundation for follow-up studies for identification of causal mutations for subsequent application in marker-assisted selection programs for improving muscle mass in pigs. © 2014 Japanese Society of Animal Science.

  15. Internally deleted WNV genomes isolated from exotic birds in New Mexico: function in cells, mosquitoes, and mice.

    PubMed

    Pesko, Kendra N; Fitzpatrick, Kelly A; Ryan, Elizabeth M; Shi, Pei-Yong; Zhang, Bo; Lennon, Niall J; Newman, Ruchi M; Henn, Matthew R; Ebel, Gregory D

    2012-05-25

    Most RNA viruses exist in their hosts as a heterogeneous population of related variants. Due to error prone replication, mutants are constantly generated which may differ in individual fitness from the population as a whole. Here we characterize three WNV isolates that contain, along with full-length genomes, mutants with large internal deletions to structural and nonstructural protein-coding regions. The isolates were all obtained from lorikeets that died from WNV at the Rio Grande Zoo in Albuquerque, NM between 2005 and 2007. The deletions are approximately 2kb, in frame, and result in the elimination of the complete envelope, and portions of the prM and NS-1 proteins. In Vero cell culture, these internally deleted WNV genomes function as defective interfering particles, reducing the production of full-length virus when introduced at high multiplicities of infection. In mosquitoes, the shortened WNV genomes reduced infection and dissemination rates, and virus titers overall, and were not detected in legs or salivary secretions at 14 or 21 days post-infection. In mice, inoculation with internally deleted genomes did not attenuate pathogenesis relative to full-length or infectious clone derived virus, and shortened genomes were not detected in mice at the time of death. These observations provide evidence that large deletions may occur within flavivirus populations more frequently than has generally been appreciated and suggest that they impact population phenotype minimally. Additionally, our findings suggest that highly similar mutants may frequently occur in particular vertebrate hosts. Copyright © 2012 Elsevier Inc. All rights reserved.

  16. A Novel Virus Detected in Papillomas and Carcinomas of the Endangered Western Barred Bandicoot (Perameles bougainville) Exhibits Genomic Features of both the Papillomaviridae and Polyomaviridae▿

    PubMed Central

    Woolford, Lucy; Rector, Annabel; Van Ranst, Marc; Ducki, Andrea; Bennett, Mark D.; Nicholls, Philip K.; Warren, Kristin S.; Swan, Ralph A.; Wilcox, Graham E.; O'Hara, Amanda J.

    2007-01-01

    Conservation efforts to prevent the extinction of the endangered western barred bandicoot (Perameles bougainville) are currently hindered by a progressively debilitating cutaneous and mucocutaneous papillomatosis and carcinomatosis syndrome observed in captive and wild populations. In this study, we detected a novel virus, designated the bandicoot papillomatosis carcinomatosis virus type 1 (BPCV1), in lesional tissue from affected western barred bandicoots using multiply primed rolling-circle amplification and PCR with the cutaneotropic papillomavirus primer pairs FAP59/FAP64 and AR-L1F8/AR-L1R9. Sequencing of the BPCV1 genome revealed a novel prototype virus exhibiting genomic properties of both the Papillomaviridae and the Polyomaviridae. Papillomaviral properties included a large genome size (∼7.3 kb) and the presence of open reading frames (ORFs) encoding canonical L1 and L2 structural proteins. The genomic organization in which structural and nonstructural proteins were encoded on different strands of the double-stranded genome and the presence of ORFs encoding the nonstructural proteins large T and small t antigens were, on the other hand, typical polyomaviral features. BPCV1 may represent the first member of a novel virus family, descended from a common ancestor of the papillomaviruses and polyomaviruses recognized today. Alternatively, it may represent the product of ancient recombination between members of these two virus families. The discovery of this virus could have implications for the current taxonomic classification of Papillomaviridae and Polyomaviridae and can provide further insight into the evolution of these ancient virus families. PMID:17898069

  17. The Complete Mitochondrial Genome of Gossypium hirsutum and Evolutionary Analysis of Higher Plant Mitochondrial Genomes

    PubMed Central

    Su, Aiguo; Geng, Jianing; Grover, Corrinne E.; Hu, Songnian; Hua, Jinping

    2013-01-01

    Background Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. Methodology/Principal Findings We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. Conclusion The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species. PMID:23940520

  18. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    PubMed

    Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

    2013-01-01

    Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  19. Altools: a user friendly NGS data analyser.

    PubMed

    Camiolo, Salvatore; Sablok, Gaurav; Porceddu, Andrea

    2016-02-17

    Genotyping by re-sequencing has become a standard approach to estimate single nucleotide polymorphism (SNP) diversity, haplotype structure and the biodiversity and has been defined as an efficient approach to address geographical population genomics of several model species. To access core SNPs and insertion/deletion polymorphisms (indels), and to infer the phyletic patterns of speciation, most such approaches map short reads to the reference genome. Variant calling is important to establish patterns of genome-wide association studies (GWAS) for quantitative trait loci (QTLs), and to determine the population and haplotype structure based on SNPs, thus allowing content-dependent trait and evolutionary analysis. Several tools have been developed to investigate such polymorphisms as well as more complex genomic rearrangements such as copy number variations, presence/absence variations and large deletions. The programs available for this purpose have different strengths (e.g. accuracy, sensitivity and specificity) and weaknesses (e.g. low computation speed, complex installation procedure and absence of a user-friendly interface). Here we introduce Altools, a software package that is easy to install and use, which allows the precise detection of polymorphisms and structural variations. Altools uses the BWA/SAMtools/VarScan pipeline to call SNPs and indels, and the dnaCopy algorithm to achieve genome segmentation according to local coverage differences in order to identify copy number variations. It also uses insert size information from the alignment of paired-end reads and detects potential large deletions. A double mapping approach (BWA/BLASTn) identifies precise breakpoints while ensuring rapid elaboration. Finally, Altools implements several processes that yield deeper insight into the genes affected by the detected polymorphisms. Altools was used to analyse both simulated and real next-generation sequencing (NGS) data and performed satisfactorily in terms of positive predictive values, sensitivity, the identification of large deletion breakpoints and copy number detection. Altools is fast, reliable and easy to use for the mining of NGS data. The software package also attempts to link identified polymorphisms and structural variants to their biological functions thus providing more valuable information than similar tools.

  20. Infection cycles of large DNA viruses: Emerging themes and underlying questions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mutsafi, Yael, E-mail: yael.mutsafi@weizmann.ac.il; Fridmann-Sirkis, Yael; Milrot, Elad

    The discovery of giant DNA viruses and the recent realization that such viruses are diverse and abundant blurred the distinction between viruses and cells. These findings elicited lively debates on the nature and origin of viruses as well as on their potential roles in the evolution of cells. The following essay is, however, concerned with new insights into fundamental structural and physical aspects of viral replication that were derived from studies conducted on large DNA viruses. Specifically, the entirely cytoplasmic replication cycles of Mimivirus and Vaccinia are discussed in light of the highly limited trafficking of large macromolecules in themore » crowded cytoplasm of cells. The extensive spatiotemporal order revealed by cytoplasmic viral factories is described and contended to play an important role in promoting the efficiency of these ‘nuclear-like’ organelles. Generation of single-layered internal membrane sheets in Mimivirus and Vaccinia, which proceeds through a novel membrane biogenesis mechanism that enables continuous supply of lipids, is highlighted as an intriguing case study of self-assembly. Mimivirus genome encapsidation was shown to occur through a portal different from the ‘stargate’ portal that is used for genome release. Such a ‘division of labor’ is proposed to enhance the efficacy of translocation processes of very large viral genomes. Finally, open questions concerning the infection cycles of giant viruses to which future studies are likely to provide novel and exciting answers are discussed. - Highlights: • The discovery of giant DNA viruses blurs the distinction between viruses and cells. • Mimivirus and Vaccinia replicate exclusively in their host cytoplasm. • Mimivirus genome is delivered through a unique portal coined the Stargate. • Generation of Mimivirus internal membrane proceeds through a novel pathway.« less

  1. The complete mitochondrial genomes of two rice planthoppers, Nilaparvata lugens and Laodelphax striatellus: conserved genome rearrangement in Delphacidae and discovery of new characteristics of atp8 and tRNA genes.

    PubMed

    Zhang, Kai-Jun; Zhu, Wen-Chao; Rong, Xia; Zhang, Yan-Kai; Ding, Xiu-Lei; Liu, Jing; Chen, Da-Song; Du, Yu; Hong, Xiao-Yue

    2013-06-22

    Nilaparvata lugens (the brown planthopper, BPH) and Laodelphax striatellus (the small brown planthopper, SBPH) are two of the most important pests of rice. Up to now, there was only one mitochondrial genome of rice planthopper has been sequenced and very few dependable information of mitochondria could be used for research on population genetics, phylogeographics and phylogenetic evolution of these pests. To get more valuable information from the mitochondria, we sequenced the complete mitochondrial genomes of BPH and SBPH. These two planthoppers were infected with two different functional Wolbachia (intracellular endosymbiont) strains (wLug and wStri). Since both mitochondria and Wolbachia are transmitted by cytoplasmic inheritance and it was difficult to separate them when purified the Wolbachia particles, concomitantly sequencing the genome of Wolbachia using next generation sequencing method, we also got nearly complete mitochondrial genome sequences of these two rice planthoppers. After gap closing, we present high quality and reliable complete mitochondrial genomes of these two planthoppers. The mitogenomes of N. lugens (BPH) and L. striatellus (SBPH) are 17, 619 bp and 16, 431 bp long with A + T contents of 76.95% and 77.17%, respectively. Both species have typical circular mitochondrial genomes that encode the complete set of 37 genes which are usually found in metazoans. However, the BPH mitogenome also possesses two additional copies of the trnC gene. In both mitochondrial genomes, the lengths of the atp8 gene were conspicuously shorter than that of all other known insect mitochondrial genomes (99 bp for BPH, 102 bp for SBPH). That two rearrangement regions (trnC-trnW and nad6-trnP-trnT) of mitochondrial genomes differing from other known insect were found in these two distantly related planthoppers revealed that the gene order of mitochondria might be conservative in Delphacidae. The large non-coding fragment (the A+T-rich region) putatively corresponding responsible for the control of replication and transcription of mitochondria contained a variable number of tandem repeats (VNTRs) block in different natural individuals of these two planthoppers. Comparison with a previously sequenced individual of SBPH revealed that the mitochondrial genetic variation within a species exists not only in the sequence and secondary structure of genes, but also in the gene order (the different location of trnH gene). The mitochondrial genome arrangement pattern found in planthoppers was involved in rearrangements of both tRNA genes and protein-coding genes (PCGs). Different species from different genera of Delphacidae possessing the same mitochondrial gene rearrangement suggests that gene rearrangements of mitochondrial genome probably occurred before the differentiation of this family. After comparatively analyzing the gene order of different species of Hemiptera, we propose that except for some specific taxonomical group (e.g. the whiteflies) the gene order might have diversified in family level of this order. The VNTRs detected in the control region might provide additional genetic markers for studying population genetics, individual difference and phylogeographics of planthoppers.

  2. The complete mitochondrial genomes of two rice planthoppers, Nilaparvata lugens and Laodelphax striatellus: conserved genome rearrangement in Delphacidae and discovery of new characteristics of atp8 and tRNA genes

    PubMed Central

    2013-01-01

    Background Nilaparvata lugens (the brown planthopper, BPH) and Laodelphax striatellus (the small brown planthopper, SBPH) are two of the most important pests of rice. Up to now, there was only one mitochondrial genome of rice planthopper has been sequenced and very few dependable information of mitochondria could be used for research on population genetics, phylogeographics and phylogenetic evolution of these pests. To get more valuable information from the mitochondria, we sequenced the complete mitochondrial genomes of BPH and SBPH. These two planthoppers were infected with two different functional Wolbachia (intracellular endosymbiont) strains (wLug and wStri). Since both mitochondria and Wolbachia are transmitted by cytoplasmic inheritance and it was difficult to separate them when purified the Wolbachia particles, concomitantly sequencing the genome of Wolbachia using next generation sequencing method, we also got nearly complete mitochondrial genome sequences of these two rice planthoppers. After gap closing, we present high quality and reliable complete mitochondrial genomes of these two planthoppers. Results The mitogenomes of N. lugens (BPH) and L. striatellus (SBPH) are 17, 619 bp and 16, 431 bp long with A + T contents of 76.95% and 77.17%, respectively. Both species have typical circular mitochondrial genomes that encode the complete set of 37 genes which are usually found in metazoans. However, the BPH mitogenome also possesses two additional copies of the trnC gene. In both mitochondrial genomes, the lengths of the atp8 gene were conspicuously shorter than that of all other known insect mitochondrial genomes (99 bp for BPH, 102 bp for SBPH). That two rearrangement regions (trnC-trnW and nad6-trnP-trnT) of mitochondrial genomes differing from other known insect were found in these two distantly related planthoppers revealed that the gene order of mitochondria might be conservative in Delphacidae. The large non-coding fragment (the A+T-rich region) putatively corresponding responsible for the control of replication and transcription of mitochondria contained a variable number of tandem repeats (VNTRs) block in different natural individuals of these two planthoppers. Comparison with a previously sequenced individual of SBPH revealed that the mitochondrial genetic variation within a species exists not only in the sequence and secondary structure of genes, but also in the gene order (the different location of trnH gene). Conclusion The mitochondrial genome arrangement pattern found in planthoppers was involved in rearrangements of both tRNA genes and protein-coding genes (PCGs). Different species from different genera of Delphacidae possessing the same mitochondrial gene rearrangement suggests that gene rearrangements of mitochondrial genome probably occurred before the differentiation of this family. After comparatively analyzing the gene order of different species of Hemiptera, we propose that except for some specific taxonomical group (e.g. the whiteflies) the gene order might have diversified in family level of this order. The VNTRs detected in the control region might provide additional genetic markers for studying population genetics, individual difference and phylogeographics of planthoppers. PMID:23799924

  3. Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation.

    PubMed

    Fitzpatrick, Matthew C; Keller, Stephen R

    2015-01-01

    Local adaptation is a central feature of most species occupying spatially heterogeneous environments, and may factor critically in responses to environmental change. However, most efforts to model the response of species to climate change ignore intraspecific variation due to local adaptation. Here, we present a new perspective on spatial modelling of organism-environment relationships that combines genomic data and community-level modelling to develop scenarios regarding the geographic distribution of genomic variation in response to environmental change. Rather than modelling species within communities, we use these techniques to model large numbers of loci across genomes. Using balsam poplar (Populus balsamifera) as a case study, we demonstrate how our framework can accommodate nonlinear responses of loci to environmental gradients. We identify a threshold response to temperature in the circadian clock gene GIGANTEA-5 (GI5), suggesting that this gene has experienced strong local adaptation to temperature. We also demonstrate how these methods can map ecological adaptation from genomic data, including the identification of predicted differences in the genetic composition of populations under current and future climates. Community-level modelling of genomic variation represents an important advance in landscape genomics and spatial modelling of biodiversity that moves beyond species-level assessments of climate change vulnerability. © 2014 John Wiley & Sons Ltd/CNRS.

  4. Segmental duplications: evolution and impact among the current Lepidoptera genomes.

    PubMed

    Zhao, Qian; Ma, Dongna; Vasseur, Liette; You, Minsheng

    2017-07-06

    Structural variation among genomes is now viewed to be as important as single nucleoid polymorphisms in influencing the phenotype and evolution of a species. Segmental duplication (SD) is defined as segments of DNA with homologous sequence. Here, we performed a systematic analysis of segmental duplications (SDs) among five lepidopteran reference genomes (Plutella xylostella, Danaus plexippus, Bombyx mori, Manduca sexta and Heliconius melpomene) to understand their potential impact on the evolution of these species. We find that the SDs content differed substantially among species, ranging from 1.2% of the genome in B. mori to 15.2% in H. melpomene. Most SDs formed very high identity (similarity higher than 90%) blocks but had very few large blocks. Comparative analysis showed that most of the SDs arose after the divergence of each linage and we found that P. xylostella and H. melpomene showed more duplications than other species, suggesting they might be able to tolerate extensive levels of variation in their genomes. Conserved ancestral and species specific SD events were assessed, revealing multiple examples of the gain, loss or maintenance of SDs over time. SDs content analysis showed that most of the genes embedded in SDs regions belonged to species-specific SDs ("Unique" SDs). Functional analysis of these genes suggested their potential roles in the lineage-specific evolution. SDs and flanking regions often contained transposable elements (TEs) and this association suggested some involvement in SDs formation. Further studies on comparison of gene expression level between SDs and non-SDs showed that the expression level of genes embedded in SDs was significantly lower, suggesting that structure changes in the genomes are involved in gene expression differences in species. The results showed that most of the SDs were "unique SDs", which originated after species formation. Functional analysis suggested that SDs might play different roles in different species. Our results provide a valuable resource beyond the genetic mutation to explore the genome structure for future Lepidoptera research.

  5. Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

    PubMed

    Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

    2011-01-01

    The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.

  6. Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

    PubMed Central

    Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

    2011-01-01

    Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

  7. Analysis of the Pantoea ananatis pan-genome reveals factors underlying its ability to colonize and interact with plant, insect and vertebrate hosts.

    PubMed

    De Maayer, Pieter; Chan, Wai Yin; Rubagotti, Enrico; Venter, Stephanus N; Toth, Ian K; Birch, Paul R J; Coutinho, Teresa A

    2014-05-27

    Pantoea ananatis is found in a wide range of natural environments, including water, soil, as part of the epi- and endophytic flora of various plant hosts, and in the insect gut. Some strains have proven effective as biological control agents and plant-growth promoters, while other strains have been implicated in diseases of a broad range of plant hosts and humans. By analysing the pan-genome of eight sequenced P. ananatis strains isolated from different sources we identified factors potentially underlying its ability to colonize and interact with hosts in both the plant and animal Kingdoms. The pan-genome of the eight compared P. ananatis strains consisted of a core genome comprised of 3,876 protein coding sequences (CDSs) and a sizeable accessory genome consisting of 1,690 CDSs. We estimate that ~106 unique CDSs would be added to the pan-genome with each additional P. ananatis genome sequenced in the future. The accessory fraction is derived mainly from integrated prophages and codes mostly for proteins of unknown function. Comparison of the translated CDSs on the P. ananatis pan-genome with the proteins encoded on all sequenced bacterial genomes currently available revealed that P. ananatis carries a number of CDSs with orthologs restricted to bacteria associated with distinct hosts, namely plant-, animal- and insect-associated bacteria. These CDSs encode proteins with putative roles in transport and metabolism of carbohydrate and amino acid substrates, adherence to host tissues, protection against plant and animal defense mechanisms and the biosynthesis of potential pathogenicity determinants including insecticidal peptides, phytotoxins and type VI secretion system effectors. P. ananatis has an 'open' pan-genome typical of bacterial species that colonize several different environments. The pan-genome incorporates a large number of genes encoding proteins that may enable P. ananatis to colonize, persist in and potentially cause disease symptoms in a wide range of plant and animal hosts.

  8. Genomic imprinting in Drosophila has properties of both mammalian and insect imprinting.

    PubMed

    Anaka, Matthew; Lynn, Audra; McGinn, Patrick; Lloyd, Vett K

    2009-02-01

    Genomic imprinting is a process that marks DNA, causing a change in gene or chromosome behavior, depending on the sex of the transmitting parent. In mammals, most examples of genomic imprinting affect the transcription of individual or small clusters of genes whereas in insects, genomic imprinting tends to silence entire chromosomes. This has been interpreted as evidence of independent evolutionary origins for imprinting. To investigate how these types of imprinting are related, we performed a phenotypic, molecular, and cytological analysis of an imprinted chromosome in Drosophila melanogaster. Analysis of this chromosome reveals that the imprint results in transcriptional silencing. Yet, the domain of transcriptional silencing is very large, extending at least 1.2 Mb and encompassing over 100 genes, and is associated with decreased somatic polytenization of the entire chromosome. We propose that repression of somatic replication in polytenized cells, as a secondary response to the imprint, acts to extend the size of the imprinted domain to an entire chromosome. Thus, imprinting in Drosophila has properties of both typical mammalian and insect imprinting which suggests that genomic imprinting in Drosophila and mammals is not fundamentally different; imprinting is manifest as transcriptional silencing of a few genes or silencing of an entire chromosome depending on secondary processes such as differences in gene density and polytenization.

  9. Tissue-specific NETs alter genome organization and regulation even in a heterologous system.

    PubMed

    de Las Heras, Jose I; Zuleger, Nikolaj; Batrakou, Dzmitry G; Czapiewski, Rafal; Kerr, Alastair R W; Schirmer, Eric C

    2017-01-02

    Different cell types exhibit distinct patterns of 3D genome organization that correlate with changes in gene expression in tissue and differentiation systems. Several tissue-specific nuclear envelope transmembrane proteins (NETs) have been found to influence the spatial positioning of genes and chromosomes that normally occurs during tissue differentiation. Here we study 3 such NETs: NET29, NET39, and NET47, which are expressed preferentially in fat, muscle and liver, respectively. We found that even when exogenously expressed in a heterologous system they can specify particular genome organization patterns and alter gene expression. Each NET affected largely different subsets of genes. Notably, the liver-specific NET47 upregulated many genes in HT1080 fibroblast cells that are normally upregulated in hepatogenesis, showing that tissue-specific NETs can favor expression patterns associated with the tissue where the NET is normally expressed. Similarly, global profiling of peripheral chromatin after exogenous expression of these NETs using lamin B1 DamID revealed that each NET affected the nuclear positioning of distinct sets of genomic regions with a significant tissue-specific component. Thus NET influences on genome organization can contribute to gene expression changes associated with differentiation even in the absence of other factors and overt cellular differentiation changes.

  10. Identification of copy number variants in whole-genome data using Reference Coverage Profiles

    PubMed Central

    Glusman, Gustavo; Severson, Alissa; Dhankani, Varsha; Robinson, Max; Farrah, Terry; Mauldin, Denise E.; Stittrich, Anna B.; Ament, Seth A.; Roach, Jared C.; Brunkow, Mary E.; Bodian, Dale L.; Vockley, Joseph G.; Shmulevich, Ilya; Niederhuber, John E.; Hood, Leroy

    2015-01-01

    The identification of DNA copy numbers from short-read sequencing data remains a challenge for both technical and algorithmic reasons. The raw data for these analyses are measured in tens to hundreds of gigabytes per genome; transmitting, storing, and analyzing such large files is cumbersome, particularly for methods that analyze several samples simultaneously. We developed a very efficient representation of depth of coverage (150–1000× compression) that enables such analyses. Current methods for analyzing variants in whole-genome sequencing (WGS) data frequently miss copy number variants (CNVs), particularly hemizygous deletions in the 1–100 kb range. To fill this gap, we developed a method to identify CNVs in individual genomes, based on comparison to joint profiles pre-computed from a large set of genomes. We analyzed depth of coverage in over 6000 high quality (>40×) genomes. The depth of coverage has strong sequence-specific fluctuations only partially explained by global parameters like %GC. To account for these fluctuations, we constructed multi-genome profiles representing the observed or inferred diploid depth of coverage at each position along the genome. These Reference Coverage Profiles (RCPs) take into account the diverse technologies and pipeline versions used. Normalization of the scaled coverage to the RCP followed by hidden Markov model (HMM) segmentation enables efficient detection of CNVs and large deletions in individual genomes. Use of pre-computed multi-genome coverage profiles improves our ability to analyze each individual genome. We make available RCPs and tools for performing these analyses on personal genomes. We expect the increased sensitivity and specificity for individual genome analysis to be critical for achieving clinical-grade genome interpretation. PMID:25741365

  11. Cloud computing for genomic data analysis and collaboration.

    PubMed

    Langmead, Ben; Nellore, Abhinav

    2018-04-01

    Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.

  12. Amplification of the 1731 LTR retrotransposon in Drosophila melanogaster cultured cells: origin of neocopies and impact on the genome.

    PubMed

    Maisonhaute, Claude; Ogereau, David; Hua-Van, Aurélie; Capy, Pierre

    2007-05-15

    Transposable elements (TEs), represent a large fraction of the eukaryotic genome. In Drosophila melanogaster, about 20% of the genome corresponds to such middle repetitive DNA dispersed sequences. A fraction of TEs is composed of elements showing a retrovirus-like structure, the LTR-retrotransposons, the first TEs to be described in the Drosophila genome. Interestingly, in D. melanogaster embryonic immortal cell culture genomes the copy number of these LTR-retrotransposons was revealed to be higher than the copy number in the Drosophila genome, presumably as the result of transposition of some copies to new genomic locations [Potter, S.S., Brorein Jr., W.J., Dunsmuir, P., Rubin, G.M., 1979. Transposition of elements of the 412, copia and 297 dispersed repeated gene families in Drosophila. Cell 17, 415-427; Junakovic, N., Di Franco, C., Best-Belpomme, M., Echalier, G., 1988. On the transposition of copia-like nomadic elements in cultured Drosophila cells. Chromosoma 97, 212-218]. This suggests that so many transpositions modified the genome organisation and consequently the expression of targeted genes. To understand what has directed the transposition of TEs in Drosophila cell culture genomes, a search to identify the newly transposed copies was undertaken using 1731, a LTR-retrotransposon. A comparison between 1731 full-length elements found in the fly sequenced genome (y(1); cn(1)bw(1), sp(1) stock) and 1731 full-length elements amplified by PCR in the two cell line was done. The resulting data provide evidence that all 1731 neocopies were derived from a single copy slightly active in the Drosophila genome and subsequently strongly activated in cultured cells; and that this active copy is related to a newly evolved genomic variant (Kalmykova, A.I., et al., 2004. Selective expansion of the newly evolved genomic variants of retrotransposon 1731 in the Drosophila genomes. Mol. Biol. Evol. 21, 2281-2289). Moreover, neocopies are shown to be inserted in different sets of genes in the two cell lines suggesting they might be involved in the biological and physiological differences observed between Kc and S2 cell lines.

  13. Pharmacogenomic agreement between two cancer cell line data sets.

    PubMed

    2015-12-03

    Large cancer cell line collections broadly capture the genomic diversity of human cancers and provide valuable insight into anti-cancer drug response. Here we show substantial agreement and biological consilience between drug sensitivity measurements and their associated genomic predictors from two publicly available large-scale pharmacogenomics resources: The Cancer Cell Line Encyclopedia and the Genomics of Drug Sensitivity in Cancer databases.

  14. Augmenting Chinese hamster genome assembly by identifying regions of high confidence.

    PubMed

    Vishwanathan, Nandita; Bandyopadhyay, Arpan A; Fu, Hsu-Yuan; Sharma, Mohit; Johnson, Kathryn C; Mudge, Joann; Ramaraj, Thiruvarangan; Onsongo, Getiria; Silverstein, Kevin A T; Jacob, Nitya M; Le, Huong; Karypis, George; Hu, Wei-Shou

    2016-09-01

    Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines. Copyright © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Private and Efficient Query Processing on Outsourced Genomic Databases.

    PubMed

    Ghasemi, Reza; Al Aziz, Md Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-09-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.

  16. Private and Efficient Query Processing on Outsourced Genomic Databases

    PubMed Central

    Ghasemi, Reza; Al Aziz, Momin; Mohammed, Noman; Dehkordi, Massoud Hadian; Jiang, Xiaoqian

    2017-01-01

    Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time-consuming and expensive process. Second, it requires large-scale computation and storage systems to processes genomic sequences. Third, genomic databases are often owned by different organizations and thus not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 SNPs in a database of 20,000 records takes around 100 and 150 seconds, respectively. PMID:27834660

  17. [Genome editing of industrial microorganism].

    PubMed

    Zhu, Linjiang; Li, Qi

    2015-03-01

    Genome editing is defined as highly-effective and precise modification of cellular genome in a large scale. In recent years, such genome-editing methods have been rapidly developed in the field of industrial strain improvement. The quickly-updating methods thoroughly change the old mode of inefficient genetic modification, which is "one modification, one selection marker, and one target site". Highly-effective modification mode in genome editing have been developed including simultaneous modification of multiplex genes, highly-effective insertion, replacement, and deletion of target genes in the genome scale, cut-paste of a large DNA fragment. These new tools for microbial genome editing will certainly be applied widely, and increase the efficiency of industrial strain improvement, and promote the revolution of traditional fermentation industry and rapid development of novel industrial biotechnology like production of biofuel and biomaterial. The technological principle of these genome-editing methods and their applications were summarized in this review, which can benefit engineering and construction of industrial microorganism.

  18. Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)

    PubMed Central

    Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana

    2017-01-01

    Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065

  19. Low levels of LTR retrotransposon deletion by ectopic recombination in the gigantic genomes of salamanders.

    PubMed

    Frahry, Matthew Blake; Sun, Cheng; Chong, Rebecca A; Mueller, Rachel Lockridge

    2015-02-01

    Across the tree of life, species vary dramatically in nuclear genome size. Mutations that add or remove sequences from genomes-insertions or deletions, or indels-are the ultimate source of this variation. Differences in the tempo and mode of insertion and deletion across taxa have been proposed to contribute to evolutionary diversity in genome size. Among vertebrates, most of the largest genomes are found within the salamanders, an amphibian clade with genome sizes ranging from ~14 to ~120 Gb. Salamander genomes have been shown to experience slower rates of DNA loss through small (i.e., <30 bp) deletions than do other vertebrate genomes. However, no studies have addressed DNA loss from salamander genomes resulting from larger deletions. Here, we focus on one type of large deletion-ectopic-recombination-mediated removal of LTR retrotransposon sequences. In ectopic recombination, double-strand breaks are repaired using a "wrong" (i.e., ectopic, or non-allelic) template sequence-typically another locus of similar sequence. When breaks occur within the LTR portions of LTR retrotransposons, ectopic-recombination-mediated repair can produce deletions that remove the internal transposon sequence and the equivalent of one of the two LTR sequences. These deletions leave a signature in the genome-a solo LTR sequence. We compared levels of solo LTRs in the genomes of four salamander species with levels present in five vertebrates with smaller genomes. Our results demonstrate that salamanders have low levels of solo LTRs, suggesting that ectopic-recombination-mediated deletion of LTR retrotransposons occurs more slowly than in other vertebrates with smaller genomes.

  20. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-07-20

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Top