multiple genome comparisons: Topics by Science.gov

Sample records for multiple genome comparisons

CoCoNUT: an efficient system for the comparison and analysis of genomes

PubMed Central

2008-01-01

Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477
OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species

USDA-ARS?s Scientific Manuscript database

Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...
Conservation in the face of diversity: multistrain analysis of an intracellular bacterium

USDA-ARS?s Scientific Manuscript database

Comparisons of multiple strains revealed that A. marginale has a closed-core genome with few highly plastic regions, which include the msp2 and msp3 genes, as well as the aaap locus. Comparison of the Florida and St. Maries genome sequences found that SNPs comprise 0.8% of the longer Florida genome,...
GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface.

PubMed

Lajugie, Julien; Fourel, Nicolas; Bouhassira, Eric E

2015-01-01

Parallel visualization of multiple individual human genomes is a complex endeavor that is rapidly gaining importance with the increasing number of personal, phased and cancer genomes that are being generated. It requires the display of variants such as SNPs, indels and structural variants that are unique to specific genomes and the introduction of multiple overlapping gaps in the reference sequence. Here, we describe GenPlay Multi-Genome, an application specifically written to visualize and analyze multiple human genomes in parallel. GenPlay Multi-Genome is ideally suited for the comparison of allele-specific expression and functional genomic data obtained from multiple phased genomes in a graphical interface with access to multiple-track operation. It also allows the analysis of data that have been aligned to custom genomes rather than to a standard reference and can be used as a variant calling format file browser and as a tool to compare different genome assembly, such as hg19 and hg38. GenPlay is available under the GNU public license (GPL-3) from http://genplay.einstein.yu.edu. The source code is available at https://github.com/JulienLajugie/GenPlay. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

PubMed

Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

2008-11-27

The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.
Comparative analysis and visualization of multiple collinear genomes

PubMed Central

2012-01-01

Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897
BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

PubMed

Kalkatawi, Manal; Alam, Intikhab; Bajic, Vladimir B

2015-08-18

Genome annotation is one way of summarizing the existing knowledge about genomic characteristics of an organism. There has been an increased interest during the last several decades in computer-based structural and functional genome annotation. Many methods for this purpose have been developed for eukaryotes and prokaryotes. Our study focuses on comparison of functional annotations of prokaryotic genomes. To the best of our knowledge there is no fully automated system for detailed comparison of functional genome annotations generated by different annotation methods (AMs). The presence of many AMs and development of new ones introduce needs to: a/ compare different annotations for a single genome, and b/ generate annotation by combining individual ones. To address these issues we developed an Automated Tool for Bacterial GEnome Annotation ComparisON (BEACON) that benefits both AM developers and annotation analysers. BEACON provides detailed comparison of gene function annotations of prokaryotic genomes obtained by different AMs and generates extended annotations through combination of individual ones. For the illustration of BEACON's utility, we provide a comparison analysis of multiple different annotations generated for four genomes and show on these examples that the extended annotation can increase the number of genes annotated by putative functions up to 27%, while the number of genes without any function assignment is reduced. We developed BEACON, a fast tool for an automated and a systematic comparison of different annotations of single genomes. The extended annotation assigns putative functions to many genes with unknown functions. BEACON is available under GNU General Public License version 3.0 and is accessible at: http://www.cbrc.kaust.edu.sa/BEACON/ .
Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies

PubMed Central

Zhang, Yu; Liu, Jun S.

2011-01-01

Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online. PMID:22140288
Aligning the unalignable: bacteriophage whole genome alignments.

PubMed

Bérard, Sèverine; Chateau, Annie; Pompidor, Nicolas; Guertin, Paul; Bergeron, Anne; Swenson, Krister M

2016-01-13

In recent years, many studies focused on the description and comparison of large sets of related bacteriophage genomes. Due to the peculiar mosaic structure of these genomes, few informative approaches for comparing whole genomes exist: dot plots diagrams give a mostly qualitative assessment of the similarity/dissimilarity between two or more genomes, and clustering techniques are used to classify genomes. Multiple alignments are conspicuously absent from this scene. Indeed, whole genome aligners interpret lack of similarity between sequences as an indication of rearrangements, insertions, or losses. This behavior makes them ill-prepared to align bacteriophage genomes, where even closely related strains can accomplish the same biological function with highly dissimilar sequences. In this paper, we propose a multiple alignment strategy that exploits functional collinearity shared by related strains of bacteriophages, and uses partial orders to capture mosaicism of sets of genomes. As classical alignments do, the computed alignments can be used to predict that genes have the same biological function, even in the absence of detectable similarity. The Alpha aligner implements these ideas in visual interactive displays, and is used to compute several examples of alignments of Staphylococcus aureus and Mycobacterium bacteriophages, involving up to 29 genomes. Using these datasets, we prove that Alpha alignments are at least as good as those computed by standard aligners. Comparison with the progressive Mauve aligner - which implements a partial order strategy, but whose alignments are linearized - shows a greatly improved interactive graphic display, while avoiding misalignments. Multiple alignments of whole bacteriophage genomes work, and will become an important conceptual and visual tool in comparative genomics of sets of related strains. A python implementation of Alpha, along with installation instructions for Ubuntu and OSX, is available on bitbucket (https://bitbucket.org/thekswenson/alpha).
Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka)

PubMed Central

Veale, Andrew J.

2017-01-01

Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. PMID:29045601
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons

PubMed Central

2011-01-01

Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. Conclusions There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/. PMID:21824423
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons.

PubMed

Alikhan, Nabil-Fareed; Petty, Nicola K; Ben Zakour, Nouri L; Beatson, Scott A

2011-08-08

Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/.
Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes.

PubMed

Singh, Param Priya; Arora, Jatin; Isambert, Hervé

2015-07-01

Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined 'ohnologs' after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases.
Identification of Ohnolog Genes Originating from Whole Genome Duplication in Early Vertebrates, Based on Synteny Comparison across Multiple Genomes

PubMed Central

Singh, Param Priya; Arora, Jatin; Isambert, Hervé

2015-01-01

Whole genome duplications (WGD) have now been firmly established in all major eukaryotic kingdoms. In particular, all vertebrates descend from two rounds of WGDs, that occurred in their jawless ancestor some 500 MY ago. Paralogs retained from WGD, also coined ‘ohnologs’ after Susumu Ohno, have been shown to be typically associated with development, signaling and gene regulation. Ohnologs, which amount to about 20 to 35% of genes in the human genome, have also been shown to be prone to dominant deleterious mutations and frequently implicated in cancer and genetic diseases. Hence, identifying ohnologs is central to better understand the evolution of vertebrates and their susceptibility to genetic diseases. Early computational analyses to identify vertebrate ohnologs relied on content-based synteny comparisons between the human genome and a single invertebrate outgroup genome or within the human genome itself. These approaches are thus limited by lineage specific rearrangements in individual genomes. We report, in this study, the identification of vertebrate ohnologs based on the quantitative assessment and integration of synteny conservation between six amniote vertebrates and six invertebrate outgroups. Such a synteny comparison across multiple genomes is shown to enhance the statistical power of ohnolog identification in vertebrates compared to earlier approaches, by overcoming lineage specific genome rearrangements. Ohnolog gene families can be browsed and downloaded for three statistical confidence levels or recompiled for specific, user-defined, significance criteria at http://ohnologs.curie.fr/. In the light of the importance of WGD on the genetic makeup of vertebrates, our analysis provides a useful resource for researchers interested in gaining further insights on vertebrate evolution and genetic diseases. PMID:26181593
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes).

PubMed

Dessimoz, Christophe; Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-09-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references.
Comparative genomics approach to detecting split-coding regions in a low-coverage genome: lessons from the chimaera Callorhinchus milii (Holocephali, Chondrichthyes)

PubMed Central

Zoller, Stefan; Manousaki, Tereza; Qiu, Huan; Meyer, Axel; Kuraku, Shigehiro

2011-01-01

Recent development of deep sequencing technologies has facilitated de novo genome sequencing projects, now conducted even by individual laboratories. However, this will yield more and more genome sequences that are not well assembled, and will hinder thorough annotation when no closely related reference genome is available. One of the challenging issues is the identification of protein-coding sequences split into multiple unassembled genomic segments, which can confound orthology assignment and various laboratory experiments requiring the identification of individual genes. In this study, using the genome of a cartilaginous fish, Callorhinchus milii, as test case, we performed gene prediction using a model specifically trained for this genome. We implemented an algorithm, designated ESPRIT, to identify possible linkages between multiple protein-coding portions derived from a single genomic locus split into multiple unassembled genomic segments. We developed a validation framework based on an artificially fragmented human genome, improvements between early and recent mouse genome assemblies, comparison with experimentally validated sequences from GenBank, and phylogenetic analyses. Our strategy provided insights into practical solutions for efficient annotation of only partially sequenced (low-coverage) genomes. To our knowledge, our study is the first formulation of a method to link unassembled genomic segments based on proteomes of relatively distantly related species as references. PMID:21712341
OrthoVenn: a web server for genome wide comparison and annotation of orthologous clusters across multiple species.

PubMed

Wang, Yi; Coleman-Derr, Devin; Chen, Guoping; Gu, Yong Q

2015-07-01

Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

PubMed Central

Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

2008-01-01

Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802
Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka).

PubMed

Veale, Andrew J; Russello, Michael A

2017-10-01

Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Screening synteny blocks in pairwise genome comparisons through integer programming.

PubMed

Tang, Haibao; Lyons, Eric; Pedersen, Brent; Schnable, James C; Paterson, Andrew H; Freeling, Michael

2011-04-18

It is difficult to accurately interpret chromosomal correspondences such as true orthology and paralogy due to significant divergence of genomes from a common ancestor. Analyses are particularly problematic among lineages that have repeatedly experienced whole genome duplication (WGD) events. To compare multiple "subgenomes" derived from genome duplications, we need to relax the traditional requirements of "one-to-one" syntenic matchings of genomic regions in order to reflect "one-to-many" or more generally "many-to-many" matchings. However this relaxation may result in the identification of synteny blocks that are derived from ancient shared WGDs that are not of interest. For many downstream analyses, we need to eliminate weak, low scoring alignments from pairwise genome comparisons. Our goal is to objectively select subset of synteny blocks whose total scores are maximized while respecting the duplication history of the genomes in comparison. We call this "quota-based" screening of synteny blocks in order to appropriately fill a quota of syntenic relationships within one genome or between two genomes having WGD events. We have formulated the synteny block screening as an optimization problem known as "Binary Integer Programming" (BIP), which is solved using existing linear programming solvers. The computer program QUOTA-ALIGN performs this task by creating a clear objective function that maximizes the compatible set of synteny blocks under given constraints on overlaps and depths (corresponding to the duplication history in respective genomes). Such a procedure is useful for any pairwise synteny alignments, but is most useful in lineages affected by multiple WGDs, like plants or fish lineages. For example, there should be a 1:2 ploidy relationship between genome A and B if genome B had an independent WGD subsequent to the divergence of the two genomes. We show through simulations and real examples using plant genomes in the rosid superorder that the quota-based screening can eliminate ambiguous synteny blocks and focus on specific genomic evolutionary events, like the divergence of lineages (in cross-species comparisons) and the most recent WGD (in self comparisons). The QUOTA-ALIGN algorithm screens a set of synteny blocks to retain only those compatible with a user specified ploidy relationship between two genomes. These blocks, in turn, may be used for additional downstream analyses such as identifying true orthologous regions in interspecific comparisons. There are two major contributions of QUOTA-ALIGN: 1) reducing the block screening task to a BIP problem, which is novel; 2) providing an efficient software pipeline starting from all-against-all BLAST to the screened synteny blocks with dot plot visualizations. Python codes and full documentations are publicly available http://github.com/tanghaibao/quota-alignment. QUOTA-ALIGN program is also integrated as a major component in SynMap http://genomevolution.com/CoGe/SynMap.pl, offering easier access to thousands of genomes for non-programmers. © 2011 Tang et al; licensee BioMed Central Ltd.

Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes

PubMed Central

Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

2014-01-01

Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution. PMID:25523484
Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

PubMed

Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

2014-12-19

Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution.
Signatures of adaptation in the weedy rice genome

USDA-ARS?s Scientific Manuscript database

Weedy rice is a common problem of by product of domestication that has evolved multiple times from cultivated and wild rice relatives. Here we use whole genome sequences to examine the origin and adaptation of the two major US weedy red rice strains, with a comparison to Chinese weedy red rice. We f...
Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics

PubMed Central

Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

2015-01-01

The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326
OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes

PubMed Central

Li, Li; Stoeckert, Christian J.; Roos, David S.

2003-01-01

The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of “recent” paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome. PMID:12952885
A Thousand Fly Genomes: An Expanded Drosophila Genome Nexus.

PubMed

Lack, Justin B; Lange, Jeremy D; Tang, Alison D; Corbett-Detig, Russell B; Pool, John E

2016-12-01

The Drosophila Genome Nexus is a population genomic resource that provides D. melanogaster genomes from multiple sources. To facilitate comparisons across data sets, genomes are aligned using a common reference alignment pipeline which involves two rounds of mapping. Regions of residual heterozygosity, identity-by-descent, and recent population admixture are annotated to enable data filtering based on the user's needs. Here, we present a significant expansion of the Drosophila Genome Nexus, which brings the current data object to a total of 1,121 wild-derived genomes. New additions include 305 previously unpublished genomes from inbred lines representing six population samples in Egypt, Ethiopia, France, and South Africa, along with another 193 genomes added from recently-published data sets. We also provide an aligned D. simulans genome to facilitate divergence comparisons. This improved resource will broaden the range of population genomic questions that can addressed from multi-population allele frequencies and haplotypes in this model species. The larger set of genomes will also enhance the discovery of functionally relevant natural variation that exists within and between populations. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets

PubMed Central

Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

2016-01-01

Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder. PMID:27684958
MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

PubMed

Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.
Genome Sequences for Multiple Clavibacter Strains from Different Subspecies

PubMed Central

Yuan, Xiaoli (Kat)

2017-01-01

ABSTRACT The Gram-positive genus Clavibacter harbors economically important plant pathogens infecting a variety of agricultural crops, such as potato, tomato, corn, barley, etc. Here, we report five new genome sequences, those of strains CFIA-Cs3N, CFIA-CsR14, LMG 3663T, LMG 7333T, and ATCC 33566T, from different subspecies of Clavibacter michiganensis. All these genomic data will be used for reclassification and niche-adapted feature comparisons. PMID:28935724
Genomic Sequencing of Bordetella pertussis for Epidemiology and Global Surveillance of Whooping Cough.

PubMed

Bouchez, Valérie; Guglielmini, Julien; Dazas, Mélody; Landier, Annie; Toubiana, Julie; Guillot, Sophie; Criscuolo, Alexis; Brisse, Sylvain

2018-06-01

Bordetella pertussis causes whooping cough, a highly contagious respiratory disease that is reemerging in many world regions. The spread of antigen-deficient strains may threaten acellular vaccine efficacy. Dynamics of strain transmission are poorly defined because of shortcomings in current strain genotyping methods. Our objective was to develop a whole-genome genotyping strategy with sufficient resolution for local epidemiologic questions and sufficient reproducibility to enable international comparisons of clinical isolates. We defined a core genome multilocus sequence typing scheme comprising 2,038 loci and demonstrated its congruence with whole-genome single-nucleotide polymorphism variation. Most cases of intrafamilial groups of isolates or of multiple isolates recovered from the same patient were distinguished from temporally and geographically cocirculating isolates. However, epidemiologically unrelated isolates were sometimes nearly undistinguishable. We set up a publicly accessible core genome multilocus sequence typing database to enable global comparisons of B. pertussis isolates, opening the way for internationally coordinated surveillance.
Towards the analysis of the genomes of single cells: further characterisation of the multiple displacement amplification.

PubMed

Panelli, Simona; Damiani, Giuseppe; Espen, Luca; Micheli, Gioacchino; Sgaramella, Vittorio

2006-05-10

The development of methods for the analysis and comparison of the nucleic acids contained in single cells is an ambitious and challenging goal that may provide useful insights in many physiopathological processes. We review here some of the published protocols for the amplification of whole genomes (WGA). We focus on the reaction known as Multiple Displacement Amplification (MDA), which probably represents the most reliable and efficient WGA protocol developed to date. We discuss some recent advances and applications, as well as some modifications to the reaction, which should improve its use and enlarge its range of applicability possibly to degraded genomes, and also to RNA via complementary DNA.
Genome Sequences for Multiple Clavibacter Strains from Different Subspecies.

PubMed

Li, Xiang Sean; Yuan, Xiaoli Kat

2017-09-21

The Gram-positive genus Clavibacter harbors economically important plant pathogens infecting a variety of agricultural crops, such as potato, tomato, corn, barley, etc. Here, we report five new genome sequences, those of strains CFIA-Cs3N, CFIA-CsR14, LMG 3663 T , LMG 7333 T , and ATCC 33566 T , from different subspecies of Clavibacter michiganensis All these genomic data will be used for reclassification and niche-adapted feature comparisons. © Crown copyright 2017.
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

PubMed

Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

2011-01-01

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Whole-genome relationships among Francisella bacteria of diverse origins define new species and provide specific regions for detection

DOE PAGES

Challacombe, Jean Faust; Petersen, Jeannine M.; Gallegos-Graves, La Verne A.; ...

2016-11-23

Francisella tularensis is a highly virulent zoonotic pathogen that causes tularemia and, because of weaponization efforts in past world wars, is considered a tier 1 biothreat agent. Detection and surveillance of F. tularensis may be confounded by the presence of uncharacterized, closely related organisms. Through DNA-based diagnostics and environmental surveys, novel clinical and environmental Francisella isolates have been obtained in recent years. Here we present 7 new Francisella genomes and a comparison of their characteristics to each other and to 24 publicly available genomes as well as a comparative analysis of 16S rRNA and sdhA genes from over 90 Francisellamore » strains. Delineation of new species in bacteria is challenging, especially when isolates having very close genomic characteristics exhibit different physiological features—for example, when some are virulent pathogens in humans and animals while others are nonpathogenic or are opportunistic pathogens. Species resolution within Francisella varies with analyses of single genes, multiple gene or protein sets, or whole-genome comparisons of nucleic acid and amino acid sequences. Analyses focusing on single genes (16S rRNA, sdhA), multiple gene sets (virulence genes, lipopolysaccharide [LPS] biosynthesis genes, pathogenicity island), and whole-genome comparisons (nucleotide and protein) gave congruent results, but with different levels of discrimination confidence. We designate four new species within the genus; Francisella opportunistica sp. nov. (MA06-7296), Francisella salina sp. nov. (TX07-7308), Francisella uliginis sp. nov. (TX07-7310), and Francisella frigiditurris sp. nov. (CA97-1460). Lastly, this study provides a robust comparative framework to discern species and virulence features of newly detected Francisella bacteria.« less
Whole-genome relationships among Francisella bacteria of diverse origins define new species and provide specific regions for detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Challacombe, Jean Faust; Petersen, Jeannine M.; Gallegos-Graves, La Verne A.

Francisella tularensis is a highly virulent zoonotic pathogen that causes tularemia and, because of weaponization efforts in past world wars, is considered a tier 1 biothreat agent. Detection and surveillance of F. tularensis may be confounded by the presence of uncharacterized, closely related organisms. Through DNA-based diagnostics and environmental surveys, novel clinical and environmental Francisella isolates have been obtained in recent years. Here we present 7 new Francisella genomes and a comparison of their characteristics to each other and to 24 publicly available genomes as well as a comparative analysis of 16S rRNA and sdhA genes from over 90 Francisellamore » strains. Delineation of new species in bacteria is challenging, especially when isolates having very close genomic characteristics exhibit different physiological features—for example, when some are virulent pathogens in humans and animals while others are nonpathogenic or are opportunistic pathogens. Species resolution within Francisella varies with analyses of single genes, multiple gene or protein sets, or whole-genome comparisons of nucleic acid and amino acid sequences. Analyses focusing on single genes (16S rRNA, sdhA), multiple gene sets (virulence genes, lipopolysaccharide [LPS] biosynthesis genes, pathogenicity island), and whole-genome comparisons (nucleotide and protein) gave congruent results, but with different levels of discrimination confidence. We designate four new species within the genus; Francisella opportunistica sp. nov. (MA06-7296), Francisella salina sp. nov. (TX07-7308), Francisella uliginis sp. nov. (TX07-7310), and Francisella frigiditurris sp. nov. (CA97-1460). Lastly, this study provides a robust comparative framework to discern species and virulence features of newly detected Francisella bacteria.« less
Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics.

PubMed

Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

2015-01-01

The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Phylogenetic shadowing of primate sequences to find functional regions of the human genome.

PubMed

Boffelli, Dario; McAuliffe, Jon; Ovcharenko, Dmitriy; Lewis, Keith D; Ovcharenko, Ivan; Pachter, Lior; Rubin, Edward M

2003-02-28

Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.
Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events.

PubMed

Wang, Xiyin; Wang, Jingpeng; Jin, Dianchuan; Guo, Hui; Lee, Tae-Ho; Liu, Tao; Paterson, Andrew H

2015-06-01

Multiple comparisons among genomes can clarify their evolution, speciation, and functional innovations. To date, the genome sequences of eight grasses representing the most economically important Poaceae (grass) clades have been published, and their genomic-level comparison is an essential foundation for evolutionary, functional, and translational research. Using a formal and conservative approach, we aligned these genomes. Direct comparison of paralogous gene pairs all duplicated simultaneously reveal striking variation in evolutionary rates among whole genomes, with nucleotide substitution slowest in rice and up to 48% faster in other grasses, adding a new dimension to the value of rice as a grass model. We reconstructed ancestral genome contents for major evolutionary nodes, potentially contributing to understanding the divergence and speciation of grasses. Recent fossil evidence suggests revisions of the estimated dates of key evolutionary events, implying that the pan-grass polyploidization occurred ∼96 million years ago and could not be related to the Cretaceous-Tertiary mass extinction as previously inferred. Adjusted dating to reflect both updated fossil evidence and lineage-specific evolutionary rates suggested that maize subgenome divergence and maize-sorghum divergence were virtually simultaneous, a coincidence that would be explained if polyploidization directly contributed to speciation. This work lays a solid foundation for Poaceae translational genomics. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Multiple recent horizontal transfers of a large genomic region in cheese making fungi.

PubMed

Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

2014-01-01

While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti--called Wallaby--present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes.
Multiple recent horizontal transfers of a large genomic region in cheese making fungi

PubMed Central

Cheeseman, Kevin; Ropars, Jeanne; Renault, Pierre; Dupont, Joëlle; Gouzy, Jérôme; Branca, Antoine; Abraham, Anne-Laure; Ceppi, Maurizio; Conseiller, Emmanuel; Debuchy, Robert; Malagnac, Fabienne; Goarin, Anne; Silar, Philippe; Lacoste, Sandrine; Sallet, Erika; Bensimon, Aaron; Giraud, Tatiana; Brygoo, Yves

2014-01-01

While the extent and impact of horizontal transfers in prokaryotes are widely acknowledged, their importance to the eukaryotic kingdom is unclear and thought by many to be anecdotal. Here we report multiple recent transfers of a huge genomic island between Penicillium spp. found in the food environment. Sequencing of the two leading filamentous fungi used in cheese making, P. roqueforti and P. camemberti, and comparison with the penicillin producer P. rubens reveals a 575 kb long genomic island in P. roqueforti—called Wallaby—present as identical fragments at non-homologous loci in P. camemberti and P. rubens. Wallaby is detected in Penicillium collections exclusively in strains from food environments. Wallaby encompasses about 250 predicted genes, some of which are probably involved in competition with microorganisms. The occurrence of multiple recent eukaryotic transfers in the food environment provides strong evidence for the importance of this understudied and probably underestimated phenomenon in eukaryotes. PMID:24407037

Initial genome sequencing and analysis of multiple myeloma

PubMed Central

Chapman, Michael A.; Lawrence, Michael S.; Keats, Jonathan J.; Cibulskis, Kristian; Sougnez, Carrie; Schinzel, Anna C.; Harview, Christina L.; Brunet, Jean-Philippe; Ahmann, Gregory J.; Adli, Mazhar; Anderson, Kenneth C.; Ardlie, Kristin G.; Auclair, Daniel; Baker, Angela; Bergsagel, P. Leif; Bernstein, Bradley E.; Drier, Yotam; Fonseca, Rafael; Gabriel, Stacey B.; Hofmeister, Craig C.; Jagannath, Sundar; Jakubowiak, Andrzej J.; Krishnan, Amrita; Levy, Joan; Liefeld, Ted; Lonial, Sagar; Mahan, Scott; Mfuko, Bunmi; Monti, Stefano; Perkins, Louise M.; Onofrio, Robb; Pugh, Trevor J.; Vincent Rajkumar, S.; Ramos, Alex H.; Siegel, David S.; Sivachenko, Andrey; Trudel, Suzanne; Vij, Ravi; Voet, Douglas; Winckler, Wendy; Zimmerman, Todd; Carpten, John; Trent, Jeff; Hahn, William C.; Garraway, Levi A.; Meyerson, Matthew; Lander, Eric S.; Getz, Gad; Golub, Todd R.

2013-01-01

Multiple myeloma is an incurable malignancy of plasma cells, and its pathogenesis is poorly understood. Here we report the massively parallel sequencing of 38 tumor genomes and their comparison to matched normal DNAs. Several new and unexpected oncogenic mechanisms were suggested by the pattern of somatic mutation across the dataset. These include the mutation of genes involved in protein translation (seen in nearly half of the patients), genes involved in histone methylation, and genes involved in blood coagulation. In addition, a broader than anticipated role of NF-κB signaling was suggested by mutations in 11 members of the NF-κB pathway. Of potential immediate clinical relevance, activating mutations of the kinase BRAF were observed in 4% of patients, suggesting the evaluation of BRAF inhibitors in multiple myeloma clinical trials. These results indicate that cancer genome sequencing of large collections of samples will yield new insights into cancer not anticipated by existing knowledge. PMID:21430775
Whole-Genome Comparison Reveals Novel Genetic Elements That Characterize the Genome of Industrial Strains of Saccharomyces cerevisiae

PubMed Central

Borneman, Anthony R.; Desany, Brian A.; Riches, David; Affourtit, Jason P.; Forgan, Angus H.; Pretorius, Isak S.; Egholm, Michael; Chambers, Paul J.

2011-01-01

Human intervention has subjected the yeast Saccharomyces cerevisiae to multiple rounds of independent domestication and thousands of generations of artificial selection. As a result, this species comprises a genetically diverse collection of natural isolates as well as domesticated strains that are used in specific industrial applications. However the scope of genetic diversity that was captured during the domesticated evolution of the industrial representatives of this important organism remains to be determined. To begin to address this, we have produced whole-genome assemblies of six commercial strains of S. cerevisiae (four wine and two brewing strains). These represent the first genome assemblies produced from S. cerevisiae strains in their industrially-used forms and the first high-quality assemblies for S. cerevisiae strains used in brewing. By comparing these sequences to six existing high-coverage S. cerevisiae genome assemblies, clear signatures were found that defined each industrial class of yeast. This genetic variation was comprised of both single nucleotide polymorphisms and large-scale insertions and deletions, with the latter often being associated with ORF heterogeneity between strains. This included the discovery of more than twenty probable genes that had not been identified previously in the S. cerevisiae genome. Comparison of this large number of S. cerevisiae strains also enabled the characterization of a cluster of five ORFs that have integrated into the genomes of the wine and bioethanol strains on multiple occasions and at diverse genomic locations via what appears to involve the resolution of a circular DNA intermediate. This work suggests that, despite the scrutiny that has been directed at the yeast genome, there remains a significant reservoir of ORFs and novel modes of genetic transmission that may have significant phenotypic impact in this important model and industrial species. PMID:21304888
Genome-wide comparison and taxonomic relatedness of multiple Xylella fastidiosa strains reveal the occurrence of three subspecies and a new Xylella species.

PubMed

Marcelletti, Simone; Scortichini, Marco

2016-10-01

A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.
G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.

PubMed

Lenis, Vasileios Panagiotis E; Swain, Martin; Larkin, Denis M

2018-05-01

Cross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses. Here, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species' reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources. G-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons. G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.
HAL: a hierarchical format for storing and analyzing multiple genome alignments.

PubMed

Hickey, Glenn; Paten, Benedict; Earl, Dent; Zerbino, Daniel; Haussler, David

2013-05-15

Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance. We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover). All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal. hickey@soe.ucsc.edu or haussler@soe.ucsc.edu Supplementary data are available at Bioinformatics online.
Autopolyploidy genome duplication preserves other ancient genome duplications in Atlantic salmon (Salmo salar).

PubMed

Christensen, Kris A; Davidson, William S

2017-01-01

Salmonids (e.g. Atlantic salmon, Pacific salmon, and trouts) have a long legacy of genome duplication. In addition to three ancient genome duplications that all teleosts are thought to share, salmonids have had one additional genome duplication. We explored a methodology for untangling these duplications from each other to better understand them in Atlantic salmon. In this methodology, homeologous regions (paralogous/duplicated genomic regions originating from a whole genome duplication) from the most recent genome duplication were assumed to have duplicated genes at greater density and have greater sequence similarity. This assumption was used to differentiate duplicated gene pairs in Atlantic salmon that are either from the most recent genome duplication or from earlier duplications. From a comparison with multiple vertebrate species, it is clear that Atlantic salmon have retained more duplicated genes from ancient genome duplications than other vertebrates--often at higher density in the genome and containing fewer synonymous mutations. It may be that polysomic inheritance is the mechanism responsible for maintaining ancient gene duplicates in salmonids. Polysomic inheritance (when multiple chromosomes pair during meiosis) is thought to be relatively common in salmonids compared to other vertebrate species. These findings illuminate how genome duplications may not only increase the number of duplicated genes, but may also be involved in the maintenance of them from previous genome duplications as well.
The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes.

PubMed

Treangen, Todd J; Ondov, Brian D; Koren, Sergey; Phillippy, Adam M

2014-01-01

Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.
Short and long-term genome stability analysis of prokaryotic genomes.

PubMed

Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

2013-05-08

Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.
eShadow: A tool for comparing closely related sequences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

2004-01-15

Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less
Comparison of clinical outcomes and genomic characteristics of single focus and multifocal glioblastoma

PubMed Central

Paulsson, Anna K.; Holmes, Jordan A.; Peiffer, Ann M.; Miller, Lance D.; Liu, Wennuan; Xu, Jianfeng; Hinson, William H.; Lesser, Glenn J.; Laxton, Adrian W.; Tatter, Stephen B.; Debinski, Waldemar

2014-01-01

We investigate the differences in molecular signature and clinical outcomes between multiple lesion glioblastoma (GBM) and single focus GBM in the modern treatment era. Between August 2000 and May 2010, 161 patients with GBM were treated with modern radiotherapy techniques. Of this group, 33 were considered to have multiple lesion GBM (25 multifocal and 8 multicentric). Patterns of failure, time to progression and overall survival were compared based on whether the tumor was considered a single focus or multiple lesion GBM. Genomic groupings and methylation status were also investigated as a possible predictor of multifocality in a cohort of 41 patients with available tissue for analysis. There was no statistically significant difference in overall survival (p < 0.3) between the multiple lesion tumors (8.2 months) and single focus GBM (11 months). Progression free survival was superior in the single focus tumors (7.1 months) as compared to multi-focal (5.6 months, p = 0.02). For patients with single focus, multifocal and multicentric GBM, 81, 76 and 88 % of treatment failures occurred in the 60 Gy volume (p < 0.5), while 54, 72, and 38 % of treatment failures occurred in the 46 Gy volume (p < 0.4). Out of field failures were rare in both single focus and multiple foci GBM (7 vs 3 %). Genomic groupings and methylation status were not found to predict for multifocality. Patterns of failure, survival and genomic signatures for multiple lesion GBM do not appreciably differ when compared to single focus tumors. PMID:24990827
Novel genomic rearrangements mediated by multiple genetic elements in Streptococcus pyogenes M23ND confer potential for evolutionary persistence

PubMed Central

Bao, Yun-Juan; Liang, Zhong; Mayfield, Jeffrey A.; McShan, William M.; Lee, Shaun W.; Ploplis, Victoria A.; Castellino, Francis J.

2016-01-01

Symmetric genomic rearrangements around replication axes in genomes are commonly observed in prokaryotic genomes, including Group A Streptococcus (GAS). However, asymmetric rearrangements are rare. Our previous studies showed that the hypervirulent invasive GAS strain, M23ND, containing an inactivated transcriptional regulator system, covRS, exhibits unique extensive asymmetric rearrangements, which reconstructed a genomic structure distinct from other GAS genomes. In the current investigation, we identified the rearrangement events and examined the genetic consequences and evolutionary implications underlying the rearrangements. By comparison with a close phylogenetic relative, M18-MGAS8232, we propose a molecular model wherein a series of asymmetric rearrangements have occurred in M23ND, involving translocations, inversions and integrations mediated by multiple factors, viz., rRNA-comX (factor for late competence), transposons and phage-encoded gene segments. Assessments of the cumulative gene orientations and GC skews reveal that the asymmetric genomic rearrangements did not affect the general genomic integrity of the organism. However, functional distributions reveal re-clustering of a broad set of CovRS-regulated actively transcribed genes, including virulence factors and metabolic genes, to the same leading strand, with high confidence (p-value ~10−10). The re-clustering of the genes suggests a potential selection advantage for the spatial proximity to the transcription complexes, which may contain the global transcriptional regulator, CovRS, and other RNA polymerases. Their proximities allow for efficient transcription of the genes required for growth, virulence and persistence. A new paradigm of survival strategies of GAS strains is provided through multiple genomic rearrangements, while, at the same time, maintaining genomic integrity. PMID:27329479
SWPhylo - A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees.

PubMed

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA.
SWPhylo – A Novel Tool for Phylogenomic Inferences by Comparison of Oligonucleotide Patterns and Integration of Genome-Based and Gene-Based Phylogenetic Trees

PubMed Central

Yu, Xiaoyu; Reva, Oleg N

2018-01-01

Modern phylogenetic studies may benefit from the analysis of complete genome sequences of various microorganisms. Evolutionary inferences based on genome-scale analysis are believed to be more accurate than the gene-based alternative. However, the computational complexity of current phylogenomic procedures, inappropriateness of standard phylogenetic tools to process genome-wide data, and lack of reliable substitution models which correlates with alignment-free phylogenomic approaches deter microbiologists from using these opportunities. For example, the super-matrix and super-tree approaches of phylogenomics use multiple integrated genomic loci or individual gene-based trees to infer an overall consensus tree. However, these approaches potentially multiply errors of gene annotation and sequence alignment not mentioning the computational complexity and laboriousness of the methods. In this article, we demonstrate that the annotation- and alignment-free comparison of genome-wide tetranucleotide frequencies, termed oligonucleotide usage patterns (OUPs), allowed a fast and reliable inference of phylogenetic trees. These were congruent to the corresponding whole genome super-matrix trees in terms of tree topology when compared with other known approaches including 16S ribosomal RNA and GyrA protein sequence comparison, complete genome-based MAUVE, and CVTree methods. A Web-based program to perform the alignment-free OUP-based phylogenomic inferences was implemented at http://swphylo.bi.up.ac.za/. Applicability of the tool was tested on different taxa from subspecies to intergeneric levels. Distinguishing between closely related taxonomic units may be enforced by providing the program with alignments of marker protein sequences, eg, GyrA. PMID:29511354
Multimode drug inducible CRISPR/Cas9 devices for transcriptional activation and genome editing

PubMed Central

Lu, Jia; Zhao, Chen; Zhao, Yingze; Zhang, Jingfang; Zhang, Yue; Chen, Li; Han, Qiyuan; Ying, Yue; Peng, Shuai; Ai, Runna; Wang, Yu

2018-01-01

Abstract Precise investigation and manipulation of dynamic biological processes often requires molecular modulation in a controlled inducible manner. The clustered, regularly interspaced, short palindromic repeats (CRISPR)/CRISPR associated protein 9 (Cas9) has emerged as a versatile tool for targeted gene editing and transcriptional programming. Here, we designed and vigorously optimized a series of Hybrid drug Inducible CRISPR/Cas9 Technologies (HIT) for transcriptional activation by grafting a mutated human estrogen receptor (ERT2) to multiple CRISPR/Cas9 systems, which renders them 4-hydroxytamoxifen (4-OHT) inducible for the access of genome. Further, extra functionality of simultaneous genome editing was achieved with one device we named HIT2. Optimized terminal devices herein delivered advantageous performances in comparison with several existing designs. They exerted selective, titratable, rapid and reversible response to drug induction. In addition, these designs were successfully adapted to an orthogonal Cas9. HIT systems developed in this study can be applied for controlled modulation of potentially any genomic loci in multiple modes. PMID:29237052
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder.

PubMed

Werling, Donna M; Brand, Harrison; An, Joon-Yong; Stone, Matthew R; Zhu, Lingxue; Glessner, Joseph T; Collins, Ryan L; Dong, Shan; Layer, Ryan M; Markenscoff-Papadimitriou, Eirene; Farrell, Andrew; Schwartz, Grace B; Wang, Harold Z; Currall, Benjamin B; Zhao, Xuefang; Dea, Jeanselle; Duhn, Clif; Erdman, Carolyn A; Gilson, Michael C; Yadav, Rachita; Handsaker, Robert E; Kashin, Seva; Klei, Lambertus; Mandell, Jeffrey D; Nowakowski, Tomasz J; Liu, Yuwen; Pochareddy, Sirisha; Smith, Louw; Walker, Michael F; Waterman, Matthew J; He, Xin; Kriegstein, Arnold R; Rubenstein, John L; Sestan, Nenad; McCarroll, Steven A; Neale, Benjamin M; Coon, Hilary; Willsey, A Jeremy; Buxbaum, Joseph D; Daly, Mark J; State, Matthew W; Quinlan, Aaron R; Marth, Gabor T; Roeder, Kathryn; Devlin, Bernie; Talkowski, Michael E; Sanders, Stephan J

2018-05-01

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
A novel statistical method for quantitative comparison of multiple ChIP-seq datasets.

PubMed

Chen, Li; Wang, Chi; Qin, Zhaohui S; Wu, Hao

2015-06-15

ChIP-seq is a powerful technology to measure the protein binding or histone modification strength in the whole genome scale. Although there are a number of methods available for single ChIP-seq data analysis (e.g. 'peak detection'), rigorous statistical method for quantitative comparison of multiple ChIP-seq datasets with the considerations of data from control experiment, signal to noise ratios, biological variations and multiple-factor experimental designs is under-developed. In this work, we develop a statistical method to perform quantitative comparison of multiple ChIP-seq datasets and detect genomic regions showing differential protein binding or histone modification. We first detect peaks from all datasets and then union them to form a single set of candidate regions. The read counts from IP experiment at the candidate regions are assumed to follow Poisson distribution. The underlying Poisson rates are modeled as an experiment-specific function of artifacts and biological signals. We then obtain the estimated biological signals and compare them through the hypothesis testing procedure in a linear model framework. Simulations and real data analyses demonstrate that the proposed method provides more accurate and robust results compared with existing ones. An R software package ChIPComp is freely available at http://web1.sph.emory.edu/users/hwu30/software/ChIPComp.html. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
PLEXdb: Gene expression resources for plants and plant pathogens

USDA-ARS?s Scientific Manuscript database

PLEXdb (Plant Expression Database), in partnership with community databases, supports comparisons of gene expression across multiple plant and pathogen species, promoting individuals and/or consortia to upload genome-scale data sets to contrast them to previously archived data. These analyses facili...
PSP: rapid identification of orthologous coding genes under positive selection across multiple closely related prokaryotic genomes.

PubMed

Su, Fei; Ou, Hong-Yu; Tao, Fei; Tang, Hongzhi; Xu, Ping

2013-12-27

With genomic sequences of many closely related bacterial strains made available by deep sequencing, it is now possible to investigate trends in prokaryotic microevolution. Positive selection is a sub-process of microevolution, in which a particular mutation is favored, causing the allele frequency to continuously shift in one direction. Wide scanning of prokaryotic genomes has shown that positive selection at the molecular level is much more frequent than expected. Genes with significant positive selection may play key roles in bacterial adaption to different environmental pressures. However, selection pressure analyses are computationally intensive and awkward to configure. Here we describe an open access web server, which is designated as PSP (Positive Selection analysis for Prokaryotic genomes) for performing evolutionary analysis on orthologous coding genes, specially designed for rapid comparison of dozens of closely related prokaryotic genomes. Remarkably, PSP facilitates functional exploration at the multiple levels by assignments and enrichments of KO, GO or COG terms. To illustrate this user-friendly tool, we analyzed Escherichia coli and Bacillus cereus genomes and found that several genes, which play key roles in human infection and antibiotic resistance, show significant evidence of positive selection. PSP is freely available to all users without any login requirement at: http://db-mml.sjtu.edu.cn/PSP/. PSP ultimately allows researchers to do genome-scale analysis for evolutionary selection across multiple prokaryotic genomes rapidly and easily, and identify the genes undergoing positive selection, which may play key roles in the interactions of host-pathogen and/or environmental adaptation.
Sockeye: A 3D Environment for Comparative Genomics

PubMed Central

Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

2004-01-01

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592
Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.

PubMed

Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad

2016-09-01

Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.

Comparative genomics in the Asteraceae reveals little evidence for parallel evolutionary change in invasive taxa.

PubMed

Hodgins, Kathryn A; Bock, Dan G; Hahn, Min A; Heredia, Sylvia M; Turner, Kathryn G; Rieseberg, Loren H

2015-05-01

Asteraceae, the largest family of flowering plants, has given rise to many notorious invasive species. Using publicly available transcriptome assemblies from 35 Asteraceae, including six major invasive species, we examined evidence for micro- and macro-evolutionary genomic changes associated with invasion. To detect episodes of positive selection repeated across multiple introductions, we conducted comparisons between native and introduced genotypes from six focal species and identified genes with elevated rates of amino acid change (dN/dS). We then looked for evidence of positive selection at a broader phylogenetic scale across all taxa. As invasive species may experience founder events during colonization and spread, we also looked for evidence of increased genetic load in introduced genotypes. We rarely found evidence for parallel changes in orthologous genes in the intraspecific comparisons, but in some cases we identified changes in members of the same gene family. Using among-species comparisons, we detected positive selection in 0.003-0.69% and 2.4-7.8% of the genes using site and stochastic branch-site models, respectively. These genes had diverse putative functions, including defence response, stress response and herbicide resistance, although there was no clear pattern in the GO terms. There was no indication that introduced genotypes have a higher proportion of deleterious alleles than native genotypes in the six focal species, suggesting multiple introductions and admixture mitigated the impact of drift. Our findings provide little evidence for common genomic responses in invasive taxa of the Asteraceae and hence suggest that multiple evolutionary pathways may lead to adaptation during introduction and spread in these species. © 2014 John Wiley & Sons Ltd.
Divergent copies of the large inverted repeat in the chloroplast genomes of ulvophycean green algae.

PubMed

Turmel, Monique; Otis, Christian; Lemieux, Claude

2017-04-20

The chloroplast genomes of many algae and almost all land plants carry two identical copies of a large inverted repeat (IR) sequence that can pair for flip-flop recombination and undergo expansion/contraction. Although the IR has been lost multiple times during the evolution of the green algae, the underlying mechanisms are still largely unknown. A recent comparison of IR-lacking and IR-containing chloroplast genomes of chlorophytes from the Ulvophyceae (Ulotrichales) suggested that differential elimination of genes from the IR copies might lead to IR loss. To gain deeper insights into the evolutionary history of the chloroplast genome in the Ulvophyceae, we analyzed the genomes of Ignatius tetrasporus and Pseudocharacium americanum (Ignatiales, an order not previously sampled), Dangemannia microcystis (Oltmannsiellopsidales), Pseudoneochloris marina (Ulvales) and also Chamaetrichon capsulatum and Trichosarcina mucosa (Ulotrichales). Our comparison of these six chloroplast genomes with those previously reported for nine ulvophyceans revealed unsuspected variability. All newly examined genomes feature an IR, but remarkably, the copies of the IR present in the Ignatiales, Pseudoneochloris, and Chamaetrichon diverge in sequence, with the tRNA genes from the rRNA operon missing in one IR copy. The implications of this unprecedented finding for the mechanism of IR loss and flip-flop recombination are discussed.
Cryptosporidium as a testbed for single cell genome characterization of unicellular eukaryotes.

PubMed

Troell, Karin; Hallström, Björn; Divne, Anna-Maria; Alsmark, Cecilia; Arrighi, Romanico; Huss, Mikael; Beser, Jessica; Bertilsson, Stefan

2016-06-23

Infectious disease involving multiple genetically distinct populations of pathogens is frequently concurrent, but difficult to detect or describe with current routine methodology. Cryptosporidium sp. is a widespread gastrointestinal protozoan of global significance in both animals and humans. It cannot be easily maintained in culture and infections of multiple strains have been reported. To explore the potential use of single cell genomics methodology for revealing genome-level variation in clinical samples from Cryptosporidium-infected hosts, we sorted individual oocysts for subsequent genome amplification and full-genome sequencing. Cells were identified with fluorescent antibodies with an 80 % success rate for the entire single cell genomics workflow, demonstrating that the methodology can be applied directly to purified fecal samples. Ten amplified genomes from sorted single cells were selected for genome sequencing and compared both to the original population and a reference genome in order to evaluate the accuracy and performance of the method. Single cell genome coverage was on average 81 % even with a moderate sequencing effort and by combining the 10 single cell genomes, the full genome was accounted for. By a comparison to the original sample, biological variation could be distinguished and separated from noise introduced in the amplification. As a proof of principle, we have demonstrated the power of applying single cell genomics to dissect infectious disease caused by closely related parasite species or subtypes. The workflow can easily be expanded and adapted to target other protozoans, and potential applications include mapping genome-encoded traits, virulence, pathogenicity, host specificity and resistance at the level of cells as truly meaningful biological units.
Population and clinical genetics of human transposable elements in the (post) genomic era

PubMed Central

Rishishwar, Lavanya; Wang, Lu; Clayton, Evan A.; Mariño-Ramírez, Leonardo; McDonald, John F.; Jordan, I. King

2017-01-01

ABSTRACT Recent technological developments—in genomics, bioinformatics and high-throughput experimental techniques—are providing opportunities to study ongoing human transposable element (TE) activity at an unprecedented level of detail. It is now possible to characterize genome-wide collections of TE insertion sites for multiple human individuals, within and between populations, and for a variety of tissue types. Comparison of TE insertion site profiles between individuals captures the germline activity of TEs and reveals insertion site variants that segregate as polymorphisms among human populations, whereas comparison among tissue types ascertains somatic TE activity that generates cellular heterogeneity. In this review, we provide an overview of these new technologies and explore their implications for population and clinical genetic studies of human TEs. We cover both recent published results on human TE insertion activity as well as the prospects for future TE studies related to human evolution and health. PMID:28228978
Identification of novel RNA secondary structures within the hepatitis C virus genome reveals a cooperative involvement in genome packaging

PubMed Central

Stewart, H.; Bingham, R.J.; White, S. J.; Dykeman, E. C.; Zothner, C.; Tuplin, A. K.; Stockley, P. G.; Twarock, R.; Harris, M.

2016-01-01

The specific packaging of the hepatitis C virus (HCV) genome is hypothesised to be driven by Core-RNA interactions. To identify the regions of the viral genome involved in this process, we used SELEX (systematic evolution of ligands by exponential enrichment) to identify RNA aptamers which bind specifically to Core in vitro. Comparison of these aptamers to multiple HCV genomes revealed the presence of a conserved terminal loop motif within short RNA stem-loop structures. We postulated that interactions of these motifs, as well as sub-motifs which were present in HCV genomes at statistically significant levels, with the Core protein may drive virion assembly. We mutated 8 of these predicted motifs within the HCV infectious molecular clone JFH-1, thereby producing a range of mutant viruses predicted to possess altered RNA secondary structures. RNA replication and viral titre were unaltered in viruses possessing only one mutated structure. However, infectivity titres were decreased in viruses possessing a higher number of mutated regions. This work thus identified multiple novel RNA motifs which appear to contribute to genome packaging. We suggest that these structures act as cooperative packaging signals to drive specific RNA encapsidation during HCV assembly. PMID:26972799
PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension.

PubMed

Chen, Feng; Wang, Chenghong; Dai, Wenrui; Jiang, Xiaoqian; Mohammed, Noman; Al Aziz, Md Momin; Sadat, Md Nazmus; Sahinalp, Cenk; Lauter, Kristin; Wang, Shuang

2017-07-26

Advances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data. We present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. We compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation. The proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.
Calibrating genomic and allelic coverage bias in single-cell sequencing.

PubMed

Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher

2015-04-16

Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.
Calibrating genomic and allelic coverage bias in single-cell sequencing

PubMed Central

Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher

2016-01-01

Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913
Genome of the Actinomycete Plant Pathogen Clavibacter michiganensis subsp. sepedonicus Suggests Recent Niche Adaptation▿ †

PubMed Central

Bentley, Stephen D.; Corton, Craig; Brown, Susan E.; Barron, Andrew; Clark, Louise; Doggett, Jon; Harris, Barbara; Ormond, Doug; Quail, Michael A.; May, Georgiana; Francis, David; Knudson, Dennis; Parkhill, Julian; Ishimaru, Carol A.

2008-01-01

Clavibacter michiganensis subsp. sepedonicus is a plant-pathogenic bacterium and the causative agent of bacterial ring rot, a devastating agricultural disease under strict quarantine control and zero tolerance in the seed potato industry. This organism appears to be largely restricted to an endophytic lifestyle, proliferating within plant tissues and unable to persist in the absence of plant material. Analysis of the genome sequence of C. michiganensis subsp. sepedonicus and comparison with the genome sequences of related plant pathogens revealed a dramatic recent evolutionary history. The genome contains 106 insertion sequence elements, which appear to have been active in extensive rearrangement of the chromosome compared to that of Clavibacter michiganensis subsp. michiganensis. There are 110 pseudogenes with overrepresentation in functions associated with carbohydrate metabolism, transcriptional regulation, and pathogenicity. Genome comparisons also indicated that there is substantial gene content diversity within the species, probably due to differential gene acquisition and loss. These genomic features and evolutionary dating suggest that there was recent adaptation for life in a restricted niche where nutrient diversity and perhaps competition are low, correlated with a reduced ability to exploit previously occupied complex niches outside the plant. Toleration of factors such as multiplication and integration of insertion sequence elements, genome rearrangements, and functional disruption of many genes and operons seems to indicate that there has been general relaxation of selective pressure on a large proportion of the genome. PMID:18192393
Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies

PubMed Central

2012-01-01

Background Rosaceae include numerous economically important and morphologically diverse species. Comparative mapping between the member species in Rosaceae have indicated some level of synteny. Recently the whole genome of three crop species, peach, apple and strawberry, which belong to different genera of the Rosaceae family, have been sequenced, allowing in-depth comparison of these genomes. Results Our analysis using the whole genome sequences of peach, apple and strawberry identified 1399 orthologous regions between the three genomes, with a mean length of around 100 kb. Each peach chromosome showed major orthology mostly to one strawberry chromosome, but to more than two apple chromosomes, suggesting that the apple genome went through more chromosomal fissions in addition to the whole genome duplication after the divergence of the three genera. However, the distribution of contiguous ancestral regions, identified using the multiple genome rearrangements and ancestors (MGRA) algorithm, suggested that the Fragaria genome went through a greater number of small scale rearrangements compared to the other genomes since they diverged from a common ancestor. Using the contiguous ancestral regions, we reconstructed a hypothetical ancestral genome for the Rosaceae 7 composed of nine chromosomes and propose the evolutionary steps from the ancestral genome to the extant Fragaria, Prunus and Malus genomes. Conclusion Our analysis shows that different modes of evolution may have played major roles in different subfamilies of Rosaceae. The hypothetical ancestral genome of Rosaceae and the evolutionary steps that lead to three different lineages of Rosaceae will facilitate our understanding of plant genome evolution as well as have a practical impact on knowledge transfer among member species of Rosaceae. PMID:22475018
Ancient bacterial endosymbionts of insects: Genomes as sources of insight and springboards for inquiry.

PubMed

Wernegreen, Jennifer J

2017-09-15

Ancient associations between insects and bacteria provide models to study intimate host-microbe interactions. Currently, a wealth of genome sequence data for long-term, obligately intracellular (primary) endosymbionts of insects reveals profound genomic consequences of this specialized bacterial lifestyle. Those consequences include severe genome reduction and extreme base compositions. This minireview highlights the utility of genome sequence data to understand how, and why, endosymbionts have been pushed to such extremes, and to illuminate the functional consequences of such extensive genome change. While the static snapshots provided by individual endosymbiont genomes are valuable, comparative analyses of multiple genomes have shed light on evolutionary mechanisms. Namely, genome comparisons have told us that selection is important in fine-tuning gene content, but at the same time, mutational pressure and genetic drift contribute to genome degradation. Examples from Blochmannia, the primary endosymbiont of the ant tribe Camponotini, illustrate the value and constraints of genome sequence data, and exemplify how genomes can serve as a springboard for further comparative and experimental inquiry. Copyright © 2017. Published by Elsevier Inc.
Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks.

PubMed

Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S K; Mammel, Mark K; Tarr, Phillip I; Eppinger, Mark

2016-01-01

Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and long-term evolution and can complement currently employed typing schemes for outbreak ex- and inclusion, diagnostics, surveillance, and forensic studies.
Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks

PubMed Central

Rusconi, Brigida; Sanjar, Fatemeh; Koenig, Sara S. K.; Mammel, Mark K.; Tarr, Phillip I.; Eppinger, Mark

2016-01-01

Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from Escherichia coli O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and long-term evolution and can complement currently employed typing schemes for outbreak ex- and inclusion, diagnostics, surveillance, and forensic studies. PMID:27446025
Comparison of the Live Attenuated Yellow Fever Vaccine 17D-204 Strain to Its Virulent Parental Strain Asibi by Deep Sequencing

PubMed Central

Beck, Andrew; Tesh, Robert B.; Wood, Thomas G.; Widen, Steven G.; Ryman, Kate D.; Barrett, Alan D. T.

2014-01-01

Background. The first comparison of a live RNA viral vaccine strain to its wild-type parental strain by deep sequencing is presented using as a model the yellow fever virus (YFV) live vaccine strain 17D-204 and its wild-type parental strain, Asibi. Methods. The YFV 17D-204 vaccine genome was compared to that of the parental strain Asibi by massively parallel methods. Variability was compared on multiple scales of the viral genomes. A modeled exploration of small-frequency variants was performed to reconstruct plausible regions of mutational plasticity. Results. Overt quasispecies diversity is a feature of the parental strain, whereas the live vaccine strain lacks diversity according to multiple independent measurements. A lack of attenuating mutations in the Asibi population relative to that of 17D-204 was observed, demonstrating that the vaccine strain was derived by discrete mutation of Asibi and not by selection of genomes in the wild-type population. Conclusions. Relative quasispecies structure is a plausible correlate of attenuation for live viral vaccines. Analyses such as these of attenuated viruses improve our understanding of the molecular basis of vaccine attenuation and provide critical information on the stability of live vaccines and the risk of reversion to virulence. PMID:24141982
Comparison of the live attenuated yellow fever vaccine 17D-204 strain to its virulent parental strain Asibi by deep sequencing.

PubMed

Beck, Andrew; Tesh, Robert B; Wood, Thomas G; Widen, Steven G; Ryman, Kate D; Barrett, Alan D T

2014-02-01

The first comparison of a live RNA viral vaccine strain to its wild-type parental strain by deep sequencing is presented using as a model the yellow fever virus (YFV) live vaccine strain 17D-204 and its wild-type parental strain, Asibi. The YFV 17D-204 vaccine genome was compared to that of the parental strain Asibi by massively parallel methods. Variability was compared on multiple scales of the viral genomes. A modeled exploration of small-frequency variants was performed to reconstruct plausible regions of mutational plasticity. Overt quasispecies diversity is a feature of the parental strain, whereas the live vaccine strain lacks diversity according to multiple independent measurements. A lack of attenuating mutations in the Asibi population relative to that of 17D-204 was observed, demonstrating that the vaccine strain was derived by discrete mutation of Asibi and not by selection of genomes in the wild-type population. Relative quasispecies structure is a plausible correlate of attenuation for live viral vaccines. Analyses such as these of attenuated viruses improve our understanding of the molecular basis of vaccine attenuation and provide critical information on the stability of live vaccines and the risk of reversion to virulence.
Genome-Wide Comparison of Magnaporthe Species Reveals a Host-Specific Pattern of Secretory Proteins and Transposable Elements

PubMed Central

Gowda, Malali

2016-01-01

Blast disease caused by the Magnaporthe species is a major factor affecting the productivity of rice, wheat and millets. This study was aimed at generating genomic information for rice and non-rice Magnaporthe isolates to understand the extent of genetic variation. We have sequenced the whole genome of the Magnaporthe isolates, infecting rice (leaf and neck), finger millet (leaf and neck), foxtail millet (leaf) and buffel grass (leaf). Rice and finger millet isolates infecting both leaf and neck tissues were sequenced, since the damage and yield loss caused due to neck blast is much higher as compared to leaf blast. The genome-wide comparison was carried out to study the variability in gene content, candidate effectors, repeat element distribution, genes involved in carbohydrate metabolism and SNPs. The analysis of repeat element footprints revealed some genes such as naringenin, 2-oxoglutarate 3-dioxygenase being targeted by Pot2 and Occan, in isolates from different host species. Some repeat insertions were host-specific while other insertions were randomly shared between isolates. The distributions of repeat elements, secretory proteins, CAZymes and SNPs showed significant variation across host-specific lineages of Magnaporthe indicating an independent genome evolution orchestrated by multiple genomic factors. PMID:27658241
Implications of the plastid genome sequence of typha (typhaceae, poales) for understanding genome evolution in poaceae.

PubMed

Guisinger, Mary M; Chumley, Timothy W; Kuehl, Jennifer V; Boore, Jeffrey L; Jansen, Robert K

2010-02-01

Plastid genomes of the grasses (Poaceae) are unusual in their organization and rates of sequence evolution. There has been a recent surge in the availability of grass plastid genome sequences, but a comprehensive comparative analysis of genome evolution has not been performed that includes any related families in the Poales. We report on the plastid genome of Typha latifolia, the first non-grass Poales sequenced to date, and we present comparisons of genome organization and sequence evolution within Poales. Our results confirm that grass plastid genomes exhibit acceleration in both genomic rearrangements and nucleotide substitutions. Poaceae have multiple structural rearrangements, including three inversions, three genes losses (accD, ycf1, ycf2), intron losses in two genes (clpP, rpoC1), and expansion of the inverted repeat (IR) into both large and small single-copy regions. These rearrangements are restricted to the Poaceae, and IR expansion into the small single-copy region correlates with the phylogeny of the family. Comparisons of 73 protein-coding genes for 47 angiosperms including nine Poaceae genera confirm that the branch leading to Poaceae has significantly accelerated rates of change relative to other monocots and angiosperms. Furthermore, rates of sequence evolution within grasses are lower, indicating a deceleration during diversification of the family. Overall there is a strong correlation between accelerated rates of genomic rearrangements and nucleotide substitutions in Poaceae, a phenomenon that has been noted recently throughout angiosperms. The cause of the correlation is unknown, but faulty DNA repair has been suggested in other systems including bacterial and animal mitochondrial genomes.
Base-By-Base: single nucleotide-level analysis of whole viral genome alignments.

PubMed

Brodie, Ryan; Smith, Alex J; Roper, Rachel L; Tcherepanov, Vasily; Upton, Chris

2004-07-14

With ever increasing numbers of closely related virus genomes being sequenced, it has become desirable to be able to compare two genomes at a level more detailed than gene content because two strains of an organism may share the same set of predicted genes but still differ in their pathogenicity profiles. For example, detailed comparison of multiple isolates of the smallpox virus genome (each approximately 200 kb, with 200 genes) is not feasible without new bioinformatics tools. A software package, Base-By-Base, has been developed that provides visualization tools to enable researchers to 1) rapidly identify and correct alignment errors in large, multiple genome alignments; and 2) generate tabular and graphical output of differences between the genomes at the nucleotide level. Base-By-Base uses detailed annotation information about the aligned genomes and can list each predicted gene with nucleotide differences, display whether variations occur within promoter regions or coding regions and whether these changes result in amino acid substitutions. Base-By-Base can connect to our mySQL database (Virus Orthologous Clusters; VOCs) to retrieve detailed annotation information about the aligned genomes or use information from text files. Base-By-Base enables users to quickly and easily compare large viral genomes; it highlights small differences that may be responsible for important phenotypic differences such as virulence. It is available via the Internet using Java Web Start and runs on Macintosh, PC and Linux operating systems with the Java 1.4 virtual machine.
The Reference Genome of the Halophytic Plant Eutrema salsugineum

PubMed Central

Yang, Ruolin; Jarvis, David E.; Chen, Hao; Beilstein, Mark A.; Grimwood, Jane; Jenkins, Jerry; Shu, ShengQiang; Prochnik, Simon; Xin, Mingming; Ma, Chuang; Schmutz, Jeremy; Wing, Rod A.; Mitchell-Olds, Thomas; Schumaker, Karen S.; Wang, Xiangfeng

2013-01-01

Halophytes are plants that can naturally tolerate high concentrations of salt in the soil, and their tolerance to salt stress may occur through various evolutionary and molecular mechanisms. Eutrema salsugineum is a halophytic species in the Brassicaceae that can naturally tolerate multiple types of abiotic stresses that typically limit crop productivity, including extreme salinity and cold. It has been widely used as a laboratorial model for stress biology research in plants. Here, we present the reference genome sequence (241 Mb) of E. salsugineum at 8× coverage sequenced using the traditional Sanger sequencing-based approach with comparison to its close relative Arabidopsis thaliana. The E. salsugineum genome contains 26,531 protein-coding genes and 51.4% of its genome is composed of repetitive sequences that mostly reside in pericentromeric regions. Comparative analyses of the genome structures, protein-coding genes, microRNAs, stress-related pathways, and estimated translation efficiency of proteins between E. salsugineum and A. thaliana suggest that halophyte adaptation to environmental stresses may occur via a global network adjustment of multiple regulatory mechanisms. The E. salsugineum genome provides a resource to identify naturally occurring genetic alterations contributing to the adaptation of halophytic plants to salinity and that might be bioengineered in related crop species. PMID:23518688
Solving the problem of comparing whole bacterial genomes across different sequencing platforms.

PubMed

Kaas, Rolf S; Leekitcharoenphon, Pimlapas; Aarestrup, Frank M; Lund, Ole

2014-01-01

Whole genome sequencing (WGS) shows great potential for real-time monitoring and identification of infectious disease outbreaks. However, rapid and reliable comparison of data generated in multiple laboratories and using multiple technologies is essential. So far studies have focused on using one technology because each technology has a systematic bias making integration of data generated from different platforms difficult. We developed two different procedures for identifying variable sites and inferring phylogenies in WGS data across multiple platforms. The methods were evaluated on three bacterial data sets and sequenced on three different platforms (Illumina, 454, Ion Torrent). We show that the methods are able to overcome the systematic biases caused by the sequencers and infer the expected phylogenies. It is concluded that the cause of the success of these new procedures is due to a validation of all informative sites that are included in the analysis. The procedures are available as web tools.

A universal genomic coordinate translator for comparative genomics

PubMed Central

2014-01-01

Background Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Results Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Conclusions Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. PMID:24976580
A universal genomic coordinate translator for comparative genomics.

PubMed

Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

2014-06-30

Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
Assembly, Annotation, and Analysis of Multiple Mycorrhizal Fungal Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Initiative Consortium, Mycorrhizal Genomics; Kuo, Alan; Grigoriev, Igor

Mycorrhizal fungi play critical roles in host plant health, soil community structure and chemistry, and carbon and nutrient cycling, all areas of intense interest to the US Dept. of Energy (DOE) Joint Genome Institute (JGI). To this end we are building on our earlier sequencing of the Laccaria bicolor genome by partnering with INRA-Nancy and the mycorrhizal research community in the MGI to sequence and analyze dozens of mycorrhizal genomes of all Basidiomycota and Ascomycota orders and multiple ecological types (ericoid, orchid, and ectomycorrhizal). JGI has developed and deployed high-throughput sequencing techniques, and Assembly, RNASeq, and Annotation Pipelines. In 2012more » alone we sequenced, assembled, and annotated 12 draft or improved genomes of mycorrhizae, and predicted ~;;232831 genes and ~;;15011 multigene families, All of this data is publicly available on JGI MycoCosm (http://jgi.doe.gov/fungi/), which provides access to both the genome data and tools with which to analyze the data. Preliminary comparisons of the current total of 14 public mycorrhizal genomes suggest that 1) short secreted proteins potentially involved in symbiosis are more enriched in some orders than in others amongst the mycorrhizal Agaricomycetes, 2) there are wide ranges of numbers of genes involved in certain functional categories, such as signal transduction and post-translational modification, and 3) novel gene families are specific to some ecological types.« less
Ancient genomic architecture for mammalian olfactory receptor clusters

PubMed Central

Aloni, Ronny; Olender, Tsviya; Lancet, Doron

2006-01-01

Background Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence. PMID:17010214
Transcriptome and methylome profiling reveals relics of genome dominance in the mesopolyploid Brassica oleracea

PubMed Central

2014-01-01

Background Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus. Results We generate an assembly representing 75% of the predicted B. oleracea genome using a hybrid Illumina/Roche 454 approach. Two dense genetic maps are generated to anchor almost 92% of the assembled scaffolds to nine pseudo-chromosomes. Over 50,000 genes are annotated and 40% of the genome predicted to be repetitive, thus contributing to the increased genome size of B. oleracea compared to its close relative B. rapa. A snapshot of both the leaf transcriptome and methylome allows comparisons to be made across the triplicated sub-genomes, which resulted from the most recent Brassiceae-specific polyploidy event. Conclusions Differential expression of the triplicated syntelogs and cytosine methylation levels across the sub-genomes suggest residual marks of the genome dominance that led to the current genome architecture. Although cytosine methylation does not correlate with individual gene dominance, the independent methylation patterns of triplicated copies suggest epigenetic mechanisms play a role in the functional diversification of duplicate genes. PMID:24916971
Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

PubMed Central

Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis

2008-01-01

Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375
Typing and comparative genome analysis of Brucella melitensis isolated from Lebanon.

PubMed

Abou Zaki, Natalia; Salloum, Tamara; Osman, Marwan; Rafei, Rayane; Hamze, Monzer; Tokajian, Sima

2017-10-16

Brucella melitensis is the main causative agent of the zoonotic disease brucellosis. This study aimed at typing and characterizing genetic variation in 33 Brucella isolates recovered from patients in Lebanon. Bruce-ladder multiplex PCR and PCR-RFLP of omp31, omp2a and omp2b were performed. Sixteen representative isolates were chosen for draft-genome sequencing and analyzed to determine variations in virulence, resistance, genomic islands, prophages and insertion sequences. Comparative whole-genome single nucleotide polymorphism analysis was also performed. The isolates were confirmed to be B. melitensis. Genome analysis revealed multiple virulence determinants and efflux pumps. Genome comparisons and single nucleotide polymorphisms divided the isolates based on geographical distribution but revealed high levels of similarity between the strains. Sequence divergence in B. melitensis was mainly due to lateral gene transfer of mobile elements. This is the first report of an in-depth genomic characterization of B. melitensis in Lebanon. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Metavir 2: new tools for viral metagenome comparison and assembled virome analysis

PubMed Central

2014-01-01

Background Metagenomics, based on culture-independent sequencing, is a well-fitted approach to provide insights into the composition, structure and dynamics of environmental viral communities. Following recent advances in sequencing technologies, new challenges arise for existing bioinformatic tools dedicated to viral metagenome (i.e. virome) analysis as (i) the number of viromes is rapidly growing and (ii) large genomic fragments can now be obtained by assembling the huge amount of sequence data generated for each metagenome. Results To face these challenges, a new version of Metavir was developed. First, all Metavir tools have been adapted to support comparative analysis of viromes in order to improve the analysis of multiple datasets. In addition to the sequence comparison previously provided, viromes can now be compared through their k-mer frequencies, their taxonomic compositions, recruitment plots and phylogenetic trees containing sequences from different datasets. Second, a new section has been specifically designed to handle assembled viromes made of thousands of large genomic fragments (i.e. contigs). This section includes an annotation pipeline for uploaded viral contigs (gene prediction, similarity search against reference viral genomes and protein domains) and an extensive comparison between contigs and reference genomes. Contigs and their annotations can be explored on the website through specifically developed dynamic genomic maps and interactive networks. Conclusions The new features of Metavir 2 allow users to explore and analyze viromes composed of raw reads or assembled fragments through a set of adapted tools and a user-friendly interface. PMID:24646187
Toxin-antitoxin systems mqsR/ygiT and dinJ/RelE of Xylella fastidiosa

USDA-ARS?s Scientific Manuscript database

The plant pathogen Xylella fastidiosa (Xf) encodes multiple toxin-antitoxin (TA) system homologues, including relE/dinJ and mqsR/ygiT. Phylogenetic analyses indicate these two Xf TA systems have distinct evolutionary histories. Genomic comparisons among Xf subspecies/strains reveal TA systems are ...
Comparative Genomics of Campylobacter iguaniorum to Unravel Genetic Regions Associated with Reptilian Hosts

PubMed Central

Gilbert, Maarten J.; Miller, William G.; Yee, Emma; Kik, Marja; Zomer, Aldert L.; Wagenaar, Jaap A.; Duim, Birgitta

2016-01-01

Abstract Campylobacter iguaniorum is most closely related to the species C. fetus, C. hyointestinalis, and C. lanienae. Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C. iguaniorum strain 1485E, isolated from a bearded dragon (Pogona vitticeps), and strain 2463D, isolated from a green iguana (Iguana iguana), with the genomes of closely related taxa, in particular with reptile-associated C. fetus subsp. testudinum. In contrast to C. fetus, C. iguaniorum is lacking an S-layer encoding region. Furthermore, a defined lipooligosaccharide biosynthesis locus, encoding multiple glycosyltransferases and bounded by waa genes, is absent from C. iguaniorum. Instead, multiple predicted glycosylation regions were identified in C. iguaniorum. One of these regions is > 50 kb with deviant G + C content, suggesting acquisition via lateral transfer. These similar, but non-homologous glycosylation regions were located at the same position on the genome in both strains. Multiple genes encoding respiratory enzymes not identified to date within the C. fetus clade were present. C. iguaniorum shared highest homology with C. hyointestinalis and C. fetus. As in reptile-associated C. fetus subsp. testudinum, a putative tricarballylate catabolism locus was identified. However, despite colonizing a shared host, no recent recombination between both taxa was detected. This genomic study provides a better understanding of host adaptation, virulence, phylogeny, and evolution of C. iguaniorum and related Campylobacter taxa. PMID:27604878
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

PubMed

Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

2014-02-01

Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
CMG-biotools, a free workbench for basic comparative microbial genomics.

PubMed

Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

2013-01-01

Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training.
Divergence with gene flow across a speciation continuum of Heliconius butterflies.

PubMed

Supple, Megan A; Papa, Riccardo; Hines, Heather M; McMillan, W Owen; Counterman, Brian A

2015-09-24

A key to understanding the origins of species is determining the evolutionary processes that drive the patterns of genomic divergence during speciation. New genomic technologies enable the study of high-resolution genomic patterns of divergence across natural speciation continua, where taxa pairs with different levels of reproductive isolation can be used as proxies for different stages of speciation. Empirical studies of these speciation continua can provide valuable insights into how genomes diverge during speciation. We examine variation across a handful of genomic regions in parapatric and allopatric populations of Heliconius butterflies with varying levels of reproductive isolation. Genome sequences were mapped to 2.2-Mb of the H. erato genome, including 1-Mb across the red color pattern locus and multiple regions unlinked to color pattern variation. Phylogenetic analyses reveal a speciation continuum of pairs of hybridizing races and incipient species in the Heliconius erato clade. Comparisons of hybridizing pairs of divergently colored races and incipient species reveal that genomic divergence increases with ecological and reproductive isolation, not only across the locus responsible for adaptive variation in red wing coloration, but also at genomic regions unlinked to color pattern. We observe high levels of divergence between the incipient species H. erato and H. himera, suggesting that divergence may accumulate early in the speciation process. Comparisons of genomic divergence between the incipient species and allopatric races suggest that limited gene flow cannot account for the observed high levels of divergence between the incipient species. Our results provide a reconstruction of the speciation continuum across the H. erato clade and provide insights into the processes that drive genomic divergence during speciation, establishing the H. erato clade as a powerful framework for the study of speciation.
Development and assessment of whole-genome oligonucleotide microarrays to analyze an anaerobic microbial community and its responses to oxidative stress.

PubMed

Scholten, Johannes C M; Culley, David E; Nie, Lei; Munn, Kyle J; Chow, Lely; Brockman, Fred J; Zhang, Weiwen

2007-06-29

The application of DNA microarray technology to investigate multiple-species microbial communities presents great challenges. In this study, we reported the design and quality assessment of four whole genome oligonucleotide microarrays for two syntroph bacteria, Desulfovibrio vulgaris and Syntrophobacter fumaroxidans, and two archaeal methanogens, Methanosarcina barkeri, and Methanospirillum hungatei, and their application to analyze global gene expression in a four-species microbial community in response to oxidative stress. In order to minimize the possibility of cross-hybridization, cross-genome comparison was performed to assure all probes unique to each genome so that the microarrays could provide species-level resolution. Microarray quality was validated by the good reproducibility of experimental measurements of multiple biological and analytical replicates. This study showed that S. fumaroxidans and M. hungatei responded to the oxidative stress with up-regulation of several genes known to be involved in reactive oxygen species (ROS) detoxification, such as catalase and rubrerythrin in S. fumaroxidans and thioredoxin and heat shock protein Hsp20 in M. hungatei. However, D. vulgaris seemed to be less sensitive to the oxidative stress as a member of a four-species community, since no gene involved in ROS detoxification was up-regulated. Our work demonstrated the successful application of microarrays to a multiple-species microbial community, and our preliminary results indicated that this approach could provide novel insights on the metabolism within microbial communities.
Comparison of single cell sequencing data between two whole genome amplification methods on two sequencing platforms.

PubMed

Chen, DaYang; Zhen, HeFu; Qiu, Yong; Liu, Ping; Zeng, Peng; Xia, Jun; Shi, QianYu; Xie, Lin; Zhu, Zhu; Gao, Ya; Huang, GuoDong; Wang, Jian; Yang, HuanMing; Chen, Fang

2018-03-21

Research based on a strategy of single-cell low-coverage whole genome sequencing (SLWGS) has enabled better reproducibility and accuracy for detection of copy number variations (CNVs). The whole genome amplification (WGA) method and sequencing platform are critical factors for successful SLWGS (<0.1 × coverage). In this study, we compared single cell and multiple cells sequencing data produced by the HiSeq2000 and Ion Proton platforms using two WGA kits and then comprehensively evaluated the GC-bias, reproducibility, uniformity and CNV detection among different experimental combinations. Our analysis demonstrated that the PicoPLEX WGA Kit resulted in higher reproducibility, lower sequencing error frequency but more GC-bias than the GenomePlex Single Cell WGA Kit (WGA4 kit) independent of the cell number on the HiSeq2000 platform. While on the Ion Proton platform, the WGA4 kit (both single cell and multiple cells) had higher uniformity and less GC-bias but lower reproducibility than those of the PicoPLEX WGA Kit. Moreover, on these two sequencing platforms, depending on cell number, the performance of the two WGA kits was different for both sensitivity and specificity on CNV detection. The results can help researchers who plan to use SLWGS on single or multiple cells to select appropriate experimental conditions for their applications.
Global Genomic Diversity of Oryza sativa Varieties Revealed by Comparative Physical Mapping

PubMed Central

Wang, Xiaoming; Kudrna, David A.; Pan, Yonglong; Wang, Hao; Liu, Lin; Lin, Haiyan; Zhang, Jianwei; Song, Xiang; Goicoechea, Jose Luis; Wing, Rod A.; Zhang, Qifa; Luo, Meizhong

2014-01-01

Bacterial artificial chromosome (BAC) physical maps embedding a large number of BAC end sequences (BESs) were generated for Oryza sativa ssp. indica varieties Minghui 63 (MH63) and Zhenshan 97 (ZS97) and were compared with the genome sequences of O. sativa spp. japonica cv. Nipponbare and O. sativa ssp. indica cv. 93-11. The comparisons exhibited substantial diversities in terms of large structural variations and small substitutions and indels. Genome-wide BAC-sized and contig-sized structural variations were detected, and the shared variations were analyzed. In the expansion regions of the Nipponbare reference sequence, in comparison to the MH63 and ZS97 physical maps, as well as to the previously constructed 93-11 physical map, the amounts and types of the repeat contents, and the outputs of gene ontology analysis, were significantly different from those of the whole genome. Using the physical maps of four wild Oryza species from OMAP (http://www.omap.org) as a control, we detected many conserved and divergent regions related to the evolution process of O. sativa. Between the BESs of MH63 and ZS97 and the two reference sequences, a total of 1532 polymorphic simple sequence repeats (SSRs), 71,383 SNPs, 1767 multiple nucleotide polymorphisms, 6340 insertions, and 9137 deletions were identified. This study provides independent whole-genome resources for intra- and intersubspecies comparisons and functional genomics studies in O. sativa. Both the comparative physical maps and the GBrowse, which integrated the QTL and molecular markers from GRAMENE (http://www.gramene.org) with our physical maps and analysis results, are open to the public through our Web site (http://gresource.hzau.edu.cn/resource/resource.html). PMID:24424778
Assessment of Recombination in the S-segment Genome of Crimean-Congo Hemorrhagic Fever Virus in Iran.

PubMed

Chinikar, Sadegh; Shah-Hosseini, Nariman; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Groschup, Martin H; Niedrig, Matthias

2016-03-01

Crimean-Congo Hemorrhagic Fever Virus (CCHFV) belongs to genus Nairovirus and family Bunyaviridae. The main aim of this study was to investigate the extent of recombination in S-segment genome of CCHFV in Iran. Samples were isolated from Iranian patients and those available in GenBank, and analyzed by phylogenetic and bootscan methods. Through comparison of the phylogenetic trees based on full length sequences and partial fragments in the S-segment genome of CCHFV, genetic switch was evident, due to recombination event. Moreover, evidence of multiple recombination events was detected in query isolates when bootscan analysis was used by SimPlot software. Switch of different genomic regions between different strains by recombination could contribute to CCHFV diversification and evolution. The occurrence of recombination in CCHFV has a critical impact on epidemiological investigations and vaccine design.
Bacillus safensis FO-36b and Bacillus pumilus SAFR-032: a whole genome comparison of two spacecraft assembly facility isolates.

PubMed

Tirumalai, Madhan R; Stepanov, Victor G; Wünsche, Andrea; Montazari, Saied; Gonzalez, Racquel O; Venkateswaran, Kasturi; Fox, George E

2018-06-08

Bacillus strains producing highly resistant spores have been isolated from cleanrooms and space craft assembly facilities. Organisms that can survive such conditions merit planetary protection concern and if that resistance can be transferred to other organisms, a health concern too. To further efforts to understand these resistances, the complete genome of Bacillus safensis strain FO-36b, which produces spores resistant to peroxide and radiation was determined. The genome was compared to the complete genome of B. pumilus SAFR-032, and the draft genomes of B. safensis JPL-MERTA-8-2 and the type strain B. pumilus ATCC7061 T . Additional comparisons were made to 61 draft genomes that have been mostly identified as strains of B. pumilus or B. safensis. The FO-36b gene order is essentially the same as that in SAFR-032 and other B. pumilus strains. The annotated genome has 3850 open reading frames and 40 noncoding RNAs and riboswitches. Of these, 307 are not shared by SAFR-032, and 65 are also not shared by MERTA and ATCC7061 T . The FO-36b genome has ten unique open reading frames and two phage-like regions, homologous to the Bacillus bacteriophage SPP1 and Brevibacillus phage Jimmer1. Differing remnants of the Jimmer1 phage are found in essentially all B. safensis / B. pumilus strains. Seven unique genes are part of these phage elements. Whole Genome Phylogenetic Analysis of the B. pumilus, B. safensis and other Firmicutes genomes, separate them into three distinct clusters. Two clusters are subgroups of B. pumilus while one houses all the B. safensis strains. The Genome-genome distance analysis and a phylogenetic analysis of gyrA sequences corroborated these results. It is not immediately obvious that the presence or absence of any specific gene or combination of genes is responsible for the variations in resistance seen. It is quite possible that distinctions in gene regulation can alter the expression levels of key proteins thereby changing the organism's resistance properties without gain or loss of a particular gene. What is clear is that phage elements contribute significantly to genome variability. Multiple genome comparison indicates that many strains named as B. pumilus likely belong to the B. safensis group.
The evolution of the natural killer complex; a comparison between mammals using new high-quality genome assemblies and targeted annotation

USDA-ARS?s Scientific Manuscript database

Natural killer (NK) cells are a diverse population of lymphocytes with a range of biological roles including essential immune functions. NK cell diversity is created by the differential expression of cell surface receptors which modulate activation and function, including multiple subfamilies of C-t...
Phylo-VISTA: Interactive visualization of multiple DNA sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.

The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. Results: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a frameworkmore » based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. Availability: Phylo-VISTA is available at http://www-gsd.lbl. gov/phylovista. It requires an Internet browser with Java Plugin 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu« less

GCView: the genomic context viewer for protein homology searches

PubMed Central

Grin, Iwan; Linke, Dirk

2011-01-01

Genomic neighborhood can provide important insights into evolution and function of a protein or gene. When looking at operons, changes in operon structure and composition can only be revealed by looking at the operon as a whole. To facilitate the analysis of the genomic context of a query in multiple organisms we have developed Genomic Context Viewer (GCView). GCView accepts results from one or multiple protein homology searches such as BLASTp as input. For each hit, the neighboring protein-coding genes are extracted, the regions of homology are labeled for each input and the results are presented as a clear, interactive graphical output. It is also possible to add more searches to iteratively refine the output. GCView groups outputs by the hits for different proteins. This allows for easy comparison of different operon compositions and structures. The tool is embedded in the framework of the Bioinformatics Toolkit of the Max-Planck Institute for Developmental Biology (MPI Toolkit). Job results from the homology search tools inside the MPI Toolkit can be forwarded to GCView and results can be subsequently analyzed by sequence analysis tools. Results are stored online, allowing for later reinspection. GCView is freely available at http://toolkit.tuebingen.mpg.de/gcview. PMID:21609955
Purification of High Molecular Weight Genomic DNA from Powdery Mildew for Long-Read Sequencing.

PubMed

Feehan, Joanna M; Scheibel, Katherine E; Bourras, Salim; Underwood, William; Keller, Beat; Somerville, Shauna C

2017-03-31

The powdery mildew fungi are a group of economically important fungal plant pathogens. Relatively little is known about the molecular biology and genetics of these pathogens, in part due to a lack of well-developed genetic and genomic resources. These organisms have large, repetitive genomes, which have made genome sequencing and assembly prohibitively difficult. Here, we describe methods for the collection, extraction, purification and quality control assessment of high molecular weight genomic DNA from one powdery mildew species, Golovinomyces cichoracearum. The protocol described includes mechanical disruption of spores followed by an optimized phenol/chloroform genomic DNA extraction. A typical yield was 7 µg DNA per 150 mg conidia. The genomic DNA that is isolated using this procedure is suitable for long-read sequencing (i.e., > 48.5 kbp). Quality control measures to ensure the size, yield, and purity of the genomic DNA are also described in this method. Sequencing of the genomic DNA of the quality described here will allow for the assembly and comparison of multiple powdery mildew genomes, which in turn will lead to a better understanding and improved control of this agricultural pathogen.
Mating system shifts and transposable element evolution in the plant genus Capsella.

PubMed

Agren, J Ågren; Wang, Wei; Koenig, Daniel; Neuffer, Barbara; Weigel, Detlef; Wright, Stephen I

2014-07-16

Despite having predominately deleterious fitness effects, transposable elements (TEs) are major constituents of eukaryote genomes in general and of plant genomes in particular. Although the proportion of the genome made up of TEs varies at least four-fold across plants, the relative importance of the evolutionary forces shaping variation in TE abundance and distributions across taxa remains unclear. Under several theoretical models, mating system plays an important role in governing the evolutionary dynamics of TEs. Here, we use the recently sequenced Capsella rubella reference genome and short-read whole genome sequencing of multiple individuals to quantify abundance, genome distributions, and population frequencies of TEs in three recently diverged species of differing mating system, two self-compatible species (C. rubella and C. orientalis) and their self-incompatible outcrossing relative, C. grandiflora. We detect different dynamics of TE evolution in our two self-compatible species; C. rubella shows a small increase in transposon copy number, while C. orientalis shows a substantial decrease relative to C. grandiflora. The direction of this change in copy number is genome wide and consistent across transposon classes. For insertions near genes, however, we detect the highest abundances in C. grandiflora. Finally, we also find differences in the population frequency distributions across the three species. Overall, our results suggest that the evolution of selfing may have different effects on TE evolution on a short and on a long timescale. Moreover, cross-species comparisons of transposon abundance are sensitive to reference genome bias, and efforts to control for this bias are key when making comparisons across species.
Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data.

PubMed

Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H

2013-05-01

Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.
Comparative Genomics of Campylobacter iguaniorum to Unravel Genetic Regions Associated with Reptilian Hosts.

PubMed

Gilbert, Maarten J; Miller, William G; Yee, Emma; Kik, Marja; Zomer, Aldert L; Wagenaar, Jaap A; Duim, Birgitta

2016-10-05

Campylobacter iguaniorum is most closely related to the species C fetus, C hyointestinalis, and C lanienae Reptiles, chelonians and lizards in particular, appear to be a primary reservoir of this Campylobacter species. Here we report the genome comparison of C iguaniorum strain 1485E, isolated from a bearded dragon (Pogona vitticeps), and strain 2463D, isolated from a green iguana (Iguana iguana), with the genomes of closely related taxa, in particular with reptile-associated C fetus subsp. testudinum In contrast to C fetus, C iguaniorum is lacking an S-layer encoding region. Furthermore, a defined lipooligosaccharide biosynthesis locus, encoding multiple glycosyltransferases and bounded by waa genes, is absent from C iguaniorum Instead, multiple predicted glycosylation regions were identified in C iguaniorum One of these regions is > 50 kb with deviant G + C content, suggesting acquisition via lateral transfer. These similar, but non-homologous glycosylation regions were located at the same position on the genome in both strains. Multiple genes encoding respiratory enzymes not identified to date within the C. fetus clade were present. C iguaniorum shared highest homology with C hyointestinalis and C fetus. As in reptile-associated C fetus subsp. testudinum, a putative tricarballylate catabolism locus was identified. However, despite colonizing a shared host, no recent recombination between both taxa was detected. This genomic study provides a better understanding of host adaptation, virulence, phylogeny, and evolution of C iguaniorum and related Campylobacter taxa. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park.

PubMed

Podar, Mircea; Makarova, Kira S; Graham, David E; Wolf, Yuri I; Koonin, Eugene V; Reysenbach, Anna-Louise

2013-04-22

A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia.
Leek yellow stripe virus isolates from Brazil form a distant clade based on the P1 gene

USDA-ARS?s Scientific Manuscript database

The complete genomic sequence of a garlic isolate of Leek yellow stripe virus from Brazil (LYSV-MG) has been determined, and phylogenetic comparisons made to LYSV isolates from other parts of the world. In addition, the nucleotide sequence of the 5'UTR and part of the P1 gene of multiple LYSV isolat...
Functional genomic Landscape of Human Breast Cancer drivers, vulnerabilities, and resistance

PubMed Central

Marcotte, Richard; Sayad, Azin; Brown, Kevin R.; Sanchez-Garcia, Felix; Reimand, Jüri; Haider, Maliha; Virtanen, Carl; Bradner, James E.; Bader, Gary D.; Mills, Gordon B.; Pe’er, Dana; Moffat, Jason; Neel, Benjamin G.

2016-01-01

Summary Large-scale genomic studies have identified multiple somatic aberrations in breast cancer, including copy number alterations, and point mutations. Still, identifying causal variants and emergent vulnerabilities that arise as a consequence of genetic alterations remain major challenges. We performed whole genome shRNA “dropout screens” on 77 breast cancer cell lines. Using a hierarchical linear regression algorithm to score our screen results and integrate them with accompanying detailed genetic and proteomic information, we identify vulnerabilities in breast cancer, including candidate “drivers,” and reveal general functional genomic properties of cancer cells. Comparisons of gene essentiality with drug sensitivity data suggest potential resistance mechanisms, effects of existing anti-cancer drugs, and opportunities for combination therapy. Finally, we demonstrate the utility of this large dataset by identifying BRD4 as a potential target in luminal breast cancer, and PIK3CA mutations as a resistance determinant for BET-inhibitors. PMID:26771497
Assessment of Recombination in the S-segment Genome of Crimean-Congo Hemorrhagic Fever Virus in Iran

PubMed Central

Chinikar, Sadegh; Shah-Hosseini, Nariman; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Groschup, Martin H; Niedrig, Matthias

2016-01-01

Background: Crimean-Congo Hemorrhagic Fever Virus (CCHFV) belongs to genus Nairovirus and family Bunyaviridae. The main aim of this study was to investigate the extent of recombination in S-segment genome of CCHFV in Iran. Methods: Samples were isolated from Iranian patients and those available in GenBank, and analyzed by phylogenetic and bootscan methods. Results: Through comparison of the phylogenetic trees based on full length sequences and partial fragments in the S-segment genome of CCHFV, genetic switch was evident, due to recombination event. Moreover, evidence of multiple recombination events was detected in query isolates when bootscan analysis was used by SimPlot software. Conclusion: Switch of different genomic regions between different strains by recombination could contribute to CCHFV diversification and evolution. The occurrence of recombination in CCHFV has a critical impact on epidemiological investigations and vaccine design. PMID:27047968
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.

PubMed

Issac, Biju; Raghava, G P S

2002-09-01

Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
The genome and phenome of the green alga Chloroidium sp. UTEX 3007 reveal adaptive traits for desert acclimatization.

PubMed

Nelson, David R; Khraiwesh, Basel; Fu, Weiqi; Alseekh, Saleh; Jaiswal, Ashish; Chaiboonchoe, Amphun; Hazzouri, Khaled M; O'Connor, Matthew J; Butterfoss, Glenn L; Drou, Nizar; Rowe, Jillian D; Harb, Jamil; Fernie, Alisdair R; Gunsalus, Kristin C; Salehi-Ashtiani, Kourosh

2017-06-17

To investigate the phenomic and genomic traits that allow green algae to survive in deserts, we characterized a ubiquitous species, Chloroidium sp. UTEX 3007 , which we isolated from multiple locations in the United Arab Emirates (UAE). Metabolomic analyses of Chloroidium sp. UTEX 3007 indicated that the alga accumulates a broad range of carbon sources, including several desiccation tolerance-promoting sugars and unusually large stores of palmitate. Growth assays revealed capacities to grow in salinities from zero to 60 g/L and to grow heterotrophically on >40 distinct carbon sources. Assembly and annotation of genomic reads yielded a 52.5 Mbp genome with 8153 functionally annotated genes. Comparison with other sequenced green algae revealed unique protein families involved in osmotic stress tolerance and saccharide metabolism that support phenomic studies. Our results reveal the robust and flexible biology utilized by a green alga to successfully inhabit a desert coastline.
Heat*seq: an interactive web tool for high-throughput sequencing experiment comparison with public data.

PubMed

Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha

2016-11-01

Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Genome sequencing and analysis of a highly virulent Vibrio parahaemolyticus strain isolated from the marine environment

NASA Astrophysics Data System (ADS)

Parks, M. C.; Moreno, E.

2016-02-01

Vibrio parahaemolyticus [Vp] is a Gram-negative bacterium and a natural inhabitant of coastal marine ecosystems worldwide. Vp is also a coincidental pathogen of humans. Virulent strains are commonly identified by the presence of the thermostable direct (tdh) or tdh-related (trh) hemolysin genes. However, virulence is multifaceted and many clinical Vp isolates do not carry tdh or trh. In this study, we sequenced and assembled the draft genome of a tdh- and trh-negative environmental isolate (805) shown previously to be highly virulent in zebrafish. To investigate potential mechanisms of virulence, we compared 805 to the clinical V. parahaemolyticus type strain (RIMD2210633). Pairwise comparison revealed the presence of multiple genomic regions including an IncF conjugative pilus (1.3 Kb) and a colicin V plasmid (1.49 Kb). These features are homologous to genomic regions present in clinical V. vulnificus and V. cholerae strains. Genome comparison also revealed the presence of five toxin-antitoxin systems. Isolate 805 likely attained these new features through the lateral acquisition of mobile genomic material - a hypothesis supported by the aberrant GC content of these regions. Colicin V plasmids are a diverse group of IncF plasmids found in invasive bacterial strains. Similarly, an abundance of toxin-antitoxin systems have been linked to virulence in Gram-negative bacteria. Current efforts are focused on characterizing 142 coding features present in 805 but absent from the type strain.
Variability Studies of Two Prunus-Infecting Fabaviruses with the Aid of High-Throughput Sequencing

PubMed Central

Sarkisova, Tatiana; Lenz, Ondřej; Přibylová, Jaroslava; Špak, Josef; Lotos, Leonidas; Beta, Christina; Katsiani, Asimina; Candresse, Thierry

2018-01-01

During their lifetime, perennial woody plants are expected to face multiple infection events. Furthermore, multiple genotypes of individual virus species may co-infect the same host. This may eventually lead to a situation where plants harbor complex communities of viral species/strains. Using high-throughput sequencing, we describe co-infection of sweet and sour cherry trees with diverse genomic variants of two closely related viruses, namely prunus virus F (PrVF) and cherry virus F (CVF). Both viruses are most homologous to members of the Fabavirus genus (Secoviridae family). The comparison of CVF and PrVF RNA2 genomic sequences suggests that the two viruses may significantly differ in their expression strategy. Indeed, similar to comoviruses, the smaller genomic segment of PrVF, RNA2, may be translated in two collinear proteins while CVF likely expresses only the shorter of these two proteins. Linked with the observation that identity levels between the coat proteins of these two viruses are significantly below the family species demarcation cut-off, these findings support the idea that CVF and PrVF represent two separate Fabavirus species. PMID:29670059
Horizontal transfer of potential mobile units in phytoplasmas

PubMed Central

Ku, Chuan; Lo, Wen-Sui; Kuo, Chih-Horng

2013-01-01

Phytoplasmas are uncultivated phytopathogenic bacteria that cause diseases in a wide range of economically important plants. Through secretion of effector proteins, they are able to manipulate their plant hosts to facilitate their multiplication and dispersal by insect vectors. The genome sequences of several phytoplasmas have been characterized to date and a group of putative composite transposons called potential mobile units (PMUs) are found in these highly reduced genomes. Recently, our team reported the genome sequence and comparative analysis of a peanut witches’ broom (PnWB) phytoplasma, the first representative of the phytoplasma 16SrII group. Comparisons between the species phylogeny and the phylogenies of the PMU genes revealed that the PnWB PMU is likely to have been transferred from the 16SrI group. This indicates that PMUs are not only the DNA unit for transposition within a genome, but also for horizontal transfer among divergent phytoplasma lineages. Given the association of PMUs with effector genes, the mobility of PMUs across genomes has important implications for phytoplasma ecology and evolution. PMID:24251068
Diverse and highly recombinant anelloviruses associated with Weddell seals in Antarctica

PubMed Central

Fahsbender, Elizabeth; Kim, Stacy; Kraberger, Simona; Frankfurter, Greg; Eilers, Alice A.; Shero, Michelle R.; Beltran, Roxanne; Kirkham, Amy; McCorkell, Robert; Berngartt, Rachel K.; Male, Maketalena F.; Ballard, Grant; Ainley, David G.; Breitbart, Mya

2017-01-01

Abstract The viruses circulating among Antarctic wildlife remain largely unknown. In an effort to identify viruses associated with Weddell seals (Leptonychotes weddellii) inhabiting the Ross Sea, vaginal and nasal swabs, and faecal samples were collected between November 2014 and February 2015. In addition, a Weddell seal kidney and South Polar skua (Stercorarius maccormicki) faeces were opportunistically sampled. Using high throughput sequencing, we identified and recovered 152 anellovirus genomes that share 63–70% genome-wide identities with other pinniped anelloviruses. Genome-wide pairwise comparisons coupled with phylogenetic analysis revealed two novel anellovirus species, tentatively named torque teno Leptonychotes weddellii virus (TTLwV) -1 and -2. TTLwV-1 (n = 133, genomes encompassing 40 genotypes) is highly recombinant, whereas TTLwV-2 (n = 19, genomes encompassing three genotypes) is relatively less recombinant. This study documents ubiquitous TTLwVs among Weddell seals in Antarctica with frequent co-infection by multiple genotypes, however, the role these anelloviruses play in seal health remains unknown. PMID:28744371
Diverse and highly recombinant anelloviruses associated with Weddell seals in Antarctica.

PubMed

Fahsbender, Elizabeth; Burns, Jennifer M; Kim, Stacy; Kraberger, Simona; Frankfurter, Greg; Eilers, Alice A; Shero, Michelle R; Beltran, Roxanne; Kirkham, Amy; McCorkell, Robert; Berngartt, Rachel K; Male, Maketalena F; Ballard, Grant; Ainley, David G; Breitbart, Mya; Varsani, Arvind

2017-01-01

The viruses circulating among Antarctic wildlife remain largely unknown. In an effort to identify viruses associated with Weddell seals ( Leptonychotes weddellii ) inhabiting the Ross Sea, vaginal and nasal swabs, and faecal samples were collected between November 2014 and February 2015. In addition, a Weddell seal kidney and South Polar skua ( Stercorarius maccormicki ) faeces were opportunistically sampled. Using high throughput sequencing, we identified and recovered 152 anellovirus genomes that share 63-70% genome-wide identities with other pinniped anelloviruses. Genome-wide pairwise comparisons coupled with phylogenetic analysis revealed two novel anellovirus species, tentatively named torque teno Leptonychotes weddellii virus (TTLwV) -1 and -2. TTLwV-1 ( n = 133, genomes encompassing 40 genotypes) is highly recombinant, whereas TTLwV-2 ( n = 19, genomes encompassing three genotypes) is relatively less recombinant. This study documents ubiquitous TTLwVs among Weddell seals in Antarctica with frequent co-infection by multiple genotypes, however, the role these anelloviruses play in seal health remains unknown.
Horizontal transfer of potential mobile units in phytoplasmas.

PubMed

Ku, Chuan; Lo, Wen-Sui; Kuo, Chih-Horng

2013-09-01

Phytoplasmas are uncultivated phytopathogenic bacteria that cause diseases in a wide range of economically important plants. Through secretion of effector proteins, they are able to manipulate their plant hosts to facilitate their multiplication and dispersal by insect vectors. The genome sequences of several phytoplasmas have been characterized to date and a group of putative composite transposons called potential mobile units (PMUs) are found in these highly reduced genomes. Recently, our team reported the genome sequence and comparative analysis of a peanut witches' broom (PnWB) phytoplasma, the first representative of the phytoplasma 16SrII group. Comparisons between the species phylogeny and the phylogenies of the PMU genes revealed that the PnWB PMU is likely to have been transferred from the 16SrI group. This indicates that PMUs are not only the DNA unit for transposition within a genome, but also for horizontal transfer among divergent phytoplasma lineages. Given the association of PMUs with effector genes, the mobility of PMUs across genomes has important implications for phytoplasma ecology and evolution.
Identification of Genetic Bases of Vibrio fluvialis Species-Specific Biochemical Pathways and Potential Virulence Factors by Comparative Genomic Analysis

PubMed Central

Lu, Xin; Liang, Weili; Wang, Yunduan; Xu, Jialiang

2014-01-01

Vibrio fluvialis is an important food-borne pathogen that causes diarrheal illness and sometimes extraintestinal infections in humans. In this study, we sequenced the genome of a clinical V. fluvialis strain and determined its phylogenetic relationships with other Vibrio species by comparative genomic analysis. We found that the closest relationship was between V. fluvialis and V. furnissii, followed by those with V. cholerae and V. mimicus. Moreover, based on genome comparisons and gene complementation experiments, we revealed genetic mechanisms of the biochemical tests that differentiate V. fluvialis from closely related species. Importantly, we identified a variety of genes encoding potential virulence factors, including multiple hemolysins, transcriptional regulators, and environmental survival and adaptation apparatuses, and the type VI secretion system, which is indicative of complex regulatory pathways modulating pathogenesis in this organism. The availability of V. fluvialis genome sequences may promote our understanding of pathogenic mechanisms for this emerging pathogen. PMID:24441165
The utility of multiple molecular methods including whole genome sequencing as tools to differentiate Escherichia coli O157:H7 outbreaks.

PubMed

Berenger, Byron M; Berry, Chrystal; Peterson, Trevor; Fach, Patrick; Delannoy, Sabine; Li, Vincent; Tschetter, Lorelee; Nadon, Celine; Honish, Lance; Louie, Marie; Chui, Linda

2015-01-01

A standardised method for determining Escherichia coli O157:H7 strain relatedness using whole genome sequencing or virulence gene profiling is not yet established. We sought to assess the capacity of either high-throughput polymerase chain reaction (PCR) of 49 virulence genes, core-genome single nt variants (SNVs) or k-mer clustering to discriminate between outbreak-associated and sporadic E. coli O157:H7 isolates. Three outbreaks and multiple sporadic isolates from the province of Alberta, Canada were included in the study. Two of the outbreaks occurred concurrently in 2014 and one occurred in 2012. Pulsed-field gel electrophoresis (PFGE) and multilocus variable-number tandem repeat analysis (MLVA) were employed as comparator typing methods. The virulence gene profiles of isolates from the 2012 and 2014 Alberta outbreak events and contemporary sporadic isolates were mostly identical; therefore the set of virulence genes chosen in this study were not discriminatory enough to distinguish between outbreak clusters. Concordant with PFGE and MLVA results, core genome SNV and k-mer phylogenies clustered isolates from the 2012 and 2014 outbreaks as distinct events. k-mer phylogenies demonstrated increased discriminatory power compared with core SNV phylogenies. Prior to the widespread implementation of whole genome sequencing for routine public health use, issues surrounding cost, technical expertise, software standardisation, and data sharing/comparisons must be addressed.

CMG-Biotools, a Free Workbench for Basic Comparative Microbial Genomics

PubMed Central

Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

2013-01-01

Background Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. Results The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. Conclusion This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training. PMID:23577086
Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data

PubMed Central

Hu, Bo; Xu, Yaomin

2013-01-01

Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach. PMID:23710259
Linking the potato genome to the conserved ortholog set (COS) markers

PubMed Central

2013-01-01

Background Conserved ortholog set (COS) markers are an important functional genomics resource that has greatly improved orthology detection in Asterid species. A comprehensive list of these markers is available at Sol Genomics Network (http://solgenomics.net/) and many of these have been placed on the genetic maps of a number of solanaceous species. Results We amplified over 300 COS markers from eight potato accessions involving two diploid landraces of Solanum tuberosum Andigenum group (formerly classified as S. goniocalyx, S. phureja), and a dihaploid clone derived from a modern tetraploid cultivar of S. tuberosum and the wild species S. berthaultii, S. chomatophilum, and S. paucissectum. By BLASTn (Basic Local Alignment Search Tool of the NCBI, National Center for Biotechnology Information) algorithm we mapped the DNA sequences of these markers into the potato genome sequence. Additionally, we mapped a subset of these markers genetically in potato and present a comparison between the physical and genetic locations of these markers in potato and in comparison with the genetic location in tomato. We found that most of the COS markers are single-copy in the reference genome of potato and that the genetic location in tomato and physical location in potato sequence are mostly in agreement. However, we did find some COS markers that are present in multiple copies and those that map in unexpected locations. Sequence comparisons between species show that some of these markers may be paralogs. Conclusions The sequence-based physical map becomes helpful in identification of markers for traits of interest thereby reducing the number of markers to be tested for applications like marker assisted selection, diversity, and phylogenetic studies. PMID:23758607
Seeking Optimal Region-Of-Interest (ROI) Single-Value Summary Measures for fMRI Studies in Imaging Genetics

PubMed Central

Tong, Yunxia; Chen, Qiang; Nichols, Thomas E.; Rasetti, Roberta; Callicott, Joseph H.; Berman, Karen F.; Weinberger, Daniel R.; Mattay, Venkata S.

2016-01-01

A data-driven hypothesis-free genome-wide association (GWA) approach in imaging genetics studies allows screening the entire genome to discover novel genes that modulate brain structure, chemistry, and function. However, a whole brain voxel-wise analysis approach in such genome-wide based imaging genetic studies can be computationally intense and also likely has low statistical power since a stringent multiple comparisons correction is needed for searching over the entire genome and brain. In imaging genetics with functional magnetic resonance imaging (fMRI) phenotypes, since many experimental paradigms activate focal regions that can be pre-specified based on a priori knowledge, reducing the voxel-wise search to single-value summary measures within a priori ROIs could prove efficient and promising. The goal of this investigation is to evaluate the sensitivity and reliability of different single-value ROI summary measures and provide guidance in future work. Four different fMRI databases were tested and comparisons across different groups (patients with schizophrenia, their siblings, vs. normal control subjects; across genotype groups) were conducted. Our results show that four of these measures, particularly those that represent values from the top most-activated voxels within an ROI are more powerful at reliably detecting group differences and generating greater effect sizes than the others. PMID:26974435
Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

PubMed Central

Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

2015-01-01

Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855
W-curve alignments for HIV-1 genomic comparisons.

PubMed

Cork, Douglas J; Lembark, Steven; Tovanabutra, Sodsai; Robb, Merlin L; Kim, Jerome H

2010-06-01

The W-curve was originally developed as a graphical visualization technique for viewing DNA and RNA sequences. Its ability to render features of DNA also makes it suitable for computational studies. Its main advantage in this area is utilizing a single-pass algorithm for comparing the sequences. Avoiding recursion during sequence alignments offers advantages for speed and in-process resources. The graphical technique also allows for multiple models of comparison to be used depending on the nucleotide patterns embedded in similar whole genomic sequences. The W-curve approach allows us to compare large numbers of samples quickly. We are currently tuning the algorithm to accommodate quirks specific to HIV-1 genomic sequences so that it can be used to aid in diagnostic and vaccine efforts. Tracking the molecular evolution of the virus has been greatly hampered by gap associated problems predominantly embedded within the envelope gene of the virus. Gaps and hypermutation of the virus slow conventional string based alignments of the whole genome. This paper describes the W-curve algorithm itself, and how we have adapted it for comparison of similar HIV-1 genomes. A treebuilding method is developed with the W-curve that utilizes a novel Cylindrical Coordinate distance method and gap analysis method. HIV-1 C2-V5 env sequence regions from a Mother/Infant cohort study are used in the comparison. The output distance matrix and neighbor results produced by the W-curve are functionally equivalent to those from Clustal for C2-V5 sequences in the mother/infant pairs infected with CRF01_AE. Significant potential exists for utilizing this method in place of conventional string based alignment of HIV-1 genomes, such as Clustal X. With W-curve heuristic alignment, it may be possible to obtain clinically useful results in a short time-short enough to affect clinical choices for acute treatment. A description of the W-curve generation process, including a comparison technique of aligning extremes of the curves to effectively phase-shift them past the HIV-1 gap problem, is presented. Besides yielding similar neighbor-joining phenogram topologies, most Mother and Infant C2-V5 sequences in the cohort pairs geometrically map closest to each other, indicating that W-curve heuristics overcame any gap problem.
Inferring the Minimal Genome of Mesoplasma florum by Comparative Genomics and Transposon Mutagenesis.

PubMed

Baby, Vincent; Lachance, Jean-Christophe; Gagnon, Jules; Lucier, Jean-François; Matteau, Dominick; Knight, Tom; Rodrigue, Sébastien

2018-01-01

The creation and comparison of minimal genomes will help better define the most fundamental mechanisms supporting life. Mesoplasma florum is a near-minimal, fast-growing, nonpathogenic bacterium potentially amenable to genome reduction efforts. In a comparative genomic study of 13 M. florum strains, including 11 newly sequenced genomes, we have identified the core genome and open pangenome of this species. Our results show that all of the strains have approximately 80% of their gene content in common. Of the remaining 20%, 17% of the genes were found in multiple strains and 3% were unique to any given strain. On the basis of random transposon mutagenesis, we also estimated that ~290 out of 720 genes are essential for M. florum L1 in rich medium. We next evaluated different genome reduction scenarios for M. florum L1 by using gene conservation and essentiality data, as well as comparisons with the first working approximation of a minimal organism, Mycoplasma mycoides JCVI-syn3.0. Our results suggest that 409 of the 473 M. mycoides JCVI-syn3.0 genes have orthologs in M. florum L1. Conversely, 57 putatively essential M. florum L1 genes have no homolog in M. mycoides JCVI-syn3.0. This suggests differences in minimal genome compositions, even for these evolutionarily closely related bacteria. IMPORTANCE The last years have witnessed the development of whole-genome cloning and transplantation methods and the complete synthesis of entire chromosomes. Recently, the first minimal cell, Mycoplasma mycoides JCVI-syn3.0, was created. Despite these milestone achievements, several questions remain to be answered. For example, is the composition of minimal genomes virtually identical in phylogenetically related species? On the basis of comparative genomics and transposon mutagenesis, we investigated this question by using an alternative model, Mesoplasma florum, that is also amenable to genome reduction efforts. Our results suggest that the creation of additional minimal genomes could help reveal different gene compositions and strategies that can support life, even within closely related species.
Elevated Rate of Genome Rearrangements in Radiation-Resistant Bacteria.

PubMed

Repar, Jelena; Supek, Fran; Klanjscek, Tin; Warnecke, Tobias; Zahradka, Ksenija; Zahradka, Davor

2017-04-01

A number of bacterial, archaeal, and eukaryotic species are known for their resistance to ionizing radiation. One of the challenges these species face is a potent environmental source of DNA double-strand breaks, potential drivers of genome structure evolution. Efficient and accurate DNA double-strand break repair systems have been demonstrated in several unrelated radiation-resistant species and are putative adaptations to the DNA damaging environment. Such adaptations are expected to compensate for the genome-destabilizing effect of environmental DNA damage and may be expected to result in a more conserved gene order in radiation-resistant species. However, here we show that rates of genome rearrangements, measured as loss of gene order conservation with time, are higher in radiation-resistant species in multiple, phylogenetically independent groups of bacteria. Comparison of indicators of selection for genome organization between radiation-resistant and phylogenetically matched, nonresistant species argues against tolerance to disruption of genome structure as a strategy for radiation resistance. Interestingly, an important mechanism affecting genome rearrangements in prokaryotes, the symmetrical inversions around the origin of DNA replication, shapes genome structure of both radiation-resistant and nonresistant species. In conclusion, the opposing effects of environmental DNA damage and DNA repair result in elevated rates of genome rearrangements in radiation-resistant bacteria. Copyright © 2017 Repar et al.
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints.

PubMed

Glusman, Gustavo; Mauldin, Denise E; Hood, Leroy E; Robinson, Max

2017-01-01

We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into "genome fingerprints" via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.
Cross-Study Comparison Reveals Common Genomic, Network, and Functional Signatures of Desiccation Resistance in Drosophila melanogaster

PubMed Central

Telonis-Scott, Marina; Sgrò, Carla M.; Hoffmann, Ary A.; Griffin, Philippa C.

2016-01-01

Repeated attempts to map the genomic basis of complex traits often yield different outcomes because of the influence of genetic background, gene-by-environment interactions, and/or statistical limitations. However, where repeatability is low at the level of individual genes, overlap often occurs in gene ontology categories, genetic pathways, and interaction networks. Here we report on the genomic overlap for natural desiccation resistance from a Pool-genome-wide association study experiment and a selection experiment in flies collected from the same region in southeastern Australia in different years. We identified over 600 single nucleotide polymorphisms associated with desiccation resistance in flies derived from almost 1,000 wild-caught genotypes, a similar number of loci to that observed in our previous genomic study of selected lines, demonstrating the genetic complexity of this ecologically important trait. By harnessing the power of cross-study comparison, we narrowed the candidates from almost 400 genes in each study to a core set of 45 genes, enriched for stimulus, stress, and defense responses. In addition to gene-level overlap, there was higher order congruence at the network and functional levels, suggesting genetic redundancy in key stress sensing, stress response, immunity, signaling, and gene expression pathways. We also identified variants linked to different molecular aspects of desiccation physiology previously verified from functional experiments. Our approach provides insight into the genomic basis of a complex and ecologically important trait and predicts candidate genetic pathways to explore in multiple genetic backgrounds and related species within a functional framework. PMID:26733490
The complete chloroplast genome sequence of Cephalotaxus oliveri (Cephalotaxaceae): evolutionary comparison of cephalotaxus chloroplast DNAs and insights into the loss of inverted repeat copies in gymnosperms.

PubMed

Yi, Xuan; Gao, Lei; Wang, Bo; Su, Ying-Juan; Wang, Ting

2013-01-01

We have determined the complete chloroplast (cp) genome sequence of Cephalotaxus oliveri. The genome is 134,337 bp in length, encodes 113 genes, and lacks inverted repeat (IR) regions. Genome-wide mutational dynamics have been investigated through comparative analysis of the cp genomes of C. oliveri and C. wilsoniana. Gene order transformation analyses indicate that when distinct isomers are considered as alternative structures for the ancestral cp genome of cupressophyte and Pinaceae lineages, it is not possible to distinguish between hypotheses favoring retention of the same IR region in cupressophyte and Pinaceae cp genomes from a hypothesis proposing independent loss of IRA and IRB. Furthermore, in cupressophyte cp genomes, the highly reduced IRs are replaced by short repeats that have the potential to mediate homologous recombination, analogous to the situation in Pinaceae. The importance of repeats in the mutational dynamics of cupressophyte cp genomes is also illustrated by the accD reading frame, which has undergone extreme length expansion in cupressophytes. This has been caused by a large insertion comprising multiple repeat sequences. Overall, we find that the distribution of repeats, indels, and substitutions is significantly correlated in Cephalotaxus cp genomes, consistent with a hypothesis that repeats play a role in inducing substitutions and indels in conifer cp genomes.
Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants.

PubMed

van Baren, Marijke J; Bachy, Charles; Reistetter, Emily Nahas; Purvine, Samuel O; Grimwood, Jane; Sudek, Sebastian; Yu, Hang; Poirier, Camille; Deerinck, Thomas J; Kuo, Alan; Grigoriev, Igor V; Wong, Chee-Hong; Smith, Richard D; Callister, Stephen J; Wei, Chia-Lin; Schmutz, Jeremy; Worden, Alexandra Z

2016-03-31

Prasinophytes are widespread marine green algae that are related to plants. Cellular abundance of the prasinophyte Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these unicellular eukaryotes are important for marine ecology and for understanding Viridiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb genome of Micromonas commoda (RCC299; named herein) shows they share ≤8,141 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequenced eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26 %) GC splice donors. Micromonas has more genus-specific protein families (19 %) than other genome sequenced prasinophytes (11 %). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other classes retain the entire PG pathway, like moss and glaucophyte algae. Surprisingly, multiple vascular plants also have the PG pathway, except the Penicillin-Binding Protein, and share a unique bi-domain protein potentially associated with the pathway. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in PG-pathway retention and implicate a role in chloroplast structure or division in several extant Viridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the plastid, have been selectively retained in multiple plants and algae, implying a biological function. Our studies provide robust genomic resources for emerging model algae, advancing knowledge of marine phytoplankton and plant evolution.
Whole-proteome phylogeny of large dsDNA viruses and parvoviruses through a composition vector method related to dynamical language model

PubMed Central

2010-01-01

Background The vast sequence divergence among different virus groups has presented a great challenge to alignment-based analysis of virus phylogeny. Due to the problems caused by the uncertainty in alignment, existing tools for phylogenetic analysis based on multiple alignment could not be directly applied to the whole-genome comparison and phylogenomic studies of viruses. There has been a growing interest in alignment-free methods for phylogenetic analysis using complete genome data. Among the alignment-free methods, a dynamical language (DL) method proposed by our group has successfully been applied to the phylogenetic analysis of bacteria and chloroplast genomes. Results In this paper, the DL method is used to analyze the whole-proteome phylogeny of 124 large dsDNA viruses and 30 parvoviruses, two data sets with large difference in genome size. The trees from our analyses are in good agreement to the latest classification of large dsDNA viruses and parvoviruses by the International Committee on Taxonomy of Viruses (ICTV). Conclusions The present method provides a new way for recovering the phylogeny of large dsDNA viruses and parvoviruses, and also some insights on the affiliation of a number of unclassified viruses. In comparison, some alignment-free methods such as the CV Tree method can be used for recovering the phylogeny of large dsDNA viruses, but they are not suitable for resolving the phylogeny of parvoviruses with a much smaller genome size. PMID:20565983
Challenges and strategies for implementing genomic services in diverse settings: experiences from the Implementing GeNomics In pracTicE (IGNITE) network.

PubMed

Sperber, Nina R; Carpenter, Janet S; Cavallari, Larisa H; J Damschroder, Laura; Cooper-DeHoff, Rhonda M; Denny, Joshua C; Ginsburg, Geoffrey S; Guan, Yue; Horowitz, Carol R; Levy, Kenneth D; Levy, Mia A; Madden, Ebony B; Matheny, Michael E; Pollin, Toni I; Pratt, Victoria M; Rosenman, Marc; Voils, Corrine I; W Weitzel, Kristen; Wilke, Russell A; Ryanne Wu, R; Orlando, Lori A

2017-05-22

To realize potential public health benefits from genetic and genomic innovations, understanding how best to implement the innovations into clinical care is important. The objective of this study was to synthesize data on challenges identified by six diverse projects that are part of a National Human Genome Research Institute (NHGRI)-funded network focused on implementing genomics into practice and strategies to overcome these challenges. We used a multiple-case study approach with each project considered as a case and qualitative methods to elicit and describe themes related to implementation challenges and strategies. We describe challenges and strategies in an implementation framework and typology to enable consistent definitions and cross-case comparisons. Strategies were linked to challenges based on expert review and shared themes. Three challenges were identified by all six projects, and strategies to address these challenges varied across the projects. One common challenge was to increase the relative priority of integrating genomics within the health system electronic health record (EHR). Four projects used data warehousing techniques to accomplish the integration. The second common challenge was to strengthen clinicians' knowledge and beliefs about genomic medicine. To overcome this challenge, all projects developed educational materials and conducted meetings and outreach focused on genomic education for clinicians. The third challenge was engaging patients in the genomic medicine projects. Strategies to overcome this challenge included use of mass media to spread the word, actively involving patients in implementation (e.g., a patient advisory board), and preparing patients to be active participants in their healthcare decisions. This is the first collaborative evaluation focusing on the description of genomic medicine innovations implemented in multiple real-world clinical settings. Findings suggest that strategies to facilitate integration of genomic data within existing EHRs and educate stakeholders about the value of genomic services are considered important for effective implementation. Future work could build on these findings to evaluate which strategies are optimal under what conditions. This information will be useful for guiding translation of discoveries to clinical care, which, in turn, can provide data to inform continual improvement of genomic innovations and their applications.
easyGWAS: A Cloud-Based Platform for Comparing the Results of Genome-Wide Association Studies.

PubMed

Grimm, Dominik G; Roqueiro, Damian; Salomé, Patrice A; Kleeberger, Stefan; Greshake, Bastian; Zhu, Wangsheng; Liu, Chang; Lippert, Christoph; Stegle, Oliver; Schölkopf, Bernhard; Weigel, Detlef; Borgwardt, Karsten M

2017-01-01

The ever-growing availability of high-quality genotypes for a multitude of species has enabled researchers to explore the underlying genetic architecture of complex phenotypes at an unprecedented level of detail using genome-wide association studies (GWAS). The systematic comparison of results obtained from GWAS of different traits opens up new possibilities, including the analysis of pleiotropic effects. Other advantages that result from the integration of multiple GWAS are the ability to replicate GWAS signals and to increase statistical power to detect such signals through meta-analyses. In order to facilitate the simple comparison of GWAS results, we present easyGWAS, a powerful, species-independent online resource for computing, storing, sharing, annotating, and comparing GWAS. The easyGWAS tool supports multiple species, the uploading of private genotype data and summary statistics of existing GWAS, as well as advanced methods for comparing GWAS results across different experiments and data sets in an interactive and user-friendly interface. easyGWAS is also a public data repository for GWAS data and summary statistics and already includes published data and results from several major GWAS. We demonstrate the potential of easyGWAS with a case study of the model organism Arabidopsis thaliana , using flowering and growth-related traits. © 2016 American Society of Plant Biologists. All rights reserved.
From genome-wide to candidate gene: an investigation of variation at the major histocompatibility complex in common bottlenose dolphins exposed to harmful algal blooms.

PubMed

Cammen, Kristina M; Wilcox, Lynsey A; Rosel, Patricia E; Wells, Randall S; Read, Andrew J

2015-02-01

The role the major histocompatibility complex (MHC) plays in response to exposure to environmental toxins is relatively poorly understood, particularly in comparison to its well-described role in pathogen immunity. We investigated associations between MHC diversity and resistance to brevetoxins in common bottlenose dolphins (Tursiops truncatus). A previous genome-wide association study investigating an apparent difference in harmful algal bloom (HAB) resistance among dolphin populations in the Gulf of Mexico identified genetic variation associated with survival in close genomic proximity to multiple MHC class II loci. Here, we characterized genetic variation at DQA, DQB, DRA, and DRB loci in dolphins from central-west Florida and the Florida Panhandle, including dolphins that died during HABs and dolphins presumed to have survived HAB exposure. We found that DRB and DQB exhibited patterns of genetic differentiation among geographic regions that differed from neutral microsatellite loci. In addition, genetic differentiation at DRB across multiple pairwise comparisons of live and dead dolphins was greater than differentiation observed at neutral loci. Our findings at these MHC loci did not approach the strength of association with survival previously described for a nearby genetic variant. However, the results provide evidence that selective pressures at the MHC vary among dolphin populations that differ in the frequency of HAB exposure and that the overall composition of DRB variants differs between dolphin survivors and non-survivors of HABs. These results may suggest a potential role of MHC diversity in variable survival of bottlenose dolphins exposed to HABs.
Fast alignment-free sequence comparison using spaced-word frequencies.

PubMed

Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard

2014-07-15

Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Our program is freely available at http://spaced.gobics.de/. © The Author 2014. Published by Oxford University Press.
Convergent adaptive evolution in marginal environments: unloading transposable elements as a common strategy among mangrove genomes.

PubMed

Lyu, Haomin; He, Ziwen; Wu, Chung-I; Shi, Suhua

2018-01-01

Several clades of mangrove trees independently invade the interface between land and sea at the margin of woody plant distribution. As phenotypic convergence among mangroves is common, the possibility of convergent adaptation in their genomes is quite intriguing. To study this molecular convergence, we sequenced multiple mangrove genomes. In this study, we focused on the evolution of transposable elements (TEs) in relation to the genome size evolution. TEs, generally considered genomic parasites, are the most common components of woody plant genomes. Analyzing the long terminal repeat-retrotransposon (LTR-RT) type of TE, we estimated their death rates by counting solo-LTRs and truncated elements. We found that all lineages of mangroves massively and convergently reduce TE loads in comparison to their nonmangrove relatives; as a consequence, genome size reduction happens independently in all six mangrove lineages; TE load reduction in mangroves can be attributed to the paucity of young elements; the rarity of young LTR-RTs is a consequence of fewer births rather than access death. In conclusion, mangrove genomes employ a convergent strategy of TE load reduction by suppressing element origination in their independent adaptation to a new environment. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints

PubMed Central

Glusman, Gustavo; Mauldin, Denise E.; Hood, Leroy E.; Robinson, Max

2017-01-01

We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into “genome fingerprints” via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics. PMID:29018478
Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation.

PubMed

Sharma, Virag; Hiller, Michael

2017-08-21

Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Uprobe: a genome-wide universal probe resource for comparative physical mapping in vertebrates.

PubMed

Kellner, Wendy A; Sullivan, Robert T; Carlson, Brian H; Thomas, James W

2005-01-01

Interspecies comparisons are important for deciphering the functional content and evolution of genomes. The expansive array of >70 public vertebrate genomic bacterial artificial chromosome (BAC) libraries can provide a means of comparative mapping, sequencing, and functional analysis of targeted chromosomal segments that is independent and complementary to whole-genome sequencing. However, at the present time, no complementary resource exists for the efficient targeted physical mapping of the majority of these BAC libraries. Universal overgo-hybridization probes, designed from regions of sequenced genomes that are highly conserved between species, have been demonstrated to be an effective resource for the isolation of orthologous regions from multiple BAC libraries in parallel. Here we report the application of the universal probe design principal across entire genomes, and the subsequent creation of a complementary probe resource, Uprobe, for screening vertebrate BAC libraries. Uprobe currently consists of whole-genome sets of universal overgo-hybridization probes designed for screening mammalian or avian/reptilian libraries. Retrospective analysis, experimental validation of the probe design process on a panel of representative BAC libraries, and estimates of probe coverage across the genome indicate that the majority of all eutherian and avian/reptilian genes or regions of interest can be isolated using Uprobe. Future implementation of the universal probe design strategy will be used to create an expanded number of whole-genome probe sets that will encompass all vertebrate genomes.
Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park

PubMed Central

2013-01-01

Background A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. Results The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Conclusions Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. Reviewers This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia PMID:23607440
FANCA safeguards interphase and mitosis during hematopoiesis in vivo

PubMed Central

Abdul-Sater, Zahi; Cerabona, Donna; Sierra Potchanant, Elizabeth; Sun, Zejin; Enzor, Rikki; He, Ying; Robertson, Kent; Goebel, W. Scott; Nalepa, Grzegorz

2015-01-01

Fanconi anemia (FA/BRCA) signaling network controls multiple genome-housekeeping checkpoints, from interphase DNA repair to mitosis. The in vivo role of abnormal cell division in FA remains unknown. Here, we quantified the origins of genomic instability in FA patients and mice in vivo and ex vivo. We found that both mitotic errors and interphase DNA damage significantly contribute to genomic instability during FA-deficient hematopoiesis and in non-hematopoietic human and murine FA primary cells. Super-resolution microscopy coupled with functional assays revealed that FANCA shuttles to the pericentriolar material (PCM) to regulate spindle assembly at mitotic entry. Loss of FA signaling rendered cells hypersensitive to spindle chemotherapeutics and allowed escape from the chemotherapy-induced spindle assembly checkpoint. In support of these findings, direct comparison of DNA cross-linking and antimitotic chemotherapeutics in primary FANCA−/− cells revealed genomic instability originating through divergent cell cycle checkpoint aberrations. Our data indicate that the FA/BRCA signaling functions as an in vivo gatekeeper of genomic integrity throughout interphase and mitosis, which may have implications for future targeted therapies in FA and FA-deficient cancers. PMID:26366677
Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.

PubMed

Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza

2013-10-01

Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.
Insights from genomic comparisons of genetically monomorphic bacterial pathogens

PubMed Central

Achtman, Mark

2012-01-01

Some of the most deadly bacterial diseases, including leprosy, anthrax and plague, are caused by bacterial lineages with extremely low levels of genetic diversity, the so-called ‘genetically monomorphic bacteria’. It has only become possible to analyse the population genetics of such bacteria since the recent advent of high-throughput comparative genomics. The genomes of genetically monomorphic lineages contain very few polymorphic sites, which often reflect unambiguous clonal genealogies. Some genetically monomorphic lineages have evolved in the last decades, e.g. antibiotic-resistant Staphylococcus aureus, whereas others have evolved over several millennia, e.g. the cause of plague, Yersinia pestis. Based on recent results, it is now possible to reconstruct the sources and the history of pandemic waves of plague by a combined analysis of phylogeographic signals in Y. pestis plus polymorphisms found in ancient DNA. Different from historical accounts based exclusively on human disease, Y. pestis evolved in China, or the vicinity, and has spread globally on multiple occasions. These routes of transmission can be reconstructed from the genealogy, most precisely for the most recent pandemic that was spread from Hong Kong in multiple independent waves in 1894. PMID:22312053
The genome and phenome of the green alga Chloroidium sp. UTEX 3007 reveal adaptive traits for desert acclimatization

PubMed Central

Nelson, David R; Khraiwesh, Basel; Fu, Weiqi; Alseekh, Saleh; Jaiswal, Ashish; Chaiboonchoe, Amphun; Hazzouri, Khaled M; O’Connor, Matthew J; Butterfoss, Glenn L; Drou, Nizar; Rowe, Jillian D; Harb, Jamil; Fernie, Alisdair R; Gunsalus, Kristin C; Salehi-Ashtiani, Kourosh

2017-01-01

To investigate the phenomic and genomic traits that allow green algae to survive in deserts, we characterized a ubiquitous species, Chloroidium sp. UTEX 3007, which we isolated from multiple locations in the United Arab Emirates (UAE). Metabolomic analyses of Chloroidium sp. UTEX 3007 indicated that the alga accumulates a broad range of carbon sources, including several desiccation tolerance-promoting sugars and unusually large stores of palmitate. Growth assays revealed capacities to grow in salinities from zero to 60 g/L and to grow heterotrophically on >40 distinct carbon sources. Assembly and annotation of genomic reads yielded a 52.5 Mbp genome with 8153 functionally annotated genes. Comparison with other sequenced green algae revealed unique protein families involved in osmotic stress tolerance and saccharide metabolism that support phenomic studies. Our results reveal the robust and flexible biology utilized by a green alga to successfully inhabit a desert coastline. DOI: http://dx.doi.org/10.7554/eLife.25783.001 PMID:28623667
Functional Genomic Landscape of Human Breast Cancer Drivers, Vulnerabilities, and Resistance.

PubMed

Marcotte, Richard; Sayad, Azin; Brown, Kevin R; Sanchez-Garcia, Felix; Reimand, Jüri; Haider, Maliha; Virtanen, Carl; Bradner, James E; Bader, Gary D; Mills, Gordon B; Pe'er, Dana; Moffat, Jason; Neel, Benjamin G

2016-01-14

Large-scale genomic studies have identified multiple somatic aberrations in breast cancer, including copy number alterations and point mutations. Still, identifying causal variants and emergent vulnerabilities that arise as a consequence of genetic alterations remain major challenges. We performed whole-genome small hairpin RNA (shRNA) "dropout screens" on 77 breast cancer cell lines. Using a hierarchical linear regression algorithm to score our screen results and integrate them with accompanying detailed genetic and proteomic information, we identify vulnerabilities in breast cancer, including candidate "drivers," and reveal general functional genomic properties of cancer cells. Comparisons of gene essentiality with drug sensitivity data suggest potential resistance mechanisms, effects of existing anti-cancer drugs, and opportunities for combination therapy. Finally, we demonstrate the utility of this large dataset by identifying BRD4 as a potential target in luminal breast cancer and PIK3CA mutations as a resistance determinant for BET-inhibitors. Copyright © 2016 Elsevier Inc. All rights reserved.
Species Choice for Comparative Genomics: Being Greedy Works

PubMed Central

Pardi, Fabio; Goldman, Nick

2005-01-01

Several projects investigating genetic function and evolution through sequencing and comparison of multiple genomes are now underway. These projects consume many resources, and appropriate planning should be devoted to choosing which species to sequence, potentially involving cooperation among different sequencing centres. A widely discussed criterion for species choice is the maximisation of evolutionary divergence. Our mathematical formalization of this problem surprisingly shows that the best long-term cooperative strategy coincides with the seemingly short-term “greedy” strategy of always choosing the next best single species. Other criteria influencing species choice, such as medical relevance or sequencing costs, can also be accommodated in our approach, suggesting our results' broad relevance in scientific policy decisions. PMID:16327885
The Drosophila genome nexus: a population genomic resource of 623 Drosophila melanogaster genomes, including 197 from a single ancestral range population.

PubMed

Lack, Justin B; Cardeno, Charis M; Crepeau, Marc W; Taylor, William; Corbett-Detig, Russell B; Stevens, Kristian A; Langley, Charles H; Pool, John E

2015-04-01

Hundreds of wild-derived Drosophila melanogaster genomes have been published, but rigorous comparisons across data sets are precluded by differences in alignment methodology. The most common approach to reference-based genome assembly is a single round of alignment followed by quality filtering and variant detection. We evaluated variations and extensions of this approach and settled on an assembly strategy that utilizes two alignment programs and incorporates both substitutions and short indels to construct an updated reference for a second round of mapping prior to final variant detection. Utilizing this approach, we reassembled published D. melanogaster population genomic data sets and added unpublished genomes from several sub-Saharan populations. Most notably, we present aligned data from phase 3 of the Drosophila Population Genomics Project (DPGP3), which provides 197 genomes from a single ancestral range population of D. melanogaster (from Zambia). The large sample size, high genetic diversity, and potentially simpler demographic history of the DPGP3 sample will make this a highly valuable resource for fundamental population genetic research. The complete set of assemblies described here, termed the Drosophila Genome Nexus, presently comprises 623 consistently aligned genomes and is publicly available in multiple formats with supporting documentation and bioinformatic tools. This resource will greatly facilitate population genomic analysis in this model species by reducing the methodological differences between data sets. Copyright © 2015 by the Genetics Society of America.
ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications.

PubMed

Sablok, Gaurav; Chen, Ting-Wen; Lee, Chi-Ching; Yang, Chi; Gan, Ruei-Chi; Wegrzyn, Jill L; Porta, Nicola L; Nayak, Kinshuk C; Huang, Po-Jung; Varotto, Claudio; Tang, Petrus

2017-06-01

Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances.

PubMed

Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

2016-01-01

Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos).
gmos: Rapid Detection of Genome Mosaicism over Short Evolutionary Distances

PubMed Central

Domazet-Lošo, Mirjana; Domazet-Lošo, Tomislav

2016-01-01

Prokaryotic and viral genomes are often altered by recombination and horizontal gene transfer. The existing methods for detecting recombination are primarily aimed at viral genomes or sets of loci, since the expensive computation of underlying statistical models often hinders the comparison of complete prokaryotic genomes. As an alternative, alignment-free solutions are more efficient, but cannot map (align) a query to subject genomes. To address this problem, we have developed gmos (Genome MOsaic Structure), a new program that determines the mosaic structure of query genomes when compared to a set of closely related subject genomes. The program first computes local alignments between query and subject genomes and then reconstructs the query mosaic structure by choosing the best local alignment for each query region. To accomplish the analysis quickly, the program mostly relies on pairwise alignments and constructs multiple sequence alignments over short overlapping subject regions only when necessary. This fine-tuned implementation achieves an efficiency comparable to an alignment-free tool. The program performs well for simulated and real data sets of closely related genomes and can be used for fast recombination detection; for instance, when a new prokaryotic pathogen is discovered. As an example, gmos was used to detect genome mosaicism in a pathogenic Enterococcus faecium strain compared to seven closely related genomes. The analysis took less than two minutes on a single 2.1 GHz processor. The output is available in fasta format and can be visualized using an accessory program, gmosDraw (freely available with gmos). PMID:27846272
Genomic comparison of virulent and non-virulent Streptococcus agalactiae in fish.

PubMed

Delannoy, C M J; Zadoks, R N; Crumlish, M; Rodgers, D; Lainson, F A; Ferguson, H W; Turnbull, J; Fontaine, M C

2016-01-01

Streptococcus agalactiae infections in fish are predominantly caused by beta-haemolytic strains of clonal complex (CC) 7, notably its namesake sequence type (ST) 7, or by non-haemolytic strains of CC552, including the globally distributed ST260. In contrast, CC23, including its namesake ST23, has been associated with a wide homeothermic and poikilothermic host range, but never with fish. The aim of this study was to determine whether ST23 is virulent in fish and to identify genomic markers of fish adaptation of S. agalactiae. Intraperitoneal challenge of Nile tilapia, Oreochromis niloticus (Linnaeus), showed that ST260 is lethal at doses down to 10(2) cfu per fish, whereas ST23 does not cause disease at 10(7) cfu per fish. Comparison of the genome sequence of ST260 and ST23 with those of strains derived from fish, cattle and humans revealed the presence of genomic elements that are unique to subpopulations of S. agalactiae that have the ability to infect fish (CC7 and CC552). These loci occurred in clusters exhibiting typical signatures of mobile genetic elements. PCR-based screening of a collection of isolates from multiple host species confirmed the association of selected genes with fish-derived strains. Several fish-associated genes encode proteins that potentially provide fitness in the aquatic environment. © 2014 John Wiley & Sons Ltd.
Sequence analysis of the complete genome of Trichoplusia ni single nucleopolyhedrovirus and the identification of a baculoviral photolyase gene

DOE Office of Scientific and Technical Information (OSTI.GOV)

Willis, Leslie G.; Siepp, Robyn; Stewart, Taryn M.

2005-08-01

The genome of the Trichoplusia ni single nucleopolyhedrovirus (TnSNPV), a group II NPV which infects the cabbage looper (T. ni), has been completely sequenced and analyzed. The TnSNPV DNA genome consists of 134,394 bp and has an overall G + C content of 39%. Gene analysis predicted 144 open reading frames (ORFs) of 150 nucleotides or greater that showed minimal overlap. Comparisons with previously sequenced baculoviruses indicate that 119 TnSNPV ORFs were homologues of previously reported viral gene sequences. Ninety-four TnSNPV ORFs returned an Autographa californica multiple NPV (AcMNPV) homologue while 25 ORFs returned poor or no sequence matches withmore » the current databases. A putative photolyase gene was also identified that had highest amino acid identity to the photolyase genes of Chrysodeixis chalcites NPV (ChchNPV) (47%) and Danio rerio (zebrafish) (40%). In addition unlike all other baculoviruses no obvious homologous repeat (hr) sequences were identified. Comparison of the TnSNPV and AcMNPV genomes provides a unique opportunity to examine two baculoviruses that are highly virulent for a common insect host (T. ni) yet belong to diverse baculovirus taxonomic groups and possess distinct biological features. In vitro fusion assays demonstrated that the TnSNPV F protein induces membrane fusion and syncytia formation and were compared to syncytia formed by AcMNPV GP64.« less
Standard operating procedure for calculating genome-to-genome distances based on high-scoring segment pairs.

PubMed

Auch, Alexander F; Klenk, Hans-Peter; Göker, Markus

2010-01-28

DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).
Seshat: A Web service for accurate annotation, validation, and analysis of TP53 variants generated by conventional and next-generation sequencing.

PubMed

Tikkanen, Tuomas; Leroy, Bernard; Fournier, Jean Louis; Risques, Rosa Ana; Malcikova, Jitka; Soussi, Thierry

2018-07-01

Accurate annotation of genomic variants in human diseases is essential to allow personalized medicine. Assessment of somatic and germline TP53 alterations has now reached the clinic and is required in several circumstances such as the identification of the most effective cancer therapy for patients with chronic lymphocytic leukemia (CLL). Here, we present Seshat, a Web service for annotating TP53 information derived from sequencing data. A flexible framework allows the use of standard file formats such as Mutation Annotation Format (MAF) or Variant Call Format (VCF), as well as common TXT files. Seshat performs accurate variant annotations using the Human Genome Variation Society (HGVS) nomenclature and the stable TP53 genomic reference provided by the Locus Reference Genomic (LRG). In addition, using the 2017 release of the UMD_TP53 database, Seshat provides multiple statistical information for each TP53 variant including database frequency, functional activity, or pathogenicity. The information is delivered in standardized output tables that minimize errors and facilitate comparison of mutational data across studies. Seshat is a beneficial tool to interpret the ever-growing TP53 sequencing data generated by multiple sequencing platforms and it is freely available via the TP53 Website, http://p53.fr or directly at http://vps338341.ovh.net/. © 2018 Wiley Periodicals, Inc.
Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life.

PubMed

Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F

2013-12-17

The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization.
Genome resolved analysis of a premature infant gut microbial community reveals a Varibaculum cambriense genome and a shift towards fermentation-based metabolism during the third week of life

PubMed Central

2013-01-01

Background The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. Results To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community. We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation. During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Conclusions Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization. PMID:24451181
MetaQUAST: evaluation of metagenome assemblies.

PubMed

Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey

2016-04-01

During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. http://bioinf.spbau.ru/metaquast aleksey.gurevich@spbu.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Comparison of Next-Generation Sequencing Systems

PubMed Central

Liu, Lin; Li, Yinhu; Li, Siliang; Hu, Ni; He, Yimin; Pong, Ray; Lin, Danni; Lu, Lihua; Law, Maggie

2012-01-01

With fast development and wide applications of next-generation sequencing (NGS) technologies, genomic sequence information is within reach to aid the achievement of goals to decode life mysteries, make better crops, detect pathogens, and improve life qualities. NGS systems are typically represented by SOLiD/Ion Torrent PGM from Life Sciences, Genome Analyzer/HiSeq 2000/MiSeq from Illumina, and GS FLX Titanium/GS Junior from Roche. Beijing Genomics Institute (BGI), which possesses the world's biggest sequencing capacity, has multiple NGS systems including 137 HiSeq 2000, 27 SOLiD, one Ion Torrent PGM, one MiSeq, and one 454 sequencer. We have accumulated extensive experience in sample handling, sequencing, and bioinformatics analysis. In this paper, technologies of these systems are reviewed, and first-hand data from extensive experience is summarized and analyzed to discuss the advantages and specifics associated with each sequencing system. At last, applications of NGS are summarized. PMID:22829749

A genetic linkage map for the apicomplexan protozoan parasite Eimeria maxima and comparison with Eimeria tenella.

PubMed

Blake, Damer P; Oakes, Richard; Smith, Adrian L

2011-02-01

Eimeria maxima is one of the seven Eimeria spp. that infect the chicken and cause the disease coccidiosis. The well characterised immunogenicity and genetic diversity associated with E. maxima promote its use in genetics-led studies on avian coccidiosis. The development of a genetic map for E. maxima, presented here based upon 647 amplified fragment length polymorphism markers typed from 22 clonal hybrid lines and assembled into 13 major linkage groups, is a major new resource for work with this parasite. Comparison with genetic maps produced for other coccidial parasites indicates relatively high levels of genetic recombination. Conversion of ∼14% of the markers representing the major linkage groups to sequence characterised amplified region markers can provide a scaffold for the assembly of future genomic sequences as well as providing a foundation for more detailed genetic maps. Comparison with the Eimeria tenella genetic map produced 10years ago has revealed a less biased marker distribution, with no more than nine markers mapped within any unresolved heritable unit. Nonetheless, preliminary bioinformatic characterisation of the three largest publicly available genomic E. maxima sequences suggest that the feature-poor/feature-rich structure which has previously been found to define the first sequenced E. tenella chromosome also defines the E. maxima genome. The significance of such a segmented genome and the apparent potential for variation in genetic recombination will be relevant to haplotype stability and the longevity of future anticoccidial strategies based upon multiple loci targeted by novel chemotherapeutic drugs or recombinant subunit vaccines. Copyright © 2010 Australian Society for Parasitology Inc. Published by Elsevier Ltd. All rights reserved.
Comparison of Various Nuclear Localization Signal-Fused Cas9 Proteins and Cas9 mRNA for Genome Editing in Zebrafish

PubMed Central

Hu, Peinan; Zhao, Xueying; Zhang, Qinghua; Li, Weiming; Zu, Yao

2018-01-01

The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has been proven to be an efficient and precise genome editing technology in various organisms. However, the gene editing efficiencies of Cas9 proteins with a nuclear localization signal (NLS) fused to different termini and Cas9 mRNA have not been systematically compared. Here, we compared the ability of Cas9 proteins with NLS fused to the N-, C-, or both the N- and C-termini and N-NLS-Cas9-NLS-C mRNA to target two sites in the tyr gene and two sites in the gol gene related to pigmentation in zebrafish. Phenotypic analysis revealed that all types of Cas9 led to hypopigmentation in similar proportions of injected embryos. Genome analysis by T7 Endonuclease I (T7E1) assays demonstrated that all types of Cas9 similarly induced mutagenesis in four target sites. Sequencing results further confirmed that a high frequency of indels occurred in the target sites (tyr1 > 66%, tyr2 > 73%, gol1 > 50%, and gol2 > 35%), as well as various types (more than six) of indel mutations observed in all four types of Cas9-injected embryos. Furthermore, all types of Cas9 showed efficient targeted mutagenesis on multiplex genome editing, resulting in multiple phenotypes simultaneously. Collectively, we conclude that various NLS-fused Cas9 proteins and Cas9 mRNAs have similar genome editing efficiencies on targeting single or multiple genes, suggesting that the efficiency of CRISPR/Cas9 genome editing is highly dependent on guide RNAs (gRNAs) and gene loci. These findings may help to simplify the selection of Cas9 for gene editing using the CRISPR/Cas9 system. PMID:29295818
As Clear as Mud? Determining the Diversity and Prevalence of Prophages in the Draft Genomes of Estuarine Isolates of Clostridium difficile.

PubMed

Hargreaves, Katherine R; Otieno, James R; Thanki, Anisha; Blades, Matthew J; Millard, Andrew D; Browne, Hilary P; Lawley, Trevor D; Clokie, Martha R J

2015-05-27

The bacterium Clostridium difficile is a significant cause of nosocomial infections worldwide. The pathogenic success of this organism can be attributed to its flexible genome which is characterized by the exchange of mobile genetic elements, and by ongoing genome evolution. Despite its pathogenic status, C. difficile can also be carried asymptomatically, and has been isolated from natural environments such as water and sediments where multiple strain types (ribotypes) are found in close proximity. These include ribotypes which are associated with disease, as well as those that are less commonly isolated from patients. Little is known about the genomic content of strains in such reservoirs in the natural environment. In this study, draft genomes have been generated for 13 C. difficile isolates from estuarine sediments including clinically relevant and environmental associated types. To identify the genetic diversity within this strain collection, whole-genome comparisons were performed using the assemblies. The strains are highly genetically diverse with regards to the C. difficile "mobilome," which includes transposons and prophage elements. We identified a novel transposon-like element in two R078 isolates. Multiple, related and unrelated, prophages were detected in isolates across ribotype groups, including two novel prophage elements and those related to the transducing phage φC2. The susceptibility of these isolates to lytic phage infection was tested using a panel of characterized phages found from the same locality. In conclusion, estuarine sediments are a source of genetically diverse C. difficile strains with a complex network of prophages, which could contribute to the emergence of new strains in clinics. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
As Clear as Mud? Determining the Diversity and Prevalence of Prophages in the Draft Genomes of Estuarine Isolates of Clostridium difficile

PubMed Central

Hargreaves, Katherine R.; Otieno, James R.; Thanki, Anisha; Blades, Matthew J.; Millard, Andrew D.; Browne, Hilary P.; Lawley, Trevor D.; Clokie, Martha R.J.

2015-01-01

The bacterium Clostridium difficile is a significant cause of nosocomial infections worldwide. The pathogenic success of this organism can be attributed to its flexible genome which is characterized by the exchange of mobile genetic elements, and by ongoing genome evolution. Despite its pathogenic status, C. difficile can also be carried asymptomatically, and has been isolated from natural environments such as water and sediments where multiple strain types (ribotypes) are found in close proximity. These include ribotypes which are associated with disease, as well as those that are less commonly isolated from patients. Little is known about the genomic content of strains in such reservoirs in the natural environment. In this study, draft genomes have been generated for 13 C. difficile isolates from estuarine sediments including clinically relevant and environmental associated types. To identify the genetic diversity within this strain collection, whole-genome comparisons were performed using the assemblies. The strains are highly genetically diverse with regards to the C. difficile “mobilome,” which includes transposons and prophage elements. We identified a novel transposon-like element in two R078 isolates. Multiple, related and unrelated, prophages were detected in isolates across ribotype groups, including two novel prophage elements and those related to the transducing phage φC2. The susceptibility of these isolates to lytic phage infection was tested using a panel of characterized phages found from the same locality. In conclusion, estuarine sediments are a source of genetically diverse C. difficile strains with a complex network of prophages, which could contribute to the emergence of new strains in clinics. PMID:26019165
Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

PubMed

Uchiyama, Ikuo

2008-10-31

Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.
Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes

PubMed Central

Cannon, Steven B.; Sterck, Lieven; Rombauts, Stephane; Sato, Shusei; Cheung, Foo; Gouzy, Jérôme; Wang, Xiaohong; Mudge, Joann; Vasdewani, Jayprakash; Schiex, Thomas; Spannagl, Manuel; Monaghan, Erin; Nicholson, Christine; Humphray, Sean J.; Schoof, Heiko; Mayer, Klaus F. X.; Rogers, Jane; Quétier, Francis; Oldroyd, Giles E.; Debellé, Frédéric; Cook, Douglas R.; Retzel, Ernest F.; Roe, Bruce A.; Town, Christopher D.; Tabata, Satoshi; Van de Peer, Yves; Young, Nevin D.

2006-01-01

Genome sequencing of the model legumes, Medicago truncatula and Lotus japonicus, provides an opportunity for large-scale sequence-based comparison of two genomes in the same plant family. Here we report synteny comparisons between these species, including details about chromosome relationships, large-scale synteny blocks, microsynteny within blocks, and genome regions lacking clear correspondence. The Lotus and Medicago genomes share a minimum of 10 large-scale synteny blocks, each with substantial collinearity and frequently extending the length of whole chromosome arms. The proportion of genes syntenic and collinear within each synteny block is relatively homogeneous. Medicago–Lotus comparisons also indicate similar and largely homogeneous gene densities, although gene-containing regions in Mt occupy 20–30% more space than Lj counterparts, primarily because of larger numbers of Mt retrotransposons. Because the interpretation of genome comparisons is complicated by large-scale genome duplications, we describe synteny, synonymous substitutions and phylogenetic analyses to identify and date a probable whole-genome duplication event. There is no direct evidence for any recent large-scale genome duplication in either Medicago or Lotus but instead a duplication predating speciation. Phylogenetic comparisons place this duplication within the Rosid I clade, clearly after the split between legumes and Salicaceae (poplar). PMID:17003129
Genome alignment with graph data structures: a comparison

PubMed Central

2014-01-01

Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884
Comparison of phasing strategies for whole human genomes

PubMed Central

Kirkness, Ewen; Schork, Nicholas J.

2018-01-01

Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density. PMID:29621242
Finding approximate gene clusters with Gecko 3.

PubMed

Winter, Sascha; Jahn, Katharina; Wehner, Stefanie; Kuchenbecker, Leon; Marz, Manja; Stoye, Jens; Böcker, Sebastian

2016-11-16

Gene-order-based comparison of multiple genomes provides signals for functional analysis of genes and the evolutionary process of genome organization. Gene clusters are regions of co-localized genes on genomes of different species. The rapid increase in sequenced genomes necessitates bioinformatics tools for finding gene clusters in hundreds of genomes. Existing tools are often restricted to few (in many cases, only two) genomes, and often make restrictive assumptions such as short perfect conservation, conserved gene order or monophyletic gene clusters. We present Gecko 3, an open-source software for finding gene clusters in hundreds of bacterial genomes, that comes with an easy-to-use graphical user interface. The underlying gene cluster model is intuitive, can cope with low degrees of conservation as well as misannotations and is complemented by a sound statistical evaluation. To evaluate the biological benefit of Gecko 3 and to exemplify our method, we search for gene clusters in a dataset of 678 bacterial genomes using Synechocystis sp. PCC 6803 as a reference. We confirm detected gene clusters reviewing the literature and comparing them to a database of operons; we detect two novel clusters, which were confirmed by publicly available experimental RNA-Seq data. The computational analysis is carried out on a laptop computer in <40 min. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genome multiplication as adaptation to tissue survival: evidence from gene expression in mammalian heart and liver.

PubMed

Anatskaya, Olga V; Vinogradov, Alexander E

2007-01-01

To elucidate the functional significance of genome multiplication in somatic tissues, we performed a large-scale analysis of ploidy-associated changes in expression of non-tissue-specific (i.e., broadly expressed) genes in the heart and liver of human and mouse (6585 homologous genes were analyzed). These species have inverse patterns of polyploidization in cardiomyocytes and hepatocytes. The between-species comparison of two pairs of homologous tissues with crisscross contrast in ploidy levels allows the removal of the effects of species and tissue specificity on the profile of gene activity. The different tests performed from the standpoint of modular biology revealed a consistent picture of ploidy-associated alteration in a wide range of functional gene groups. The major effects consisted of hypoxia-inducible factor-triggered changes in main cellular processes and signaling pathways, activation of defense against DNA lesions, acceleration of protein turnover and transcription, and the impairment of apoptosis, the immune response, and cytoskeleton maintenance. We also found a severe decline in aerobic respiration and stimulation of sugar and fatty acid metabolism. These metabolic rearrangements create a special type of metabolism that can be considered intermediate between aerobic and anaerobic. The metabolic and physiological changes revealed (reflected in the alteration of gene expression) help explain the unique ability of polyploid tissues to combine proliferation and differentiation, which are separated in diploid tissues. We argue that genome multiplication promotes cell survival and tissue regeneration under stressful conditions.
Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis

PubMed Central

Stata, Matt; Wang, Wei; White, Merlin M.; Moncalvo, Jean-Marc

2018-01-01

ABSTRACT Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. PMID:29764946
One Bacterial Cell, One Complete Genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos

2010-04-26

While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated frommore » the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.« less
Reticulate classification of mosaic microbial genomes using NeAT website.

PubMed

Lima-Mendez, Gipsi

2012-01-01

The tree of life is the classical representation of the evolutionary relationships between existent species. A tree is appropriate to display the divergence of species through mutation, i.e., by vertical descent. However, lateral gene transfer (LGT) is excluded from such representations. When LGT contribution to genome evolution cannot be neglected (e.g., for prokaryotes and mobile genetic elements), the tree becomes misleading. Networks appear as an intuitive way to represent both vertical and horizontal relationships, while overlapping groups within such graphs are more suitable for their classification. Here, we describe a method to represent both vertical and horizontal relationships. We start with a set of genomes whose coded proteins have been grouped into families based on sequence similarity. Next, all pairs of genomes are compared, counting the number of proteins classified into the same family. From this comparison, we derive a weighted graph where genomes with a significant number of similar proteins are linked. Finally, we apply a two-step clustering of this graph to produce a classification where nodes can be assigned to multiple clusters. The procedure can be performed using the Network Analysis Tools (NeAT) website.
5C-ID: Increased resolution Chromosome-Conformation-Capture-Carbon-Copy with in situ 3C and double alternating primer design.

PubMed

Kim, Ji Hun; Titus, Katelyn R; Gong, Wanfeng; Beagan, Jonathan A; Cao, Zhendong; Phillips-Cremins, Jennifer E

2018-05-14

Mammalian genomes are folded in a hierarchy of compartments, topologically associating domains (TADs), subTADs, and looping interactions. Currently, there is a great need to evaluate the link between chromatin topology and genome function across many biological conditions and genetic perturbations. Hi-C can generate genome-wide maps of looping interactions but is intractable for high-throughput comparison of loops across multiple conditions due to the enormous number of reads (>6 Billion) required per library. Here, we describe 5C-ID, a new version of Chromosome-Conformation-Capture-Carbon-Copy (5C) with restriction digest and ligation performed in the nucleus (in situ Chromosome-Conformation-Capture (3C)) and ligation-mediated amplification performed with a double alternating primer design. We demonstrate that 5C-ID produces higher-resolution 3D genome folding maps with reduced spatial noise using markedly lower cell numbers than canonical 5C. 5C-ID enables the creation of high-resolution, high-coverage maps of chromatin loops in up to a 30 Megabase subset of the genome at a fraction of the cost of Hi-C. Copyright © 2018 Elsevier Inc. All rights reserved.
Genome-scale reconstruction of the sigma factor network in Escherichia coli: topology and functional states

PubMed Central

2014-01-01

Background At the beginning of the transcription process, the RNA polymerase (RNAP) core enzyme requires a σ-factor to recognize the genomic location at which the process initiates. Although the crucial role of σ-factors has long been appreciated and characterized for many individual promoters, we do not yet have a genome-scale assessment of their function. Results Using multiple genome-scale measurements, we elucidated the network of σ-factor and promoter interactions in Escherichia coli. The reconstructed network includes 4,724 σ-factor-specific promoters corresponding to transcription units (TUs), representing an increase of more than 300% over what has been previously reported. The reconstructed network was used to investigate competition between alternative σ-factors (the σ70 and σ38 regulons), confirming the competition model of σ substitution and negative regulation by alternative σ-factors. Comparison with σ-factor binding in Klebsiella pneumoniae showed that transcriptional regulation of conserved genes in closely related species is unexpectedly divergent. Conclusions The reconstructed network reveals the regulatory complexity of the promoter architecture in prokaryotic genomes, and opens a path to the direct determination of the systems biology of their transcriptional regulatory networks. PMID:24461193
Comparative genomics of four closely related Clostridium perfringens bacteriophages reveals variable evolution among core genes with therapeutic potential

PubMed Central

2011-01-01

Background Because biotechnological uses of bacteriophage gene products as alternatives to conventional antibiotics will require a thorough understanding of their genomic context, we sequenced and analyzed the genomes of four closely related phages isolated from Clostridium perfringens, an important agricultural and human pathogen. Results Phage whole-genome tetra-nucleotide signatures and proteomic tree topologies correlated closely with host phylogeny. Comparisons of our phage genomes to 26 others revealed three shared COGs; of particular interest within this core genome was an endolysin (PF01520, an N-acetylmuramoyl-L-alanine amidase) and a holin (PF04531). Comparative analyses of the evolutionary history and genomic context of these common phage proteins revealed two important results: 1) strongly significant host-specific sequence variation within the endolysin, and 2) a protein domain architecture apparently unique to our phage genomes in which the endolysin is located upstream of its associated holin. Endolysin sequences from our phages were one of two very distinct genotypes distinguished by variability within the putative enzymatically-active domain. The shared or core genome was comprised of genes with multiple sequence types belonging to five pfam families, and genes belonging to 12 pfam families, including the holin genes, which were nearly identical. Conclusions Significant genomic diversity exists even among closely-related bacteriophages. Holins and endolysins represent conserved functions across divergent phage genomes and, as we demonstrate here, endolysins can have significant variability and host-specificity even among closely-related genomes. Endolysins in our phage genomes may be subject to different selective pressures than the rest of the genome. These findings may have important implications for potential biotechnological applications of phage gene products. PMID:21631945
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

PubMed

Birney, Ewan; Stamatoyannopoulos, John A; Dutta, Anindya; Guigó, Roderic; Gingeras, Thomas R; Margulies, Elliott H; Weng, Zhiping; Snyder, Michael; Dermitzakis, Emmanouil T; Thurman, Robert E; Kuehn, Michael S; Taylor, Christopher M; Neph, Shane; Koch, Christoph M; Asthana, Saurabh; Malhotra, Ankit; Adzhubei, Ivan; Greenbaum, Jason A; Andrews, Robert M; Flicek, Paul; Boyle, Patrick J; Cao, Hua; Carter, Nigel P; Clelland, Gayle K; Davis, Sean; Day, Nathan; Dhami, Pawandeep; Dillon, Shane C; Dorschner, Michael O; Fiegler, Heike; Giresi, Paul G; Goldy, Jeff; Hawrylycz, Michael; Haydock, Andrew; Humbert, Richard; James, Keith D; Johnson, Brett E; Johnson, Ericka M; Frum, Tristan T; Rosenzweig, Elizabeth R; Karnani, Neerja; Lee, Kirsten; Lefebvre, Gregory C; Navas, Patrick A; Neri, Fidencio; Parker, Stephen C J; Sabo, Peter J; Sandstrom, Richard; Shafer, Anthony; Vetrie, David; Weaver, Molly; Wilcox, Sarah; Yu, Man; Collins, Francis S; Dekker, Job; Lieb, Jason D; Tullius, Thomas D; Crawford, Gregory E; Sunyaev, Shamil; Noble, William S; Dunham, Ian; Denoeud, France; Reymond, Alexandre; Kapranov, Philipp; Rozowsky, Joel; Zheng, Deyou; Castelo, Robert; Frankish, Adam; Harrow, Jennifer; Ghosh, Srinka; Sandelin, Albin; Hofacker, Ivo L; Baertsch, Robert; Keefe, Damian; Dike, Sujit; Cheng, Jill; Hirsch, Heather A; Sekinger, Edward A; Lagarde, Julien; Abril, Josep F; Shahab, Atif; Flamm, Christoph; Fried, Claudia; Hackermüller, Jörg; Hertel, Jana; Lindemeyer, Manja; Missal, Kristin; Tanzer, Andrea; Washietl, Stefan; Korbel, Jan; Emanuelsson, Olof; Pedersen, Jakob S; Holroyd, Nancy; Taylor, Ruth; Swarbreck, David; Matthews, Nicholas; Dickson, Mark C; Thomas, Daryl J; Weirauch, Matthew T; Gilbert, James; Drenkow, Jorg; Bell, Ian; Zhao, XiaoDong; Srinivasan, K G; Sung, Wing-Kin; Ooi, Hong Sain; Chiu, Kuo Ping; Foissac, Sylvain; Alioto, Tyler; Brent, Michael; Pachter, Lior; Tress, Michael L; Valencia, Alfonso; Choo, Siew Woh; Choo, Chiou Yu; Ucla, Catherine; Manzano, Caroline; Wyss, Carine; Cheung, Evelyn; Clark, Taane G; Brown, James B; Ganesh, Madhavan; Patel, Sandeep; Tammana, Hari; Chrast, Jacqueline; Henrichsen, Charlotte N; Kai, Chikatoshi; Kawai, Jun; Nagalakshmi, Ugrappa; Wu, Jiaqian; Lian, Zheng; Lian, Jin; Newburger, Peter; Zhang, Xueqing; Bickel, Peter; Mattick, John S; Carninci, Piero; Hayashizaki, Yoshihide; Weissman, Sherman; Hubbard, Tim; Myers, Richard M; Rogers, Jane; Stadler, Peter F; Lowe, Todd M; Wei, Chia-Lin; Ruan, Yijun; Struhl, Kevin; Gerstein, Mark; Antonarakis, Stylianos E; Fu, Yutao; Green, Eric D; Karaöz, Ulaş; Siepel, Adam; Taylor, James; Liefer, Laura A; Wetterstrand, Kris A; Good, Peter J; Feingold, Elise A; Guyer, Mark S; Cooper, Gregory M; Asimenos, George; Dewey, Colin N; Hou, Minmei; Nikolaev, Sergey; Montoya-Burgos, Juan I; Löytynoja, Ari; Whelan, Simon; Pardi, Fabio; Massingham, Tim; Huang, Haiyan; Zhang, Nancy R; Holmes, Ian; Mullikin, James C; Ureta-Vidal, Abel; Paten, Benedict; Seringhaus, Michael; Church, Deanna; Rosenbloom, Kate; Kent, W James; Stone, Eric A; Batzoglou, Serafim; Goldman, Nick; Hardison, Ross C; Haussler, David; Miller, Webb; Sidow, Arend; Trinklein, Nathan D; Zhang, Zhengdong D; Barrera, Leah; Stuart, Rhona; King, David C; Ameur, Adam; Enroth, Stefan; Bieda, Mark C; Kim, Jonghwan; Bhinge, Akshay A; Jiang, Nan; Liu, Jun; Yao, Fei; Vega, Vinsensius B; Lee, Charlie W H; Ng, Patrick; Shahab, Atif; Yang, Annie; Moqtaderi, Zarmik; Zhu, Zhou; Xu, Xiaoqin; Squazzo, Sharon; Oberley, Matthew J; Inman, David; Singer, Michael A; Richmond, Todd A; Munn, Kyle J; Rada-Iglesias, Alvaro; Wallerman, Ola; Komorowski, Jan; Fowler, Joanna C; Couttet, Phillippe; Bruce, Alexander W; Dovey, Oliver M; Ellis, Peter D; Langford, Cordelia F; Nix, David A; Euskirchen, Ghia; Hartman, Stephen; Urban, Alexander E; Kraus, Peter; Van Calcar, Sara; Heintzman, Nate; Kim, Tae Hoon; Wang, Kun; Qu, Chunxu; Hon, Gary; Luna, Rosa; Glass, Christopher K; Rosenfeld, M Geoff; Aldred, Shelley Force; Cooper, Sara J; Halees, Anason; Lin, Jane M; Shulha, Hennady P; Zhang, Xiaoling; Xu, Mousheng; Haidar, Jaafar N S; Yu, Yong; Ruan, Yijun; Iyer, Vishwanath R; Green, Roland D; Wadelius, Claes; Farnham, Peggy J; Ren, Bing; Harte, Rachel A; Hinrichs, Angie S; Trumbower, Heather; Clawson, Hiram; Hillman-Jackson, Jennifer; Zweig, Ann S; Smith, Kayla; Thakkapallayil, Archana; Barber, Galt; Kuhn, Robert M; Karolchik, Donna; Armengol, Lluis; Bird, Christine P; de Bakker, Paul I W; Kern, Andrew D; Lopez-Bigas, Nuria; Martin, Joel D; Stranger, Barbara E; Woodroffe, Abigail; Davydov, Eugene; Dimas, Antigone; Eyras, Eduardo; Hallgrímsdóttir, Ingileif B; Huppert, Julian; Zody, Michael C; Abecasis, Gonçalo R; Estivill, Xavier; Bouffard, Gerard G; Guan, Xiaobin; Hansen, Nancy F; Idol, Jacquelyn R; Maduro, Valerie V B; Maskeri, Baishali; McDowell, Jennifer C; Park, Morgan; Thomas, Pamela J; Young, Alice C; Blakesley, Robert W; Muzny, Donna M; Sodergren, Erica; Wheeler, David A; Worley, Kim C; Jiang, Huaiyang; Weinstock, George M; Gibbs, Richard A; Graves, Tina; Fulton, Robert; Mardis, Elaine R; Wilson, Richard K; Clamp, Michele; Cuff, James; Gnerre, Sante; Jaffe, David B; Chang, Jean L; Lindblad-Toh, Kerstin; Lander, Eric S; Koriabine, Maxim; Nefedov, Mikhail; Osoegawa, Kazutoyo; Yoshinaga, Yuko; Zhu, Baoli; de Jong, Pieter J

2007-06-14

We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.
Mycobacterial species as case-study of comparative genome analysis.

PubMed

Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

2011-02-08

The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.
Pseudoscorpion mitochondria show rearranged genes and genome-wide reductions of RNA gene sizes and inferred structures, yet typical nucleotide composition bias

PubMed Central

2012-01-01

Background Pseudoscorpions are chelicerates and have historically been viewed as being most closely related to solifuges, harvestmen, and scorpions. No mitochondrial genomes of pseudoscorpions have been published, but the mitochondrial genomes of some lineages of Chelicerata possess unusual features, including short rRNA genes and tRNA genes that lack sequence to encode arms of the canonical cloverleaf-shaped tRNA. Additionally, some chelicerates possess an atypical guanine-thymine nucleotide bias on the major coding strand of their mitochondrial genomes. Results We sequenced the mitochondrial genomes of two divergent taxa from the chelicerate order Pseudoscorpiones. We find that these genomes possess unusually short tRNA genes that do not encode cloverleaf-shaped tRNA structures. Indeed, in one genome, all 22 tRNA genes lack sequence to encode canonical cloverleaf structures. We also find that the large ribosomal RNA genes are substantially shorter than those of most arthropods. We inferred secondary structures of the LSU rRNAs from both pseudoscorpions, and find that they have lost multiple helices. Based on comparisons with the crystal structure of the bacterial ribosome, two of these helices were likely contact points with tRNA T-arms or D-arms as they pass through the ribosome during protein synthesis. The mitochondrial gene arrangements of both pseudoscorpions differ from the ancestral chelicerate gene arrangement. One genome is rearranged with respect to the location of protein-coding genes, the small rRNA gene, and at least 8 tRNA genes. The other genome contains 6 tRNA genes in novel locations. Most chelicerates with rearranged mitochondrial genes show a genome-wide reversal of the CA nucleotide bias typical for arthropods on their major coding strand, and instead possess a GT bias. Yet despite their extensive rearrangement, these pseudoscorpion mitochondrial genomes possess a CA bias on the major coding strand. Phylogenetic analyses of all 13 mitochondrial protein-coding gene sequences consistently yield trees that place pseudoscorpions as sister to acariform mites. Conclusion The well-supported phylogenetic placement of pseudoscorpions as sister to Acariformes differs from some previous analyses based on morphology. However, these two lineages share multiple molecular evolutionary traits, including substantial mitochondrial genome rearrangements, extensive nucleotide substitution, and loss of helices in their inferred tRNA and rRNA structures. PMID:22409411
Strategies and tools for whole genome alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Couronne, Olivier; Poliakov, Alexander; Bray, Nicolas

2002-11-25

The availability of the assembled mouse genome makespossible, for the first time, an alignment and comparison of two largevertebrate genomes. We have investigated different strategies ofalignment for the subsequent analysis of conservation of genomes that areeffective for different quality assemblies. These strategies were appliedto the comparison of the working draft of the human genome with the MouseGenome Sequencing Consortium assembly, as well as other intermediatemouse assemblies. Our methods are fast and the resulting alignmentsexhibit a high degree of sensitivity, covering more than 90 percent ofknown coding exons in the human genome. We have obtained such coveragewhile preserving specificity. With amore » view towards the end user, we havedeveloped a suite of tools and websites for automatically aligning, andsubsequently browsing and working with whole genome comparisons. Wedescribe the use of these tools to identify conserved non-coding regionsbetween the human and mouse genomes, some of which have not beenidentified by other methods.« less

From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data

NASA Astrophysics Data System (ADS)

Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.

2007-11-01

Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.
Geographic isolates of Lymantria dispar multiple nucleopolyhedrovirus: Genome sequence analysis and pathogenicity against European and Asian gypsy moth strains

USDA-ARS?s Scientific Manuscript database

Geographic isolates of Lymantria dispar multiple nucleopolyhedrovirus: Genome sequence analysis and pathogenicity against European and Asian gypsy moth strains. To evaluate the genetic diversity of Lymantria dispar nucleopolyhedrovirus (LdMNPV) at the genomic level, the genomes of three isolates of...
A new polymorphic and multicopy MHC gene family related to nonmammalian class I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leelayuwat, C.; Degli-Esposti, M.A.; Abraham, L.J.

1994-12-31

The authors have used genomic analysis to characterize a region of the central major histocompatibility complex (MHC) spanning {approximately} 300 kilobases (kb) between TNF and HLA-B. This region has been suggested to carry genetic factors relevant to the development of autoimmune diseases such as myasthenia gravis (MG) and insulin dependent diabetes mellitus (IDDM). Genomic sequence was analyzed for coding potential, using two neural network programs, GRAIL and GeneParser. A genomic probe, JAB, containing putative coding sequences (PERB11) located 60 kb centromeric of HLA-B, was used for northern analysis of human tissues. Multiple transcripts were detected. Southern analysis of genomic DNAmore » and overlapping YAC clones, covering the region from BAT1 to HLA-F, indicated that there are at least five copies of PERB11, four of which are located within this region of the MHC. The partial cDNA sequence of PERB11 was obtained from poly-A RNA derived from skeletal muscle. The putative amino acid sequence of PERB11 shares {approximately} 30% identity to MHC class I molecules from various species, including reptiles, chickens, and frogs, as well as to other MHC class I-like molecules, such as the IgG FcR of the mouse and rat and the human Zn-{alpha}2-glycoprotein. From direct comparison of amino acid sequences, it is concluded that PERB11 is a distinct molecule more closely related to nonmammalian than known mammalian MHC class I molecules. Genomic sequence analysis of PERB11 from five MHC ancestral haplotypes (AH) indicated that the gene is polymorphic at both DNA and protein level. The results suggest that the authors have identified a novel polymorphic gene family with multiple copies within the MHC. 48 refs., 10 figs., 2 tabs.« less
Simulated space radiation-induced mutants in the mouse kidney display widespread genomic change

PubMed Central

Grygoryev, Dmytro; Lasarev, Michael; Ohlrich, Anna; Rwatambuga, Furaha A.; Johnson, Sorrel; Dan, Cristian; Eckelmann, Bradley; Hryciw, Gwen; Mao, Jian-Hua; Snijders, Antoine M.; Gauny, Stacey; Kronenberg, Amy

2017-01-01

Exposure to a small number of high-energy heavy charged particles (HZE ions), as found in the deep space environment, could significantly affect astronaut health following prolonged periods of space travel if these ions induce mutations and related cancers. In this study, we used an in vivo mutagenesis assay to define the mutagenic effects of accelerated 56Fe ions (1 GeV/amu, 151 keV/μm) in the mouse kidney epithelium exposed to doses ranging from 0.25 to 2.0 Gy. These doses represent fluences ranging from 1 to 8 particle traversals per cell nucleus. The Aprt locus, located on chromosome 8, was used to select induced and spontaneous mutants. To fully define the mutagenic effects, we used multiple endpoints including mutant frequencies, mutation spectrum for chromosome 8, translocations involving chromosome 8, and mutations affecting non-selected chromosomes. The results demonstrate mutagenic effects that often affect multiple chromosomes for all Fe ion doses tested. For comparison with the most abundant sparsely ionizing particle found in space, we also examined the mutagenic effects of high-energy protons (1 GeV, 0.24 keV/μm) at 0.5 and 1.0 Gy. Similar doses of protons were not as mutagenic as Fe ions for many assays, though genomic effects were detected in Aprt mutants at these doses. Considered as a whole, the data demonstrate that Fe ions are highly mutagenic at the low doses and fluences of relevance to human spaceflight, and that cells with considerable genomic mutations are readily induced by these exposures and persist in the kidney epithelium. The level of genomic change produced by low fluence exposure to heavy ions is reminiscent of the extensive rearrangements seen in tumor genomes suggesting a potential initiation step in radiation carcinogenesis. PMID:28683078
Simulated space radiation-induced mutants in the mouse kidney display widespread genomic change.

PubMed

Turker, Mitchell S; Grygoryev, Dmytro; Lasarev, Michael; Ohlrich, Anna; Rwatambuga, Furaha A; Johnson, Sorrel; Dan, Cristian; Eckelmann, Bradley; Hryciw, Gwen; Mao, Jian-Hua; Snijders, Antoine M; Gauny, Stacey; Kronenberg, Amy

2017-01-01

Exposure to a small number of high-energy heavy charged particles (HZE ions), as found in the deep space environment, could significantly affect astronaut health following prolonged periods of space travel if these ions induce mutations and related cancers. In this study, we used an in vivo mutagenesis assay to define the mutagenic effects of accelerated 56Fe ions (1 GeV/amu, 151 keV/μm) in the mouse kidney epithelium exposed to doses ranging from 0.25 to 2.0 Gy. These doses represent fluences ranging from 1 to 8 particle traversals per cell nucleus. The Aprt locus, located on chromosome 8, was used to select induced and spontaneous mutants. To fully define the mutagenic effects, we used multiple endpoints including mutant frequencies, mutation spectrum for chromosome 8, translocations involving chromosome 8, and mutations affecting non-selected chromosomes. The results demonstrate mutagenic effects that often affect multiple chromosomes for all Fe ion doses tested. For comparison with the most abundant sparsely ionizing particle found in space, we also examined the mutagenic effects of high-energy protons (1 GeV, 0.24 keV/μm) at 0.5 and 1.0 Gy. Similar doses of protons were not as mutagenic as Fe ions for many assays, though genomic effects were detected in Aprt mutants at these doses. Considered as a whole, the data demonstrate that Fe ions are highly mutagenic at the low doses and fluences of relevance to human spaceflight, and that cells with considerable genomic mutations are readily induced by these exposures and persist in the kidney epithelium. The level of genomic change produced by low fluence exposure to heavy ions is reminiscent of the extensive rearrangements seen in tumor genomes suggesting a potential initiation step in radiation carcinogenesis.
eXframe: reusable framework for storage, analysis and visualization of genomics experiments

PubMed Central

2011-01-01

Background Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types. Results We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1) the ability to publish structured data compliant with accepted standards 2) support for multiple data types including microarrays and next generation sequencing 3) query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples) and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients. Conclusion The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own useful modifications. PMID:22103807
Analysis of the genome-wide variations among multiple strains of the plant pathogenic bacterium Xylella fastidiosa

PubMed Central

Doddapaneni, Harshavardhan; Yao, Jiqiang; Lin, Hong; Walker, M Andrew; Civerolo, Edwin L

2006-01-01

Background The Gram-negative, xylem-limited phytopathogenic bacterium Xylella fastidiosa is responsible for causing economically important diseases in grapevine, citrus and many other plant species. Despite its economic impact, relatively little is known about the genomic variations among strains isolated from different hosts and their influence on the population genetics of this pathogen. With the availability of genome sequence information for four strains, it is now possible to perform genome-wide analyses to identify and categorize such DNA variations and to understand their influence on strain functional divergence. Results There are 1,579 genes and 194 non-coding homologous sequences present in the genomes of all four strains, representing a 76. 2% conservation of the sequenced genome. About 60% of the X. fastidiosa unique sequences exist as tandem gene clusters of 6 or more genes. Multiple alignments identified 12,754 SNPs and 14,449 INDELs in the 1528 common genes and 20,779 SNPs and 10,075 INDELs in the 194 non-coding sequences. The average SNP frequency was 1.08 × 10-2 per base pair of DNA and the average INDEL frequency was 2.06 × 10-2 per base pair of DNA. On an average, 60.33% of the SNPs were synonymous type while 39.67% were non-synonymous type. The mutation frequency, primarily in the form of external INDELs was the main type of sequence variation. The relative similarity between the strains was discussed according to the INDEL and SNP differences. The number of genes unique to each strain were 60 (9a5c), 54 (Dixon), 83 (Ann1) and 9 (Temecula-1). A sub-set of the strain specific genes showed significant differences in terms of their codon usage and GC composition from the native genes suggesting their xenologous origin. Tandem repeat analysis of the genomic sequences of the four strains identified associations of repeat sequences with hypothetical and phage related functions. Conclusion INDELs and strain specific genes have been identified as the main source of variations among strains, with individual strains showing different rates of genome evolution. Based on these genome comparisons, it appears that the Pierce's disease strain Temecula-1 genome represents the ancestral genome of the X. fastidiosa. Results of this analysis are publicly available in the form of a web database. PMID:16948851
Genome Sequencing and Comparative Analysis of Stenotrophomonas acidaminiphila Reveal Evolutionary Insights Into Sulfamethoxazole Resistance.

PubMed

Huang, Yao-Ting; Chen, Jia-Min; Ho, Bing-Ching; Wu, Zong-Yen; Kuo, Rita C; Liu, Po-Yu

2018-01-01

Stenotrophomonas acidaminiphila is an aerobic, glucose non-fermentative, Gram-negative bacterium that been isolated from various environmental sources, particularly aquatic ecosystems. Although resistance to multiple antimicrobial agents has been reported in S. acidaminiphila , the mechanisms are largely unknown. Here, for the first time, we report the complete genome and antimicrobial resistome analysis of a clinical isolate S. acidaminiphila SUNEO which is resistant to sulfamethoxazole. Comparative analysis among closely related strains identified common and strain-specific genes. In particular, comparison with a sulfamethoxazole-sensitive strain identified a mutation within the sulfonamide-binding site of folP in SUNEO, which may reduce the binding affinity of sulfamethoxazole. Selection pressure analysis indicated folP in SUNEO is under purifying selection, which may be owing to long-term administration of sulfonamide against Stenotrophomonas .
Three copies of a single protein II-encoding sequence in the genome of Neisseria gonorrhoeae JS3: evidence for gene conversion and gene duplication.

PubMed

van der Ley, P

1988-11-01

Gonococci express a family of related outer membrane proteins designated protein II (P.II). These surface proteins are subject to both phase variation and antigenic variation. The P.II gene repertoire of Neisseria gonorrhoeae strain JS3 was found to consist of at least ten genes, eight of which were cloned. Sequence analysis and DNA hybridization studies revealed that one particular P.II-encoding sequence is present in three distinct, but almost identical, copies in the JS3 genome. These genes encode the P.II protein that was previously identified as P.IIc. Comparison of their sequences shows that the multiple copies of this P.IIc-encoding gene might have been generated by both gene conversion and gene duplication.
Comparative Genomics of an Unusual Biogeographic Disjunction in the Cotton Tribe (Gossypieae) Yields Insights into Genome Downsizing

PubMed Central

Arick, Mark A; Conover, Justin L; Thrash, Adam; Sanders, William S; Hsu, Chuan-Yu; Naqvi, Rubab Zahra; Farooq, Muhammad; Li, Xiaochong; Gong, Lei; Mudge, Joann; Ramaraj, Thiruvarangan; Udall, Joshua A; Peterson, Daniel G

2017-01-01

Abstract Long-distance insular dispersal is associated with divergence and speciation because of founder effects and strong genetic drift. The cotton tribe (Gossypieae) has experienced multiple transoceanic dispersals, generating an aggregate geographic range that encompasses much of the tropics and subtropics worldwide. Two genera in the Gossypieae, Kokia and Gossypioides, exhibit a remarkable geographic disjunction, being restricted to the Hawaiian Islands and Madagascar/East Africa, respectively. We assembled and use de novo genome sequences to address questions regarding the divergence of these two genera from each other and from their sister-group, Gossypium. In addition, we explore processes underlying the genome downsizing that characterizes Kokia and Gossypioides relative to other genera in the tribe. Using 13,000 gene orthologs and synonymous substitution rates, we show that the two disjuncts last shared a common ancestor ∼5 Ma, or half as long ago as their divergence from Gossypium. We report relative stasis in the transposable element fraction. In comparison to Gossypium, there is loss of ∼30% of the gene content in the two disjunct genera and a history of genome-wide accumulation of deletions. In both genera, there is a genome-wide bias toward deletions over insertions, and the number of gene losses exceeds the number of gains by ∼2- to 4-fold. The genomic analyses presented here elucidate genomic consequences of the demographic and biogeographic history of these closest relatives of Gossypium, and enhance their value as phylogenetic outgroups. PMID:29194487
Comparison of Penalty Functions for Sparse Canonical Correlation Analysis

PubMed Central

Chalise, Prabhakar; Fridley, Brooke L.

2011-01-01

Canonical correlation analysis (CCA) is a widely used multivariate method for assessing the association between two sets of variables. However, when the number of variables far exceeds the number of subjects, such in the case of large-scale genomic studies, the traditional CCA method is not appropriate. In addition, when the variables are highly correlated the sample covariance matrices become unstable or undefined. To overcome these two issues, sparse canonical correlation analysis (SCCA) for multiple data sets has been proposed using a Lasso type of penalty. However, these methods do not have direct control over sparsity of solution. An additional step that uses Bayesian Information Criterion (BIC) has also been suggested to further filter out unimportant features. In this paper, a comparison of four penalty functions (Lasso, Elastic-net, SCAD and Hard-threshold) for SCCA with and without the BIC filtering step have been carried out using both real and simulated genotypic and mRNA expression data. This study indicates that the SCAD penalty with BIC filter would be a preferable penalty function for application of SCCA to genomic data. PMID:21984855
Genomic changes in an attenuated genotype I Japanese encephalitis virus and comparison with virulent parental strain.

PubMed

Zhou, Yuyong; Wu, Rui; Feng, Yao; Zhao, Qin; Wen, Xintian; Huang, Xiaobo; Wen, Yiping; Yan, Qigui; Huang, Yong; Ma, Xiaoping; Han, Xinfeng; Cao, Sanjie

2018-06-01

Genotype I Japanese encephalitis virus (JEV) strain SCYA201201 was previously isolated from brain tissues of aborted piglets. In this study, we obtained an attenuated SCYA201201-0901 strain by serial passage of strain SCYA201201-1 in Syrian baby hamster kidney cells, combined with multiple plaque purifications and selection for virulence in mice. We investigated the genetic changes associated with attenuation by comparing the entire genomes of SCYA201201-0901 and SCYA201201-1. Sequence comparisons identified 14 common amino acid substitutions in the coding region, with two nucleotide point mutations in the 5'-untranslated region (UTR) and another three in the 3'-UTR, which differed between the attenuated and virulent strains. In addition, a total of 13 silent nucleotide mutations were found after attenuation. These substitutions, alone or in combination, may be responsible for the attenuated phenotype of the SCYA201201-0901 strain in mice. This information will contribute to our understanding of attenuation and of the molecular basis of virulence in genotype I strains such as SCYA201201-0901, as well as aiding the development of safer JEV vaccines.
The relationship between the human genome and microbiome comes into view

PubMed Central

Goodrich, Julia K.; Davenport, Emily R.; Clark, Andrew G.; Ley, Ruth E.

2017-01-01

The microbiome’s involvement in health and disease, and the complexity of its composition and function, make it intriguing to consider human genetic factors that impact microbiome composition. Genes may influence health through their ability to promote a stable microbial community in the gut. Studies of heritability yield a consistent subset of microbes that are impacted by genes, but the use of genome-wide association studies (GWAS) to identify specific genetic variants associated with microbiota phenotypes has proven challenging. Processing microbiome datasets into traits to be modeled and reducing the burden of multiple testing are just some of the technical hurdles in microbiome GWAS. Studies to date are small by GWAS standards, making cross-study comparisons and validations particularly important in identifying authentic signals. Cross-study comparisons are hampered by differences in analytical approaches. Nevertheless, some consistent associations have emerged between populations, most notably between Bifidobacteria and the lactase non-persister genotype. These early successes open the way for the microbiome to be incorporated into studies that quantify interactions among genotype, environment, and the microbiome for predicting disease susceptibility. PMID:28934590
The Oxytricha trifallax Macronuclear Genome: A Complex Eukaryotic Genome with 16,000 Tiny Chromosomes

PubMed Central

Swart, Estienne C.; Bracht, John R.; Magrini, Vincent; Minx, Patrick; Chen, Xiao; Zhou, Yi; Khurana, Jaspreet S.; Goldman, Aaron D.; Nowacki, Mariusz; Schotanus, Klaas; Jung, Seolkyoung; Fulton, Robert S.; Ly, Amy; McGrath, Sean; Haub, Kevin; Wiggins, Jessica L.; Storton, Donna; Matese, John C.; Parsons, Lance; Chang, Wei-Jen; Bowen, Michael S.; Stover, Nicholas A.; Jones, Thomas A.; Eddy, Sean R.; Herrick, Glenn A.; Doak, Thomas G.; Wilson, Richard K.; Mardis, Elaine R.; Landweber, Laura F.

2013-01-01

The macronuclear genome of the ciliate Oxytricha trifallax displays an extreme and unique eukaryotic genome architecture with extensive genomic variation. During sexual genome development, the expressed, somatic macronuclear genome is whittled down to the genic portion of a small fraction (∼5%) of its precursor “silent” germline micronuclear genome by a process of “unscrambling” and fragmentation. The tiny macronuclear “nanochromosomes” typically encode single, protein-coding genes (a small portion, 10%, encode 2–8 genes), have minimal noncoding regions, and are differentially amplified to an average of ∼2,000 copies. We report the high-quality genome assembly of ∼16,000 complete nanochromosomes (∼50 Mb haploid genome size) that vary from 469 bp to 66 kb long (mean ∼3.2 kb) and encode ∼18,500 genes. Alternative DNA fragmentation processes ∼10% of the nanochromosomes into multiple isoforms that usually encode complete genes. Nucleotide diversity in the macronucleus is very high (SNP heterozygosity is ∼4.0%), suggesting that Oxytricha trifallax may have one of the largest known effective population sizes of eukaryotes. Comparison to other ciliates with nonscrambled genomes and long macronuclear chromosomes (on the order of 100 kb) suggests several candidate proteins that could be involved in genome rearrangement, including domesticated MULE and IS1595-like DDE transposases. The assembly of the highly fragmented Oxytricha macronuclear genome is the first completed genome with such an unusual architecture. This genome sequence provides tantalizing glimpses into novel molecular biology and evolution. For example, Oxytricha maintains tens of millions of telomeres per cell and has also evolved an intriguing expansion of telomere end-binding proteins. In conjunction with the micronuclear genome in progress, the O. trifallax macronuclear genome will provide an invaluable resource for investigating programmed genome rearrangements, complementing studies of rearrangements arising during evolution and disease. PMID:23382650
Genome Comparisons Reveal a Dominant Mechanism of Chromosome Number Reduction in Grasses and Accelerated Genome Evolution in Triticeae

USDA-ARS?s Scientific Manuscript database

Single nucleotide polymorphism was employed in the construction of a high-resolution, expressed sequence tag (EST) map of Aegilops tauschii, the diploid source of the wheat D genome. Comparison of the map with the rice and sorghum genome sequences revealed 50 inversions and translocations; 2, 8, and...
A Gene-Oriented Haplotype Comparison Reveals Recently Selected Genomic Regions in Temperate and Tropical Maize Germplasm

PubMed Central

Zhang, Jie; Li, Yongxiang; Zheng, Jun; Zhang, Hongwei; Yang, Xiaohong; Wang, Jianhua; Wang, Guoying

2017-01-01

The extensive genetic variation present in maize (Zea mays) germplasm makes it possible to detect signatures of positive artificial selection that occurred during temperate and tropical maize improvement. Here we report an analysis of 532,815 polymorphisms from a maize association panel consisting of 368 diverse temperate and tropical inbred lines. We developed a gene-oriented approach adapting exonic polymorphisms to identify recently selected alleles by comparing haplotypes across the maize genome. This analysis revealed evidence of selection for more than 1100 genomic regions during recent improvement, and included regulatory genes and key genes with visible mutant phenotypes. We find that selected candidate target genes in temperate maize are enriched in biosynthetic processes, and further examination of these candidates highlights two cases, sucrose flux and oil storage, in which multiple genes in a common pathway can be cooperatively selected. Finally, based on available parallel gene expression data, we hypothesize that some genes were selected for regulatory variations, resulting in altered gene expression. PMID:28099470
The 3D Structure of the Immunoglobulin Heavy-Chain Locus: Implications for Long-Range Genomic Interactions

PubMed Central

Jhunjhunwala, Suchit; van Zelm, Menno C.; Peak, Mandy M.; Cutchin, Steve; Riblet, Roy; van Dongen, Jacques J.M.; Grosveld, Frank G.; Knoch, Tobias A.; Murre, Cornelis

2009-01-01

SUMMARY The immunoglobulin heavy-chain (Igh) locus is organized into distinct regions that contain multiple variable (VH), diversity (DH), joining (JH) and constant (CH) coding elements. How the Igh locus is structured in 3D space is unknown. To probe the topography of the Igh locus, spatial distance distributions were determined between 12 genomic markers that span the entire Igh locus. Comparison of the distance distributions to computer simulations of alternative chromatin arrangements predicted that the Igh locus is organized into compartments containing clusters of loops separated by linkers. Trilateration and triple-point angle measurements indicated the mean relative 3D positions of the VH, DH, JH, and CH elements, showed compartmentalization and striking conformational changes involving VH and DH-JH elements during early B cell development. In pro-B cells, the entire repertoire of VH regions (2 Mbp) appeared to have merged and juxtaposed to the DH elements, mechanistically permitting long-range genomic interactions to occur with relatively high frequency. PMID:18423198
Genomic-based multiple-trait evaluation in Eucalyptus grandis using dominant DArT markers.

PubMed

Cappa, Eduardo P; El-Kassaby, Yousry A; Muñoz, Facundo; Garcia, Martín N; Villalba, Pamela V; Klápště, Jaroslav; Marcucci Poltri, Susana N

2018-06-01

We investigated the impact of combining the pedigree- and genomic-based relationship matrices in a multiple-trait individual-tree mixed model (a.k.a., multiple-trait combined approach) on the estimates of heritability and on the genomic correlations between growth and stem straightness in an open-pollinated Eucalyptus grandis population. Additionally, the added advantage of incorporating genomic information on the theoretical accuracies of parents and offspring breeding values was evaluated. Our results suggested that the use of the combined approach for estimating heritabilities and additive genetic correlations in multiple-trait evaluations is advantageous and including genomic information increases the expected accuracy of breeding values. Furthermore, the multiple-trait combined approach was proven to be superior to the single-trait combined approach in predicting breeding values, in particular for low-heritability traits. Finally, our results advocate the use of the combined approach in forest tree progeny testing trials, specifically when a multiple-trait individual-tree mixed model is considered. Copyright © 2018 Elsevier B.V. All rights reserved.
DNA methylome signature in rheumatoid arthritis.

PubMed

Nakano, Kazuhisa; Whitaker, John W; Boyle, David L; Wang, Wei; Firestein, Gary S

2013-01-01

Epigenetics can influence disease susceptibility and severity. While DNA methylation of individual genes has been explored in autoimmunity, no unbiased systematic analyses have been reported. Therefore, a genome-wide evaluation of DNA methylation loci in fibroblast-like synoviocytes (FLS) isolated from the site of disease in rheumatoid arthritis (RA) was performed. Genomic DNA was isolated from six RA and five osteoarthritis (OA) FLS lines and evaluated using the Illumina HumanMethylation450 chip. Cluster analysis of data was performed and corrected using Benjamini-Hochberg adjustment for multiple comparisons. Methylation was confirmed by pyrosequencing and gene expression was determined by qPCR. Pathway analysis was performed using the Kyoto Encyclopedia of Genes and Genomes. RA and control FLS segregated based on DNA methylation, with 1859 differentially methylated loci. Hypomethylated loci were identified in key genes relevant to RA, such as CHI3L1, CASP1, STAT3, MAP3K5, MEFV and WISP3. Hypermethylation was also observed, including TGFBR2 and FOXO1. Hypomethylation of individual genes was associated with increased gene expression. Grouped analysis identified 207 hypermethylated or hypomethylated genes with multiple differentially methylated loci, including COL1A1, MEFV and TNF. Hypomethylation was increased in multiple pathways related to cell migration, including focal adhesion, cell adhesion, transendothelial migration and extracellular matrix interactions. Confirmatory studies with OA and normal FLS also demonstrated segregation of RA from control FLS based on methylation pattern. Differentially methylated genes could alter FLS gene expression and contribute to the pathogenesis of RA. DNA methylation of critical genes suggests that RA FLS are imprinted and implicate epigenetic contributions to inflammatory arthritis.
Construction of a nurse shark (Ginglymostoma cirratum) bacterial artificial chromosome (BAC) library and a preliminary genome survey.

PubMed

Luo, Meizhong; Kim, Hyeran; Kudrna, Dave; Sisneros, Nicholas B; Lee, So-Jeong; Mueller, Christopher; Collura, Kristi; Zuccolo, Andrea; Buckingham, E Bryan; Grim, Suzanne M; Yanagiya, Kazuyo; Inoko, Hidetoshi; Shiina, Takashi; Flajnik, Martin F; Wing, Rod A; Ohta, Yuko

2006-05-03

Sharks are members of the taxonomic class Chondrichthyes, the oldest living jawed vertebrates. Genomic studies of this group, in comparison to representative species in other vertebrate taxa, will allow us to theorize about the fundamental genetic, developmental, and functional characteristics in the common ancestor of all jawed vertebrates. In order to obtain mapping and sequencing data for comparative genomics, we constructed a bacterial artificial chromosome (BAC) library for the nurse shark, Ginglymostoma cirratum. The BAC library consists of 313,344 clones with an average insert size of 144 kb, covering ~4.5 x 1010 bp and thus providing an 11-fold coverage of the haploid genome. BAC end sequence analyses revealed, in addition to LINEs and SINEs commonly found in other animal and plant genomes, two new groups of nurse shark-specific repetitive elements, NSRE1 and NSRE2 that seem to be major components of the nurse shark genome. Screening the library with single-copy or multi-copy gene probes showed 6-28 primary positive clones per probe of which 50-90% were true positives, demonstrating that the BAC library is representative of the different regions of the nurse shark genome. Furthermore, some BAC clones contained multiple genes, making physical mapping feasible. We have constructed a deep-coverage, high-quality, large insert, and publicly available BAC library for a cartilaginous fish. It will be very useful to the scientific community interested in shark genomic structure, comparative genomics, and functional studies. We found two new groups of repetitive elements specific to the nurse shark genome, which may contribute to the architecture and evolution of the nurse shark genome.

Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study.

PubMed

Lane, William J; Westhoff, Connie M; Gleadall, Nicholas S; Aguad, Maria; Smeland-Wagman, Robin; Vege, Sunitha; Simmons, Daimon P; Mah, Helen H; Lebo, Matthew S; Walter, Klaudia; Soranzo, Nicole; Di Angelantonio, Emanuele; Danesh, John; Roberts, David J; Watkins, Nick A; Ouwehand, Willem H; Butterworth, Adam S; Kaufman, Richard M; Rehm, Heidi L; Silberstein, Leslie E; Green, Robert C

2018-06-01

There are more than 300 known red blood cell (RBC) antigens and 33 platelet antigens that differ between individuals. Sensitisation to antigens is a serious complication that can occur in prenatal medicine and after blood transfusion, particularly for patients who require multiple transfusions. Although pre-transfusion compatibility testing largely relies on serological methods, reagents are not available for many antigens. Methods based on single-nucleotide polymorphism (SNP) arrays have been used, but typing for ABO and Rh-the most important blood groups-cannot be done with SNP typing alone. We aimed to develop a novel method based on whole-genome sequencing to identify RBC and platelet antigens. This whole-genome sequencing study is a subanalysis of data from patients in the whole-genome sequencing arm of the MedSeq Project randomised controlled trial (NCT01736566) with no measured patient outcomes. We created a database of molecular changes in RBC and platelet antigens and developed an automated antigen-typing algorithm based on whole-genome sequencing (bloodTyper). This algorithm was iteratively improved to address cis-trans haplotype ambiguities and homologous gene alignments. Whole-genome sequencing data from 110 MedSeq participants (30 × depth) were used to initially validate bloodTyper through comparison with conventional serology and SNP methods for typing of 38 RBC antigens in 12 blood-group systems and 22 human platelet antigens. bloodTyper was further validated with whole-genome sequencing data from 200 INTERVAL trial participants (15 × depth) with serological comparisons. We iteratively improved bloodTyper by comparing its typing results with conventional serological and SNP typing in three rounds of testing. The initial whole-genome sequencing typing algorithm was 99·5% concordant across the first 20 MedSeq genomes. Addressing discordances led to development of an improved algorithm that was 99·8% concordant for the remaining 90 MedSeq genomes. Additional modifications led to the final algorithm, which was 99·2% concordant across 200 INTERVAL genomes (or 99·9% after adjustment for the lower depth of coverage). By enabling more precise antigen-matching of patients with blood donors, antigen typing based on whole-genome sequencing provides a novel approach to improve transfusion outcomes with the potential to transform the practice of transfusion medicine. National Human Genome Research Institute, Doris Duke Charitable Foundation, National Health Service Blood and Transplant, National Institute for Health Research, and Wellcome Trust. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Phylogenomic Assessment of Ancient Polyploidy and Genome Evolution across the Poales

PubMed Central

McKain, Michael R.; Tang, Haibao; McNeal, Joel R.; Ayyampalayam, Saravanaraj; Davis, Jerrold I.; dePamphilis, Claude W.; Givnish, Thomas J.; Pires, J. Chris; Stevenson, Dennis Wm.; Leebens-Mack, James H.

2016-01-01

Comparisons of flowering plant genomes reveal multiple rounds of ancient polyploidy characterized by large intragenomic syntenic blocks. Three such whole-genome duplication (WGD) events, designated as rho (ρ), sigma (σ), and tau (τ), have been identified in the genomes of cereal grasses. Precise dating of these WGD events is necessary to investigate how they have influenced diversification rates, evolutionary innovations, and genomic characteristics such as the GC profile of protein-coding sequences. The timing of these events has remained uncertain due to the paucity of monocot genome sequence data outside the grass family (Poaceae). Phylogenomic analysis of protein-coding genes from sequenced genomes and transcriptome assemblies from 35 species, including representatives of all families within the Poales, has resolved the timing of rho and sigma relative to speciation events and placed tau prior to divergence of Asparagales and the commelinids but after divergence with eudicots. Examination of gene family phylogenies indicates that rho occurred just prior to the diversification of Poaceae and sigma occurred before early diversification of Poales lineages but after the Poales-commelinid split. Additional lineage-specific WGD events were identified on the basis of the transcriptome data. Gene families exhibiting high GC content are underrepresented among those with duplicate genes that persisted following these genome duplications. However, genome duplications had little overall influence on lineage-specific changes in the GC content of coding genes. Improved resolution of the timing of WGD events in monocot history provides evidence for the influence of polyploidization on functional evolution and species diversification. PMID:26988252
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India.

PubMed

Bondre, Vijay P; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N

2016-11-01

Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.
Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

PubMed Central

Bondre, Vijay P.; Sankararaman, Vasudha; Andhare, Vijaysinh; Tupekar, Manisha; Sapkal, Gajanan N.

2016-01-01

Background & objectives: Human herpes simplex virus 1 (HSV-1) is the most common cause of sporadic encephalitis in humans that contributes to >10 per cent of the encephalitis cases occurring worldwide. Availability of limited full genome sequences from a small number of isolates resulted in poor understanding of host and viral factors responsible for variable clinical outcome. In this study genetic relationship, extent and source of recombination using full-length genome sequence derived from a newly isolated HSV-1 isolate was studied in comparison with those sampled from patients with varied clinical outcome. Methods: Full genome sequence of HSV-1 isolated from cerebrospinal fluid (CSF) of a patient with acute encephalitis syndrome (AES) by inoculation in baby hamster kidney-21 (BHK-21) cells was determined using next-generation sequencing (NGS) technology. Phylogenetic analysis of the newly generated sequence in comparison with 33 additional full-length genomes defined genetic relationship with worldwide distributed strains. The bootscan and similarity plot analysis defined recombination crossovers and similarities between newly isolated Indian HSV-1 with six Asian and a total of 34 worldwide isolated strains. Results: Mapping of 376,332 reads amplified from HSV-1 DNA by NGS generated full-length genome of 151,024 bp from newly isolated Indian HSV-1. Phylogenetic analysis classified worldwide distributed strains into three major evolutionary lineages correlating to their geographic distribution. Lineage 1 containing strains were isolated from America and Europe; lineage 2 contained all the strains from Asian countries along with the North American KOS and RE strains whereas the South African isolates were distributed into two groups under lineage 3. Recombination analysis confirmed events of recombination in Indian HSV-1 genome resulting from mixing of different strains evolved in Asian countries. Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections. PMID:28361829
Comparison and quantitative verification of mapping algorithms for whole genome bisulfite sequencing

USDA-ARS?s Scientific Manuscript database

Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitat...
GenColors: annotation and comparative genomics of prokaryotes made easy.

PubMed

Romualdi, Alessandro; Felder, Marius; Rose, Dominic; Gausmann, Ulrike; Schilhabel, Markus; Glöckner, Gernot; Platzer, Matthias; Sühnel, Jürgen

2007-01-01

GenColors (gencolors.fli-leibniz.de) is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. A variety of export/import filters manages an effective data flow from sequence assembly and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard GenBank file(s). The genome comparison tools include best bidirectional hits, gene conservation, syntenies, and gene core sets. Precomputed UniProt matches allow annotation and analysis in an effective manner. In addition to these analysis options, base-specific quality data (coverage and confidence) can also be handled if available. The GenColors system can be used both for annotation purposes in ongoing genome projects and as an analysis tool for finished genomes. GenColors comes in two types, as dedicated genome browsers and as the Jena Prokaryotic Genome Viewer (JPGV). Dedicated genome browsers contain genomic information on a set of related genomes and offer a large number of options for genome comparison. The system has been efficiently used in the genomic sequencing of Borrelia garinii and is currently applied to various ongoing genome projects on Borrelia, Legionella, Escherichia, and Pseudomonas genomes. One of these dedicated browsers, the Spirochetes Genome Browser (sgb.fli-leibniz.de) with Borrelia, Leptospira, and Treponema genomes, is freely accessible. The others will be released after finalization of the corresponding genome projects. JPGV (jpgv.fli-leibniz.de) offers information on almost all finished bacterial genomes, as compared to the dedicated browsers with reduced genome comparison functionality, however. As of January 2006, this viewer includes 632 genomic elements (e.g., chromosomes and plasmids) of 293 species. The system provides versatile quick and advanced search options for all currently known prokaryotic genomes and generates circular and linear genome plots. Gene information sheets contain basic gene information, database search options, and links to external databases. GenColors is also available on request for local installation.
Origins and Domestication of Cultivated Banana Inferred from Chloroplast and Nuclear Genes

PubMed Central

Zhang, Cui; Wang, Xin-Feng; Shi, Feng-Xue; Chen, Wen-Na; Ge, Xue-Jun

2013-01-01

Background Cultivated bananas are large, vegetatively-propagated members of the genus Musa. More than 1,000 cultivars are grown worldwide and they are major economic and food resources in numerous developing countries. It has been suggested that cultivated bananas originated from the islands of Southeast Asia (ISEA) and have been developed through complex geodomestication pathways. However, the maternal and parental donors of most cultivars are unknown, and the pattern of nucleotide diversity in domesticated banana has not been fully resolved. Methodology/Principal Findings We studied the genetics of 16 cultivated and 18 wild Musa accessions using two single-copy nuclear (granule-bound starch synthase I, GBSS I, also known as Waxy, and alcohol dehydrogenase 1, Adh1) and two chloroplast (maturase K, matK, and the trnL-F gene cluster) genes. The results of phylogenetic analyses showed that all A-genome haplotypes of cultivated bananas were grouped together with those of ISEA subspecies of M. acuminata (A-genome). Similarly, the B- and S-genome haplotypes of cultivated bananas clustered with the wild species M. balbisiana (B-genome) and M. schizocarpa (S-genome), respectively. Notably, it has been shown that distinct haplotypes of each cultivar (A-genome group) were nested together to different ISEA subspecies M. acuminata. Analyses of nucleotide polymorphism in the Waxy and Adh1 genes revealed that, in comparison to the wild relatives, cultivated banana exhibited slightly lower nucleotide diversity both across all sites and specifically at silent sites. However, dramatically reduced nucleotide diversity was found at nonsynonymous sites for cultivated bananas. Conclusions/Significance Our study not only confirmed the origin of cultivated banana as arising from multiple intra- and inter-specific hybridization events, but also showed that cultivated banana may have not suffered a severe genetic bottleneck during the domestication process. Importantly, our findings suggested that multiple maternal origins and a reduction in nucleotide diversity at nonsynonymous sites are general attributes of cultivated bananas. PMID:24260405
Comparison of Various Nuclear Localization Signal-Fused Cas9 Proteins and Cas9 mRNA for Genome Editing in Zebrafish.

PubMed

Hu, Peinan; Zhao, Xueying; Zhang, Qinghua; Li, Weiming; Zu, Yao

2018-03-02

The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system has been proven to be an efficient and precise genome editing technology in various organisms. However, the gene editing efficiencies of Cas9 proteins with a nuclear localization signal (NLS) fused to different termini and Cas9 mRNA have not been systematically compared. Here, we compared the ability of Cas9 proteins with NLS fused to the N-, C-, or both the N- and C-termini and N-NLS-Cas9-NLS-C mRNA to target two sites in the tyr gene and two sites in the gol gene related to pigmentation in zebrafish. Phenotypic analysis revealed that all types of Cas9 led to hypopigmentation in similar proportions of injected embryos. Genome analysis by T7 Endonuclease I (T7E1) assays demonstrated that all types of Cas9 similarly induced mutagenesis in four target sites. Sequencing results further confirmed that a high frequency of indels occurred in the target sites ( tyr1 > 66%, tyr2 > 73%, gol1 > 50%, and gol2 > 35%), as well as various types (more than six) of indel mutations observed in all four types of Cas9-injected embryos. Furthermore, all types of Cas9 showed efficient targeted mutagenesis on multiplex genome editing, resulting in multiple phenotypes simultaneously. Collectively, we conclude that various NLS-fused Cas9 proteins and Cas9 mRNAs have similar genome editing efficiencies on targeting single or multiple genes, suggesting that the efficiency of CRISPR/Cas9 genome editing is highly dependent on guide RNAs (gRNAs) and gene loci. These findings may help to simplify the selection of Cas9 for gene editing using the CRISPR/Cas9 system. Copyright © 2018 Hu et al.
Genome structure and emerging evidence of an incipient sex chromosome in Populus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yin, Tongming; DiFazio, Stephen P; Gunter, Lee E

The genus Populus consists of dioecious woody species with largely unknown genetic mechanisms for gender determination. We have discovered genetic and genomic features in the peritelomeric region of chromosome XIX that suggest this region of the Populus genome is in the process of developing characteristics of a sex chromosome. We have identified a gender-associated locus that consistently maps to this region. Furthermore, comparison of genetic maps across multiple Populus families reveals consistently distorted segregation within this region. We have intensively characterized this region using an F1 interspecific cross involving the female genotype that was used for genome sequencing. This regionmore » shows suppressed recombination and high divergence between the alternate haplotypes, as revealed by dense map-based genome assembly using microsatellite markers. The suppressed recombination, distorted segregation, and haplotype divergence were observed only for the maternal parent in this cross. Furthermore, the progeny of this cross showed a strongly male-biased sex ratio, in agreement with Haldane's rule that postulates that the heterogametic sex is more likely to be absent, rare, or sterile in interspecific crosses. Together, these results support the role of chromosome XIX in sex determination and suggest that sex determination in Populus occurs through a ZW system in which the female is the heterogametic gender.« less
Gamarada debralockiae gen. nov. sp. nov.-the genome of the most widespread Australian ericoid mycorrhizal fungus.

PubMed

Midgley, David J; Sutcliffe, Brodie; Greenfield, Paul; Tran-Dinh, Nai

2018-05-01

This study describes a novel ericoid mycorrhizal fungus (ErMF), Gamarada debralockiae Midgley and Tran-Dinh gen. nov. sp. nov. Additionally, catabolism was explored from a genomic perspective. The nuclear and mitochondrial genomes of G. debralockiae were sequenced. Morphological characteristics were assessed on various media. Catabolic genes of G. debralockiae were explored using SignalP and dbCAN. Phylogenetic comparisons were undertaken using Phylogeny.fr. The 58.5-Mbp draft genome of G. debralockiae contained 17,075 putative genes. The complete mitochondrial genome was 28,168 bp in length. In culture, G. debralockiae produces slow-growing non-sporulating colonies. Gamarada debralockiae has many putative secreted catabolic enzymes. Phylogeny indicated G. debralockiae was distinct from known ascomycetous ErMF: Pezoloma ericae, Meliniomyces spp., Oidiodendron spp., and Cairneyella variabilis. It is closely related to many undescribed plant root-associated fungi and its nearest described relative is Hyphodiscus brevicollaris. Gamarada debralockiae has been recovered from virtually all Australian ericoid mycorrhizal studies and biogeographic data suggests the taxon is widespread in Australia. Gamarada debralockiae has similar catabolic potential to C. variabilis and co-occurs with C. variabilis at Australian sites. Plants that host multiple ErMF may benefit from subtle differences in catabolism that improve access to nitrogen and phosphorus from within recalcitrant organic matter.
Mitochondrial genome evolution and tRNA truncation in Acariformes mites: new evidence from eriophyoid mites

PubMed Central

Xue, Xiao-Feng; Guo, Jing-Feng; Dong, Yan; Hong, Xiao-Yue; Shao, Renfu

2016-01-01

The subclass Acari (mites and ticks) comprises two super-orders: Acariformes and Parasitiformes. Most species of the Parasitiformes known retained the ancestral pattern of mitochondrial (mt) gene arrangement of arthropods, and their mt tRNAs have the typical cloverleaf structure. All of the species of the Acariformes known, however, have rearranged mt genomes and truncated mt tRNAs. We sequenced the mt genomes of two species of Eriophyoidea: Phyllocoptes taishanensis and Epitrimerus sabinae. The mt genomes of P. taishanensis and E. sabinae are 13,475 bp and 13,531 bp, respectively, are circular and contain the 37 genes typical of animals; most mt tRNAs are highly truncated in both mites. On the other hand, these two eriophyoid mites have the least rearranged mt genomes seen in the Acariformes. Comparison between eriophyoid mites and other Aacariformes mites showed that: 1) the most recent common ancestor of Acariformes mites retained the ancestral pattern of mt gene arrangement of arthropods with slight modifications; 2) truncation of tRNAs for cysteine, phenylalanine and histidine occurred once in the most recent common ancestor of Acariformes mites whereas truncation of other tRNAs occurred multiple times; and 3) the placement of eriophyoid mites in the order Trombidiformes needs to be reviewed. PMID:26732998
Paternal Genome Elimination in Liposcelis Booklice (Insecta: Psocodea)

PubMed Central

Hodson, Christina N.; Hamilton, Phineas T.; Dilworth, Dave; Nelson, Chris J.; Curtis, Caitlin I.; Perlman, Steve J.

2017-01-01

How sex is determined in insects is diverse and dynamic, and includes male heterogamety, female heterogamety, and haplodiploidy. In many insect lineages, sex determination is either completely unknown or poorly studied. We studied sex determination in Psocodea—a species-rich order of insects that includes parasitic lice, barklice, and booklice. We focus on a recently discovered species of Liposcelis booklice (Psocodea: Troctomorpha), which are among the closest free-living relatives of parasitic lice. Using genetic, genomic, and immunohistochemical approaches, we show that this group exhibits paternal genome elimination (PGE), an unusual mode of sex determination that involves genomic imprinting. Controlled crosses, following a genetic marker over multiple generations, demonstrated that males only transmit to offspring genes they inherited from their mother. Immunofluorescence microscopy revealed densely packed chromocenters associated with H3K9me3—a conserved marker for heterochromatin—in males, but not in females, suggesting silencing of chromosomes in males. Genome assembly and comparison of read coverage in male and female libraries showed no evidence for differentiated sex chromosomes. We also found that females produce more sons early in life, consistent with facultative sex allocation. It is likely that PGE is widespread in Psocodea, including human lice. This order represents a promising model for studying this enigmatic mode of sex determination. PMID:28292917
Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

PubMed

Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

2015-01-01

We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.
Statistical Methods in Integrative Genomics

PubMed Central

Richardson, Sylvia; Tseng, George C.; Sun, Wei

2016-01-01

Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions. PMID:27482531
Genome of an arbuscular mycorrhizal fungus provides insight into the oldest plant symbiosis.

PubMed

Tisserant, Emilie; Malbreil, Mathilde; Kuo, Alan; Kohler, Annegret; Symeonidi, Aikaterini; Balestrini, Raffaella; Charron, Philippe; Duensing, Nina; Frei dit Frey, Nicolas; Gianinazzi-Pearson, Vivienne; Gilbert, Luz B; Handa, Yoshihiro; Herr, Joshua R; Hijri, Mohamed; Koul, Raman; Kawaguchi, Masayoshi; Krajinski, Franziska; Lammers, Peter J; Masclaux, Frederic G; Murat, Claude; Morin, Emmanuelle; Ndikumana, Steve; Pagni, Marco; Petitpierre, Denis; Requena, Natalia; Rosikiewicz, Pawel; Riley, Rohan; Saito, Katsuharu; San Clemente, Hélène; Shapiro, Harris; van Tuinen, Diederik; Bécard, Guillaume; Bonfante, Paola; Paszkowski, Uta; Shachar-Hill, Yair Y; Tuskan, Gerald A; Young, J Peter W; Young, Peter W; Sanders, Ian R; Henrissat, Bernard; Rensing, Stefan A; Grigoriev, Igor V; Corradi, Nicolas; Roux, Christophe; Martin, Francis

2013-12-10

The mutualistic symbiosis involving Glomeromycota, a distinctive phylum of early diverging Fungi, is widely hypothesized to have promoted the evolution of land plants during the middle Paleozoic. These arbuscular mycorrhizal fungi (AMF) perform vital functions in the phosphorus cycle that are fundamental to sustainable crop plant productivity. The unusual biological features of AMF have long fascinated evolutionary biologists. The coenocytic hyphae host a community of hundreds of nuclei and reproduce clonally through large multinucleated spores. It has been suggested that the AMF maintain a stable assemblage of several different genomes during the life cycle, but this genomic organization has been questioned. Here we introduce the 153-Mb haploid genome of Rhizophagus irregularis and its repertoire of 28,232 genes. The observed low level of genome polymorphism (0.43 SNP per kb) is not consistent with the occurrence of multiple, highly diverged genomes. The expansion of mating-related genes suggests the existence of cryptic sex-related processes. A comparison of gene categories confirms that R. irregularis is close to the Mucoromycotina. The AMF obligate biotrophy is not explained by genome erosion or any related loss of metabolic complexity in central metabolism, but is marked by a lack of genes encoding plant cell wall-degrading enzymes and of genes involved in toxin and thiamine synthesis. A battery of mycorrhiza-induced secreted proteins is expressed in symbiotic tissues. The present comprehensive repertoire of R. irregularis genes provides a basis for future research on symbiosis-related mechanisms in Glomeromycota.
Development of a real-time PCR for detection of Staphylococcus pseudintermedius using a novel automated comparison of whole-genome sequences.

PubMed

Verstappen, Koen M; Huijbregts, Loes; Spaninks, Mirlin; Wagenaar, Jaap A; Fluit, Ad C; Duim, Birgitta

2017-01-01

Staphylococcus pseudintermedius is an opportunistic pathogen in dogs and cats and occasionally causes infections in humans. S. pseudintermedius is often resistant to multiple classes of antimicrobials. It requires a reliable detection so that it is not misidentified as S. aureus. Phenotypic and currently-used molecular-based diagnostic assays lack specificity or are labour-intensive using multiplex PCR or nucleic acid sequencing. The aim of this study was to identify a specific target for real-time PCR by comparing whole genome sequences of S. pseudintermedius and non-pseudintermedius.Genome sequences were downloaded from public repositories and supplemented by isolates that were sequenced in this study. A Perl-script was written that analysed 300-nt fragments from a reference genome sequence of S. pseudintermedius and checked if this sequence was present in other S. pseudintermedius genomes (n = 74) and non-pseudintermedius genomes (n = 138). Six sequences specific for S. pseudintermedius were identified (sequence length between 300-500 nt). One sequence, which was located in the spsJ gene, was used to develop primers and a probe. The real-time PCR showed 100% specificity when testing for S. pseudintermedius isolates (n = 54), and eight other staphylococcal species (n = 43). In conclusion, a novel approach by comparing whole genome sequences identified a sequence that is specific for S. pseudintermedius and provided a real-time PCR target for rapid and reliable detection of S. pseudintermedius.
Bursts of retrotransposition reproduced in Arabidopsis.

PubMed

Tsukahara, Sayuri; Kobayashi, Akie; Kawabe, Akira; Mathieu, Olivier; Miura, Asuka; Kakutani, Tetsuji

2009-09-17

Retrotransposons, which proliferate by reverse transcription of RNA intermediates, comprise a major portion of plant genomes. Plants often change the genome size and organization during evolution by rapid proliferation and deletion of long terminal repeat (LTR) retrotransposons. Precise transposon sequences throughout the Arabidopsis thaliana genome and the trans-acting mutations affecting epigenetic states make it an ideal model organism with which to study transposon dynamics. Here we report the mobilization of various families of endogenous A. thaliana LTR retrotransposons identified through genetic and genomic approaches with high-resolution genomic tiling arrays and mutants in the chromatin-remodelling gene DDM1 (DECREASE IN DNA METHYLATION 1). Using multiple lines of self-pollinated ddm1 mutant, we detected an increase in copy number, and verified this for various retrotransposons in a gypsy family (ATGP3) and copia families (ATCOPIA13, ATCOPIA21, ATCOPIA93), and also for a DNA transposon of a Mutator family, VANDAL21. A burst of retrotransposition occurred stochastically and independently for each element, suggesting an additional autocatalytic process. Furthermore, comparison of the identified LTR retrotransposons in related Arabidopsis species revealed that a lineage-specific burst of retrotransposition of these elements did indeed occur in natural Arabidopsis populations. The recent burst of retrotransposition in natural population is targeted to centromeric repeats, which is presumably less harmful than insertion into genes. The ddm1-induced retrotransposon proliferations and genome rearrangements mimic the transposon-mediated genome dynamics during evolution and provide experimental systems with which to investigate the controlling molecular factors directly.
Genome of an arbuscular mycorrhizal fungus provides insight into the oldest plant symbiosis

PubMed Central

Tisserant, Emilie; Malbreil, Mathilde; Kuo, Alan; Kohler, Annegret; Symeonidi, Aikaterini; Balestrini, Raffaella; Charron, Philippe; Duensing, Nina; Frei dit Frey, Nicolas; Gianinazzi-Pearson, Vivienne; Gilbert, Luz B.; Handa, Yoshihiro; Herr, Joshua R.; Hijri, Mohamed; Koul, Raman; Kawaguchi, Masayoshi; Krajinski, Franziska; Lammers, Peter J.; Masclaux, Frederic G.; Murat, Claude; Morin, Emmanuelle; Ndikumana, Steve; Pagni, Marco; Petitpierre, Denis; Requena, Natalia; Rosikiewicz, Pawel; Riley, Rohan; Saito, Katsuharu; San Clemente, Hélène; Shapiro, Harris; van Tuinen, Diederik; Bécard, Guillaume; Bonfante, Paola; Paszkowski, Uta; Shachar-Hill, Yair Y.; Tuskan, Gerald A.; Young, J. Peter W.; Sanders, Ian R.; Henrissat, Bernard; Rensing, Stefan A.; Grigoriev, Igor V.; Corradi, Nicolas; Roux, Christophe; Martin, Francis

2013-01-01

The mutualistic symbiosis involving Glomeromycota, a distinctive phylum of early diverging Fungi, is widely hypothesized to have promoted the evolution of land plants during the middle Paleozoic. These arbuscular mycorrhizal fungi (AMF) perform vital functions in the phosphorus cycle that are fundamental to sustainable crop plant productivity. The unusual biological features of AMF have long fascinated evolutionary biologists. The coenocytic hyphae host a community of hundreds of nuclei and reproduce clonally through large multinucleated spores. It has been suggested that the AMF maintain a stable assemblage of several different genomes during the life cycle, but this genomic organization has been questioned. Here we introduce the 153-Mb haploid genome of Rhizophagus irregularis and its repertoire of 28,232 genes. The observed low level of genome polymorphism (0.43 SNP per kb) is not consistent with the occurrence of multiple, highly diverged genomes. The expansion of mating-related genes suggests the existence of cryptic sex-related processes. A comparison of gene categories confirms that R. irregularis is close to the Mucoromycotina. The AMF obligate biotrophy is not explained by genome erosion or any related loss of metabolic complexity in central metabolism, but is marked by a lack of genes encoding plant cell wall-degrading enzymes and of genes involved in toxin and thiamine synthesis. A battery of mycorrhiza-induced secreted proteins is expressed in symbiotic tissues. The present comprehensive repertoire of R. irregularis genes provides a basis for future research on symbiosis-related mechanisms in Glomeromycota. PMID:24277808
Assembly and comparison of two closely related Brassica napus genomes.

PubMed

Bayer, Philipp E; Hurgobin, Bhavna; Golicz, Agnieszka A; Chan, Chon-Kit Kenneth; Yuan, Yuxuan; Lee, HueyTyng; Renton, Michael; Meng, Jinling; Li, Ruiyuan; Long, Yan; Zou, Jun; Bancroft, Ian; Chalhoub, Boulos; King, Graham J; Batley, Jacqueline; Edwards, David

2017-12-01

As an increasing number of plant genome sequences become available, it is clear that gene content varies between individuals, and the challenge arises to predict the gene content of a species. However, genome comparison is often confounded by variation in assembly and annotation. Differentiating between true gene absence and variation in assembly or annotation is essential for the accurate identification of conserved and variable genes in a species. Here, we present the de novo assembly of the B. napus cultivar Tapidor and comparison with an improved assembly of the Brassica napus cultivar Darmor-bzh. Both cultivars were annotated using the same method to allow comparison of gene content. We identified genes unique to each cultivar and differentiate these from artefacts due to variation in the assembly and annotation. We demonstrate that using a common annotation pipeline can result in different gene predictions, even for closely related cultivars, and repeat regions which collapse during assembly impact whole genome comparison. After accounting for differences in assembly and annotation, we demonstrate that the genome of Darmor-bzh contains a greater number of genes than the genome of Tapidor. Our results are the first step towards comparison of the true differences between B. napus genomes and highlight the potential sources of error in future production of a B. napus pangenome. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Multiplexed fragaria chloroplast genome sequencing

Treesearch

W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

2010-01-01

A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

The invasive MED/Q Bemisia tabaci genome: a tale of gene loss and gene gain.

PubMed

Xie, Wen; Yang, Xin; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Wang, Dan; Huang, Jinqun; Zhang, Hailin; Wen, Yanan; Zhao, Jinyang; Wu, Qingjun; Wang, Shaoli; Coates, Brad S; Zhou, Xuguo; Zhang, Youjun

2018-01-22

Sweetpotato whitefly, Bemisia tabaci MED/Q and MEAM1/B, are two economically important invasive species that cause considerable damages to agriculture crops through direct feeding and indirect vectoring of plant pathogens. Recently, a draft genome of B. tabaci MED/Q has been assembled. In this study, we focus on the genomic comparison between MED/Q and MEAM1/B, with a special interest in MED/Q's genomic signatures that may contribute to the highly invasive nature of this emerging insect pest. The genomes of both species share similarity in syntenic blocks, but have significant divergence in the gene coding sequence. Expansion of cytochrome P450 monooxygenases and UDP glycosyltransferases in MED/Q and MEAM1/B genome is functionally validated for mediating insecticide resistance in MED/Q using in vivo RNAi. The amino acid biosynthesis pathways in MED/Q genome are partitioned among the host and endosymbiont genomes in a manner distinct from other hemipterans. Evidence of horizontal gene transfer to the host genome may explain their obligate relationship. Putative loss-of-function in the immune deficiency-signaling pathway due to the gene loss is a shared ancestral trait among hemipteran insects. The expansion of detoxification genes families, such as P450s, may contribute to the development of insecticide resistance traits and a broad host range in MED/Q and MEAM1/B, and facilitate species' invasions into intensively managed cropping systems. Numerical and compositional changes in multiple gene families (gene loss and gene gain) in the MED/Q genome sets a foundation for future hypothesis testing that will advance our understanding of adaptation, viral transmission, symbiosis, and plant-insect-pathogen tritrophic interactions.
Novel Loci for Metabolic Networks and Multi-Tissue Expression Studies Reveal Genes for Atherosclerosis

PubMed Central

Inouye, Michael; Ripatti, Samuli; Kettunen, Johannes; Lyytikäinen, Leo-Pekka; Oksala, Niku; Laurila, Pirkka-Pekka; Kangas, Antti J.; Soininen, Pasi; Savolainen, Markku J.; Viikari, Jorma; Kähönen, Mika; Perola, Markus; Salomaa, Veikko; Raitakari, Olli; Lehtimäki, Terho; Taskinen, Marja-Riitta; Järvelin, Marjo-Riitta; Ala-Korpela, Mika; Palotie, Aarno; de Bakker, Paul I. W.

2012-01-01

Association testing of multiple correlated phenotypes offers better power than univariate analysis of single traits. We analyzed 6,600 individuals from two population-based cohorts with both genome-wide SNP data and serum metabolomic profiles. From the observed correlation structure of 130 metabolites measured by nuclear magnetic resonance, we identified 11 metabolic networks and performed a multivariate genome-wide association analysis. We identified 34 genomic loci at genome-wide significance, of which 7 are novel. In comparison to univariate tests, multivariate association analysis identified nearly twice as many significant associations in total. Multi-tissue gene expression studies identified variants in our top loci, SERPINA1 and AQP9, as eQTLs and showed that SERPINA1 and AQP9 expression in human blood was associated with metabolites from their corresponding metabolic networks. Finally, liver expression of AQP9 was associated with atherosclerotic lesion area in mice, and in human arterial tissue both SERPINA1 and AQP9 were shown to be upregulated (6.3-fold and 4.6-fold, respectively) in atherosclerotic plaques. Our study illustrates the power of multi-phenotype GWAS and highlights candidate genes for atherosclerosis. PMID:22916037
The variable genomic architecture of isolation between hybridizing species of house mice.

PubMed

Teeter, Katherine C; Thibodeau, Lisa M; Gompert, Zachariah; Buerkle, C Alex; Nachman, Michael W; Tucker, Priscilla K

2010-02-01

Studies of the genetics of hybrid zones can provide insight into the genomic architecture of species boundaries. By examining patterns of introgression of multiple loci across a hybrid zone, it may be possible to identify regions of the genome that have experienced selection. Here, we present a comparison of introgression in two replicate transects through the house mouse hybrid zone through central Europe, using data from 41 single nucleotide markers. Using both genomic and geographic clines, we found many differences in patterns of introgression between the two transects, as well as some similarities. We found that many loci may have experienced the effects of selection at linked sites, including selection against hybrid genotypes, as well as positive selection in the form of genotypes introgressed into a foreign genetic background. We also found many positive associations of conspecific alleles among unlinked markers, which could be caused by epistatic interactions. Different patterns of introgression in the two transects highlight the challenge of using hybrid zones to identify genes underlying isolation and raise the possibility that the genetic basis of isolation between these species may be dependent on the local population genetic make-up or the local ecological setting.
Comparison of genome-wide selection strategies to identify furfural tolerance genes in Escherichia coli.

PubMed

Glebes, Tirzah Y; Sandoval, Nicholas R; Gillis, Jacob H; Gill, Ryan T

2015-01-01

Engineering both feedstock and product tolerance is important for transitioning towards next-generation biofuels derived from renewable sources. Tolerance to chemical inhibitors typically results in complex phenotypes, for which multiple genetic changes must often be made to confer tolerance. Here, we performed a genome-wide search for furfural-tolerant alleles using the TRackable Multiplex Recombineering (TRMR) method (Warner et al. (2010), Nature Biotechnology), which uses chromosomally integrated mutations directed towards increased or decreased expression of virtually every gene in Escherichia coli. We employed various growth selection strategies to assess the role of selection design towards growth enrichments. We also compared genes with increased fitness from our TRMR selection to those from a previously reported genome-wide identification study of furfural tolerance genes using a plasmid-based genomic library approach (Glebes et al. (2014) PLOS ONE). In several cases, growth improvements were observed for the chromosomally integrated promoter/RBS mutations but not for the plasmid-based overexpression constructs. Through this assessment, four novel tolerance genes, ahpC, yhjH, rna, and dicA, were identified and confirmed for their effect on improving growth in the presence of furfural. © 2014 Wiley Periodicals, Inc.
Bluejay 1.0: genome browsing and comparison with rich customization provision and dynamic resource linking

PubMed Central

Soh, Jung; Gordon, Paul MK; Taschuk, Morgan L; Dong, Anguo; Ah-Seng, Andrew C; Turinsky, Andrei L; Sensen, Christoph W

2008-01-01

Background The Bluejay genome browser has been developed over several years to address the challenges posed by the ever increasing number of data types as well as the increasing volume of data in genome research. Beginning with a browser capable of rendering views of XML-based genomic information and providing scalable vector graphics output, we have now completed version 1.0 of the system with many additional features. Our development efforts were guided by our observation that biologists who use both gene expression profiling and comparative genomics gain functional insights above and beyond those provided by traditional per-gene analyses. Results Bluejay 1.0 is a genome viewer integrating genome annotation with: (i) gene expression information; and (ii) comparative analysis with an unlimited number of other genomes in the same view. This allows the biologist to see a gene not just in the context of its genome, but also its regulation and its evolution. Bluejay now has rich provision for personalization by users: (i) numerous display customization features; (ii) the availability of waypoints for marking multiple points of interest on a genome and subsequently utilizing them; and (iii) the ability to take user relevance feedback of annotated genes or textual items to offer personalized recommendations. Bluejay 1.0 also embeds the Seahawk browser for the Moby protocol, enabling users to seamlessly invoke hundreds of Web Services on genomic data of interest without any hard-coding. Conclusion Bluejay offers a unique set of customizable genome-browsing features, with the goal of allowing biologists to quickly focus on, analyze, compare, and retrieve related information on the parts of the genomic data they are most interested in. We expect these capabilities of Bluejay to benefit the many biologists who want to answer complex questions using the information available from completely sequenced genomes. PMID:18940007
Detection of genomic signatures of recent selection in commercial broiler chickens.

PubMed

Fu, Weixuan; Lee, William R; Abasht, Behnam

2016-08-26

Identification of the genomic signatures of recent selection may help uncover causal polymorphisms controlling traits relevant to recent decades of selective breeding in livestock. In this study, we aimed at detecting signatures of recent selection in commercial broiler chickens using genotype information from single nucleotide polymorphisms (SNPs). A total of 565 chickens from five commercial purebred lines, including three broiler sire (male) lines and two broiler dam (female) lines, were genotyped using the 60K SNP Illumina iSelect chicken array. To detect genomic signatures of recent selection, we applied two methods based on population comparison, cross-population extended haplotype homozygosity (XP-EHH) and cross-population composite likelihood ratio (XP-CLR), and further analyzed the results to find genomic regions under recent selection in multiple purebred lines. A total of 321 candidate selection regions spanning approximately 1.45 % of the chicken genome in each line were detected by consensus of results of both XP-EHH and XP-CLR methods. To minimize false discovery due to genetic drift, only 42 of the candidate selection regions that were shared by 2 or more purebred lines were considered as high-confidence selection regions in the study. Of these 42 regions, 20 were 50 kb or less while 4 regions were larger than 0.5 Mb. In total, 91 genes could be found in the 42 regions, among which 19 regions contained only 1 or 2 genes, and 9 regions were located at gene deserts. Our results provide a genome-wide scan of recent selection signatures in five purebred lines of commercial broiler chickens. We found several candidate genes for recent selection in multiple lines, such as SOX6 (Sex Determining Region Y-Box 6) and cTR (Thyroid hormone receptor beta). These genes may have been under recent selection due to their essential roles in growth, development and reproduction in chickens. Furthermore, our results suggest that in some candidate regions, the same or opposite alleles have been under recent selection in multiple lines. Most of the candidate genes in the selection regions are novel, and as such they should be of great interest for future research into the genetic architecture of traits relevant to modern broiler breeding.
Functional equivalency inferred from "authoritative sources" in networks of homologous proteins.

PubMed

Natarajan, Shreedhar; Jakobsson, Eric

2009-06-12

A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods.
Functional Equivalency Inferred from “Authoritative Sources” in Networks of Homologous Proteins

PubMed Central

Natarajan, Shreedhar; Jakobsson, Eric

2009-01-01

A one-on-one mapping of protein functionality across different species is a critical component of comparative analysis. This paper presents a heuristic algorithm for discovering the Most Likely Functional Counterparts (MoLFunCs) of a protein, based on simple concepts from network theory. A key feature of our algorithm is utilization of the user's knowledge to assign high confidence to selected functional identification. We show use of the algorithm to retrieve functional equivalents for 7 membrane proteins, from an exploration of almost 40 genomes form multiple online resources. We verify the functional equivalency of our dataset through a series of tests that include sequence, structure and function comparisons. Comparison is made to the OMA methodology, which also identifies one-on-one mapping between proteins from different species. Based on that comparison, we believe that incorporation of user's knowledge as a key aspect of the technique adds value to purely statistical formal methods. PMID:19521530
DL-ADR: a novel deep learning model for classifying genomic variants into adverse drug reactions.

PubMed

Liang, Zhaohui; Huang, Jimmy Xiangji; Zeng, Xing; Zhang, Gang

2016-08-10

Genomic variations are associated with the metabolism and the occurrence of adverse reactions of many therapeutic agents. The polymorphisms on over 2000 locations of cytochrome P450 enzymes (CYP) due to many factors such as ethnicity, mutations, and inheritance attribute to the diversity of response and side effects of various drugs. The associations of the single nucleotide polymorphisms (SNPs), the internal pharmacokinetic patterns and the vulnerability of specific adverse reactions become one of the research interests of pharmacogenomics. The conventional genomewide association studies (GWAS) mainly focuses on the relation of single or multiple SNPs to a specific risk factors which are a one-to-many relation. However, there are no robust methods to establish a many-to-many network which can combine the direct and indirect associations between multiple SNPs and a serial of events (e.g. adverse reactions, metabolic patterns, prognostic factors etc.). In this paper, we present a novel deep learning model based on generative stochastic networks and hidden Markov chain to classify the observed samples with SNPs on five loci of two genes (CYP2D6 and CYP1A2) respectively to the vulnerable population of 14 types of adverse reactions. A supervised deep learning model is proposed in this study. The revised generative stochastic networks (GSN) model with transited by the hidden Markov chain is used. The data of the training set are collected from clinical observation. The training set is composed of 83 observations of blood samples with the genotypes respectively on CYP2D6*2, *10, *14 and CYP1A2*1C, *1 F. The samples are genotyped by the polymerase chain reaction (PCR) method. A hidden Markov chain is used as the transition operator to simulate the probabilistic distribution. The model can perform learning at lower cost compared to the conventional maximal likelihood method because the transition distribution is conditional on the previous state of the hidden Markov chain. A least square loss (LASSO) algorithm and a k-Nearest Neighbors (kNN) algorithm are used as the baselines for comparison and to evaluate the performance of our proposed deep learning model. There are 53 adverse reactions reported during the observation. They are assigned to 14 categories. In the comparison of classification accuracy, the deep learning model shows superiority over the LASSO and kNN model with a rate over 80 %. In the comparison of reliability, the deep learning model shows the best stability among the three models. Machine learning provides a new method to explore the complex associations among genomic variations and multiple events in pharmacogenomics studies. The new deep learning algorithm is capable of classifying various SNPs to the corresponding adverse reactions. We expect that as more genomic variations are added as features and more observations are made, the deep learning model can improve its performance and can act as a black-box but reliable verifier for other GWAS studies.
A brief introduction to web-based genome browsers.

PubMed

Wang, Jun; Kong, Lei; Gao, Ge; Luo, Jingchu

2013-03-01

Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. In this review, we attempt to give an overview for the main functions and features of web-based genome browsers, covering data visualization, retrieval, analysis and customization. To give a brief introduction to the multiple-species genome browser, we describe the user interface and main functions of the Ensembl and UCSC genome browsers using the human alpha-globin gene cluster as an example. We further use the MSU and the Rice-Map genome browsers to show some special features of species-specific genome browser, taking a rice transcription factor gene OsSPL14 as an example.
Determination and analysis of the genome sequence of Spodoptera littoralis multiple nucleopolyhedrovirus

USDA-ARS?s Scientific Manuscript database

The Spodoptera littoralis multiple nucleopolyhedrovirus (SpliMNPV), a pathogen of the Egyptian cotton leaf worm Spodoptera littoralis, was subjected to sequencing of its entire DNA genome and bioassay analysis comparing its virulence to that of other baculoviruses. The annotated SpliMNPV genome of...
GTF2IRD2 is located in the Williams–Beuren syndrome critical region 7q11.23 and encodes a protein with two TFII-I-like helix–loop–helix repeats

PubMed Central

Makeyev, Aleksandr V.; Erdenechimeg, Lkhamsuren; Mungunsukh, Ognoon; Roth, Jutta J.; Enkhmandakh, Badam; Ruddle, Frank H.; Bayarsaihan, Dashzeveg

2004-01-01

Williams–Beuren syndrome (also known as Williams syndrome) is caused by a deletion of a 1.55- to 1.84-megabase region from chromosome band 7q11.23. GTF2IRD1 and GTF2I, located within this critical region, encode proteins of the TFII-I family with multiple helix–loop–helix domains known as I repeats. In the present work, we characterize a third member, GTF2IRD2, which has sequence and structural similarity to the GTF2I and GTF2IRD1 paralogs. The ORF encodes a protein with several features characteristic of regulatory factors, including two I repeats, two leucine zippers, and a single Cys-2/His-2 zinc finger. The genomic organization of human, baboon, rat, and mouse genes is well conserved. Our exon-by-exon comparison has revealed that GTF2IRD2 is more closely related to GTF2I than to GTF2IRD1 and apparently is derived from the GTF2I sequence. The comparison of GTF2I and GTF2IRD2 genes revealed two distinct regions of homology, indicating that the helix–loop–helix domain structure of the GTF2IRD2 gene has been generated by two independent genomic duplications. We speculate that GTF2I is derived from GTF2IRD1 as a result of local duplication and the further evolution of its structure was associated with its functional specialization. Comparison of genomic sequences surrounding GTF2IRD2 genes in mice and humans allows refinement of the centromeric breakpoint position of the primate-specific inversion within the Williams–Beuren syndrome critical region. PMID:15243160
GTF2IRD2 is located in the Williams-Beuren syndrome critical region 7q11.23 and encodes a protein with two TFII-I-like helix-loop-helix repeats.

PubMed

Makeyev, Aleksandr V; Erdenechimeg, Lkhamsuren; Mungunsukh, Ognoon; Roth, Jutta J; Enkhmandakh, Badam; Ruddle, Frank H; Bayarsaihan, Dashzeveg

2004-07-27

Williams-Beuren syndrome (also known as Williams syndrome) is caused by a deletion of a 1.55- to 1.84-megabase region from chromosome band 7q11.23. GTF2IRD1 and GTF2I, located within this critical region, encode proteins of the TFII-I family with multiple helix-loop-helix domains known as I repeats. In the present work, we characterize a third member, GTF2IRD2, which has sequence and structural similarity to the GTF2I and GTF2IRD1 paralogs. The ORF encodes a protein with several features characteristic of regulatory factors, including two I repeats, two leucine zippers, and a single Cys-2/His-2 zinc finger. The genomic organization of human, baboon, rat, and mouse genes is well conserved. Our exon-by-exon comparison has revealed that GTF2IRD2 is more closely related to GTF2I than to GTF2IRD1 and apparently is derived from the GTF2I sequence. The comparison of GTF2I and GTF2IRD2 genes revealed two distinct regions of homology, indicating that the helix-loop-helix domain structure of the GTF2IRD2 gene has been generated by two independent genomic duplications. We speculate that GTF2I is derived from GTF2IRD1 as a result of local duplication and the further evolution of its structure was associated with its functional specialization. Comparison of genomic sequences surrounding GTF2IRD2 genes in mice and humans allows refinement of the centromeric breakpoint position of the primate-specific inversion within the Williams-Beuren syndrome critical region.
Phenotypic diversification by enhanced genome restructuring after induction of multiple DNA double-strand breaks.

PubMed

Muramoto, Nobuhiko; Oda, Arisa; Tanaka, Hidenori; Nakamura, Takahiro; Kugou, Kazuto; Suda, Kazuki; Kobayashi, Aki; Yoneda, Shiori; Ikeuchi, Akinori; Sugimoto, Hiroki; Kondo, Satoshi; Ohto, Chikara; Shibata, Takehiko; Mitsukawa, Norihiro; Ohta, Kunihiro

2018-05-18

DNA double-strand break (DSB)-mediated genome rearrangements are assumed to provide diverse raw genetic materials enabling accelerated adaptive evolution; however, it remains unclear about the consequences of massive simultaneous DSB formation in cells and their resulting phenotypic impact. Here, we establish an artificial genome-restructuring technology by conditionally introducing multiple genomic DSBs in vivo using a temperature-dependent endonuclease TaqI. Application in yeast and Arabidopsis thaliana generates strains with phenotypes, including improved ethanol production from xylose at higher temperature and increased plant biomass, that are stably inherited to offspring after multiple passages. High-throughput genome resequencing revealed that these strains harbor diverse rearrangements, including copy number variations, translocations in retrotransposons, and direct end-joinings at TaqI-cleavage sites. Furthermore, large-scale rearrangements occur frequently in diploid yeasts (28.1%) and tetraploid plants (46.3%), whereas haploid yeasts and diploid plants undergo minimal rearrangement. This genome-restructuring system (TAQing system) will enable rapid genome breeding and aid genome-evolution studies.
Evolutionary conservation, diversity and specificity of LTR retrotransposons in flowering plants: insights from genome-wide analysis and multi-specific comparison

USDA-ARS?s Scientific Manuscript database

The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 LTR-retrotransposon (LTR-RT) families that are comprised of 32,...
Apophysomyces variabilis: draft genome sequence and comparison of predictive virulence determinants with other medically important Mucorales.

PubMed

Prakash, Hariprasath; Rudramurthy, Shivaprakash Mandya; Gandham, Prasad S; Ghosh, Anup Kumar; Kumar, Milner M; Badapanda, Chandan; Chakrabarti, Arunaloke

2017-09-18

Apophysomyces species are prevalent in tropical countries and A. variabilis is the second most frequent agent causing mucormycosis in India. Among Apophysomyces species, A. elegans, A. trapeziformis and A. variabilis are commonly incriminated in human infections. The genome sequences of A. elegans and A. trapeziformis are available in public database, but not A. variabilis. We, therefore, performed the whole genome sequence of A. variabilis to explore its genomic structure and possible genes determining the virulence of the organism. The whole genome of A. variabilis NCCPF 102052 was sequenced and the genomic structure of A. variabilis was compared with already available genome structures of A. elegans, A. trapeziformis and other medically important Mucorales. The total size of genome assembly of A. variabilis was 39.38 Mb with 12,764 protein-coding genes. The transposable elements (TEs) were low in Apophysomyces genome and the retrotransposon Ty3-gypsy was the common TE. Phylogenetically, Apophysomyces species were grouped closely with Phycomyces blakesleeanus. OrthoMCL analysis revealed 3025 orthologues proteins, which were common in those three pathogenic Apophysomyces species. Expansion of multiple gene families/duplication was observed in Apophysomyces genomes. Approximately 6% of Apophysomyces genes were predicted to be associated with virulence on PHIbase analysis. The virulence determinants included the protein families of CotH proteins (invasins), proteases, iron utilisation pathways, siderophores and signal transduction pathways. Serine proteases were the major group of proteases found in all Apophysomyces genomes. The carbohydrate active enzymes (CAZymes) constitute the majority of the secretory proteins. The present study is the maiden attempt to sequence and analyze the genomic structure of A. variabilis. Together with available genome sequence of A. elegans and A. trapeziformis, the study helped to indicate the possible virulence determinants of pathogenic Apophysomyces species. The presence of unique CAZymes in cell wall might be exploited in future for antifungal drug development.
Comparison of the protein-coding genomes of three deep-sea, sulfur-oxidising bacteria: "Candidatus Ruthia magnifica", "Candidatus Vesicomyosocius okutanii" and Thiomicrospira crunogena.

PubMed

McGill, Susan E; Barker, Daniel

2017-07-20

" Candidatus Ruthia magnifica", "Candidatus Vesicomyosocius okutanii" and Thiomicrospira crunogena are all sulfur-oxidising bacteria found in deep-sea vent environments. Recent research suggests that the two symbiotic organisms, "Candidatus R. magnifica" and "Candidatus V. okutanii", may share common ancestry with the autonomously living species T. crunogena. We used comparative genomics to examine the genome-wide protein-coding content of all three species to explore their similarities. In particular, we used the OrthoMCL algorithm to sort proteins into groups of putative orthologs on the basis of sequence similarity. The OrthoMCL inflation parameter was tuned using biological criteria. Using the tuned value, OrthoMCL delimited 1070 protein groups. 63.5% of these groups contained one protein from each species. Two groups contained duplicate protein copies from all three species. 123 groups were unique to T. crunogena and ten groups included multiple copies of T. crunogena proteins but only single copies from the other species. "Candidatus R. magnifica" had one unique group, and had multiple copies in one group where the other species had a single copy. There were no groups unique to "Candidatus V. okutanii", and no groups in which there were multiple "Candidatus V. okutanii" proteins but only single proteins from the other species. Results align with previous suggestions that all three species share a common ancestor. However this is not definitive evidence to make taxonomic conclusions and the possibility of horizontal gene transfer was not investigated. Methodologically, the tuning of the OrthoMCL inflation parameter using biological criteria provides further methods to refine the OrthoMCL procedure.
Genome comparison of three serovar 5 pathogenic strains of Haemophilus parasuis: insights into an evolving swine pathogen.

PubMed

Bello-Ortí, Bernardo; Aragon, Virginia; Pina-Pedrero, Sonia; Bensaid, Albert

2014-09-01

Haemophilus parasuis is the causative agent of Glässer's disease, a systemic disorder characterized by polyarthritis, polyserositis and meningitis in pigs. Although it is well known that H. parasuis serovar 5 is the most prevalent serovar associated with the disease, the genetic differences among strains are only now being discovered. Genomes from two serovar 5 strains, SH0165 and 29755, are already available. Here, we present the draft genome of a third H. parasuis serovar 5 strain, the formal serovar 5 reference strain Nagasaki. An in silico genome subtractive analysis with full-length predicted genes of the three H. parasuis serovar 5 strains detected 95, 127 and 95 strain-specific genes (SSGs) for Nagasaki, SH0165 and 29755, respectively. We found that the genomic diversity within these three strains was high, in part because of a high number of mobile elements. Furthermore, a detailed analysis of large sequence polymorphisms (LSPs), encompassing regions ranging from 2 to 16 kb, revealed LSPs in virulence-related elements, such as a Toll-IL receptor, the AcrA multidrug efflux protein, an ATP-binding cassette (ABC) transporter, lipopolysaccharide-synthetizing enzymes and a tripartite ATP-independent periplasmic (TRAP) transporter. The whole-genome codon adaptation index (CAI) was also calculated and revealed values similar to other well-known bacterial pathogens. In addition, whole-genome SNP analysis indicated that nucleotide changes tended to be increased in membrane-related genes. This analysis provides further evidence that the genome of H. parasuis has been subjected to multiple lateral gene transfers (LGTs) and to fine-tuning of virulence factors, and has the potential for accelerated genome evolution. © 2014 The Authors.
BACCardI--a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison.

PubMed

Bartels, Daniela; Kespohl, Sebastian; Albaum, Stefan; Drüke, Tanja; Goesmann, Alexander; Herold, Julia; Kaiser, Olaf; Pühler, Alfred; Pfeiffer, Friedhelm; Raddatz, Günter; Stoye, Jens; Meyer, Folker; Schuster, Stephan C

2005-04-01

We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries.
Association of Cancer Susceptibility Variants with Risk of Multiple Primary Cancers: the Population Architecture using Genomics and Epidemiology Study

PubMed Central

Park, S. Lani; Caberto, Christian P.; Lin, Yi; Goodloe, Robert J.; Dumitrescu, Logan; Love, Shelly-Ann; Matise, Tara C.; Hindorff, Lucia A.; Fowke, Jay H.; Schumacher, Fredrick R.; Beebe-Dimmer, Jennifer; Chen, Chu; Hou, Lifang; Thomas, Fridtjof; Deelman, Ewa; Han, Ying; Peters, Ulrike; North, Kari E.; Heiss, Gerardo; Crawford, Dana C.; Haiman, Christopher A.; Wilkens, Lynne R.; Bush, William S.; Kooperberg, Charles; Cheng, Iona; Le Marchand, Loïc

2014-01-01

Background Multiple primary cancers account for ~16% of all incident cancers in the U.S.. While genome-wide association studies (GWAS) have identified many common genetic variants associated with various cancer sites, no study has examined the association of these genetic variants with risk of multiple primary cancers (MPC). Methods As part of the NHGRI Population Architecture using Genomics and Epidemiology (PAGE) study, we used data from the Multiethnic Cohort and Women’s Health Initiative. Incident MPC (IMPC) cases (n=1,385) were defined as participants diagnosed with >1 incident cancers after cohort entry. Participants diagnosed with only one incident cancer after cohort entry with follow-up equal to or longer than IMPC cases served as controls (single-index cancer controls; n= 9,626). Fixed-effects meta-analyses of unconditional logistic regression analyses were used to evaluate the association between cancer risk variants and IMPC risk. To account for multiple comparisons, we used the false positive report probability (FPRP) to determine statistical significance. Results A nicotine dependence-associated and lung cancer variant, CHRNA3 rs578776 (OR=1.16, 95% CI=1.05–1.26; p=0.004) and two breast cancer variants, EMBP1 rs11249433 and TOX3 rs3803662 (OR=1.16, 95% CI=1.04–1.28; p=0.005 and OR=1.13, 95% CI=1.03–1.23; p=0.006) were significantly associated with risk of IMPC. The associations for rs578776 and rs11249433 remained (p<0.05) after removing subjects who had lung or breast cancers, respectively (p-values≤0.046). These associations did not show significant heterogeneity by smoking status (p-heterogeneity≥0.53). Conclusions Our study has identified rs578776 and rs11249433 as risk variants for IMPC. Impact These findings may help to identify genetic regions associated with IMPC risk. PMID:25139936

Therapeutic Remyelination Strategies in a Novel Model of Multiple Sclerosis: Japanese Macaque Encephalomyelitis

DTIC Science & Technology

2011-05-01

genome was determined and compared to simian and human herpesvirus genomes representing alpha-herpesvi- ruses, beta- herpesviruses and gamma-1 and...of JMRV Genome with Select Simian and Human Herpesvirus Genomes Showing Percent Nucleotide Sequence Identity Virus JMRV RRV KSHV HVS RhLCV EBV RhCMV...2 - Introduction Particular viruses, especially gama- herpesviruses , may act as a trigger of multiple sclerosis (MS) (Levin et
Novel genomic findings in multiple myeloma identified through routine diagnostic sequencing.

PubMed

Ryland, Georgina L; Jones, Kate; Chin, Melody; Markham, John; Aydogan, Elle; Kankanige, Yamuna; Caruso, Marisa; Guinto, Jerick; Dickinson, Michael; Prince, H Miles; Yong, Kwee; Blombery, Piers

2018-05-14

Multiple myeloma is a genomically complex haematological malignancy with many genomic alterations recognised as important in diagnosis, prognosis and therapeutic decision making. Here, we provide a summary of genomic findings identified through routine diagnostic next-generation sequencing at our centre. A cohort of 86 patients with multiple myeloma underwent diagnostic sequencing using a custom hybridisation-based panel targeting 104 genes. Sequence variants, genome-wide copy number changes and structural rearrangements were detected using an inhouse-developed bioinformatics pipeline. At least one mutation was found in 69 (80%) patients. Frequently mutated genes included TP53 (36%), KRAS (22.1%), NRAS (15.1%), FAM46C/DIS3 (8.1%) and TET2/FGFR3 (5.8%), including multiple mutations not previously described in myeloma. Importantly we observed TP53 mutations in the absence of a 17 p deletion in 8% of the cohort, highlighting the need for sequencing-based assessment in addition to cytogenetics to identify these high-risk patients. Multiple novel copy number changes and immunoglobulin heavy chain translocations are also discussed. Our results demonstrate that many clinically relevant genomic findings remain in multiple myeloma which have not yet been identified through large-scale sequencing efforts, and provide important mechanistic insights into plasma cell pathobiology. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
When Whole-Genome Alignments Just Won't Work: kSNP v2 Software for Alignment-Free SNP Discovery and Phylogenetics of Hundreds of Microbial Genomes

PubMed Central

Gardner, Shea N.; Hall, Barry G.

2013-01-01

Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four “raw read” genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths. PMID:24349125
When whole-genome alignments just won't work: kSNP v2 software for alignment-free SNP discovery and phylogenetics of hundreds of microbial genomes.

PubMed

Gardner, Shea N; Hall, Barry G

2013-01-01

Effective use of rapid and inexpensive whole genome sequencing for microbes requires fast, memory efficient bioinformatics tools for sequence comparison. The kSNP v2 software finds single nucleotide polymorphisms (SNPs) in whole genome data. kSNP v2 has numerous improvements over kSNP v1 including SNP gene annotation; better scaling for draft genomes available as assembled contigs or raw, unassembled reads; a tool to identify the optimal value of k; distribution of packages of executables for Linux and Mac OS X for ease of installation and user-friendly use; and a detailed User Guide. SNP discovery is based on k-mer analysis, and requires no multiple sequence alignment or the selection of a single reference genome. Most target sets with hundreds of genomes complete in minutes to hours. SNP phylogenies are built by maximum likelihood, parsimony, and distance, based on all SNPs, only core SNPs, or SNPs present in some intermediate user-specified fraction of targets. The SNP-based trees that result are consistent with known taxonomy. kSNP v2 can handle many gigabases of sequence in a single run, and if one or more annotated genomes are included in the target set, SNPs are annotated with protein coding and other information (UTRs, etc.) from Genbank file(s). We demonstrate application of kSNP v2 on sets of viral and bacterial genomes, and discuss in detail analysis of a set of 68 finished E. coli and Shigella genomes and a set of the same genomes to which have been added 47 assemblies and four "raw read" genomes of H104:H4 strains from the recent European E. coli outbreak that resulted in both bloody diarrhea and hemolytic uremic syndrome (HUS), and caused at least 50 deaths.
A Unified Framework for Association Analysis with Multiple Related Phenotypes

PubMed Central

Stephens, Matthew

2013-01-01

We consider the problem of assessing associations between multiple related outcome variables, and a single explanatory variable of interest. This problem arises in many settings, including genetic association studies, where the explanatory variable is genotype at a genetic variant. We outline a framework for conducting this type of analysis, based on Bayesian model comparison and model averaging for multivariate regressions. This framework unifies several common approaches to this problem, and includes both standard univariate and standard multivariate association tests as special cases. The framework also unifies the problems of testing for associations and explaining associations – that is, identifying which outcome variables are associated with genotype. This provides an alternative to the usual, but conceptually unsatisfying, approach of resorting to univariate tests when explaining and interpreting significant multivariate findings. The method is computationally tractable genome-wide for modest numbers of phenotypes (e.g. 5–10), and can be applied to summary data, without access to raw genotype and phenotype data. We illustrate the methods on both simulated examples, and to a genome-wide association study of blood lipid traits where we identify 18 potential novel genetic associations that were not identified by univariate analyses of the same data. PMID:23861737
Alignment-free inference of hierarchical and reticulate phylogenomic relationships.

PubMed

Bernard, Guillaume; Chan, Cheong Xin; Chan, Yao-Ban; Chua, Xin-Yi; Cong, Yingnan; Hogan, James M; Maetschke, Stefan R; Ragan, Mark A

2017-06-30

We are amidst an ongoing flood of sequence data arising from the application of high-throughput technologies, and a concomitant fundamental revision in our understanding of how genomes evolve individually and within the biosphere. Workflows for phylogenomic inference must accommodate data that are not only much larger than before, but often more error prone and perhaps misassembled, or not assembled in the first place. Moreover, genomes of microbes, viruses and plasmids evolve not only by tree-like descent with modification but also by incorporating stretches of exogenous DNA. Thus, next-generation phylogenomics must address computational scalability while rethinking the nature of orthogroups, the alignment of multiple sequences and the inference and comparison of trees. New phylogenomic workflows have begun to take shape based on so-called alignment-free (AF) approaches. Here, we review the conceptual foundations of AF phylogenetics for the hierarchical (vertical) and reticulate (lateral) components of genome evolution, focusing on methods based on k-mers. We reflect on what seems to be successful, and on where further development is needed. © The Author 2017. Published by Oxford University Press.
The analysis of genomic structures in the L1 family of cell adhesion molecules provides no evidence for exon shuffling events after the separation of arthropod and chordate lineages.

PubMed

Zhao, G; Hortsch, M

1998-07-17

Members of the L1 family of neural cell adhesion molecules consist of multiple extracellular immunoglobulin and fibronectin type III domains that mediate the adhesive properties of this group of transmembrane proteins. In vertebrate genomes, these protein domains are separated by introns, and it has been suggested that L1-type genes might have been subject to exon-shuffling events during evolution. However, comparison of the human L1-CAM and the chicken neurofascin gene with the genomic structure of their Drosophila homologue, neuroglian, indicates that no major rearrangement of protein domains has taken place subsequent to the split of the arthropod and chordate phyla. The Drosophila neuroglian gene appears to have lost most of the introns that have been conserved in the human L1-CAM and the chicken neurofascin gene. Nevertheless, exon shuffling or the generation of new exons by mutational changes might have been responsible for the generation of additional, alternatively spliced exons in L1-type genes.
Development of Real Time PCR Using Novel Genomic Target for Detection of Multiple Salmonella Serovars from Milk and Chickens

USDA-ARS?s Scientific Manuscript database

Background: A highly sensitive and specific novel genomic and plasmid target-based PCR platform was developed to detect multiple Salmonella serovars (S. Heidelberg, S. Dublin, S. Hadar, S. Kentucky and S. Enteritidis). Through extensive genome mining of protein databases of these serovars and compar...
Mycobacterium leprae RecA is structurally analogous but functionally distinct from Mycobacterium tuberculosis RecA protein.

PubMed

Patil, K Neelakanteshwar; Singh, Pawan; Harsha, Sri; Muniyappa, K

2011-12-01

Mycobacterium leprae is closely related to Mycobacterium tuberculosis, yet causes a very different illness. Detailed genomic comparison between these two species of mycobacteria reveals that the decaying M. leprae genome contains less than half of the M. tuberculosis functional genes. The reduction of genome size and accumulation of pseudogenes in the M. leprae genome is thought to result from multiple recombination events between related repetitive sequences, which provided the impetus to investigate the recombination-like activities of RecA protein. In this study, we have cloned, over-expressed and purified M. leprae RecA and compared its activities with that of M. tuberculosis RecA. Both proteins, despite being 91% identical at the amino acid level, exhibit strikingly different binding profiles for single-stranded DNA with varying GC contents, in the ability to catalyze the formation of D-loops and to promote DNA strand exchange. The kinetics and the extent of single-stranded DNA-dependent ATPase and coprotease activities were nearly equivalent between these two recombinases. However, the degree of inhibition exerted by a range of ATP:ADP ratios was greater on strand exchange promoted by M. leprae RecA compared to its M. tuberculosis counterpart. Taken together, our results provide insights into the mechanistic aspects of homologous recombination and coprotease activity promoted by M. lepare RecA, and further suggests that it differs from the M. tuberculosis counterpart. These results are consistent with an emerging concept of DNA-sequence influenced structural differences in RecA nucleoprotein filaments and how these differences reflect on the multiple activities associated with RecA protein. Copyright © 2011 Elsevier B.V. All rights reserved.
Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences

PubMed Central

2013-01-01

Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named “editing boxes”) in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes. PMID:23537002
A statistical method for the detection of variants from next-generation resequencing of DNA pools.

PubMed

Bansal, Vikas

2010-06-15

Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
Pleiotropic and Sex-Specific Effects of Cancer GWAS SNPs on Melanoma Risk in the Population Architecture Using Genomics and Epidemiology (PAGE) Study

PubMed Central

Kocarnik, Jonathan M.; Park, S. Lani; Han, Jiali; Dumitrescu, Logan; Cheng, Iona; Wilkens, Lynne R.; Schumacher, Fredrick R.; Kolonel, Laurence; Carlson, Chris S.; Crawford, Dana C.; Goodloe, Robert J.; Dilks, Holli H.; Baker, Paxton; Richardson, Danielle; Matise, Tara C.; Ambite, José Luis; Song, Fengju; Qureshi, Abrar A.; Zhang, Mingfeng; Duggan, David; Hutter, Carolyn; Hindorff, Lucia; Bush, William S.; Kooperberg, Charles; Le Marchand, Loic; Peters, Ulrike

2015-01-01

Background Several regions of the genome show pleiotropic associations with multiple cancers. We sought to evaluate whether 181 single-nucleotide polymorphisms previously associated with various cancers in genome-wide association studies were also associated with melanoma risk. Methods We evaluated 2,131 melanoma cases and 20,353 controls from three studies in the Population Architecture using Genomics and Epidemiology (PAGE) study (EAGLE-BioVU, MEC, WHI) and two collaborating studies (HPFS, NHS). Overall and sex-stratified analyses were performed across studies. Results We observed statistically significant associations with melanoma for two lung cancer SNPs in the TERT-CLPTM1L locus (Bonferroni-corrected p<2.8x10-4), replicating known pleiotropic effects at this locus. In sex-stratified analyses, we also observed a potential male-specific association between prostate cancer risk variant rs12418451 and melanoma risk (OR=1.22, p=8.0x10-4). No other variants in our study were associated with melanoma after multiple comparisons adjustment (p>2.8e-4). Conclusions We provide confirmatory evidence of pleiotropic associations with melanoma for two SNPs previously associated with lung cancer, and provide suggestive evidence for a male-specific association with melanoma for prostate cancer variant rs12418451. This SNP is located near TPCN2, an ion transport gene containing SNPs which have been previously associated with hair pigmentation but not melanoma risk. Previous evidence provides biological plausibility for this association, and suggests a complex interplay between ion transport, pigmentation, and melanoma risk that may vary by sex. If confirmed, these pleiotropic relationships may help elucidate shared molecular pathways between cancers and related phenotypes. PMID:25789475
UFO: a web server for ultra-fast functional profiling of whole genome protein sequences.

PubMed

Meinicke, Peter

2009-09-02

Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.
Short interspersed elements (SINEs) are a major source of canine genomic diversity.

PubMed

Wang, Wei; Kirkness, Ewen F

2005-12-01

SINEs are retrotransposons that have enjoyed remarkable reproductive success during the course of mammalian evolution, and have played a major role in shaping mammalian genomes. Previously, an analysis of survey-sequence data from an individual dog (a poodle) indicated that canine genomes harbor a high frequency of alleles that differ only by the absence or presence of a SINEC_Cf repeat. Comparison of this survey-sequence data with a draft genome sequence of a distinct dog (a boxer) has confirmed this prediction, and revealed the chromosomal coordinates for >10,000 loci that are bimorphic for SINEC_Cf insertions. Analysis of SINE insertion sites from the genomes of nine additional dogs indicates that 3%-5% are absent from either the poodle or boxer genome sequences--suggesting that an additional 10,000 bimorphic loci could be readily identified in the general dog population. We describe a methodology that can be used to identify these loci, and could be adapted to exploit these bimorphic loci for genotyping purposes. Approximately half of all annotated canine genes contain SINEC_Cf repeats, and these elements are occasionally transcribed. When transcribed in the antisense orientation, they provide splice acceptor sites that can result in incorporation of novel exons. The high frequency of bimorphic SINE insertions in the dog population is predicted to provide numerous examples of allele-specific transcription patterns that will be valuable for the study of differential gene expression among multiple dog breeds.
The streamlined genome of Phytomonas spp. relative to human pathogenic kinetoplastids reveals a parasite tailored for plants.

PubMed

Porcel, Betina M; Denoeud, France; Opperdoes, Fred; Noel, Benjamin; Madoui, Mohammed-Amine; Hammarton, Tansy C; Field, Mark C; Da Silva, Corinne; Couloux, Arnaud; Poulain, Julie; Katinka, Michael; Jabbari, Kamel; Aury, Jean-Marc; Campbell, David A; Cintron, Roxana; Dickens, Nicholas J; Docampo, Roberto; Sturm, Nancy R; Koumandou, V Lila; Fabre, Sandrine; Flegontov, Pavel; Lukeš, Julius; Michaeli, Shulamit; Mottram, Jeremy C; Szöőr, Balázs; Zilberstein, Dan; Bringaud, Frédéric; Wincker, Patrick; Dollet, Michel

2014-02-01

Members of the family Trypanosomatidae infect many organisms, including animals, plants and humans. Plant-infecting trypanosomes are grouped under the single genus Phytomonas, failing to reflect the wide biological and pathological diversity of these protists. While some Phytomonas spp. multiply in the latex of plants, or in fruit or seeds without apparent pathogenicity, others colonize the phloem sap and afflict plants of substantial economic value, including the coffee tree, coconut and oil palms. Plant trypanosomes have not been studied extensively at the genome level, a major gap in understanding and controlling pathogenesis. We describe the genome sequences of two plant trypanosomatids, one pathogenic isolate from a Guianan coconut and one non-symptomatic isolate from Euphorbia collected in France. Although these parasites have extremely distinct pathogenic impacts, very few genes are unique to either, with the vast majority of genes shared by both isolates. Significantly, both Phytomonas spp. genomes consist essentially of single copy genes for the bulk of their metabolic enzymes, whereas other trypanosomatids e.g. Leishmania and Trypanosoma possess multiple paralogous genes or families. Indeed, comparison with other trypanosomatid genomes revealed a highly streamlined genome, encoding for a minimized metabolic system while conserving the major pathways, and with retention of a full complement of endomembrane organelles, but with no evidence for functional complexity. Identification of the metabolic genes of Phytomonas provides opportunities for establishing in vitro culturing of these fastidious parasites and new tools for the control of agricultural plant disease.
The Streamlined Genome of Phytomonas spp. Relative to Human Pathogenic Kinetoplastids Reveals a Parasite Tailored for Plants

PubMed Central

Porcel, Betina M.; Denoeud, France; Opperdoes, Fred; Noel, Benjamin; Madoui, Mohammed-Amine; Hammarton, Tansy C.; Field, Mark C.; Da Silva, Corinne; Couloux, Arnaud; Poulain, Julie; Katinka, Michael; Jabbari, Kamel; Aury, Jean-Marc; Campbell, David A.; Cintron, Roxana; Dickens, Nicholas J.; Docampo, Roberto; Sturm, Nancy R.; Koumandou, V. Lila; Fabre, Sandrine; Flegontov, Pavel; Lukeš, Julius; Michaeli, Shulamit; Mottram, Jeremy C.; Szöőr, Balázs; Zilberstein, Dan; Bringaud, Frédéric; Wincker, Patrick; Dollet, Michel

2014-01-01

Members of the family Trypanosomatidae infect many organisms, including animals, plants and humans. Plant-infecting trypanosomes are grouped under the single genus Phytomonas, failing to reflect the wide biological and pathological diversity of these protists. While some Phytomonas spp. multiply in the latex of plants, or in fruit or seeds without apparent pathogenicity, others colonize the phloem sap and afflict plants of substantial economic value, including the coffee tree, coconut and oil palms. Plant trypanosomes have not been studied extensively at the genome level, a major gap in understanding and controlling pathogenesis. We describe the genome sequences of two plant trypanosomatids, one pathogenic isolate from a Guianan coconut and one non-symptomatic isolate from Euphorbia collected in France. Although these parasites have extremely distinct pathogenic impacts, very few genes are unique to either, with the vast majority of genes shared by both isolates. Significantly, both Phytomonas spp. genomes consist essentially of single copy genes for the bulk of their metabolic enzymes, whereas other trypanosomatids e.g. Leishmania and Trypanosoma possess multiple paralogous genes or families. Indeed, comparison with other trypanosomatid genomes revealed a highly streamlined genome, encoding for a minimized metabolic system while conserving the major pathways, and with retention of a full complement of endomembrane organelles, but with no evidence for functional complexity. Identification of the metabolic genes of Phytomonas provides opportunities for establishing in vitro culturing of these fastidious parasites and new tools for the control of agricultural plant disease. PMID:24516393
Characterization of a Genomic Signature of Pregnancy in the Breast

PubMed Central

Belitskaya-Lévy, Ilana; Zeleniuch-Jacquotte, Anne; Russo, Jose; Russo, Irma H.; Bordás, Pal; Åhman, Janet; Afanasyeva, Yelena; Johansson, Robert; Lenner, Per; Li, Xiaochun; de Cicco, Ricardo López; Peri, Suraj; Ross, Eric; Russo, Patricia A.; Santucci-Pereira, Julia; Sheriff, Fathima S.; Slifker, Michael; Hallmans, Göran; Toniolo, Paolo; Arslan, Alan A.

2012-01-01

The objective of the current study was to comprehensively compare the genomic profiles in the breast of parous and nulliparous postmenopausal women to identify genes that permanently change their expression following pregnancy. The study was designed as a two-phase approach. In the discovery phase, we compared breast genomic profiles of 37 parous with 18 nulliparous postmenopausal women. In the validation phase, confirmation of the genomic patterns observed in the discovery phase was sought in an independent set of 30 parous and 22 nulliparous postmenopausal women. RNA was hybridized to Affymetrix HG_U133 Plus 2.0 oligonucleotide arrays containing probes to 54,675 transcripts; scanned and the images analyzed using Affymetrix GCOS software. Surrogate variable analysis, logistic regression and significance analysis for microarrays were used to identify statistically significant differences in expression of genes. The False Discovery Rate (FDR) approach was used to control for multiple comparisons. We found that 208 genes (305 probe sets) were differentially expressed between parous and nulliparous women in both discovery and validation phases of the study at a FDR of 10% and with at least a 1.25-fold change. These genes are involved in regulation of transcription, centrosome organization, RNA splicing, cell cycle control, adhesion and differentiation. The results provide persuasive evidence that full-term pregnancy induces long-term genomic changes in the breast. The genomic signature of pregnancy could be used as an intermediate marker to assess potential chemopreventive interventions with hormones mimicking the effects of pregnancy for prevention of breast cancer. PMID:21622728
Genomic Data Quality Impacts Automated Detection of Lateral Gene Transfer in Fungi

PubMed Central

Dupont, Pierre-Yves; Cox, Murray P.

2017-01-01

Lateral gene transfer (LGT, also known as horizontal gene transfer), an atypical mechanism of transferring genes between species, has almost become the default explanation for genes that display an unexpected composition or phylogeny. Numerous methods of detecting LGT events all rely on two fundamental strategies: primary structure composition or gene tree/species tree comparisons. Discouragingly, the results of these different approaches rarely coincide. With the wealth of genome data now available, detection of laterally transferred genes is increasingly being attempted in large uncurated eukaryotic datasets. However, detection methods depend greatly on the quality of the underlying genomic data, which are typically complex for eukaryotes. Furthermore, given the automated nature of genomic data collection, it is typically impractical to manually verify all protein or gene models, orthology predictions, and multiple sequence alignments, requiring researchers to accept a substantial margin of error in their datasets. Using a test case comprising plant-associated genomes across the fungal kingdom, this study reveals that composition- and phylogeny-based methods have little statistical power to detect laterally transferred genes. In particular, phylogenetic methods reveal extreme levels of topological variation in fungal gene trees, the vast majority of which show departures from the canonical species tree. Therefore, it is inherently challenging to detect LGT events in typical eukaryotic genomes. This finding is in striking contrast to the large number of claims for laterally transferred genes in eukaryotic species that routinely appear in the literature, and questions how many of these proposed examples are statistically well supported. PMID:28235827
SigTree: A Microbial Community Analysis Tool to Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree.

PubMed

Stevens, John R; Jones, Todd R; Lefevre, Michael; Ganesan, Balasubramanian; Weimer, Bart C

2017-01-01

Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree , a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.
Short template switch events explain mutation clusters in the human genome.

PubMed

Löytynoja, Ari; Goldman, Nick

2017-06-01

Resequencing efforts are uncovering the extent of genetic variation in humans and provide data to study the evolutionary processes shaping our genome. One recurring puzzle in both intra- and inter-species studies is the high frequency of complex mutations comprising multiple nearby base substitutions or insertion-deletions. We devised a generalized mutation model of template switching during replication that extends existing models of genome rearrangement and used this to study the role of template switch events in the origin of short mutation clusters. Applied to the human genome, our model detects thousands of template switch events during the evolution of human and chimp from their common ancestor and hundreds of events between two independently sequenced human genomes. Although many of these are consistent with a template switch mechanism previously proposed for bacteria, our model also identifies new types of mutations that create short inversions, some flanked by paired inverted repeats. The local template switch process can create numerous complex mutation patterns, including hairpin loop structures, and explains multinucleotide mutations and compensatory substitutions without invoking positive selection, speculative mechanisms, or implausible coincidence. Clustered sequence differences are challenging for current mapping and variant calling methods, and we show that many erroneous variant annotations exist in human reference data. Local template switch events may have been neglected as an explanation for complex mutations because of biases in commonly used analyses. Incorporation of our model into reference-based analysis pipelines and comparisons of de novo assembled genomes will lead to improved understanding of genome variation and evolution. © 2017 Löytynoja and Goldman; Published by Cold Spring Harbor Laboratory Press.

xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud[OPEN

PubMed Central

Merchant, Nirav

2016-01-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today’s pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant’s Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. PMID:27020957
StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.

PubMed

Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh

2016-01-01

The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.
xGDBvm: A Web GUI-Driven Workflow for Annotating Eukaryotic Genomes in the Cloud.

PubMed

Duvick, Jon; Standage, Daniel S; Merchant, Nirav; Brendel, Volker P

2016-04-01

Genome-wide annotation of gene structure requires the integration of numerous computational steps. Currently, annotation is arguably best accomplished through collaboration of bioinformatics and domain experts, with broad community involvement. However, such a collaborative approach is not scalable at today's pace of sequence generation. To address this problem, we developed the xGDBvm software, which uses an intuitive graphical user interface to access a number of common genome analysis and gene structure tools, preconfigured in a self-contained virtual machine image. Once their virtual machine instance is deployed through iPlant's Atmosphere cloud services, users access the xGDBvm workflow via a unified Web interface to manage inputs, set program parameters, configure links to high-performance computing (HPC) resources, view and manage output, apply analysis and editing tools, or access contextual help. The xGDBvm workflow will mask the genome, compute spliced alignments from transcript and/or protein inputs (locally or on a remote HPC cluster), predict gene structures and gene structure quality, and display output in a public or private genome browser complete with accessory tools. Problematic gene predictions are flagged and can be reannotated using the integrated yrGATE annotation tool. xGDBvm can also be configured to append or replace existing data or load precomputed data. Multiple genomes can be annotated and displayed, and outputs can be archived for sharing or backup. xGDBvm can be adapted to a variety of use cases including de novo genome annotation, reannotation, comparison of different annotations, and training or teaching. © 2016 American Society of Plant Biologists. All rights reserved.
Synteny of Prunus and other model plant species

PubMed Central

Jung, Sook; Jiwan, Derick; Cho, Ilhyung; Lee, Taein; Abbott, Albert; Sosinski, Bryon; Main, Dorrie

2009-01-01

Background Fragmentary conservation of synteny has been reported between map-anchored Prunus sequences and Arabidopsis. With the availability of genome sequence for fellow rosid I members Populus and Medicago, we analyzed the synteny between Prunus and the three model genomes. Eight Prunus BAC sequences and map-anchored Prunus sequences were used in the comparison. Results We found a well conserved synteny across the Prunus species – peach, plum, and apricot – and Populus using a set of homologous Prunus BACs. Conversely, we could not detect any synteny with Arabidopsis in this region. Other peach BACs also showed extensive synteny with Populus. The syntenic regions detected were up to 477 kb in Populus. Two syntenic regions between Arabidopsis and these BACs were much shorter, around 10 kb. We also found syntenic regions that are conserved between the Prunus BACs and Medicago. The array of synteny corresponded with the proposed whole genome duplication events in Populus and Medicago. Using map-anchored Prunus sequences, we detected many syntenic blocks with several gene pairs between Prunus and Populus or Arabidopsis. We observed a more complex network of synteny between Prunus-Arabidopsis, indicative of multiple genome duplication and subsequence gene loss in Arabidopsis. Conclusion Our result shows the striking microsynteny between the Prunus BACs and the genome of Populus and Medicago. In macrosynteny analysis, more distinct Prunus regions were syntenic to Populus than to Arabidopsis. PMID:19208249
Retroposed SNOfall--a mammalian-wide comparison of platypus snoRNAs.

PubMed

Schmitz, Jürgen; Zemann, Anja; Churakov, Gennady; Kuhl, Heiner; Grützner, Frank; Reinhardt, Richard; Brosius, Jürgen

2008-06-01

Diversification of mammalian species began more than 160 million years ago when the egg-laying monotremes diverged from live bearing mammals. The duck-billed platypus (Ornithorhynchus anatinus) and echidnas are the only potential contemporary witnesses of this period and, thereby, provide a unique insight into mammalian genome evolution. It has become clear that small RNAs are major regulatory agents in eukaryotic cells, and the significant role of non-protein-coding (npc) RNAs in transcription, processing, and translation is now well accepted. Here we show that the platypus genome contains more than 200 small nucleolar (sno) RNAs among hundreds of other diverse npcRNAs. Their comparison among key mammalian groups and other vertebrates enabled us to reconstruct a complete temporal pathway of acquisition and loss of these snoRNAs. In platypus we found cis- and trans-duplication distribution patterns for snoRNAs, which have not been described in any other vertebrates but are known to occur in nematodes. An exciting novelty in platypus is a snoRNA-derived retroposon (termed snoRTE) that facilitates a very effective dispersal of an H/ACA snoRNA via RTE-mediated retroposition. From more than 40,000 detected full-length and truncated genomic copies of this snoRTE, at least 21 are processed into mature snoRNAs. High-copy retroposition via multiple host gene-promoted transcription units is a novel pathway for combining housekeeping function and SINE-like dispersal and reveals a new dimension in the evolution of novel snoRNA function.
Retroposed SNOfall—A mammalian-wide comparison of platypus snoRNAs

PubMed Central

Schmitz, Jürgen; Zemann, Anja; Churakov, Gennady; Kuhl, Heiner; Grützner, Frank; Reinhardt, Richard; Brosius, Jürgen

2008-01-01

Diversification of mammalian species began more than 160 million years ago when the egg-laying monotremes diverged from live bearing mammals. The duck-billed platypus (Ornithorhynchus anatinus) and echidnas are the only potential contemporary witnesses of this period and, thereby, provide a unique insight into mammalian genome evolution. It has become clear that small RNAs are major regulatory agents in eukaryotic cells, and the significant role of non-protein-coding (npc) RNAs in transcription, processing, and translation is now well accepted. Here we show that the platypus genome contains more than 200 small nucleolar (sno) RNAs among hundreds of other diverse npcRNAs. Their comparison among key mammalian groups and other vertebrates enabled us to reconstruct a complete temporal pathway of acquisition and loss of these snoRNAs. In platypus we found cis- and trans-duplication distribution patterns for snoRNAs, which have not been described in any other vertebrates but are known to occur in nematodes. An exciting novelty in platypus is a snoRNA-derived retroposon (termed snoRTE) that facilitates a very effective dispersal of an H/ACA snoRNA via RTE-mediated retroposition. From more than 40,000 detected full-length and truncated genomic copies of this snoRTE, at least 21 are processed into mature snoRNAs. High-copy retroposition via multiple host gene-promoted transcription units is a novel pathway for combining housekeeping function and SINE-like dispersal and reveals a new dimension in the evolution of novel snoRNA function. PMID:18463303
Genome-wide association study of the four-constitution medicine.

PubMed

Yin, Chang Shik; Park, Hi Joon; Chung, Joo-Ho; Lee, Hye-Jung; Lee, Byung-Cheol

2009-12-01

Four-constitution medicine (FCM), also known as Sasang constitutional medicine, and the heritage of the long history of individualized acupuncture medicine tradition, is one of the holistic and traditional systems of constitution to appraise and categorize individual differences into four major types. This study first reports a genome-wide association study on FCM, to explore the genetic basis of FCM and facilitate the integration of FCM with conventional individual differences research. Healthy individuals of the Korean population were classified into the four constitutional types (FCTs). A total of 353,202 single nucleotide polymorphisms (SNPs) were typed using whole genome amplified samples, and six-way comparison of FCM types provided lists of significantly differential SNPs. In one-to-one FCT comparisons, 15,944 SNPs were significantly differential, and 5 SNPs were commonly significant in all of the three comparisons. In one-to-two FCT comparisons, 22,616 SNPs were significantly differential, and 20 SNPs were commonly significant in all of the three comparison groups. This study presents the association between genome-wide SNP profiles and the categorization of the FCM, and it could further provide a starting point of genome-based identification and research of the constitutions of FCM.
Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy.

PubMed

Jung, Ki-Hong; Dardick, Christopher; Bartley, Laura E; Cao, Peijian; Phetsom, Jirapa; Canlas, Patrick; Seo, Young-Su; Shultz, Michael; Ouyang, Shu; Yuan, Qiaoping; Frank, Bryan C; Ly, Eugene; Zheng, Li; Jia, Yi; Hsia, An-Ping; An, Kyungsook; Chou, Hui-Hsien; Rocke, David; Lee, Geun Cheol; Schnable, Patrick S; An, Gynheung; Buell, C Robin; Ronald, Pamela C

2008-10-06

Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.
Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

DOE Office of Scientific and Technical Information (OSTI.GOV)

van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas

Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share ≤ 8,142 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequencedmore » eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the plastid, have been selectively retained in some plants and algae, implying a biological function. As a result, our studies provide robust genomic resources for emerging model algae, advancing knowledge of marine phytoplankton and plant evolution.« less
Evidence-based green algal genomics reveals marine diversity and ancestral characteristics of land plants

DOE PAGES

van Baren, Marijke J.; Bachy, Charles; Reistetter, Emily Nahas; ...

2016-03-31

Prasinophytes are widespread marine green algae that are related to plants. Abundance of the genus Micromonas has reportedly increased in the Arctic due to climate-induced changes. Thus, studies of these organisms are important for marine ecology and understanding Virdiplantae evolution and diversification. We generated evidence-based Micromonas gene models using proteomics and RNA-Seq to improve prasinophyte genomic resources. First, sequences of four chromosomes in the 22 Mb Micromonas pusilla (CCMP1545) genome were finished. Comparison with the finished 21 Mb Micromonas commoda (RCC299) shows they share ≤ 8,142 of ~10,000 protein-encoding genes, depending on the analysis method. Unlike RCC299 and other sequencedmore » eukaryotes, CCMP1545 has two abundant repetitive intron types and a high percent (26%) GC splice donors. Micromonas has more genus-specific protein families (19%) than other genome sequenced prasinophytes (11%). Comparative analyses using predicted proteomes from other prasinophytes reveal proteins likely related to scale formation and ancestral photosynthesis. Our studies also indicate that peptidoglycan (PG) biosynthesis enzymes have been lost in multiple independent events in select prasinophytes and most plants. However, CCMP1545, polar Micromonas CCMP2099 and prasinophytes from other claasses retain the entire PG pathway, like moss and glaucophyte algae. Multiple vascular plants that share a unique bi-domain protein also have the pathway, except the Penicillin-Binding-Protein. Alongside Micromonas experiments using antibiotics that halt bacterial PG biosynthesis, the findings highlight unrecognized phylogenetic complexity in the PG-pathway retention and implicate a role in chloroplast structure of division in several extant Vridiplantae lineages. Extensive differences in gene loss and architecture between related prasinophytes underscore their extensive divergence. PG biosynthesis genes from the cyanobacterial endosymbiont that became the plastid, have been selectively retained in some plants and algae, implying a biological function. As a result, our studies provide robust genomic resources for emerging model algae, advancing knowledge of marine phytoplankton and plant evolution.« less
Variation in Recombination Rate and Its Genetic Determinism in Sheep Populations

PubMed Central

Petit, Morgane; Astruc, Jean-Michel; Sarry, Julien; Drouilhet, Laurence; Fabre, Stéphane; Moreno, Carole R.; Servin, Bertrand

2017-01-01

Recombination is a complex biological process that results from a cascade of multiple events during meiosis. Understanding the genetic determinism of recombination can help to understand if and how these events are interacting. To tackle this question, we studied the patterns of recombination in sheep, using multiple approaches and data sets. We constructed male recombination maps in a dairy breed from the south of France (the Lacaune breed) at a fine scale by combining meiotic recombination rates from a large pedigree genotyped with a 50K SNP array and historical recombination rates from a sample of unrelated individuals genotyped with a 600K SNP array. This analysis revealed recombination patterns in sheep similar to other mammals but also genome regions that have likely been affected by directional and diversifying selection. We estimated the average recombination rate of Lacaune sheep at 1.5 cM/Mb, identified ∼50,000 crossover hotspots on the genome, and found a high correlation between historical and meiotic recombination rate estimates. A genome-wide association study revealed two major loci affecting interindividual variation in recombination rate in Lacaune, including the RNF212 and HEI10 genes and possibly two other loci of smaller effects including the KCNJ15 and FSHR genes. The comparison of these new results to those obtained previously in a distantly related population of domestic sheep (the Soay) revealed that Soay and Lacaune males have a very similar distribution of recombination along the genome. The two data sets were thus combined to create more precise male meiotic recombination maps in Sheep. However, despite their similar recombination maps, Soay and Lacaune males were found to exhibit different heritabilities and QTL effects for interindividual variation in genome-wide recombination rates. This highlights the robustness of recombination patterns to underlying variation in their genetic determinism. PMID:28978774
Variation in Recombination Rate and Its Genetic Determinism in Sheep Populations.

PubMed

Petit, Morgane; Astruc, Jean-Michel; Sarry, Julien; Drouilhet, Laurence; Fabre, Stéphane; Moreno, Carole R; Servin, Bertrand

2017-10-01

Recombination is a complex biological process that results from a cascade of multiple events during meiosis. Understanding the genetic determinism of recombination can help to understand if and how these events are interacting. To tackle this question, we studied the patterns of recombination in sheep, using multiple approaches and data sets. We constructed male recombination maps in a dairy breed from the south of France (the Lacaune breed) at a fine scale by combining meiotic recombination rates from a large pedigree genotyped with a 50K SNP array and historical recombination rates from a sample of unrelated individuals genotyped with a 600K SNP array. This analysis revealed recombination patterns in sheep similar to other mammals but also genome regions that have likely been affected by directional and diversifying selection. We estimated the average recombination rate of Lacaune sheep at 1.5 cM/Mb, identified ∼50,000 crossover hotspots on the genome, and found a high correlation between historical and meiotic recombination rate estimates. A genome-wide association study revealed two major loci affecting interindividual variation in recombination rate in Lacaune, including the RNF212 and HEI10 genes and possibly two other loci of smaller effects including the KCNJ15 and FSHR genes. The comparison of these new results to those obtained previously in a distantly related population of domestic sheep (the Soay) revealed that Soay and Lacaune males have a very similar distribution of recombination along the genome. The two data sets were thus combined to create more precise male meiotic recombination maps in Sheep. However, despite their similar recombination maps, Soay and Lacaune males were found to exhibit different heritabilities and QTL effects for interindividual variation in genome-wide recombination rates. This highlights the robustness of recombination patterns to underlying variation in their genetic determinism. Copyright © 2017 by the Genetics Society of America.
Annotation of the Clostridium Acetobutylicum Genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daly, M. J.

The genome sequence of the solvent producing bacterium Clostridium acetobutylicum ATCC824, has been determined by the shotgun approach. The genome consists of a 3.94 Mb chromosome and a 192 kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases, closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria.
Genome-wide comparisons of phylogenetic similarities between partial genomic regions and the full-length genome in Hepatitis E virus genotyping.

PubMed

Wang, Shuai; Wei, Wei; Luo, Xuenong; Cai, Xuepeng

2014-01-01

Besides the complete genome, different partial genomic sequences of Hepatitis E virus (HEV) have been used in genotyping studies, making it difficult to compare the results based on them. No commonly agreed partial region for HEV genotyping has been determined. In this study, we used a statistical method to evaluate the phylogenetic performance of each partial genomic sequence from a genome wide, by comparisons of evolutionary distances between genomic regions and the full-length genomes of 101 HEV isolates to identify short genomic regions that can reproduce HEV genotype assignments based on full-length genomes. Several genomic regions, especially one genomic region at the 3'-terminal of the papain-like cysteine protease domain, were detected to have relatively high phylogenetic correlations with the full-length genome. Phylogenetic analyses confirmed the identical performances between these regions and the full-length genome in genotyping, in which the HEV isolates involved could be divided into reasonable genotypes. This analysis may be of value in developing a partial sequence-based consensus classification of HEV species.
Draft genome sequence of non-shiga toxin-producing Escherichia coli O157 NCCP15738.

PubMed

Kwon, Taesoo; Kim, Jung-Beom; Bak, Young-Seok; Yu, Young-Bin; Kwon, Ki Sung; Kim, Won; Cho, Seung-Hak

2016-01-01

The non-shiga toxin-producing Escherichia coli (non-STEC) O157 is a pathogenic strain that cause diarrhea but does not cause hemolytic-uremic syndrome, or hemorrhagic colitis. Here, we present the 5-Mb draft genome sequence of non-STEC O157 NCCP15738, which was isolated from the feces of a Korean patient with diarrhea, and describe its features and the structural basis for its genome evolution. A total of 565-Mbp paired-end reads were generated using the Illumina-HiSeq 2000 platform. The reads were assembled into 135 scaffolds throughout the de novo assembly. The assembled genome size of NCCP15738 was 5,005,278 bp with an N50 value of 142,450 bp and 50.65 % G+C content. Using Rapid Annotation using Subsystem Technology analysis, we predicted 4780 ORFs and 31 RNA genes. The evolutionary tree was inferred from multiple sequence alignment of 45 E. coli species. The most closely related neighbor of NCCP15738 indicated by whole-genome phylogeny was E. coli UMNK88, but that indicated by multilocus sequence analysis was E. coli DH1(ME8569). A comparison between the NCCP15738 genome and those of reference strains, E. coli K-12 substr. MG1655 and EHEC O157:H7 EDL933 by bioinformatics analyses revealed unique genes in NCCP15738 associated with lysis protein S, two-component signal transduction system, conjugation, the flagellum, nucleotide-binding proteins, and metal-ion binding proteins. Notably, NCCP15738 has a dual flagella system like that in Vibrio parahaemolyticus, Aeromonas spp., and Rhodospirillum centenum. The draft genome sequence and the results of bioinformatics analysis of NCCP15738 provide the basis for understanding the genomic evolution of this strain.
Comparative genomic analysis of multiple strains of two unusual plant pathogens: Pseudomonas corrugata and Pseudomonas mediterranea

PubMed Central

Trantas, Emmanouil A.; Licciardello, Grazia; Almeida, Nalvo F.; Witek, Kamil; Strano, Cinzia P.; Duxbury, Zane; Ververidis, Filippos; Goumas, Dimitrios E.; Jones, Jonathan D. G.; Guttman, David S.; Catara, Vittoria; Sarris, Panagiotis F.

2015-01-01

The non-fluorescent pseudomonads, Pseudomonas corrugata (Pcor) and P. mediterranea (Pmed), are closely related species that cause pith necrosis, a disease of tomato that causes severe crop losses. However, they also show strong antagonistic effects against economically important pathogens, demonstrating their potential for utilization as biological control agents. In addition, their metabolic versatility makes them attractive for the production of commercial biomolecules and bioremediation. An extensive comparative genomics study is required to dissect the mechanisms that Pcor and Pmed employ to cause disease, prevent disease caused by other pathogens, and to mine their genomes for genes that encode proteins involved in commercially important chemical pathways. Here, we present the draft genomes of nine Pcor and Pmed strains from different geographical locations. This analysis covered significant genetic heterogeneity and allowed in-depth genomic comparison. All examined strains were able to trigger symptoms in tomato plants but not all induced a hypersensitive-like response in Nicotiana benthamiana. Genome-mining revealed the absence of type III secretion system and known type III effector-encoding genes from all examined Pcor and Pmed strains. The lack of a type III secretion system appears to be unique among the plant pathogenic pseudomonads. Several gene clusters coding for type VI secretion system were detected in all genomes. Genome-mining also revealed the presence of gene clusters for biosynthesis of siderophores, polyketides, non-ribosomal peptides, and hydrogen cyanide. A highly conserved quorum sensing system was detected in all strains, although species specific differences were observed. Our study provides the basis for in-depth investigations regarding the molecular mechanisms underlying virulence strategies in the battle between plants and microbes. PMID:26300874
The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes.

PubMed

Angly, Florent E; Willner, Dana; Prieto-Davó, Alejandra; Edwards, Robert A; Schmieder, Robert; Vega-Thurber, Rebecca; Antonopoulos, Dionysios A; Barott, Katie; Cottrell, Matthew T; Desnues, Christelle; Dinsdale, Elizabeth A; Furlan, Mike; Haynes, Matthew; Henn, Matthew R; Hu, Yongfei; Kirchman, David L; McDole, Tracey; McPherson, John D; Meyer, Folker; Miller, R Michael; Mundt, Egbert; Naviaux, Robert K; Rodriguez-Mueller, Beltran; Stevens, Rick; Wegley, Linda; Zhang, Lixin; Zhu, Baoli; Rohwer, Forest

2009-12-01

Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.
Environmental genomics of "Haloquadratum walsbyi" in a saltern crystallizer indicates a large pool of accessory genes in an otherwise coherent species

PubMed Central

Legault, Boris A; Lopez-Lopez, Arantxa; Alba-Casado, Jose Carlos; Doolittle, W Ford; Bolhuis, Henk; Rodriguez-Valera, Francisco; Papke, R Thane

2006-01-01

Background Mature saturated brine (crystallizers) communities are largely dominated (>80% of cells) by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species) even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities. PMID:16820057
Conservation and loss of ribosomal RNA gene sites in diploid and polyploid Fragaria (Rosaceae)

PubMed Central

2011-01-01

Background The genus Fragaria comprises species at ploidy levels ranging from diploid (2n = 2x = 14) to decaploid (2n = 10x = 70). Fluorescence in situ hybridization with 5S and 25S rDNA probes was performed to gather cytogenetic information that illuminates genomic divergence among different taxa at multiple ploidy levels, as well as to explore the evolution of ribosomal RNA genes during polyploidization in Fragaria. Results Root tip cells of diploid taxa were typified by two 5S and six 25S rDNA hybridization signals of varying intensities, providing a baseline for comparisons within the genus. In three exceptional diploid genotypes, F. nilgerrensis (CFRA 1358 and CFRA 1825) and F. vesca 'Yellow Wonder', two 5S but only four 25S rDNA sites were found but with differing site losses. The numbers of 5S and 25S rDNA signals, respectively were three and nine in a triploid F. ×bifera accession, and were four and twelve in three tetraploids, thus occurring in proportional 1.5× and 2× multiples of the typical diploid pattern. In hexaploid F. moschata, a proportional multiple of six 5S rDNA sites was observed, but the number of 25S rDNA sites was one or two less than the proportionate prediction of eighteen. This apparent tendency toward rDNA site loss at higher ploidy was markedly expanded in octoploids, which displayed only two 5S and ten 25S rDNA sites. In the two decaploids examined, the numbers of 5S and 25S rDNA signals, respectively, were four and fifteen in F. virginiana subsp. platypetala, and six and twelve in F. iturupensis. Conclusions Among diploid Fragaria species, a general consistency of rDNA site numbers implies conserved genomic organization, but highly variable 25S signal sizes and intensities and two instances of site loss suggest concurrent high dynamics of rDNA copy numbers among both homologs and non-homologs. General conservation of rDNA site numbers in lower ploidy, but marked site number reductions at higher ploidy levels, suggest complex evolution of rDNA sites during polyploidization and/or independent evolutionary pathways for 6x versus higher ploidy strawberries. Site number comparisons suggest common genomic composition among natural octoploids, and independent origins of the two divergent decaploid accessions. PMID:22074487
Comparison of de novo assembly statistics of Cucumis sativus L.

NASA Astrophysics Data System (ADS)

Wojcieszek, Michał; Kuśmirek, Wiktor; Pawełkowicz, Magdalena; PlÄ der, Wojciech; Nowak, Robert M.

2017-08-01

Genome sequencing is the core of genomic research. With the development of NGS and lowering the cost of procedure there is another tight gap - genome assembly. Developing the proper tool for this task is essential as quality of genome has important impact on further research. Here we present comparison of several de Bruijn assemblers tested on C. sativus genomic reads. The assessment shows that newly developed software - dnaasm provides better results in terms of quantity and quality. The number of generated sequences is lower by 5 - 33% with even two fold higher N50. Quality check showed reliable results were generated by dnaasm. This provides us with very strong base for future genomic analysis.

PrimerDesign-M: A multiple-alignment based multiple-primer design tool for walking across variable genomes

DOE PAGES

Yoon, Hyejin; Leitner, Thomas

2014-12-17

Analyses of entire viral genomes or mtDNA requires comprehensive design of many primers across their genomes. In addition, simultaneous optimization of several DNA primer design criteria may improve overall experimental efficiency and downstream bioinformatic processing. To achieve these goals, we developed PrimerDesign-M. It includes several options for multiple-primer design, allowing researchers to efficiently design walking primers that cover long DNA targets, such as entire HIV-1 genomes, and that optimizes primers simultaneously informed by genetic diversity in multiple alignments and experimental design constraints given by the user. PrimerDesign-M can also design primers that include DNA barcodes and minimize primer dimerization. PrimerDesign-Mmore » finds optimal primers for highly variable DNA targets and facilitates design flexibility by suggesting alternative designs to adapt to experimental conditions.« less
Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method

PubMed Central

Burger, Lukas; van Nimwegen, Erik

2008-01-01

Accurate and large-scale prediction of protein–protein interactions directly from amino-acid sequences is one of the great challenges in computational biology. Here we present a new Bayesian network method that predicts interaction partners using only multiple alignments of amino-acid sequences of interacting protein domains, without tunable parameters, and without the need for any training examples. We first apply the method to bacterial two-component systems and comprehensively reconstruct two-component signaling networks across all sequenced bacteria. Comparisons of our predictions with known interactions show that our method infers interaction partners genome-wide with high accuracy. To demonstrate the general applicability of our method we show that it also accurately predicts interaction partners in a recent dataset of polyketide synthases. Analysis of the predicted genome-wide two-component signaling networks shows that cognates (interacting kinase/regulator pairs, which lie adjacent on the genome) and orphans (which lie isolated) form two relatively independent components of the signaling network in each genome. In addition, while most genes are predicted to have only a small number of interaction partners, we find that 10% of orphans form a separate class of ‘hub' nodes that distribute and integrate signals to and from up to tens of different interaction partners. PMID:18277381
Insights from Human/Mouse genome comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pennacchio, Len A.

2003-03-30

Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less
Phylogenomic Insights into Mouse Evolution Using a Pseudoreference Approach

PubMed Central

Sarver, Brice A.J.; Keeble, Sara; Cosart, Ted; Tucker, Priscilla K.; Dean, Matthew D.

2017-01-01

Comparative genomic studies are now possible across a broad range of evolutionary timescales, but the generation and analysis of genomic data across many different species still present a number of challenges. The most sophisticated genotyping and down-stream analytical frameworks are still predominantly based on comparisons to high-quality reference genomes. However, established genomic resources are often limited within a given group of species, necessitating comparisons to divergent reference genomes that could restrict or bias comparisons across a phylogenetic sample. Here, we develop a scalable pseudoreference approach to iteratively incorporate sample-specific variation into a genome reference and reduce the effects of systematic mapping bias in downstream analyses. To characterize this framework, we used targeted capture to sequence whole exomes (∼54 Mbp) in 12 lineages (ten species) of mice spanning the Mus radiation. We generated whole exome pseudoreferences for all species and show that this iterative reference-based approach improved basic genomic analyses that depend on mapping accuracy while preserving the associated annotations of the mouse reference genome. We then use these pseudoreferences to resolve evolutionary relationships among these lineages while accounting for phylogenetic discordance across the genome, contributing an important resource for comparative studies in the mouse system. We also describe patterns of genomic introgression among lineages and compare our results to previous studies. Our general approach can be applied to whole or partitioned genomic data and is easily portable to any system with sufficient genomic resources, providing a useful framework for phylogenomic studies in mice and other taxa. PMID:28338821
Low-pass sequencing for microbial comparative genomics

PubMed Central

Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

2004-01-01

Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
Syntenic block overlap multiplicities with a panel of reference genomes provide a signature of ancient polyploidization events.

PubMed

Zheng, Chunfang; Santos Muñoz, Daniella; Albert, Victor A; Sankoff, David

2015-01-01

Following whole genome duplication (WGD), there is a compact distribution of gene similarities within the genome reflecting duplicate pairs of all the genes in the genome. With time, the distribution broadens and loses volume due to variable decay of duplicate gene similarity and to the process of duplicate gene loss. If there are two WGD, the older one becomes so reduced and broad that it merges with the tail of the distributions resulting from more recent events, and it becomes difficult to distinguish them. The goal of this paper is to advance statistical methods of identifying, or at least counting, the WGD events in the lineage of a given genome. For a set of 15 angiosperm genomes, we analyze all 15 × 14 = 210 ordered pairs of target genome versus reference genome, using SynMap to find syntenic blocks. We consider all sets of B ≥ 2 syntenic blocks in the target genome that overlap in the reference genome as evidence of WGD activity in the target, whether it be one event or several. We hypothesize that in fitting an exponential function to the tail of the empirical distribution f (B) of block multiplicities, the size of the exponent will reflect the amount of WGD in the history of the target genome. By amalgamating the results from all reference genomes, a range of values of SynMap parameters, and alternative cutoff points for the tail, we find a clear pattern whereby multiple-WGD core eudicots have the smallest (negative) exponents, followed by core eudicots with only the single "γ" triplication in their history, followed by a non-core eudicot with a single WGD, followed by the monocots, with a basal angiosperm, the WGD-free Amborella having the largest exponent. The hypothesis that the exponent of the fit to the tail of the multiplicity distribution is a signature of the amount of WGD is verified, but there is also a clear complicating factor in the monocot clade, where a history of multiple WGD is not reflected in a small exponent.
Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

PubMed

Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

2009-01-01

Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.
TreeQ-VISTA: An Interactive Tree Visualization Tool withFunctional Annotation Query Capabilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gu, Shengyin; Anderson, Iain; Kunin, Victor

2007-05-07

Summary: We describe a general multiplatform exploratorytool called TreeQ-Vista, designed for presenting functional annotationsin a phylogenetic context. Traits, such as phenotypic and genomicproperties, are interactively queried from a relational database with auser-friendly interface which provides a set of tools for users with orwithout SQL knowledge. The query results are projected onto aphylogenetic tree and can be displayed in multiple color groups. A richset of browsing, grouping and query tools are provided to facilitatetrait exploration, comparison and analysis.Availability: The program,detailed tutorial and examples are available online athttp://genome-test.lbl.gov/vista/TreeQVista.
What can comparative genomics tell us about species concepts in the genus Aspergillus?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rokas, Antonis; payne, gary; Federova, Natalie D.

2007-12-15

Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A.more » flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.« less
Structural and transcriptional analysis of plant genes encoding the bifunctional lysine ketoglutarate reductase saccharopine dehydrogenase enzyme.

PubMed

Anderson, Olin D; Coleman-Derr, Devin; Gu, Yong Q; Heath, Sekou

2010-06-16

Among the dietary essential amino acids, the most severely limiting in the cereals is lysine. Since cereals make up half of the human diet, lysine limitation has quality/nutritional consequences. The breakdown of lysine is controlled mainly by the catabolic bifunctional enzyme lysine ketoglutarate reductase - saccharopine dehydrogenase (LKR/SDH). The LKR/SDH gene has been reported to produce transcripts for the bifunctional enzyme and separate monofunctional transcripts. In addition to lysine metabolism, this gene has been implicated in a number of metabolic and developmental pathways, which along with its production of multiple transcript types and complex exon/intron structure suggest an important node in plant metabolism. Understanding more about the LKR/SDH gene is thus interesting both from applied standpoint and for basic plant metabolism. The current report describes a wheat genomic fragment containing an LKR/SDH gene and adjacent genes. The wheat LKR/SDH genomic segment was found to originate from the A-genome of wheat, and EST analysis indicates all three LKR/SDH genes in hexaploid wheat are transcriptionally active. A comparison of a set of plant LKR/SDH genes suggests regions of greater sequence conservation likely related to critical enzymatic functions and metabolic controls. Although most plants contain only a single LKR/SDH gene per genome, poplar contains at least two functional bifunctional genes in addition to a monofunctional LKR gene. Analysis of ESTs finds evidence for monofunctional LKR transcripts in switchgrass, and monofunctional SDH transcripts in wheat, Brachypodium, and poplar. The analysis of a wheat LKR/SDH gene and comparative structural and functional analyses among available plant genes provides new information on this important gene. Both the structure of the LKR/SDH gene and the immediately adjacent genes show lineage-specific differences between monocots and dicots, and findings suggest variation in activity of LKR/SDH genes among plants. Although most plant genomes seem to contain a single conserved LKR/SDH gene per genome, poplar possesses multiple contiguous genes. A preponderance of SDH transcripts suggests the LKR region may be more rate-limiting. Only switchgrass has EST evidence for LKR monofunctional transcripts. Evidence for monofunctional SDH transcripts shows a novel intron in wheat, Brachypodium, and poplar.
Genome analysis following a national increase in Scarlet Fever in England 2014.

PubMed

Chalker, Victoria; Jironkin, Aleksey; Coelho, Juliana; Al-Shahib, Ali; Platt, Steve; Kapatai, Georgia; Daniel, Roger; Dhami, Chenchal; Laranjeira, Marisa; Chambers, Timothy; Guy, Rebecca; Lamagni, Theresa; Harrison, Timothy; Chand, Meera; Johnson, Alan P; Underwood, Anthony

2017-03-10

During a substantial elevation in scarlet fever (SF) notifications in 2014 a national genomic study was undertaken of Streptococcus pyogenes (Group A Streptococci, GAS) isolates from patients with SF with comparison to isolates from patients with invasive disease (iGAS) to test the hypotheses that the increase in SF was due to either the introduction of one or more new/emerging strains in the population in England or the transmission of a known genetic element through the population of GAS by horizontal gene transfer (HGT) resulting in infections with an increased likelihood of causing SF. Isolates were collected to provide geographical representation, for approximately 5% SF isolates from each region from 1 st April 2014 to 18 th June 2014. Contemporaneous iGAS isolates for which genomic data were available were included for comparison. Data were analysed in order to determine emm gene sequence type, phylogenetic lineage and genomic clade representation, the presence of known prophage elements and the presence of genes known to confer pathogenicity and resistance to antibiotics. 555 isolates were analysed, 303 from patients with SF and 252 from patients with iGAS. Isolates from patients with SF were of multiple distinct emm sequence types and phylogenetic lineages. Prior to data normalisation, emm3 was the predominant type (accounting for 42.9% of SF isolates, 130/303 95%CI 37.5-48.5; 14.7% higher than the percentage of emm3 isolates found in the iGAS isolates). Post-normalisation emm types, 4 and 12, were found to be over-represented in patients with SF versus iGAS (p < 0.001). A single gene, ssa, was over-represented in isolates from patients with SF. No single phage was found to be over represented in SF vs iGAS. However, a "meta-ssa" phage defined by the presence of :315.2, SPsP6, MGAS10750.3 or HK360ssa, was found to be over represented. The HKU360.vir phage was not detected yet the HKU360.ssa phage was present in 43/63 emm12 isolates but not found to be over-represented in isolates from patients with SF. There is no evidence that the increased number of SF cases was a strain-specific or known mobile element specific phenomenon, as the increase in SF cases was associated with multiple lineages of GAS.
Geographic isolates of Lymantria dispar multiple nucleopolyhedrovirus: Genome sequence analysis and pathogenicity against European and Asian gypsy moth strains

Treesearch

Harrison Robert L.; Daniel L. Rowley; Melody A. Keena

2016-01-01

Isolates of the baculovirus species Lymantria dispar multiple nucleopolyhedrovirus have been formulated and applied to suppress outbreaks of the gypsy moth, L. dispar. To evaluate the genetic diversity in this species at the genomic level, the genomes of three isolates from Massachusetts, USA (LdMNPV-Aba624), Spain (LdMNPV-3054...
Automated ensemble assembly and validation of microbial genomes.

PubMed

Koren, Sergey; Treangen, Todd J; Hill, Christopher M; Pop, Mihai; Phillippy, Adam M

2014-05-03

The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
An improved model for whole genome phylogenetic analysis by Fourier transform.

PubMed

Yin, Changchuan; Yau, Stephen S-T

2015-10-07

DNA sequence similarity comparison is one of the major steps in computational phylogenetic studies. The sequence comparison of closely related DNA sequences and genomes is usually performed by multiple sequence alignments (MSA). While the MSA method is accurate for some types of sequences, it may produce incorrect results when DNA sequences undergone rearrangements as in many bacterial and viral genomes. It is also limited by its computational complexity for comparing large volumes of data. Previously, we proposed an alignment-free method that exploits the full information contents of DNA sequences by Discrete Fourier Transform (DFT), but still with some limitations. Here, we present a significantly improved method for the similarity comparison of DNA sequences by DFT. In this method, we map DNA sequences into 2-dimensional (2D) numerical sequences and then apply DFT to transform the 2D numerical sequences into frequency domain. In the 2D mapping, the nucleotide composition of a DNA sequence is a determinant factor and the 2D mapping reduces the nucleotide composition bias in distance measure, and thus improving the similarity measure of DNA sequences. To compare the DFT power spectra of DNA sequences with different lengths, we propose an improved even scaling algorithm to extend shorter DFT power spectra to the longest length of the underlying sequences. After the DFT power spectra are evenly scaled, the spectra are in the same dimensionality of the Fourier frequency space, then the Euclidean distances of full Fourier power spectra of the DNA sequences are used as the dissimilarity metrics. The improved DFT method, with increased computational performance by 2D numerical representation, can be applicable to any DNA sequences of different length ranges. We assess the accuracy of the improved DFT similarity measure in hierarchical clustering of different DNA sequences including simulated and real datasets. The method yields accurate and reliable phylogenetic trees and demonstrates that the improved DFT dissimilarity measure is an efficient and effective similarity measure of DNA sequences. Due to its high efficiency and accuracy, the proposed DFT similarity measure is successfully applied on phylogenetic analysis for individual genes and large whole bacterial genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.
A Supervised Statistical Learning Approach for Accurate Legionella pneumophila Source Attribution during Outbreaks

PubMed Central

Buultjens, Andrew H.; Chua, Kyra Y. L.; Baines, Sarah L.; Kwong, Jason; Gao, Wei; Cutcher, Zoe; Adcock, Stuart; Ballard, Susan; Schultz, Mark B.; Tomita, Takehiro; Subasinghe, Nela; Carter, Glen P.; Pidot, Sacha J.; Franklin, Lucinda; Seemann, Torsten; Gonçalves Da Silva, Anders

2017-01-01

ABSTRACT Public health agencies are increasingly relying on genomics during Legionnaires' disease investigations. However, the causative bacterium (Legionella pneumophila) has an unusual population structure, with extreme temporal and spatial genome sequence conservation. Furthermore, Legionnaires' disease outbreaks can be caused by multiple L. pneumophila genotypes in a single source. These factors can confound cluster identification using standard phylogenomic methods. Here, we show that a statistical learning approach based on L. pneumophila core genome single nucleotide polymorphism (SNP) comparisons eliminates ambiguity for defining outbreak clusters and accurately predicts exposure sources for clinical cases. We illustrate the performance of our method by genome comparisons of 234 L. pneumophila isolates obtained from patients and cooling towers in Melbourne, Australia, between 1994 and 2014. This collection included one of the largest reported Legionnaires' disease outbreaks, which involved 125 cases at an aquarium. Using only sequence data from L. pneumophila cooling tower isolates and including all core genome variation, we built a multivariate model using discriminant analysis of principal components (DAPC) to find cooling tower-specific genomic signatures and then used it to predict the origin of clinical isolates. Model assignments were 93% congruent with epidemiological data, including the aquarium Legionnaires' disease outbreak and three other unrelated outbreak investigations. We applied the same approach to a recently described investigation of Legionnaires' disease within a UK hospital and observed a model predictive ability of 86%. We have developed a promising means to breach L. pneumophila genetic diversity extremes and provide objective source attribution data for outbreak investigations. IMPORTANCE Microbial outbreak investigations are moving to a paradigm where whole-genome sequencing and phylogenetic trees are used to support epidemiological investigations. It is critical that outbreak source predictions are accurate, particularly for pathogens, like Legionella pneumophila, which can spread widely and rapidly via cooling system aerosols, causing Legionnaires' disease. Here, by studying hundreds of Legionella pneumophila genomes collected over 21 years around a major Australian city, we uncovered limitations with the phylogenetic approach that could lead to a misidentification of outbreak sources. We implement instead a statistical learning technique that eliminates the ambiguity of inferring disease transmission from phylogenies. Our approach takes geolocation information and core genome variation from environmental L. pneumophila isolates to build statistical models that predict with high confidence the environmental source of clinical L. pneumophila during disease outbreaks. We show the versatility of the technique by applying it to unrelated Legionnaires' disease outbreaks in Australia and the UK. PMID:28821546
Genomic sequencing and analyses of HearMNPV—a new Multinucleocapsid nucleopolyhedrovirus isolated from Helicoverpa armigera

PubMed Central

2012-01-01

Background HearMNPV, a nucleopolyhedrovirus (NPV), which infects the cotton bollworm, Helicoverpa armigera, comprises multiple rod-shaped nucleocapsids in virion(as detected by electron microscopy). HearMNPV shows a different host range compared with H. armigera single-nucleocapsid NPV (HearSNPV). To better understand HearMNPV, the HearMNPV genome was sequenced and analyzed. Methods The morphology of HearMNPV was observed by electron microscope. The qPCR was used to determine the replication kinetics of HearMNPV infectious for H. armigera in vivo. A random genomic library of HearMNPV was constructed according to the “partial filling-in” method, the sequence and organization of the HearMNPV genome was analyzed and compared with sequence data from other baculoviruses. Results Real time qPCR showed that HearMNPV DNA replication included a decreasing phase, latent phase, exponential phase, and a stationary phase during infection of H. armigera. The HearMNPV genome consists of 154,196 base pairs, with a G + C content of 40.07%. 162 putative ORFs were detected in the HearMNPV genome, which represented 90.16% of the genome. The remaining 9.84% constitute four homologous regions and other non-coding regions. The gene content and gene arrangement in HearMNPV were most similar to those of Mamestra configurata NPV-B (MacoNPV-B), but was different to HearSNPV. Comparison of the genome of HearMNPV and MacoNPV-B suggested that HearMNPV has a deletion of a 5.4-kb fragment containing five ORFs. In addition, HearMNPV orf66, bro genes, and hrs are different to the corresponding parts of the MacoNPV-B genome. Conclusions HearMNPV can replicate in vivo in H. armigera and in vitro, and is a new NPV isolate distinguished from HearSNPV. HearMNPV is most closely related to MacoNPV-B, but has a distinct genomic structure, content, and organization. PMID:22913743
Medium-sized tandem repeats represent an abundant component of the Drosophila virilis genome.

PubMed

Abdurashitov, Murat A; Gonchar, Danila A; Chernukhin, Valery A; Tomilov, Victor N; Tomilova, Julia E; Schostak, Natalia G; Zatsepina, Olga G; Zelentsova, Elena S; Evgen'ev, Michael B; Degtyarev, Sergey K H

2013-11-09

Previously, we developed a simple method for carrying out a restriction enzyme analysis of eukaryotic DNA in silico, based on the known DNA sequences of the genomes. This method allows the user to calculate lengths of all DNA fragments that are formed after a whole genome is digested at the theoretical recognition sites of a given restriction enzyme. A comparison of the observed peaks in distribution diagrams with the results from DNA cleavage using several restriction enzymes performed in vitro have shown good correspondence between the theoretical and experimental data in several cases. Here, we applied this approach to the annotated genome of Drosophila virilis which is extremely rich in various repeats. Here we explored the combined approach to perform the restriction analysis of D. virilis DNA. This approach enabled to reveal three abundant medium-sized tandem repeats within the D. virilis genome. While the 225 bp repeats were revealed previously in intergenic non-transcribed spacers between ribosomal genes of D. virilis, two other families comprised of 154 bp and 172 bp repeats were not described. Tandem Repeats Finder search demonstrated that 154 bp and 172 bp units are organized in multiple clusters in the genome of D. virilis. Characteristically, only 154 bp repeats derived from Helitron transposon are transcribed. Using in silico digestion in combination with conventional restriction analysis and sequencing of repeated DNA fragments enabled us to isolate and characterize three highly abundant families of medium-sized repeats present in the D. virilis genome. These repeats comprise a significant portion of the genome and may have important roles in genome function and structural integrity. Therefore, we demonstrated an approach which makes possible to investigate in detail the gross arrangement and expression of medium-sized repeats basing on sequencing data even in the case of incompletely assembled and/or annotated genomes.
High Resolution Typing by Whole Genome Mapping Enables Discrimination of LA-MRSA (CC398) Strains and Identification of Transmission Events

PubMed Central

Bosch, Thijs; Verkade, Erwin; van Luit, Martijn; Pot, Bruno; Vauterin, Paul; Burggrave, Ronald; Savelkoul, Paul; Kluytmans, Jan; Schouls, Leo

2013-01-01

After its emergence in 2003, a livestock-associated (LA-)MRSA clade (CC398) has caused an impressive increase in the number of isolates submitted for the Dutch national MRSA surveillance and now comprises 40% of all isolates. The currently used molecular typing techniques have limited discriminatory power for this MRSA clade, which hampers studies on the origin and transmission routes. Recently, a new molecular analysis technique named whole genome mapping was introduced. This method creates high-resolution, ordered whole genome restriction maps that may have potential for strain typing. In this study, we assessed and validated the capability of whole genome mapping to differentiate LA-MRSA isolates. Multiple validation experiments showed that whole genome mapping produced highly reproducible results. Assessment of the technique on two well-documented MRSA outbreaks showed that whole genome mapping was able to confirm one outbreak, but revealed major differences between the maps of a second, indicating that not all isolates belonged to this outbreak. Whole genome mapping of LA-MRSA isolates that were epidemiologically unlinked provided a much higher discriminatory power than spa-typing or MLVA. In contrast, maps created from LA-MRSA isolates obtained during a proven LA-MRSA outbreak were nearly indistinguishable showing that transmission of LA-MRSA can be detected by whole genome mapping. Finally, whole genome maps of LA-MRSA isolates originating from two unrelated veterinarians and their household members showed that veterinarians may carry and transmit different LA-MRSA strains at the same time. No such conclusions could be drawn based spa-typing and MLVA. Although PFGE seems to be suitable for molecular typing of LA-MRSA, WGM provides a much higher discriminatory power. Furthermore, whole genome mapping can provide a comparison with other maps within 2 days after the bacterial culture is received, making it suitable to investigate transmission events and outbreaks caused by LA-MRSA. PMID:23805225
An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

PubMed

Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

2016-08-09

Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance progress in elucidating transcription regulation mechanism, thus provide benefit to the genomic research community and prokaryotic genome researchers in particular.
A Single Multiplex crRNA Array for FnCpf1-Mediated Human Genome Editing.

PubMed

Sun, Huihui; Li, Fanfan; Liu, Jie; Yang, Fayu; Zeng, Zhenhai; Lv, Xiujuan; Tu, Mengjun; Liu, Yeqing; Ge, Xianglian; Liu, Changbao; Zhao, Junzhao; Zhang, Zongduan; Qu, Jia; Song, Zongming; Gu, Feng

2018-06-15

Cpf1 has been harnessed as a tool for genome manipulation in various species because of its simplicity and high efficiency. Our recent study demonstrated that FnCpf1 could be utilized for human genome editing with notable advantages for target sequence selection due to the flexibility of the protospacer adjacent motif (PAM) sequence. Multiplex genome editing provides a powerful tool for targeting members of multigene families, dissecting gene networks, modeling multigenic disorders in vivo, and applying gene therapy. However, there are no reports at present that show FnCpf1-mediated multiplex genome editing via a single customized CRISPR RNA (crRNA) array. In the present study, we utilize a single customized crRNA array to simultaneously target multiple genes in human cells. In addition, we also demonstrate that a single customized crRNA array to target multiple sites in one gene could be achieved. Collectively, FnCpf1, a powerful genome-editing tool for multiple genomic targets, can be harnessed for effective manipulation of the human genome. Copyright © 2018 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.

Insights from 20 years of bacterial genome sequencing

DOE PAGES

Land, Miriam L.; Hauser, Loren; Jun, Se-Ran; ...

2015-02-27

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less
Insights from 20 years of bacterial genome sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Land, Miriam L.; Hauser, Loren; Jun, Se-Ran

Since the first two complete bacterial genome sequences were published in 1995, the science of bacteria has dramatically changed. Using third-generation DNA sequencing, it is possible to completely sequence a bacterial genome in a few hours and identify some types of methylation sites along the genome as well. Sequencing of bacterial genome sequences is now a standard procedure, and the information from tens of thousands of bacterial genomes has had a major impact on our views of the bacterial world. In this review, we explore a series of questions to highlight some insights that comparative genomics has produced. To date,more » there are genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. However, the distribution is quite skewed towards a few phyla that contain model organisms. But the breadth is continuing to improve, with projects dedicated to filling in less characterized taxonomic groups. The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas system provides bacteria with immunity against viruses, which outnumber bacteria by tenfold. How fast can we go? Second-generation sequencing has produced a large number of draft genomes (close to 90 % of bacterial genomes in GenBank are currently not complete); third-generation sequencing can potentially produce a finished genome in a few hours, and at the same time provide methlylation sites along the entire chromosome. The diversity of bacterial communities is extensive as is evident from the genome sequences available from 50 different bacterial phyla and 11 different archaeal phyla. Genome sequencing can help in classifying an organism, and in the case where multiple genomes of the same species are available, it is possible to calculate the pan- and core genomes; comparison of more than 2000 Escherichia coli genomes finds an E. coli core genome of about 3100 gene families and a total of about 89,000 different gene families. Why do we care about bacterial genome sequencing? There are many practical applications, such as genome-scale metabolic modeling, biosurveillance, bioforensics, and infectious disease epidemiology. In the near future, high-throughput sequencing of patient metagenomic samples could revolutionize medicine in terms of speed and accuracy of finding pathogens and knowing how to treat them.« less
NCI collaborates with Multiple Myeloma Research Foundation

Cancer.gov

The National Cancer Institute (NCI) announced a collaboration with the Multiple Myeloma Research Foundation (MMRF) to incorporate MMRF's wealth of genomic and clinical data on the disease into the NCI Genomic Data Commons (GDC), a publicly available datab
Polycistronic tRNA and CRISPR guide-RNA enables highly efficient multiplexed genome engineering in human cells

PubMed Central

Dong, Fengping; Xie, Kabin; Chen, Yueying; Yang, Yinong; Mao, Yingwei

2016-01-01

CRISPR/Cas9 has been widely used for genomic editing in many organisms. Many human diseases are caused by multiple mutations. The CRISPR/Cas9 system provides a potential tool to introduce multiple mutations in a genome. To mimic complicated genomic variants in human diseases, such as multiple gene deletions or mutations, two or more small guide RNAs (sgRNAs) need to be introduced all together. This can be achieved by separate Pol III promoters in a construct. However, limited enzyme sites and increased insertion size lower the efficiency to make a construct. Here, we report a strategy to quickly assembly multiple sgRNAs in one construct using a polycistronic-tRNA-gRNA (PTG) strategy. Taking advantage of the endogenous tRNA processing system in mammalian cells, we efficiently express multiple sgRNAs driven using only one Pol III promoter. Using an all-in-one construct carrying PTG, we disrupt the deacetylase domain in multiple histone deacetylases (HDACs) in human cells simultaneously. We demonstrate that multiple HDAC deletions significantly affect the activation of the Wnt-signaling pathway. Thus, this method enables to efficiently target multiple genes and provide a useful tool to establish mutated cells mimicking human diseases. PMID:27890617
Polycistronic tRNA and CRISPR guide-RNA enables highly efficient multiplexed genome engineering in human cells.

PubMed

Dong, Fengping; Xie, Kabin; Chen, Yueying; Yang, Yinong; Mao, Yingwei

2017-01-22

CRISPR/Cas9 has been widely used for genomic editing in many organisms. Many human diseases are caused by multiple mutations. The CRISPR/Cas9 system provides a potential tool to introduce multiple mutations in a genome. To mimic complicated genomic variants in human diseases, such as multiple gene deletions or mutations, two or more small guide RNAs (sgRNAs) need to be introduced all together. This can be achieved by separate Pol III promoters in a construct. However, limited enzyme sites and increased insertion size lower the efficiency to make a construct. Here, we report a strategy to quickly assembly multiple sgRNAs in one construct using a polycistronic-tRNA-gRNA (PTG) strategy. Taking advantage of the endogenous tRNA processing system in mammalian cells, we efficiently express multiple sgRNAs driven using only one Pol III promoter. Using an all-in-one construct carrying PTG, we disrupt the deacetylase domain in multiple histone deacetylases (HDACs) in human cells simultaneously. We demonstrate that multiple HDAC deletions significantly affect the activation of the Wnt-signaling pathway. Thus, this method enables to efficiently target multiple genes and provide a useful tool to establish mutated cells mimicking human diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
Sequence of the tomato chloroplast DNA and evolutionary comparison of solanaceous plastid genomes.

PubMed

Kahlau, Sabine; Aspinall, Sue; Gray, John C; Bock, Ralph

2006-08-01

Tomato, Solanum lycopersicum (formerly Lycopersicon esculentum), has long been one of the classical model species of plant genetics. More recently, solanaceous species have become a model of evolutionary genomics, with several EST projects and a tomato genome project having been initiated. As a first contribution toward deciphering the genetic information of tomato, we present here the complete sequence of the tomato chloroplast genome (plastome). The size of this circular genome is 155,461 base pairs (bp), with an average AT content of 62.14%. It contains 114 genes and conserved open reading frames (ycfs). Comparison with the previously sequenced plastid DNAs of Nicotiana tabacum and Atropa belladonna reveals patterns of plastid genome evolution in the Solanaceae family and identifies varying degrees of conservation of individual plastid genes. In addition, we discovered several new sites of RNA editing by cytidine-to-uridine conversion. A detailed comparison of editing patterns in the three solanaceous species highlights the dynamics of RNA editing site evolution in chloroplasts. To assess the level of intraspecific plastome variation in tomato, the plastome of a second tomato cultivar was sequenced. Comparison of the two genotypes (IPA-6, bred in South America, and Ailsa Craig, bred in Europe) revealed no nucleotide differences, suggesting that the plastomes of modern tomato cultivars display very little, if any, sequence variation.
Assessing the Robustness of Complete Bacterial Genome Segmentations

NASA Astrophysics Data System (ADS)

Devillers, Hugo; Chiapello, Hélène; Schbath, Sophie; El Karoui, Meriem

Comparison of closely related bacterial genomes has revealed the presence of highly conserved sequences forming a "backbone" that is interrupted by numerous, less conserved, DNA fragments. Segmentation of bacterial genomes into backbone and variable regions is particularly useful to investigate bacterial genome evolution. Several software tools have been designed to compare complete bacterial chromosomes and a few online databases store pre-computed genome comparisons. However, very few statistical methods are available to evaluate the reliability of these software tools and to compare the results obtained with them. To fill this gap, we have developed two local scores to measure the robustness of bacterial genome segmentations. Our method uses a simulation procedure based on random perturbations of the compared genomes. The scores presented in this paper are simple to implement and our results show that they allow to discriminate easily between robust and non-robust bacterial genome segmentations when using aligners such as MAUVE and MGA.
Factors affecting reproducibility between genome-scale siRNA-based screens

PubMed Central

Barrows, Nicholas J.; Le Sommer, Caroline; Garcia-Blanco, Mariano A.; Pearson, James L.

2011-01-01

RNA interference-based screening is a powerful new genomic technology which addresses gene function en masse. To evaluate factors influencing hit list composition and reproducibility, we performed two identically designed small interfering RNA (siRNA)-based, whole genome screens for host factors supporting yellow fever virus infection. These screens represent two separate experiments completed five months apart and allow the direct assessment of the reproducibility of a given siRNA technology when performed in the same environment. Candidate hit lists generated by sum rank, median absolute deviation, z-score, and strictly standardized mean difference were compared within and between whole genome screens. Application of these analysis methodologies within a single screening dataset using a fixed threshold equivalent to a p-value ≤ 0.001 resulted in hit lists ranging from 82 to 1,140 members and highlighted the tremendous impact analysis methodology has on hit list composition. Intra- and inter-screen reproducibility was significantly influenced by the analysis methodology and ranged from 32% to 99%. This study also highlighted the power of testing at least two independent siRNAs for each gene product in primary screens. To facilitate validation we conclude by suggesting methods to reduce false discovery at the primary screening stage. In this study we present the first comprehensive comparison of multiple analysis strategies, and demonstrate the impact of the analysis methodology on the composition of the “hit list”. Therefore, we propose that the entire dataset derived from functional genome-scale screens, especially if publicly funded, should be made available as is done with data derived from gene expression and genome-wide association studies. PMID:20625183
Genome-wide genetic variation and comparison of fruit-associated traits between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina).

PubMed

Liu, Tian-Jia; Li, Yong-Ping; Zhou, Jing-Jing; Hu, Chun-Gen; Zhang, Jin-Zhi

2018-03-01

The comprehensive genetic variation of two citrus species were analyzed at genome and transcriptome level. A total of 1090 differentially expressed genes were found during fruit development by RNA-sequencing. Fruit size (fruit equatorial diameter) and weight (fresh weight) are the two most important components determining yield and consumer acceptability for many horticultural crops. However, little is known about the genetic control of these traits. Here, we performed whole-genome resequencing to reveal the comprehensive genetic variation of the fruit development between kumquat (Citrus japonica) and Clementine mandarin (Citrus clementina). In total, 5,865,235 single-nucleotide polymorphisms (SNPs) and 414,447 insertions/deletions (InDels) were identified in the two citrus species. Based on integrative analysis of genome and transcriptome of fruit, 640,801 SNPs and 20,733 InDels were identified. The features, genomic distribution, functional effect, and other characteristics of these genetic variations were explored. RNA-sequencing identified 1090 differentially expressed genes (DEGs) during fruit development of kumquat and Clementine mandarin. Gene Ontology revealed that these genes were involved in various molecular functional and biological processes. In addition, the genetic variation of 939 DEGs and 74 multiple fruit development pathway genes from previous reports were also identified. A global survey identified 24,237 specific alternative splicing events in the two citrus species and showed that intron retention is the most prevalent pattern of alternative splicing. These genome variation data provide a foundation for further exploration of citrus diversity and gene-phenotype relationships and for future research on molecular breeding to improve kumquat, Clementine mandarin and related species.
Community perceptions of genomic research: implications for addressing health disparities.

PubMed

Isler, Malika Roman; Sutton, Karey; Cadigan, R Jean; Corbie-Smith, Giselle

2013-01-01

Increasing the engagement of racial and ethnic minorities in genomic research may help alleviate health disparities. This paper examines community perceptions of the relationships between race, genes, environment, and health disparities, and it discusses how such perceptions may influence participation in genomic research. We conducted semi-structured interviews with 91 African American, Latino, and white lay community members and community leaders in North Carolina. Using constant comparison methods, we identified, compared, and developed linkages between conceptual categories and respondent groups. Participants described gene-environment interactions as contributing to group differences in health outcomes, expressed the belief that genetic predisposition to disease differs across groups, and said that social conditions trigger group-level genetic differences and create poorer health outcomes among African Americans. Given the regional presence of major research institutions and the relatively high education level of many participants, this sample may not reflect the perspectives of those most disparately affected by health disparities. Members from multiple community sectors share perceptions and may respond to similar approaches when attempts are made to increase participation in genomic research. Researchers may inadvertently fuel the perception that health disparities experienced by minorities are rooted in the shared genomes of a particular group as distinct from those of other groups. The way researchers use race and ethnicity in recruitment, analysis, and communication of research findings inaccurately implies that there are genetic differences between races, when categories of social experience or ancestry may more accurately characterize health differences. Understanding these issues is crucial to designing effective community engagement strategies, recruitment plans, and messages about genomic research, which could ultimately help to lessen health disparities.
The Genome and Methylome of a Beetle with Complex Social Behavior, Nicrophorus vespilloides (Coleoptera: Silphidae).

PubMed

Cunningham, Christopher B; Ji, Lexiang; Wiberg, R Axel W; Shelton, Jennifer; McKinney, Elizabeth C; Parker, Darren J; Meagher, Richard B; Benowitz, Kyle M; Roy-Zokan, Eileen M; Ritchie, Michael G; Brown, Susan J; Schmitz, Robert J; Moore, Allen J

2015-10-09

Testing for conserved and novel mechanisms underlying phenotypic evolution requires a diversity of genomes available for comparison spanning multiple independent lineages. For example, complex social behavior in insects has been investigated primarily with eusocial lineages, nearly all of which are Hymenoptera. If conserved genomic influences on sociality do exist, we need data from a wider range of taxa that also vary in their levels of sociality. Here, we present the assembled and annotated genome of the subsocial beetle Nicrophorus vespilloides, a species long used to investigate evolutionary questions of complex social behavior. We used this genome to address two questions. First, do aspects of life history, such as using a carcass to breed, predict overlap in gene models more strongly than phylogeny? We found that the overlap in gene models was similar between N. vespilloides and all other insect groups regardless of life history. Second, like other insects with highly developed social behavior but unlike other beetles, does N. vespilloides have DNA methylation? We found strong evidence for an active DNA methylation system. The distribution of methylation was similar to other insects with exons having the most methylated CpGs. Methylation status appears highly conserved; 85% of the methylated genes in N. vespilloides are also methylated in the hymentopteran Nasonia vitripennis. The addition of this genome adds a coleopteran resource to answer questions about the evolution and mechanistic basis of sociality and to address questions about the potential role of methylation in social behavior. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Archaeal Clusters of Orthologous Genes (arCOGs): An Update and Application for Analysis of Shared Features between Thermococcales, Methanococcales, and Methanobacteriales

PubMed Central

Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.

2015-01-01

With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that unit two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea. PMID:25764277
Complete genome analysis of three Acinetobacter baumannii clinical isolates in China for insight into the diversification of drug resistance elements.

PubMed

Zhu, Lingxiang; Yan, Zhongqiang; Zhang, Zhaojun; Zhou, Qiming; Zhou, Jinchun; Wakeland, Edward K; Fang, Xiangdong; Xuan, Zhenyu; Shen, Dingxia; Li, Quan-Zhen

2013-01-01

The emergence and rapid spreading of multidrug-resistant Acinetobacter baumannii strains has become a major health threat worldwide. To better understand the genetic recombination related with the acquisition of drug-resistant elements during bacterial infection, we performed complete genome analysis on three newly isolated multidrug-resistant A. baumannii strains from Beijing using next-generation sequencing technology. Whole genome comparison revealed that all 3 strains share some common drug resistant elements including carbapenem-resistant bla OXA-23 and tetracycline (tet) resistance islands, but the genome structures are diversified among strains. Various genomic islands intersperse on the genome with transposons and insertions, reflecting the recombination flexibility during the acquisition of the resistant elements. The blood-isolated BJAB07104 and ascites-isolated BJAB0868 exhibit high similarity on their genome structure with most of the global clone II strains, suggesting these two strains belong to the dominant outbreak strains prevalent worldwide. A large resistance island (RI) of about 121-kb, carrying a cluster of resistance-related genes, was inserted into the ATPase gene on BJAB07104 and BJAB0868 genomes. A 78-kb insertion element carrying tra-locus and bla OXA-23 island, can be either inserted into one of the tniB gene in the 121-kb RI on the chromosome, or transformed to conjugative plasmid in the two BJAB strains. The third strains of this study, BJAB0715, which was isolated from spinal fluid, exhibit much more divergence compared with above two strains. It harbors multiple drug-resistance elements including a truncated AbaR-22-like RI on its genome. One of the unique features of this strain is that it carries both bla OXA-23 and bla OXA-58 genes on its genome. Besides, an Acinetobacter lwoffii adeABC efflux element was found inserted into the ATPase position in BJAB0715. Our comparative analysis on currently completed Acinetobacter baumannii genomes revealed extensive and dynamic genome organizations, which may facilitate the bacteria to acquire drug-resistance elements into their genomes.
Whole-genome sequencing of two North American Drosophila melanogaster populations reveals genetic differentiation and positive selection.

PubMed

Campo, D; Lehmann, K; Fjeldsted, C; Souaiaia, T; Kao, J; Nuzhdin, S V

2013-10-01

The prevailing demographic model for Drosophila melanogaster suggests that the colonization of North America occurred very recently from a subset of European flies that rapidly expanded across the continent. This model implies a sudden population growth and range expansion consistent with very low or no population subdivision. As flies adapt to new environments, local adaptation events may be expected. To describe demographic and selective events during North American colonization, we have generated a data set of 35 individual whole-genome sequences from inbred lines of D. melanogaster from a west coast US population (Winters, California, USA) and compared them with a public genome data set from Raleigh (Raleigh, North Carolina, USA). We analysed nuclear and mitochondrial genomes and described levels of variation and divergence within and between these two North American D. melanogaster populations. Both populations exhibit negative values of Tajima's D across the genome, a common signature of demographic expansion. We also detected a low but significant level of genome-wide differentiation between the two populations, as well as multiple allele surfing events, which can be the result of gene drift in local subpopulations on the edge of an expansion wave. In contrast to this genome-wide pattern, we uncovered a 50-kilobase segment in chromosome arm 3L that showed all the hallmarks of a soft selective sweep in both populations. A comparison of allele frequencies within this divergent region among six populations from three continents allowed us to cluster these populations in two differentiated groups, providing evidence for the action of natural selection on a global scale. © 2013 John Wiley & Sons Ltd.
Whole-Genome-Sequencing characterization of bloodstream infection-causing hypervirulent Klebsiella pneumoniae of capsular serotype K2 and ST374.

PubMed

Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Ou, Hong-Yu; Qu, Hongping

2018-01-01

Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 10 2 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance.
Whole-Genome-Sequencing characterization of bloodstream infection-causing hypervirulent Klebsiella pneumoniae of capsular serotype K2 and ST374

PubMed Central

Wang, Xiaoli; Xie, Yingzhou; Li, Gang; Liu, Jialin; Li, Xiaobin; Tian, Lijun; Sun, Jingyong; Qu, Hongping

2018-01-01

ABSTRACT Hypervirulent K. pneumoniae variants (hvKP) have been increasingly reported worldwide, causing metastasis of severe infections such as liver abscesses and bacteremia. The capsular serotype K2 hvKP strains show diverse multi-locus sequence types (MLSTs), but with limited genetics and virulence information. In this study, we report a hypermucoviscous K. pneumoniae strain, RJF293, isolated from a human bloodstream sample in a Chinese hospital. It caused a metastatic infection and fatal septic shock in a critical patient. The microbiological features and genetic background were investigated with multiple approaches. The Strain RJF293 was determined to be multilocis sequence type (ST) 374 and serotype K2, displayed a median lethal dose (LD50) of 1.5 × 102 CFU in BALB/c mice and was as virulent as the ST23 K1 serotype hvKP strain NTUH-K2044 in a mouse lethality assay. Whole genome sequencing revealed that the RJF293 genome codes for 32 putative virulence factors and exhibits a unique presence/absence pattern in comparison to the other 105 completely sequenced K. pneumoniae genomes. Whole genome SNP-based phylogenetic analysis revealed that strain RJF293 formed a single clade, distant from those containing either ST66 or ST86 hvKP. Compared to the other sequenced hvKP chromosomes, RJF293 contains several strain-variable regions, including one prophage, one ICEKp1 family integrative and conjugative element and six large genomic islands. The sequencing of the first complete genome of an ST374 K2 hvKP clinical strain should reinforce our understanding of the epidemiology and virulence mechanisms of this bloodstream infection-causing hvKP with clinical significance. PMID:29338592
Divergence of Mammalian Higher Order Chromatin Structure Is Associated with Developmental Loci

PubMed Central

Chambers, Emily V.; Bickmore, Wendy A.; Semple, Colin A.

2013-01-01

Several recent studies have examined different aspects of mammalian higher order chromatin structure – replication timing, lamina association and Hi-C inter-locus interactions — and have suggested that most of these features of genome organisation are conserved over evolution. However, the extent of evolutionary divergence in higher order structure has not been rigorously measured across the mammalian genome, and until now little has been known about the characteristics of any divergent loci present. Here, we generate a dataset combining multiple measurements of chromatin structure and organisation over many embryonic cell types for both human and mouse that, for the first time, allows a comprehensive assessment of the extent of structural divergence between mammalian genomes. Comparison of orthologous regions confirms that all measurable facets of higher order structure are conserved between human and mouse, across the vast majority of the detectably orthologous genome. This broad similarity is observed in spite of many loci possessing cell type specific structures. However, we also identify hundreds of regions (from 100 Kb to 2.7 Mb in size) showing consistent evidence of divergence between these species, constituting at least 10% of the orthologous mammalian genome and encompassing many hundreds of human and mouse genes. These regions show unusual shifts in human GC content, are unevenly distributed across both genomes, and are enriched in human subtelomeric regions. Divergent regions are also relatively enriched for genes showing divergent expression patterns between human and mouse ES cells, implying these regions cause divergent regulation. Particular divergent loci are strikingly enriched in genes implicated in vertebrate development, suggesting important roles for structural divergence in the evolution of mammalian developmental programmes. These data suggest that, though relatively rare in the mammalian genome, divergence in higher order chromatin structure has played important roles during evolution. PMID:23592965
The Methanosarcina barkeri genome: comparative analysis withMethanosarcina acetivorans and Methanosarcina mazei reveals extensiverearrangement within methanosarcinal genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.

2006-05-19

We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri,more » 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.« less
Fourteen-Genome Comparison Identifies DNA Markers for Severe-Disease-Associated Strains of Clostridium difficile▿†

PubMed Central

Forgetta, Vincenzo; Oughton, Matthew T.; Marquis, Pascale; Brukner, Ivan; Blanchette, Ruth; Haub, Kevin; Magrini, Vince; Mardis, Elaine R.; Gerding, Dale N.; Loo, Vivian G.; Miller, Mark A.; Mulvey, Michael R.; Rupnik, Maja; Dascal, Andre; Dewar, Ken

2011-01-01

Clostridium difficile is a common cause of infectious diarrhea in hospitalized patients. A severe and increased incidence of C. difficile infection (CDI) is associated predominantly with the NAP1 strain; however, the existence of other severe-disease-associated (SDA) strains and the extensive genetic diversity across C. difficile complicate reliable detection and diagnosis. Comparative genome analysis of 14 sequenced genomes, including those of a subset of NAP1 isolates, allowed the assessment of genetic diversity within and between strain types to identify DNA markers that are associated with severe disease. Comparative genome analysis of 14 isolates, including five publicly available strains, revealed that C. difficile has a core genome of 3.4 Mb, comprising ∼3,000 genes. Analysis of the core genome identified candidate DNA markers that were subsequently evaluated using a multistrain panel of 177 isolates, representing more than 50 pulsovars and 8 toxinotypes. A subset of 117 isolates from the panel had associated patient data that allowed assessment of an association between the DNA markers and severe CDI. We identified 20 candidate DNA markers for species-wide detection and 10,683 single nucleotide polymorphisms (SNPs) associated with the predominant SDA strain (NAP1). A species-wide detection candidate marker, the sspA gene, was found to be the same across 177 sequenced isolates and lacked significant similarity to those of other species. Candidate SNPs in genes CD1269 and CD1265 were found to associate more closely with disease severity than currently used diagnostic markers, as they were also present in the toxin A-negative and B-positive (A-B+) strain types. The genetic markers identified illustrate the potential of comparative genomics for the discovery of diagnostic DNA-based targets that are species specific or associated with multiple SDA strains. PMID:21508155
Conserved structure and expression of hsp70 paralogs in teleost fishes.

PubMed

Metzger, David C H; Hemmer-Hansen, Jakob; Schulte, Patricia M

2016-06-01

The cytosolic 70KDa heat shock proteins (Hsp70s) are widely used as biomarkers of environmental stress in ecological and toxicological studies in fish. Here we analyze teleost genome sequences to show that two genes encoding inducible hsp70s (hsp70-1 and hsp70-2) are likely present in all teleost fish. Phylogenetic and synteny analyses indicate that hsp70-1 and hsp70-2 are distinct paralogs that originated prior to the diversification of the teleosts. The promoters of both genes contain a TATA box and conserved heat shock elements (HSEs), but unlike mammalian HSP70s, both genes contain an intron in the 5' UTR. The hsp70-2 gene has undergone tandem duplication in several species. In addition, many other teleost genome assemblies have multiple copies of hsp70-2 present on separate, small, genomic scaffolds. To verify that these represent poorly assembled tandem duplicates, we cloned the genomic region surrounding hsp70-2 in Fundulus heteroclitus and showed that the hsp70-2 gene copies that are on separate scaffolds in the genome assembly are arranged as tandem duplicates. Real-time quantitative PCR of F. heteroclitus genomic DNA indicates that four copies of the hsp70-2 gene are likely present in the F. heteroclitus genome. Comparison of expression patterns in F. heteroclitus and Gasterosteus aculeatus demonstrates that hsp70-2 has a higher fold increase than hsp70-1 following heat shock in gill but not in muscle tissue, revealing a conserved difference in expression patterns between isoforms and tissues. These data indicate that ecological and toxicological studies using hsp70 as a biomarker in teleosts should take this complexity into account. Copyright © 2016 Elsevier Inc. All rights reserved.

Multiple Testing in the Context of Gene Discovery in Sickle Cell Disease Using Genome-Wide Association Studies.

PubMed

Kuo, Kevin H M

2017-01-01

The issue of multiple testing, also termed multiplicity, is ubiquitous in studies where multiple hypotheses are tested simultaneously. Genome-wide association study (GWAS), a type of genetic association study that has gained popularity in the past decade, is most susceptible to the issue of multiple testing. Different methodologies have been employed to address the issue of multiple testing in GWAS. The purpose of the review is to examine the methodologies employed in dealing with multiple testing in the context of gene discovery using GWAS in sickle cell disease complications.
Examination of Association to Autism of Common Genetic Variation in Genes Related to Dopamine

PubMed Central

Anderson, B.M.; Schnetz-Boutaud, N.; Bartlett, J.; Wright, H.H.; Abramson, R.K.; Cuccaro, M.L.; Gilbert, J.R.; Pericak-Vance, M.A.; Haines, J.L.

2010-01-01

Autism is a severe neurodevelopmental disorder characterized by a triad of complications. Autistic individuals display significant disturbances in language and reciprocal social interactions, combined with repetitive and stereotypic behaviors. Prevalence studies suggest that autism is more common than originally believed, with recent estimates citing a rate of one in 150. Although this genomic approach has yielded multiple suggestive regions, a specific risk locus has yet to be identified and widely confirmed. Because many etiologies have been suggested for this complex syndrome, we hypothesize that one of the difficulties in identifying autism genes is that multiple genetic variants may be required to significantly increase the risk of developing autism. Thus we took the alternative approach of examining 14 prominent dopamine pathway candidate genes for detailed study by genotyping 28 SNPs. Although we did observe a nominally significant association for rs2239535 (p=.008) on chromosome 20, single locus analysis did not reveal any results as significant after correction for multiple comparisons. No significant interaction was identified when Multifactor Dimensionality Reduction (MDR) was employed to test specifically for multilocus effects. Although genome-wide linkage scans in autism have provided support for linkage to various loci along the dopamine pathway, our study does not provide strong evidence of linkage or association to any specific gene or combination of genes within the pathway. These results demonstrate that common genetic variation within the tested genes located within this pathway at most play a minor to moderate role in overall autism pathogenesis. PMID:19360691
Association between polygenic risk for schizophrenia, neurocognition and social cognition across development

PubMed Central

Germine, L; Robinson, E B; Smoller, J W; Calkins, M E; Moore, T M; Hakonarson, H; Daly, M J; Lee, P H; Holmes, A J; Buckner, R L; Gur, R C; Gur, R E

2016-01-01

Breakthroughs in genomics have begun to unravel the genetic architecture of schizophrenia risk, providing methods for quantifying schizophrenia polygenic risk based on common genetic variants. Our objective in the current study was to understand the relationship between schizophrenia genetic risk variants and neurocognitive development in healthy individuals. We first used combined genomic and neurocognitive data from the Philadelphia Neurodevelopmental Cohort (4303 participants ages 8–21 years) to screen 26 neurocognitive phenotypes for their association with schizophrenia polygenic risk. Schizophrenia polygenic risk was estimated for each participant based on summary statistics from the most recent schizophrenia genome-wide association analysis (Psychiatric Genomics Consortium 2014). After correction for multiple comparisons, greater schizophrenia polygenic risk was significantly associated with reduced speed of emotion identification and verbal reasoning. These associations were significant by age 9 years and there was no evidence of interaction between schizophrenia polygenic risk and age on neurocognitive performance. We then looked at the association between schizophrenia polygenic risk and emotion identification speed in the Harvard/MGH Brain Genomics Superstruct Project sample (695 participants ages 18–35 years), where we replicated the association between schizophrenia polygenic risk and emotion identification speed. These analyses provide evidence for a replicable association between polygenic risk for schizophrenia and a specific aspect of social cognition. Our findings indicate that individual differences in genetic risk for schizophrenia are linked with the development of aspects of social cognition and potentially verbal reasoning, and that these associations emerge relatively early in development. PMID:27754483
Genome-wide DNA methylation measurements in prostate tissues uncovers novel prostate cancer diagnostic biomarkers and transcription factor binding patterns.

PubMed

Kirby, Marie K; Ramaker, Ryne C; Roberts, Brian S; Lasseigne, Brittany N; Gunther, David S; Burwell, Todd C; Davis, Nicholas S; Gulzar, Zulfiqar G; Absher, Devin M; Cooper, Sara J; Brooks, James D; Myers, Richard M

2017-04-17

Current diagnostic tools for prostate cancer lack specificity and sensitivity for detecting very early lesions. DNA methylation is a stable genomic modification that is detectable in peripheral patient fluids such as urine and blood plasma that could serve as a non-invasive diagnostic biomarker for prostate cancer. We measured genome-wide DNA methylation patterns in 73 clinically annotated fresh-frozen prostate cancers and 63 benign-adjacent prostate tissues using the Illumina Infinium HumanMethylation450 BeadChip array. We overlaid the most significantly differentially methylated sites in the genome with transcription factor binding sites measured by the Encyclopedia of DNA Elements consortium. We used logistic regression and receiver operating characteristic curves to assess the performance of candidate diagnostic models. We identified methylation patterns that have a high predictive power for distinguishing malignant prostate tissue from benign-adjacent prostate tissue, and these methylation signatures were validated using data from The Cancer Genome Atlas Project. Furthermore, by overlaying ENCODE transcription factor binding data, we observed an enrichment of enhancer of zeste homolog 2 binding in gene regulatory regions with higher DNA methylation in malignant prostate tissues. DNA methylation patterns are greatly altered in prostate cancer tissue in comparison to benign-adjacent tissue. We have discovered patterns of DNA methylation marks that can distinguish prostate cancers with high specificity and sensitivity in multiple patient tissue cohorts, and we have identified transcription factors binding in these differentially methylated regions that may play important roles in prostate cancer development.
Association between polygenic risk for schizophrenia, neurocognition and social cognition across development.

PubMed

Germine, L; Robinson, E B; Smoller, J W; Calkins, M E; Moore, T M; Hakonarson, H; Daly, M J; Lee, P H; Holmes, A J; Buckner, R L; Gur, R C; Gur, R E

2016-10-18

Breakthroughs in genomics have begun to unravel the genetic architecture of schizophrenia risk, providing methods for quantifying schizophrenia polygenic risk based on common genetic variants. Our objective in the current study was to understand the relationship between schizophrenia genetic risk variants and neurocognitive development in healthy individuals. We first used combined genomic and neurocognitive data from the Philadelphia Neurodevelopmental Cohort (4303 participants ages 8-21 years) to screen 26 neurocognitive phenotypes for their association with schizophrenia polygenic risk. Schizophrenia polygenic risk was estimated for each participant based on summary statistics from the most recent schizophrenia genome-wide association analysis (Psychiatric Genomics Consortium 2014). After correction for multiple comparisons, greater schizophrenia polygenic risk was significantly associated with reduced speed of emotion identification and verbal reasoning. These associations were significant by age 9 years and there was no evidence of interaction between schizophrenia polygenic risk and age on neurocognitive performance. We then looked at the association between schizophrenia polygenic risk and emotion identification speed in the Harvard/MGH Brain Genomics Superstruct Project sample (695 participants ages 18-35 years), where we replicated the association between schizophrenia polygenic risk and emotion identification speed. These analyses provide evidence for a replicable association between polygenic risk for schizophrenia and a specific aspect of social cognition. Our findings indicate that individual differences in genetic risk for schizophrenia are linked with the development of aspects of social cognition and potentially verbal reasoning, and that these associations emerge relatively early in development.
High quality draft genome sequence of the moderately halophilic bacterium Pontibacillus yanchengensis Y32(T) and comparison among Pontibacillus genomes.

PubMed

Huang, Jing; Qiao, Zi Xu; Tang, Jing Wei; Wang, Gejiao

2015-01-01

Pontibacillus yanchengensis Y32(T) is an aerobic, motile, Gram-positive, endospore-forming, and moderately halophilic bacterium isolated from a salt field. In this study, we describe the features of P. yanchengensis strain Y32(T) together with a comparison with other four Pontibacillus genomes. The 4,281,464 bp high-quality-draft genome of strain Y32(T) is arranged into 153 contigs containing 3,965 protein-coding genes and 77 RNA encoding genes. The genome of strain Y32(T) possesses many genes related to its halophilic character, flagellar assembly and chemotaxis to support its survival in a salt-rich environment.
Genomic comparison of closely related Giant Viruses supports an accordion-like model of evolution.

PubMed

Filée, Jonathan

2015-01-01

Genome gigantism occurs so far in Phycodnaviridae and Mimiviridae (order Megavirales). Origin and evolution of these Giant Viruses (GVs) remain open questions. Interestingly, availability of a collection of closely related GV genomes enabling genomic comparisons offer the opportunity to better understand the different evolutionary forces acting on these genomes. Whole genome alignment for five groups of viruses belonging to the Mimiviridae and Phycodnaviridae families show that there is no trend of genome expansion or general tendency of genome contraction. Instead, GV genomes accumulated genomic mutations over the time with gene gains compensating the different losses. In addition, each lineage displays specific patterns of genome evolution. Mimiviridae (megaviruses and mimiviruses) and Chlorella Phycodnaviruses evolved mainly by duplications and losses of genes belonging to large paralogous families (including movements of diverse mobiles genetic elements), whereas Micromonas and Ostreococcus Phycodnaviruses derive most of their genetic novelties thought lateral gene transfers. Taken together, these data support an accordion-like model of evolution in which GV genomes have undergone successive steps of gene gain and gene loss, accrediting the hypothesis that genome gigantism appears early, before the diversification of the different GV lineages.
Multiple Myeloma Genomics: A Systematic Review.

PubMed

Weaver, Casey J; Tariman, Joseph D

2017-08-01

This integrative review describes the genomic variants that have been found to be associated with poor prognosis in patients diagnosed with multiple myeloma (MM). Second, it identifies MM genetic and genomic changes using next-generation sequencing, specifically whole-genome sequencing or exome sequencing. A search for peer-reviewed articles through PubMed, EBSCOhost, and DePaul WorldCat Libraries Worldwide yielded 33 articles that were included in the final analysis. The most commonly reported genetic changes were KRAS, NRAS, TP53, FAM46C, BRAF, DIS3, ATM, and CCND1. These genetic changes play a role in the pathogenesis of MM, prognostication, and therapeutic targets for novel therapies. MM genetics and genomics are expanding rapidly; oncology nurse clinicians must have basic competencies in genetics and genomics to help patients understand the complexities of genetic and genomic alterations and be able to refer patients to appropriate genomic professionals if needed. Copyright © 2017 Elsevier Inc. All rights reserved.
COMPARISON OF COMPARATIVE GENOMIC HYBRIDIZATIONS TECHNOLOGIES ACROSS MICROARRAY PLATFORMS

EPA Science Inventory

Comparative Genomic Hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase...
Genomic Diversity of Burkholderia pseudomallei Clinical Isolates: Subtractive Hybridization Reveals a Burkholderia mallei-Specific Propage in B. pseudomallei 1026b

DTIC Science & Technology

2004-06-01

identification of several new virulence gene candidates. In particular, K96243 harbors multiple genomic islands with relatively low GC contents, suggesting...coli, Streptococcus pyogenes, Staphylococcus aureus, S. enterica, and Xylella fastidiosa (11, 16, 17). The genomic sequencing results for multiple... virulence genes by subtractive hybridization: identifica- tion of capsular polysaccharide of Burkholderia pseudomallei as a major virulence determinant
Discovering functional DNA elements using population genomic information: a proof of concept using human mtDNA.

PubMed

Schrider, Daniel R; Kern, Andrew D

2014-06-09

Identifying the complete set of functional elements within the human genome would be a windfall for multiple areas of biological research including medicine, molecular biology, and evolution. Complete knowledge of function would aid in the prioritization of loci when searching for the genetic bases of disease or adaptive phenotypes. Because mutations that disrupt function are disfavored by natural selection, purifying selection leaves a detectable signature within functional elements; accordingly, this signal has been exploited for over a decade through the use of genomic comparisons of distantly related species. While this is so, the functional complement of the genome changes extensively across time and between lineages; therefore, evidence of the current action of purifying selection in humans is essential. Because the removal of deleterious mutations by natural selection also reduces within-species genetic diversity within functional loci, dense population genetic data have the potential to reveal genomic elements that are currently functional. Here, we assess the potential of this approach by examining an ultradeep sample of human mitochondrial genomes (n = 16,411). We show that the high density of polymorphism in this data set precisely delineates regions experiencing purifying selection. Furthermore, we show that the number of segregating alleles at a site is strongly correlated with its divergence across species after accounting for known mutational biases in human mitochondrial DNA (ρ = 0.51; P < 2.2 × 10(-16)). These two measures track one another at a remarkably fine scale across many loci-a correlation that is purely the result of natural selection. Our results demonstrate that genetic variation has the potential to reveal with surprising precision which regions in the genome are currently performing important functions and likely to have deleterious fitness effects when mutated. As more complete human genomes are sequenced, similar power to reveal purifying selection may be achievable in the human nuclear genome. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Whole genome investigation of a divergent clade of the pathogen Streptococcus suis

PubMed Central

Baig, Abiyad; Weinert, Lucy A.; Peters, Sarah E.; Howell, Kate J.; Chaudhuri, Roy R.; Wang, Jinhong; Holden, Matthew T. G.; Parkhill, Julian; Langford, Paul R.; Rycroft, Andrew N.; Wren, Brendan W.; Tucker, Alexander W.; Maskell, Duncan J.

2015-01-01

Streptococcus suis is a major porcine and zoonotic pathogen responsible for significant economic losses in the pig industry and an increasing number of human cases. Multiple isolates of S. suis show marked genomic diversity. Here, we report the analysis of whole genome sequences of nine pig isolates that caused disease typical of S. suis and had phenotypic characteristics of S. suis, but their genomes were divergent from those of many other S. suis isolates. Comparison of protein sequences predicted from divergent genomes with those from normal S. suis reduced the size of core genome from 793 to only 397 genes. Divergence was clear if phylogenetic analysis was performed on reduced core genes and MLST alleles. Phylogenies based on certain other genes (16S rRNA, sodA, recN, and cpn60) did not show divergence for all isolates, suggesting recombination between some divergent isolates with normal S. suis for these genes. Indeed, there is evidence of recent recombination between the divergent and normal S. suis genomes for 249 of 397 core genes. In addition, phylogenetic analysis based on the 16S rRNA gene and 132 genes that were conserved between the divergent isolates and representatives of the broader Streptococcus genus showed that divergent isolates were more closely related to S. suis. Six out of nine divergent isolates possessed a S. suis-like capsule region with variation in capsular gene sequences but the remaining three did not have a discrete capsule locus. The majority (40/70), of virulence-associated genes in normal S. suis were present in the divergent genomes. Overall, the divergent isolates extend the current diversity of S. suis species but the phenotypic similarities and the large amount of gene exchange with normal S. suis gives insufficient evidence to assign these isolates to a new species or subspecies. Further, sampling and whole genome analysis of more isolates is warranted to understand the diversity of the species. PMID:26583006
SynFind: Compiling Syntenic Regions across Any Set of Genomes on Demand.

PubMed

Tang, Haibao; Bomhoff, Matthew D; Briones, Evan; Zhang, Liangsheng; Schnable, James C; Lyons, Eric

2015-11-11

The identification of conserved syntenic regions enables discovery of predicted locations for orthologous and homeologous genes, even when no such gene is present. This capability means that synteny-based methods are far more effective than sequence similarity-based methods in identifying true-negatives, a necessity for studying gene loss and gene transposition. However, the identification of syntenic regions requires complex analyses which must be repeated for pairwise comparisons between any two species. Therefore, as the number of published genomes increases, there is a growing demand for scalable, simple-to-use applications to perform comparative genomic analyses that cater to both gene family studies and genome-scale studies. We implemented SynFind, a web-based tool that addresses this need. Given one query genome, SynFind is capable of identifying conserved syntenic regions in any set of target genomes. SynFind is capable of reporting per-gene information, useful for researchers studying specific gene families, as well as genome-wide data sets of syntenic gene and predicted gene locations, critical for researchers focused on large-scale genomic analyses. Inference of syntenic homologs provides the basis for correlation of functional changes around genes of interests between related organisms. Deployed on the CoGe online platform, SynFind is connected to the genomic data from over 15,000 organisms from all domains of life as well as supporting multiple releases of the same organism. SynFind makes use of a powerful job execution framework that promises scalability and reproducibility. SynFind can be accessed at http://genomevolution.org/CoGe/SynFind.pl. A video tutorial of SynFind using Phytophthrora as an example is available at http://www.youtube.com/watch?v=2Agczny9Nyc. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
A population genomic scan in Chorthippus grasshoppers unveils previously unknown phenotypic divergence.

PubMed

Berdan, Emma L; Mazzoni, Camila J; Waurick, Isabelle; Roehr, Johannes T; Mayer, Frieder

2015-08-01

Understanding the genetics of speciation and the processes that drive it is a central goal of evolutionary biology. Grasshoppers of the Chorthippus species group differ strongly in calling song (and corresponding female preferences) but are exceedingly similar in other characteristics such as morphology. Here, we performed a population genomic scan on three Chorthippus species (Chorthippus biguttulus, C. mollis and C. brunneus) to gain insight into the genes and processes involved in divergence and speciation in this group. Using an RNA-seq approach, we examined functional variation between the species by calling SNPs for each of the three species pairs and using FST -based approaches to identify outliers. We found approximately 1% of SNPs in each comparison to be outliers. Between 37% and 40% of these outliers were nonsynonymous SNPs (as opposed to a global level of 17%) indicating that we recovered loci under selection. Among the outliers were several genes that may be involved in song production and hearing as well as genes involved in other traits such as food preferences and metabolism. Differences in food preferences between species were confirmed with a behavioural experiment. This indicates that multiple phenotypic differences implicating multiple evolutionary processes (sexual selection and natural selection) are present between the species. © 2015 John Wiley & Sons Ltd.
Comparison of taxon-specific versus general locus sets for targeted sequence capture in plant phylogenomics.

PubMed

Chau, John H; Rahfeldt, Wolfgang A; Olmstead, Richard G

2018-03-01

Targeted sequence capture can be used to efficiently gather sequence data for large numbers of loci, such as single-copy nuclear loci. Most published studies in plants have used taxon-specific locus sets developed individually for a clade using multiple genomic and transcriptomic resources. General locus sets can also be developed from loci that have been identified as single-copy and have orthologs in large clades of plants. We identify and compare a taxon-specific locus set and three general locus sets (conserved ortholog set [COSII], shared single-copy nuclear [APVO SSC] genes, and pentatricopeptide repeat [PPR] genes) for targeted sequence capture in Buddleja (Scrophulariaceae) and outgroups. We evaluate their performance in terms of assembly success, sequence variability, and resolution and support of inferred phylogenetic trees. The taxon-specific locus set had the most target loci. Assembly success was high for all locus sets in Buddleja samples. For outgroups, general locus sets had greater assembly success. Taxon-specific and PPR loci had the highest average variability. The taxon-specific data set produced the best-supported tree, but all data sets showed improved resolution over previous non-sequence capture data sets. General locus sets can be a useful source of sequence capture targets, especially if multiple genomic resources are not available for a taxon.
Lateral Gene Transfer in a Heavy Metal-Contaminated-Groundwater Microbial Community

PubMed Central

Hemme, Christopher L.; Green, Stefan J.; Rishishwar, Lavanya; Prakash, Om; Pettenato, Angelica; Chakraborty, Romy; Deutschbauer, Adam M.; Van Nostrand, Joy D.; Wu, Liyou; He, Zhili; Jordan, I. King; Arkin, Adam P.; Kostka, Joel E.

2016-01-01

ABSTRACT Unraveling the drivers controlling the response and adaptation of biological communities to environmental change, especially anthropogenic activities, is a central but poorly understood issue in ecology and evolution. Comparative genomics studies suggest that lateral gene transfer (LGT) is a major force driving microbial genome evolution, but its role in the evolution of microbial communities remains elusive. To delineate the importance of LGT in mediating the response of a groundwater microbial community to heavy metal contamination, representative Rhodanobacter reference genomes were sequenced and compared to shotgun metagenome sequences. 16S rRNA gene-based amplicon sequence analysis indicated that Rhodanobacter populations were highly abundant in contaminated wells with low pHs and high levels of nitrate and heavy metals but remained rare in the uncontaminated wells. Sequence comparisons revealed that multiple geochemically important genes, including genes encoding Fe2+/Pb2+ permeases, most denitrification enzymes, and cytochrome c553, were native to Rhodanobacter and not subjected to LGT. In contrast, the Rhodanobacter pangenome contained a recombinational hot spot in which numerous metal resistance genes were subjected to LGT and/or duplication. In particular, Co2+/Zn2+/Cd2+ efflux and mercuric resistance operon genes appeared to be highly mobile within Rhodanobacter populations. Evidence of multiple duplications of a mercuric resistance operon common to most Rhodanobacter strains was also observed. Collectively, our analyses indicated the importance of LGT during the evolution of groundwater microbial communities in response to heavy metal contamination, and a conceptual model was developed to display such adaptive evolutionary processes for explaining the extreme dominance of Rhodanobacter populations in the contaminated groundwater microbiome. PMID:27048805
Methicillin Resistant Staphylococcus aureus Transmission in a Ghanaian Burn Unit: The Importance of Active Surveillance in Resource-Limited Settings.

PubMed

Amissah, Nana Ama; Buultjens, Andrew H; Ablordey, Anthony; van Dam, Lieke; Opoku-Ware, Ampomah; Baines, Sarah L; Bulach, Dieter; Tetteh, Caitlin S; Prah, Isaac; van der Werf, Tjip S; Friedrich, Alexander W; Seemann, Torsten; van Dijl, Jan Maarten; Stienstra, Ymkje; Stinear, Timothy P; Rossen, John W

2017-01-01

Objectives: Staphylococcus aureus infections in burn patients can lead to serious complications and death. The frequency of S. aureus infection is high in low- and middle-income countries presumably due to limited resources, misuse of antibiotics and poor infection control. The objective of the present study was to apply population genomics to precisely define, for the first time, the transmission of antibiotic resistant S. aureus in a resource-limited setting in sub-Saharan Africa. Methods: Staphylococcus aureus surveillance was performed amongst burn patients and healthcare workers during a 7-months survey within the burn unit of the Korle Bu Teaching Hospital in Ghana. Results: Sixty-six S. aureus isolates (59 colonizing and 7 clinical) were obtained from 31 patients and 10 healthcare workers. Twenty-one of these isolates were ST250-IV methicillin-resistant S. aureus (MRSA). Notably, 25 (81%) of the 31 patients carried or were infected with S. aureus within 24 h of admission. Genome comparisons revealed six distinct S. aureus clones circulating in the burn unit, and demonstrated multiple transmission events between patients and healthcare workers. Further, the collected S. aureus isolates exhibited a wide range of genotypic resistances to antibiotics, including trimethoprim (21%), aminoglycosides (33%), oxacillin (33%), chloramphenicol (50%), tetracycline (59%) and fluoroquinolones (100%). Conclusion: Population genomics uncovered multiple transmission events of S. aureus , especially MRSA, within the investigated burn unit. Our findings highlight lapses in infection control and prevention, and underscore the great importance of active surveillance to protect burn victims against multi-drug resistant pathogens in resource-limited settings.
Whole Genome Comparison Reveals High Levels of Inbreeding and Strain Redundancy Across the Spectrum of Commercial Wine Strains of Saccharomyces cerevisiae

PubMed Central

Borneman, Anthony R.; Forgan, Angus H.; Kolouchova, Radka; Fraser, James A.; Schmidt, Simon A.

2016-01-01

Humans have been consuming wines for more than 7000 yr . For most of this time, fermentations were presumably performed by strains of Saccharomyces cerevisiae that naturally found their way into the fermenting must . In contrast, most commercial wines are now produced by inoculation with pure yeast monocultures, ensuring consistent, reliable and reproducible fermentations, and there are now hundreds of these yeast starter cultures commercially available. In order to thoroughly investigate the genetic diversity that has been captured by over 50 yr of commercial wine yeast development and domestication, whole genome sequencing has been performed on 212 strains of S. cerevisiae, including 119 commercial wine and brewing starter strains, and wine isolates from across seven decades. Comparative genomic analysis indicates that, despite their large numbers, commercial strains, and wine strains in general, are extremely similar genetically, possessing all of the hallmarks of a population bottle-neck, and high levels of inbreeding. In addition, many commercial strains from multiple suppliers are nearly genetically identical, suggesting that the limits of effective genetic variation within this genetically narrow group may be approaching saturation. PMID:26869621
A Y-Encoded Suppressor of Feminization Arose via Lineage-Specific Duplication of a Cytokinin Response Regulator in Kiwifruit[OPEN

PubMed Central

Ohtani, Haruka; Morimoto, Takuya; Beppu, Kenji; Kataoka, Ikuo

2018-01-01

Dioecy, the presence of male and female flowers on distinct individuals, has evolved independently in multiple plant lineages, and the genes involved in this differential development are just starting to be uncovered in a few species. Here, we used genomic approaches to investigate this pathway in kiwifruits (genus Actinidia). Genome-wide cataloging of male-specific subsequences, combined with transcriptome analysis, led to the identification of a type-C cytokinin response regulator as a potential sex determinant gene in this genus. Functional transgenic analyses in two model systems, Arabidopsis thaliana and Nicotiana tabacum, indicated that this gene acts as a dominant suppressor of carpel development, prompting us to name it Shy Girl (SyGI). Evolutionary analyses in a panel of Actinidia species revealed that SyGI is located in the Y-specific region of the genome and probably arose from a lineage-specific gene duplication. Comparisons with the duplicated autosomal counterpart, and with orthologs from other angiosperms, suggest that the SyGI-specific duplication and subsequent evolution of cis-elements may have played a key role in the acquisition of separate sexes in this species. PMID:29626069
Highly Conserved Mitochondrial Genomes among Multicellular Red Algae of the Florideophyceae

PubMed Central

Yang, Eun Chan; Kim, Kyeong Mi; Kim, Su Yeon; Lee, JunMo; Boo, Ga Hun; Lee, Jung-Hyun; Nelson, Wendy A.; Yi, Gangman; Schmidt, William E.; Fredericq, Suzanne; Boo, Sung Min; Bhattacharya, Debashish; Yoon, Hwan Su

2015-01-01

Two red algal classes, the Florideophyceae (approximately 7,100 spp.) and Bangiophyceae (approximately 193 spp.), comprise 98% of red algal diversity in marine and freshwater habitats. These two classes form well-supported monophyletic groups in most phylogenetic analyses. Nonetheless, the interordinal relationships remain largely unresolved, in particular in the largest subclass Rhodymeniophycidae that includes 70% of all species. To elucidate red algal phylogenetic relationships and study organelle evolution, we determined the sequence of 11 mitochondrial genomes (mtDNA) from 5 florideophycean subclasses. These mtDNAs were combined with existing data, resulting in a database of 25 florideophytes and 12 bangiophytes (including cyanidiophycean species). A concatenated alignment of mt proteins was used to resolve ordinal relationships in the Rhodymeniophycidae. Red algal mtDNA genome comparisons showed 47 instances of gene rearrangement including 12 that distinguish Bangiophyceae from Hildenbrandiophycidae, and 5 that distinguish Hildenbrandiophycidae from Nemaliophycidae. These organelle data support a rapid radiation and surprisingly high conservation of mtDNA gene syntheny among the morphologically divergent multicellular lineages of Rhodymeniophycidae. In contrast, we find extensive mitochondrial gene rearrangements when comparing Bangiophyceae and Florideophyceae and multiple examples of gene loss among the different red algal lineages. PMID:26245677

Physical and genetic map of Streptococcus thermophilus A054.

PubMed Central

Roussel, Y; Pebay, M; Guedon, G; Simonet, J M; Decaris, B

1994-01-01

The three restriction endonucleases SfiI, BssHII, and SmaI were found to generate fragments with suitable size distributions for mapping the genome of Streptococcus thermophilus A054. A total of 5, 8, and 24 fragments were produced with SfiI, BssHII, and SmaI, respectively. An average genome size of 1,824 kb was determined by summing the total fragment sizes obtained by digestions with these three enzymes. Partial and multiple digestions of genomic DNA in conjunction with Southern hybridization were used to map SfiI, BssHII, and SmaI fragments. All restriction fragments were arranged in a unique circular chromosome. Southern hybridization analysis with specific probes allowed 23 genetic markers to be located on the restriction map. Among them, six rrn loci were precisely located. The area of the chromosome containing the ribosomal operons was further detailed by mapping some of the ApaI and SgrAI sites. Comparison of macrorestriction patterns from three clones derived from strain A054 revealed two variable regions in the chromosome. One was associated with the tandem rrnD and rrnE loci, and the other was mapped in the region of the lactose operon. Images PMID:8002562
The ENCODE Project at UC Santa Cruz.

PubMed

Thomas, Daryl J; Rosenbloom, Kate R; Clawson, Hiram; Hinrichs, Angie S; Trumbower, Heather; Raney, Brian J; Karolchik, Donna; Barber, Galt P; Harte, Rachel A; Hillman-Jackson, Jennifer; Kuhn, Robert M; Rhead, Brooke L; Smith, Kayla E; Thakkapallayil, Archana; Zweig, Ann S; Haussler, David; Kent, W James

2007-01-01

The goal of the Encyclopedia Of DNA Elements (ENCODE) Project is to identify all functional elements in the human genome. The pilot phase is for comparison of existing methods and for the development of new methods to rigorously analyze a defined 1% of the human genome sequence. Experimental datasets are focused on the origin of replication, DNase I hypersensitivity, chromatin immunoprecipitation, promoter function, gene structure, pseudogenes, non-protein-coding RNAs, transcribed RNAs, multiple sequence alignment and evolutionarily constrained elements. The ENCODE project at UCSC website (http://genome.ucsc.edu/ENCODE) is the primary portal for the sequence-based data produced as part of the ENCODE project. In the pilot phase of the project, over 30 labs provided experimental results for a total of 56 browser tracks supported by 385 database tables. The site provides researchers with a number of tools that allow them to visualize and analyze the data as well as download data for local analyses. This paper describes the portal to the data, highlights the data that has been made available, and presents the tools that have been developed within the ENCODE project. Access to the data and types of interactive analysis that are possible are illustrated through supplemental examples.
Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis

PubMed Central

2013-01-01

Background Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis. Methods We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities. Results A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis. Conclusions We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression. PMID:24308539
Setting Up the JBrowse Genome Browser

PubMed Central

Skinner, Mitchell E; Holmes, Ian H

2010-01-01

JBrowse is a web-based tool for visualizing genomic data. Unlike most other web-based genome browsers, JBrowse exploits the capabilities of the user's web browser to make scrolling and zooming fast and smooth. It supports the browsers used by almost all internet users, and is relatively simple to install. JBrowse can utilize multiple types of data in a variety of common genomic data formats, including genomic feature data in bioperl databases, GFF files, and BED files, and quantitative data in wiggle files. This unit describes how to obtain the JBrowse software, set it up on a Linux or Mac OS X computer running as a web server and incorporate genome annotation data from multiple sources into JBrowse. After completing the protocols described in this unit, the reader will have a web site that other users can visit to browse the genomic data. PMID:21154710
Detection of genomic rearrangements in cucumber using genomecmp software

NASA Astrophysics Data System (ADS)

Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

2017-08-01

Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.
A MAD-Bayes Algorithm for State-Space Inference and Clustering with Application to Querying Large Collections of ChIP-Seq Data Sets.

PubMed

Zuo, Chandler; Chen, Kailei; Keleş, Sündüz

2017-06-01

Current analytic approaches for querying large collections of chromatin immunoprecipitation followed by sequencing (ChIP-seq) data from multiple cell types rely on individual analysis of each data set (i.e., peak calling) independently. This approach discards the fact that functional elements are frequently shared among related cell types and leads to overestimation of the extent of divergence between different ChIP-seq samples. Methods geared toward multisample investigations have limited applicability in settings that aim to integrate 100s to 1000s of ChIP-seq data sets for query loci (e.g., thousands of genomic loci with a specific binding site). Recently, Zuo et al. developed a hierarchical framework for state-space matrix inference and clustering, named MBASIC, to enable joint analysis of user-specified loci across multiple ChIP-seq data sets. Although this versatile framework estimates both the underlying state-space (e.g., bound vs. unbound) and also groups loci with similar patterns together, its Expectation-Maximization-based estimation structure hinders its applicability with large number of loci and samples. We address this limitation by developing MAP-based asymptotic derivations from Bayes (MAD-Bayes) framework for MBASIC. This results in a K-means-like optimization algorithm that converges rapidly and hence enables exploring multiple initialization schemes and flexibility in tuning. Comparison with MBASIC indicates that this speed comes at a relatively insignificant loss in estimation accuracy. Although MAD-Bayes MBASIC is specifically designed for the analysis of user-specified loci, it is able to capture overall patterns of histone marks from multiple ChIP-seq data sets similar to those identified by genome-wide segmentation methods such as ChromHMM and Spectacle.
Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources

USDA-ARS?s Scientific Manuscript database

Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...
The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

PubMed

Braasch, Ingo; Gehrke, Andrew R; Smith, Jeramiah J; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M; Campbell, Michael S; Barrell, Daniel; Martin, Kyle J; Mulley, John F; Ravi, Vydianathan; Lee, Alison P; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E G; Sun, Yi; Hertel, Jana; Beam, Michael J; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H; Litman, Gary W; Litman, Ronda T; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F; Wang, Han; Taylor, John S; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M J; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T; Venkatesh, Byrappa; Holland, Peter W H; Guiguen, Yann; Bobe, Julien; Shubin, Neil H; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H

2016-04-01

To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.
The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

PubMed Central

Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

2016-01-01

To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

PubMed

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-03-01

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Integrative prescreening in analysis of multiple cancer genomic studies

PubMed Central

2012-01-01

Background In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost. Results An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach. Conclusions The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers. PMID:22799431
Creation of a Recombinant Rift Valley Fever Virus with a Two-Segmented Genome ▿ †

PubMed Central

Brennan, Benjamin; Welch, Stephen R.; McLees, Angela; Elliott, Richard M.

2011-01-01

Rift Valley fever virus (RVFV; family Bunyaviridae) is a clinically important, mosquito-borne pathogen of both livestock and humans, which is found mainly in sub-Saharan Africa and the Arabian Peninsula. RVFV has a trisegmented single-stranded RNA (ssRNA) genome. The L and M segments are negative sense and encode the L protein (viral polymerase) on the L segment and the virion glycoproteins Gn and Gc as well as two other proteins, NSm and 78K, on the M segment. The S segment uses an ambisense coding strategy to express the nucleocapsid protein, N, and the nonstructural protein, NSs. Both the NSs and NSm proteins are dispensable for virus growth in tissue culture. Using reverse genetics, we generated a recombinant virus, designated r2segMP12, containing a two-segmented genome in which the NSs coding sequence was replaced with that for the Gn and Gc precursor. Thus, r2segMP12 lacks an M segment, and although it was attenuated in comparison to the three-segmented parental virus in both mammalian and insect cell cultures, it was genetically stable over multiple passages. We further show that the virus can stably maintain an M-like RNA segment encoding the enhanced green fluorescent protein gene. The implications of these findings for RVFV genome packaging and the potential to develop multivalent live-attenuated vaccines are discussed. PMID:21795328
Genomic survey of Clostridium difficile reservoirs in the East of England implicates environmental contamination of wastewater treatment plants by clinical lineages.

PubMed

Moradigaravand, Danesh; Gouliouris, Theodore; Ludden, Catherine; Reuter, Sandra; Jamrozy, Dorota; Blane, Beth; Naydenova, Plamena; Judge, Kim; H Aliyu, Sani; F Hadjirin, Nazreen; A Holmes, Mark; Török, Estée; M Brown, Nicholas; Parkhill, Julian; Peacock, Sharon

2018-03-02

There is growing evidence that patients with Clostridiumdifficile-associated diarrhoea often acquire their infecting strain before hospital admission. Wastewater is known to be a potential source of surface water that is contaminated with C. difficile spores. Here, we describe a study that used genome sequencing to compare C. difficile isolated from multiple wastewater treatment plants across the East of England and from patients with clinical disease at a major hospital in the same region. We confirmed that C. difficile from 65 patients were highly diverse and that most cases were not linked to other active cases in the hospital. In total, 186 C. difficile isolates were isolated from effluent water obtained from 18 municipal treatment plants at the point of release into the environment. Whole genome comparisons of clinical and environmental isolates demonstrated highly related populations, and confirmed extensive release of toxigenic C. difficile into surface waters. An analysis based on multilocus sequence types (STs) identified 19 distinct STs in the clinical collection and 38 STs in the wastewater collection, with 13 of 44 STs common to both clinical and wastewater collections. Furthermore, we identified five pairs of highly similar isolates (≤2 SNPs different in the core genome) in clinical and wastewater collections. Strategies to control community acquisition should consider the need for bacterial control of treated wastewater.
Karyotype and genome size of Iberochondrostoma almacai (Teleostei, Cyprinidae) and comparison with the sister-species I.lusitanicum

PubMed Central

2009-01-01

This study aimed to define the karyotype of the recently described Iberian endemic Iberochondrostoma almacai, to revisit the previously documented chromosome polymorphisms of its sister species I.lusitanicum using C-, Ag-/CMA3 and RE-banding, and to compare the two species genome sizes. A 2n = 50 karyotype (with the exception of a triploid I.lusitanicum specimen) and a corresponding haploid chromosome formula of 7M:15SM:3A (FN = 94) were found. Multiple NORs were observed in both species (in two submetacentric chromosome pairs, one of them clearly homologous) and a higher intra and interpopulational variability was evidenced in I.lusitanicum. Flow cytometry measurements of nuclear DNA content showed some significant differences in genome size both between and within species: the genome of I. almacai was smaller than that of I.lusitanicum (mean values 2.61 and 2.93 pg, respectively), which presented a clear interpopulational variability (mean values ranging from 2.72 to 3.00 pg). These data allowed the distinction of both taxa and confirmed the existence of two well differentiated groups within I. lusitanicum: one that includes the populations from the right bank of the Tejo and Samarra drainages, and another that reunites the southern populations. The peculiar differences between the two species, presently listed as “Critically Endangered”, reinforced the importance of this study for future conservation plans. PMID:21637679
Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader.

PubMed

Richardson, Mark F; Sequeira, Fernando; Selechnik, Daniel; Carneiro, Miguel; Vallinoto, Marcelo; Reid, Jack G; West, Andrea J; Crossland, Michael R; Shine, Richard; Rollins, Lee A

2018-01-01

Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. © The Authors 2017. Published by Oxford University Press.
Genomic survey of Clostridium difficile reservoirs in the East of England implicates environmental contamination of wastewater treatment plants by clinical lineages

PubMed Central

Moradigaravand, Danesh; Gouliouris, Theodore; Ludden, Catherine; Reuter, Sandra; Jamrozy, Dorota; Blane, Beth; Naydenova, Plamena; Judge, Kim; H. Aliyu, Sani; F. Hadjirin, Nazreen; A. Holmes, Mark; Török, Estée; M. Brown, Nicholas; Parkhill, Julian; Peacock, Sharon

2018-01-01

There is growing evidence that patients with Clostridiumdifficile-associated diarrhoea often acquire their infecting strain before hospital admission. Wastewater is known to be a potential source of surface water that is contaminated with C. difficile spores. Here, we describe a study that used genome sequencing to compare C. difficile isolated from multiple wastewater treatment plants across the East of England and from patients with clinical disease at a major hospital in the same region. We confirmed that C. difficile from 65 patients were highly diverse and that most cases were not linked to other active cases in the hospital. In total, 186 C. difficile isolates were isolated from effluent water obtained from 18 municipal treatment plants at the point of release into the environment. Whole genome comparisons of clinical and environmental isolates demonstrated highly related populations, and confirmed extensive release of toxigenic C. difficile into surface waters. An analysis based on multilocus sequence types (STs) identified 19 distinct STs in the clinical collection and 38 STs in the wastewater collection, with 13 of 44 STs common to both clinical and wastewater collections. Furthermore, we identified five pairs of highly similar isolates (≤2 SNPs different in the core genome) in clinical and wastewater collections. Strategies to control community acquisition should consider the need for bacterial control of treated wastewater. PMID:29498619
Improving amphibian genomic resources: a multitissue reference transcriptome of an iconic invader

PubMed Central

Reid, Jack G; Crossland, Michael R

2018-01-01

Abstract Background Cane toads (Rhinella marina) are an iconic invasive species introduced to 4 continents and well utilized for studies of rapid evolution in introduced environments. Despite the long introduction history of this species, its profound ecological impacts, and its utility for demonstrating evolutionary principles, genetic information is sparse. Here we produce a de novo transcriptome spanning multiple tissues and life stages to enable investigation of the genetic basis of previously identified rapid phenotypic change over the introduced range. Findings Using approximately 1.9 billion reads from developing tadpoles and 6 adult tissue-specific cDNA libraries, as well as a transcriptome assembly pipeline encompassing 100 separate de novo assemblies, we constructed 62 202 transcripts, of which we functionally annotated ∼50%. Our transcriptome assembly exhibits 90% full-length completeness of the Benchmarking Universal Single-Copy Orthologs data set. Robust assembly metrics and comparisons with several available anuran transcriptomes and genomes indicate that our cane toad assembly is one of the most complete anuran genomic resources available. Conclusions This comprehensive anuran transcriptome will provide a valuable resource for investigation of genes under selection during invasion in cane toads, but will also greatly expand our general knowledge of anuran genomes, which are underrepresented in the literature. The data set is publically available in NCBI and GigaDB to serve as a resource for other researchers. PMID:29186423
Colistin-Resistant Acinetobacter baumannii Clinical Strains with Deficient Biofilm Formation

PubMed Central

Dafopoulou, Konstantina; Xavier, Basil Britto; Hotterbeekx, An; Janssens, Lore; Lammens, Christine; Dé, Emmanuelle; Goossens, Herman; Tsakris, Athanasios; Malhotra-Kumar, Surbhi

2015-01-01

In two pairs of clinical colistin-susceptible/colistin-resistant (Csts/Cstr) Acinetobacter baumannii strains, the Cstr strains showed significantly decreased biofilm formation in static and dynamic assays (P < 0.001) and lower relative fitness (P < 0.05) compared with those of the Csts counterparts. The whole-genome sequencing comparison of strain pairs identified a mutation converting a stop codon to lysine (*241K) in LpsB (involved in lipopolysaccharide [LPS] synthesis) in one Cstr strain and a frameshift mutation in CarO and the loss of a 47,969-bp element containing multiple genes associated with biofilm production in the other. PMID:26666921
[The life cycle of Rubella Virus].

PubMed

Sakata, Masafumi; Mori, Yoshio

2014-01-01

Rubella virus (RV), an infectious agent of rubella, is the sole member of the genus Rubivirus in the family of Togaviridae. RV has a positive-stranded sense RNA as a genome. A natural host of RV is limited to human, and rubella is considered to be a childhood disease in general. When woman is infected with RV during early pregnancy, her fetus may develop severe birth defects known as congenital rubella syndrome. In this review, the RV life cycle from the virus entry to budding is illustrated in comparison with those of member viruses of the genus alphavirus in the same family. The multiple functions of the RV capsid protein are also introduced.
Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera)

PubMed Central

2011-01-01

Background The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Results Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Conclusions Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the loss of this gene lead to the formation of minicircles in lice. PMID:21813020

Mitochondrial genome deletions and minicircles are common in lice (Insecta: Phthiraptera).

PubMed

Cameron, Stephen L; Yoshizawa, Kazunori; Mizukoshi, Atsushi; Whiting, Michael F; Johnson, Kevin P

2011-08-04

The gene composition, gene order and structure of the mitochondrial genome are remarkably stable across bilaterian animals. Lice (Insecta: Phthiraptera) are a major exception to this genomic stability in that the canonical single chromosome with 37 genes found in almost all other bilaterians has been lost in multiple lineages in favour of multiple, minicircular chromosomes with less than 37 genes on each chromosome. Minicircular mt genomes are found in six of the ten louse species examined to date and three types of minicircles were identified: heteroplasmic minicircles which coexist with full sized mt genomes (type 1); multigene chromosomes with short, simple control regions, we infer that the genome consists of several such chromosomes (type 2); and multiple, single to three gene chromosomes with large, complex control regions (type 3). Mapping minicircle types onto a phylogenetic tree of lice fails to show a pattern of their occurrence consistent with an evolutionary series of minicircle types. Analysis of the nuclear-encoded, mitochondrially-targetted genes inferred from the body louse, Pediculus, suggests that the loss of mitochondrial single-stranded binding protein (mtSSB) may be responsible for the presence of minicircles in at least species with the most derived type 3 minicircles (Pediculus, Damalinia). Minicircular mt genomes are common in lice and appear to have arisen multiple times within the group. Life history adaptive explanations which attribute minicircular mt genomes in lice to the adoption of blood-feeding in the Anoplura are not supported by this expanded data set as minicircles are found in multiple non-blood feeding louse groups but are not found in the blood-feeding genus Heterodoxus. In contrast, a mechanist explanation based on the loss of mtSSB suggests that minicircles may be selectively favoured due to the incapacity of the mt replisome to synthesize long replicative products without mtSSB and thus the loss of this gene lead to the formation of minicircles in lice.
Mutational analysis of multiple lung cancers: Discrimination between primary and metastatic lung cancers by genomic profile.

PubMed

Goto, Taichiro; Hirotsu, Yosuke; Mochizuki, Hitoshi; Nakagomi, Takahiro; Shikata, Daichi; Yokoyama, Yujiro; Oyama, Toshio; Amemiya, Kenji; Okimoto, Kenichiro; Omata, Masao

2017-05-09

In cases of multiple lung cancers, individual tumors may represent either a primary lung cancer or both primary and metastatic lung cancers. Treatment selection varies depending on such features, and this discrimination is critically important in predicting prognosis. The present study was undertaken to determine the efficacy and validity of mutation analysis as a means of determining whether multiple lung cancers are primary or metastatic in nature. The study involved 12 patients who underwent surgery in our department for multiple lung cancers between July 2014 and March 2016. Tumor cells were collected from formalin-fixed paraffin-embedded tissues of the primary lesions by using laser capture microdissection, and targeted sequencing of 53 lung cancer-related genes was performed. In surgically treated patients with multiple lung cancers, the driver mutation profile differed among the individual tumors. Meanwhile, in a case of a solitary lung tumor that appeared after surgery for double primary lung cancers, gene mutation analysis using a bronchoscopic biopsy sample revealed a gene mutation profile consistent with the surgically resected specimen, thus demonstrating that the tumor in this case was metastatic. In cases of multiple lung cancers, the comparison of driver mutation profiles clarifies the clonal origin of the tumors and enables discrimination between primary and metastatic tumors.
Comparative genome-wide analysis reveals that Burkholderia contaminans MS14 possesses multiple antimicrobial biosynthesis genes but not major genetic loci required for pathogenesis.

PubMed

Deng, Peng; Wang, Xiaoqiang; Baird, Sonya M; Showmaker, Kurt C; Smith, Leif; Peterson, Daniel G; Lu, Shien

2016-06-01

Burkholderia contaminans MS14 shows significant antimicrobial activities against plant and animal pathogenic fungi and bacteria. The antifungal agent occidiofungin produced by MS14 has great potential for development of biopesticides and pharmaceutical drugs. However, the use of Burkholderia species as biocontrol agent in agriculture is restricted due to the difficulties in distinguishing between plant growth-promoting bacteria and the pathogenic bacteria. The complete MS14 genome was sequenced and analyzed to find what beneficial and virulence-related genes it harbors. The phylogenetic relatedness of B. contaminans MS14 and other 17 Burkholderia species was also analyzed. To research MS14's potential virulence, the gene regions related to the antibiotic production, antibiotic resistance, and virulence were compared between MS14 and other Burkholderia genomes. The genome of B. contaminans MS14 was sequenced and annotated. The genomic analyses reveal the presence of multiple gene sets for antimicrobial biosynthesis, which contribute to its antimicrobial activities. BLAST results indicate that the MS14 genome harbors a large number of unique regions. MS14 is closely related to another plant growth-promoting Burkholderia strain B. lata 383 according to the average nucleotide identity data. Moreover, according to the phylogenetic analysis, plant growth-promoting species isolated from soils and mammalian pathogenic species are clustered together, respectively. MS14 has multiple antimicrobial activity-related genes identified from the genome, but it lacks key virulence-related gene loci found in the pathogenic strains. Additionally, plant growth-promoting Burkholderia species have one or more antimicrobial biosynthesis genes in their genomes as compared with nonplant growth-promoting soil-isolated Burkholderia species. On the other hand, pathogenic species harbor multiple virulence-associated gene loci that are not present in nonpathogenic Burkholderia species. The MS14 genome as well as Burkholderia species genome show considerable diversity. Multiple antimicrobial agent biosynthesis genes were identified in the genome of plant growth-promoting species of Burkholderia. In addition, by comparing to nonpathogenic Burkholderia species, pathogenic Burkholderia species have more characterized homologs of the gene loci known to contribute to pathogenicity and virulence to plant and animals. © 2016 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
The MAX Statistic is Less Powerful for Genome Wide Association Studies Under Most Alternative Hypotheses.

PubMed

Shifflett, Benjamin; Huang, Rong; Edland, Steven D

2017-01-01

Genotypic association studies are prone to inflated type I error rates if multiple hypothesis testing is performed, e.g., sequentially testing for recessive, multiplicative, and dominant risk. Alternatives to multiple hypothesis testing include the model independent genotypic χ 2 test, the efficiency robust MAX statistic, which corrects for multiple comparisons but with some loss of power, or a single Armitage test for multiplicative trend, which has optimal power when the multiplicative model holds but with some loss of power when dominant or recessive models underlie the genetic association. We used Monte Carlo simulations to describe the relative performance of these three approaches under a range of scenarios. All three approaches maintained their nominal type I error rates. The genotypic χ 2 and MAX statistics were more powerful when testing a strictly recessive genetic effect or when testing a dominant effect when the allele frequency was high. The Armitage test for multiplicative trend was most powerful for the broad range of scenarios where heterozygote risk is intermediate between recessive and dominant risk. Moreover, all tests had limited power to detect recessive genetic risk unless the sample size was large, and conversely all tests were relatively well powered to detect dominant risk. Taken together, these results suggest the general utility of the multiplicative trend test when the underlying genetic model is unknown.
Sequencing of the large dsDNA genome of Oryctes rhinoceros nudivirus using multiple displacement amplification of nanogram amounts of virus DNA.

PubMed

Wang, Yongjie; Kleespies, Regina G; Ramle, Moslim B; Jehle, Johannes A

2008-09-01

The genomic sequence analysis of many large dsDNA viruses is hampered by the lack of enough sample materials. Here, we report a whole genome amplification of the Oryctes rhinoceros nudivirus (OrNV) isolate Ma07 starting from as few as about 10 ng of purified viral DNA by application of phi29 DNA polymerase- and exonuclease-resistant random hexamer-based multiple displacement amplification (MDA) method. About 60 microg of high molecular weight DNA with fragment sizes of up to 25 kbp was amplified. A genomic DNA clone library was generated using the product DNA. After 8-fold sequencing coverage, the 127,615 bp of OrNV whole genome was sequenced successfully. The results demonstrate that the MDA-based whole genome amplification enables rapid access to genomic information from exiguous virus samples.
CAMBerVis: visualization software to support comparative analysis of multiple bacterial strains.

PubMed

Woźniak, Michał; Wong, Limsoon; Tiuryn, Jerzy

2011-12-01

A number of inconsistencies in genome annotations are documented among bacterial strains. Visualization of the differences may help biologists to make correct decisions in spurious cases. We have developed a visualization tool, CAMBerVis, to support comparative analysis of multiple bacterial strains. The software manages simultaneous visualization of multiple bacterial genomes, enabling visual analysis focused on genome structure annotations. The CAMBerVis software is freely available at the project website: http://bioputer.mimuw.edu.pl/camber. Input datasets for Mycobacterium tuberculosis and Staphylocacus aureus are integrated with the software as examples. m.wozniak@mimuw.edu.pl Supplementary data are available at Bioinformatics online.
Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

PubMed

Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

2017-05-22

Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.
Whole-genome sequence analysis of the Mycobacterium avium complex and proposal of the transfer of Mycobacterium yongonense to Mycobacterium intracellulare subsp. yongonense subsp. nov.

PubMed

Castejon, Maria; Menéndez, Maria Carmen; Comas, Iñaki; Vicente, Ana; Garcia, Maria J

2018-06-01

Bacterial whole-genome sequences contain informative features of their evolutionary pathways. Comparison of whole-genome sequences have become the method of choice for classification of prokaryotes, thus allowing the identification of bacteria from an evolutionary perspective, and providing data to resolve some current controversies. Currently, controversy exists about the assignment of members of the Mycobacterium avium complex, as is for the cases of Mycobacterium yongonense and 'Mycobacterium indicus pranii'. These two mycobacteria, closely related to Mycobacterium intracellulare on the basis of standard phenotypic and single gene-sequences comparisons, were not considered a member of such species on the basis on some particular differences displayed by a single strain. Whole-genome sequence comparison procedures, namely the average nucleotide identity and the genome distance, showed that those two mycobacteria should be considered members of the species M. intracellulare. The results were confirmed with other whole-genome comparison supplementary methods. According to the data provided, Mycobacterium yongonense and 'Mycobacterium indicus pranii' should be considered and renamed and included as members of M. intracellulare. This study highlights the problems caused when a novel species is accepted on the basis of a single strain, as was the case for M. yongonense. Based mainly on whole-genome sequence analysis, we conclude that M. yongonense should be reclassified as a subspecies of Mycobacterium intracellulareas Mycobacterium intracellularesubsp. yongonense and 'Mycobacterium indicus pranii' classified in the same subspecies as the type strain of Mycobacterium intracellulare and classified as Mycobacterium intracellularesubsp. intracellulare.
Whole genome assembly of a natto production strain Bacillus subtilis natto from very short read data.

PubMed

Nishito, Yukari; Osana, Yasunori; Hachiya, Tsuyoshi; Popendorf, Kris; Toyoda, Atsushi; Fujiyama, Asao; Itaya, Mitsuhiro; Sakakibara, Yasubumi

2010-04-16

Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and functions as a starter for the production of the traditional Japanese food "natto" made from soybeans. Although re-sequencing whole genomes of several laboratory domesticated B. subtilis 168 derivatives has already been attempted using short read sequencing data, the assembly of the whole genome sequence of a closely related strain, B. subtilis natto, from very short read data is more challenging, particularly with our aim to assemble one fully connected scaffold from short reads around 35 bp in length. We applied a comparative genome assembly method, which combines de novo assembly and reference guided assembly, to one of the B. subtilis natto strains. We successfully assembled 28 scaffolds and managed to avoid substantial fragmentation. Completion of the assembly through long PCR experiments resulted in one connected scaffold for B. subtilis natto. Based on the assembled genome sequence, our orthologous gene analysis between natto BEST195 and Marburg 168 revealed that 82.4% of 4375 predicted genes in BEST195 are one-to-one orthologous to genes in 168, with two genes in-paralog, 3.2% are deleted in 168, 14.3% are inserted in BEST195, and 5.9% of genes present in 168 are deleted in BEST195. The natto genome contains the same alleles in the promoter region of degQ and the coding region of swrAA as the wild strain, RO-FF-1. These are specific for gamma-PGA production ability, which is related to natto production. Further, the B. subtilis natto strain completely lacked a polyketide synthesis operon, disrupted the plipastatin production operon, and possesses previously unidentified transposases. The determination of the whole genome sequence of Bacillus subtilis natto provided detailed analyses of a set of genes related to natto production, demonstrating the number and locations of insertion sequences that B. subtilis natto harbors but B. subtilis 168 lacks. Multiple genome-level comparisons among five closely related Bacillus species were also carried out. The determined genome sequence of B. subtilis natto and gene annotations are available from the Natto genome browser http://natto-genome.org/.
Integrating genome assemblies with MAIA

PubMed Central

Nijkamp, Jurgen; Winterbach, Wynand; van den Broek, Marcel; Daran, Jean-Marc; Reinders, Marcel; de Ridder, Dick

2010-01-01

Motivation: De novo assembly of a eukaryotic genome with next-generation sequencing data is still a challenging task. Over the past few years several assemblers have been developed, often suitable for one specific type of sequencing data. The number of known genomes is expanding rapidly, therefore it becomes possible to use multiple reference genomes for assembly projects. We introduce an assembly integrator that makes use of all available data, i.e. multiple de novo assemblies and mappings against multiple related genomes, by optimizing a weighted combination of criteria. Results: The developed algorithm was applied on the de novo sequencing of the Saccharomyces cerevisiae CEN.PK 113-7D strain. Using Solexa and 454 read data, two de novo and three comparative assemblies were constructed and subsequently integrated, yielding 29 contigs, covering more than 12 Mbp; a drastic improvement compared with the single assemblies. Availability: MAIA is available as a Matlab package and can be downloaded from http://bioinformatics.tudelft.nl Contact: j.f.nijkamp@tudelft.nl PMID:20823304
Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes.

PubMed

Hirsch, Cory D; Evans, Joseph; Buell, C Robin; Hirsch, Candice N

2014-07-01

Technology and software improvements in the last decade now provide methodologies to access the genome sequence of not only a single accession, but also multiple accessions of plant species. This provides a means to interrogate species diversity at the genome level. Ample diversity among accessions in a collection of species can be found, including single-nucleotide polymorphisms, insertions and deletions, copy number variation and presence/absence variation. For species with small, non-repetitive rich genomes, re-sequencing of query accessions is robust, highly informative, and economically feasible. However, for species with moderate to large sized repetitive-rich genomes, technical and economic barriers prevent en masse genome re-sequencing of accessions. Multiple approaches to access a focused subset of loci in species with larger genomes have been developed, including reduced representation sequencing, exome capture and transcriptome sequencing. Collectively, these approaches have enabled interrogation of diversity on a genome scale for large plant genomes, including crop species important to worldwide food security. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity

NASA Astrophysics Data System (ADS)

Corcoran, Martin M.; Phad, Ganesh E.; Bernat, Néstor Vázquez; Stahl-Hennig, Christiane; Sumida, Noriyuki; Persson, Mats A. A.; Martin, Marcel; Hedestam, Gunilla B. Karlsson

2016-12-01

Comprehensive knowledge of immunoglobulin genetics is required to advance our understanding of B cell biology. Validated immunoglobulin variable (V) gene databases are close to completion only for human and mouse. We present a novel computational approach, IgDiscover, that identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species. Further, we describe a novel human IGHV3-21 allele and confirm significant gene differences between Balb/c and C57BL6 mouse strains, demonstrating the power of IgDiscover as a germline V gene discovery tool.
A comprehensive analysis of replicative lifespan in 4,698 single-gene deletion strains uncovers conserved mechanisms of aging

PubMed Central

McCormick, Mark A.; Delaney, Joe R.; Tsuchiya, Mitsuhiro; Tsuchiyama, Scott; Shemorry, Anna; Sim, Sylvia; Chou, Annie Chia-Zong; Ahmed, Umema; Carr, Daniel; Murakami, Christopher J.; Schleit, Jennifer; Sutphin, George L.; Wasko, Brian M.; Bennett, Christopher F.; Wang, Adrienne M.; Olsen, Brady; Beyer, Richard P.; Bammler, Theodor K.; Prunkard, Donna; Johnson, Simon C.; Pennypacker, Juniper K.; An, Elroy; Anies, Arieanna; Castanza, Anthony S.; Choi, Eunice; Dang, Nick; Enerio, Shiena; Fletcher, Marissa; Fox, Lindsay; Goswami, Sarani; Higgins, Sean A.; Holmberg, Molly A.; Hu, Di; Hui, Jessica; Jelic, Monika; Jeong, Ki-Soo; Johnston, Elijah; Kerr, Emily O.; Kim, Jin; Kim, Diana; Kirkland, Katie; Klum, Shannon; Kotireddy, Soumya; Liao, Eric; Lim, Michael; Lin, Michael S.; Lo, Winston C.; Lockshon, Dan; Miller, Hillary A.; Moller, Richard M.; Muller, Brian; Oakes, Jonathan; Pak, Diana N.; Peng, Zhao Jun; Pham, Kim M.; Pollard, Tom G.; Pradeep, Prarthana; Pruett, Dillon; Rai, Dilreet; Robison, Brett; Rodriguez, Ariana A.; Ros, Bopharoth; Sage, Michael; Singh, Manpreet K.; Smith, Erica D.; Snead, Katie; Solanky, Amrita; Spector, Benjamin L.; Steffen, Kristan K.; Tchao, Bie Nga; Ting, Marc K.; Wende, Helen Vander; Wang, Dennis; Welton, K. Linnea; Westman, Eric A.; Brem, Rachel B.; Liu, Xin-guang; Suh, Yousin; Zhou, Zhongjun; Kaeberlein, Matt; Kennedy, Brian K.

2015-01-01

SUMMARY Many genes that affect replicative lifespan (RLS) in the budding yeast Saccharomyces cerevisiae also affect aging in other organisms such as C. elegans and M. musculus. We performed a systematic analysis of yeast RLS in a set of 4,698 viable single-gene deletion strains. Multiple functional gene clusters were identified, and full genome-to-genome comparison demonstrated a significant conservation in longevity pathways between yeast and C. elegans. Among the mechanisms of aging identified, deletion of tRNA exporter LOS1 robustly extended lifespan. Dietary restriction (DR) and inhibition of mechanistic Target of Rapamycin (mTOR) exclude Los1 from the nucleus in a Rad53-dependent manner. Moreover, lifespan extension from deletion of LOS1 is non-additive with DR or mTOR inhibition, and results in Gcn4 transcription factor activation. Thus, the DNA damage response and mTOR converge on Los1-mediated nuclear tRNA export to regulate Gcn4 activity and aging. PMID:26456335
A Secure Web Application Providing Public Access to High-Performance Data Intensive Scientific Resources - ScalaBLAST Web Application

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis, Darren S.; Peterson, Elena S.; Oehmen, Chris S.

2008-05-04

This work presents the ScalaBLAST Web Application (SWA), a web based application implemented using the PHP script language, MySQL DBMS, and Apache web server under a GNU/Linux platform. SWA is an application built as part of the Data Intensive Computer for Complex Biological Systems (DICCBS) project at the Pacific Northwest National Laboratory (PNNL). SWA delivers accelerated throughput of bioinformatics analysis via high-performance computing through a convenient, easy-to-use web interface. This approach greatly enhances emerging fields of study in biology such as ontology-based homology, and multiple whole genome comparisons which, in the absence of a tool like SWA, require a heroicmore » effort to overcome the computational bottleneck associated with genome analysis. The current version of SWA includes a user account management system, a web based user interface, and a backend process that generates the files necessary for the Internet scientific community to submit a ScalaBLAST parallel processing job on a dedicated cluster.« less
Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility.

PubMed

Yin, Xianyong; Low, Hui Qi; Wang, Ling; Li, Yonghong; Ellinghaus, Eva; Han, Jiali; Estivill, Xavier; Sun, Liangdan; Zuo, Xianbo; Shen, Changbing; Zhu, Caihong; Zhang, Anping; Sanchez, Fabio; Padyukov, Leonid; Catanese, Joseph J; Krueger, Gerald G; Duffin, Kristina Callis; Mucha, Sören; Weichenthal, Michael; Weidinger, Stephan; Lieb, Wolfgang; Foo, Jia Nee; Li, Yi; Sim, Karseng; Liany, Herty; Irwan, Ishak; Teo, Yikying; Theng, Colin T S; Gupta, Rashmi; Bowcock, Anne; De Jager, Philip L; Qureshi, Abrar A; de Bakker, Paul I W; Seielstad, Mark; Liao, Wilson; Ståhle, Mona; Franke, Andre; Zhang, Xuejun; Liu, Jianjun

2015-04-23

Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here we present the largest trans-ethnic genome-wide meta-analysis (GWMA) of psoriasis in 15,369 cases and 19,517 controls of Caucasian and Chinese ancestries. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. Fine-mapping analysis of MHC region demonstrates an important role for all three HLA class I genes and a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations. Further, trans-ethnic comparison suggests population-specific effect or allelic heterogeneity for 11 loci. These population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. This study not only provides novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrates a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations.
Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity

PubMed Central

Corcoran, Martin M.; Phad, Ganesh E.; Bernat, Néstor Vázquez; Stahl-Hennig, Christiane; Sumida, Noriyuki; Persson, Mats A.A.; Martin, Marcel; Hedestam, Gunilla B. Karlsson

2016-01-01

Comprehensive knowledge of immunoglobulin genetics is required to advance our understanding of B cell biology. Validated immunoglobulin variable (V) gene databases are close to completion only for human and mouse. We present a novel computational approach, IgDiscover, that identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species. Further, we describe a novel human IGHV3-21 allele and confirm significant gene differences between Balb/c and C57BL6 mouse strains, demonstrating the power of IgDiscover as a germline V gene discovery tool. PMID:27995928
Genome-wide meta-analysis identifies multiple novel associations and ethnic heterogeneity of psoriasis susceptibility

PubMed Central

Yin, Xianyong; Low, Hui Qi; Wang, Ling; Li, Yonghong; Ellinghaus, Eva; Han, Jiali; Estivill, Xavier; Sun, Liangdan; Zuo, Xianbo; Shen, Changbing; Zhu, Caihong; Zhang, Anping; Sanchez, Fabio; Padyukov, Leonid; Catanese, Joseph J.; Krueger, Gerald G.; Duffin, Kristina Callis; Mucha, Sören; Weichenthal, Michael; Weidinger, Stephan; Lieb, Wolfgang; Foo, Jia Nee; Li, Yi; Sim, Karseng; Liany, Herty; Irwan, Ishak; Teo, Yikying; Theng, Colin T. S.; Gupta, Rashmi; Bowcock, Anne; De Jager, Philip L.; Qureshi, Abrar A.; de Bakker, Paul I. W.; Seielstad, Mark; Liao, Wilson; Ståhle, Mona; Franke, Andre; Zhang, Xuejun; Liu, Jianjun

2015-01-01

Psoriasis is a common inflammatory skin disease with complex genetics and different degrees of prevalence across ethnic populations. Here we present the largest trans-ethnic genome-wide meta-analysis (GWMA) of psoriasis in 15,369 cases and 19,517 controls of Caucasian and Chinese ancestries. We identify four novel associations at LOC144817, COG6, RUNX1 and TP63, as well as three novel secondary associations within IFIH1 and IL12B. Fine-mapping analysis of MHC region demonstrates an important role for all three HLA class I genes and a complex and heterogeneous pattern of HLA associations between Caucasian and Chinese populations. Further, trans-ethnic comparison suggests population-specific effect or allelic heterogeneity for 11 loci. These population-specific effects contribute significantly to the ethnic diversity of psoriasis prevalence. This study not only provides novel biological insights into the involvement of immune and keratinocyte development mechanism, but also demonstrates a complex and heterogeneous genetic architecture of psoriasis susceptibility across ethnic populations. PMID:25903422
A multiplexed system for quantitative comparisons of chromatin landscapes

PubMed Central

van Galen, Peter; Viny, Aaron D.; Ram, Oren; Ryan, Russell J.H.; Cotton, Matthew J.; Donohue, Laura; Sievers, Cem; Drier, Yotam; Liau, Brian B.; Gillespie, Shawn M.; Carroll, Kaitlin M.; Cross, Michael B.; Levine, Ross L.; Bernstein, Bradley E.

2015-01-01

Genome-wide profiling of histone modifications can provide systematic insight into the regulatory elements and programs engaged in a given cell type. However, conventional chromatin immunoprecipitation and sequencing (ChIP-seq) does not capture quantitative information on histone modification levels, requires large amounts of starting material, and involves tedious processing of each individual sample. Here we address these limitations with a technology that leverages DNA barcoding to profile chromatin quantitatively and in multiplexed format. We concurrently map relative levels of multiple histone modifications across multiple samples, each comprising as few as a thousand cells. We demonstrate the technology by monitoring dynamic changes following inhibition of P300, EZH2 or KDM5, by linking altered epigenetic landscapes to chromatin regulator mutations, and by mapping active and repressive marks in purified human hematopoietic stem cells. Hence, this technology enables quantitative studies of chromatin state dynamics across rare cell types, genotypes, environmental conditions and drug treatments. PMID:26687680
Direct detection of methylation in genomic DNA

PubMed Central

Bart, A.; van Passel, M. W. J.; van Amsterdam, K.; van der Ende, A.

2005-01-01

The identification of methylated sites on bacterial genomic DNA would be a useful tool to study the major roles of DNA methylation in prokaryotes: distinction of self and nonself DNA, direction of post-replicative mismatch repair, control of DNA replication and cell cycle, and regulation of gene expression. Three types of methylated nucleobases are known: N6-methyladenine, 5-methylcytosine and N4-methylcytosine. The aim of this study was to develop a method to detect all three types of DNA methylation in complete genomic DNA. It was previously shown that N6-methyladenine and 5-methylcytosine in plasmid and viral DNA can be detected by intersequence trace comparison of methylated and unmethylated DNA. We extended this method to include N4-methylcytosine detection in both in vitro and in vivo methylated DNA. Furthermore, application of intersequence trace comparison was extended to bacterial genomic DNA. Finally, we present evidence that intrasequence comparison suffices to detect methylated sites in genomic DNA. In conclusion, we present a method to detect all three natural types of DNA methylation in bacterial genomic DNA. This provides the possibility to define the complete methylome of any prokaryote. PMID:16091626
Evolutionary blueprint for host- and niche-adaptation in Staphylococcus aureus clonal complex CC30

PubMed Central

McGavin, Martin J.; Arsic, Benjamin; Nickerson, Nicholas N.

2012-01-01

Staphylococcus aureus clonal complex CC30 has caused infectious epidemics for more than 60 years, and, therefore, provides a model system to evaluate how evolution has influenced the disease potential of closely related strains. In previous multiple genome comparisons, phylogenetic analyses established three major branches that evolved from a common ancestor. Clade 1, comprised of historic pandemic phage type 80/81 methicillin susceptible S. aureus (MSSA), and Clade 2 comprised of contemporary community acquired methicillin resistant S. aureus (CA-MRSA) were hyper-virulent in murine infection models. Conversely, Clade 3 strains comprised of contemporary hospital associated MRSA (HA-MRSA) and clinical MSSA exhibited attenuated virulence, due to common single nucleotide polymorphisms (SNP's) that abrogate production of α-hemolysin Hla, and interfere with signaling of the accessory gene regulator agr. We have now completed additional in silico genome comparisons of 15 additional CC30 genomes in the public domain, to assess the hypothesis that Clade 3 has evolved to favor niche adaptation. In addition to SNP's that influence agr and hla, other common traits of Clade 3 include tryptophan auxotrophy due to a di-nucleotide deletion within trpD, a premature stop codon within isdH encoding an immunogenic cell surface protein involved in iron acquisition, loss of a genomic toxin–antitoxin (TA) addiction module, acquisition of S. aureus pathogenicity islands SaPI4, and SaPI2 encoding toxic shock syndrome toxin tst, and increased copy number of insertion sequence ISSau2, which appears to target transcription terminators. Compared to other Clade 3 MSSA, S. aureus MN8, which is associated with Staphylococcal toxic shock syndrome, exhibited a unique ISSau2 insertion, and enhanced production of toxic shock syndrome toxin encoded by SaPI2. Cumulatively, our data support the notion that Clade 3 strains are following an evolutionary blueprint toward niche-adaptation. PMID:22919639

In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics.

PubMed

Chen, Tsute; Siddiqui, Huma; Olsen, Ingar

2017-01-01

Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica . All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.
In silico Comparison of 19 Porphyromonas gingivalis Strains in Genomics, Phylogenetics, Phylogenomics and Functional Genomics

PubMed Central

Chen, Tsute; Siddiqui, Huma; Olsen, Ingar

2017-01-01

Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/. PMID:28261563
Genomic Diversity of Burkholderia pseudomallei Clinical Isolates: Subtractive Hybridization Reveals a Burkholderia mallei-Specific Prophage in B. pseudomallei 1026b

DTIC Science & Technology

2004-06-01

identification of several new virulence gene candidates. In particular, K96243 harbors multiple genomic islands with relatively low GC contents...differences were observed. Prophage-encoded virulence factors in other bacterial species have been described (5), and it was of interest to see if gene ... Xylella fastidiosa (11, 16, 17). The genomic sequencing results for multiple strains of Streptococcus and Xylella suggest that different disease
Antagonism between Staphylococcus epidermidis and Propionibacterium acnes and its genomic basis.

PubMed

Christensen, Gitte J M; Scholz, Christian F P; Enghild, Jan; Rohde, Holger; Kilian, Mogens; Thürmer, Andrea; Brzuszkiewicz, Elzbieta; Lomholt, Hans B; Brüggemann, Holger

2016-02-29

Propionibacterium acnes and Staphylococcus epidermidis live in close proximity on human skin, and both bacterial species can be isolated from normal and acne vulgaris-affected skin sites. The antagonistic interactions between the two species are poorly understood, as well as the potential significance of bacterial interferences for the skin microbiota. Here, we performed simultaneous antagonism assays to detect inhibitory activities between multiple isolates of the two species. Selected strains were sequenced to identify the genomic basis of their antimicrobial phenotypes. First, we screened 77 P. acnes strains isolated from healthy and acne-affected skin, and representing all known phylogenetic clades (I, II, and III), for their antimicrobial activities against 12 S. epidermidis isolates. One particular phylogroup (I-2) exhibited a higher antimicrobial activity than other P. acnes phylogroups. All genomes of type I-2 strains carry an island encoding the biosynthesis of a thiopeptide with possible antimicrobial activity against S. epidermidis. Second, 20 S. epidermidis isolates were examined for inhibitory activity against 25 P. acnes strains. The majority of S. epidermidis strains were able to inhibit P. acnes. Genomes of S. epidermidis strains with strong, medium and no inhibitory activities against P. acnes were sequenced. Genome comparison underlined the diversity of S. epidermidis and detected multiple clade- or strain-specific mobile genetic elements encoding a variety of functions important in antibiotic and stress resistance, biofilm formation and interbacterial competition, including bacteriocins such as epidermin. One isolate with an extraordinary antimicrobial activity against P. acnes harbors a functional ESAT-6 secretion system that might be involved in the antimicrobial activity against P. acnes via the secretion of polymorphic toxins. Taken together, our study suggests that interspecies interactions could potentially jeopardize balances in the skin microbiota. In particular, S. epidermidis strains possess an arsenal of different mechanisms to inhibit P. acnes. However, if such interactions are relevant in skin disorders such as acne vulgaris remains questionable, since no difference in the antimicrobial activity against, or the sensitivity towards S. epidermidis could be detected between health- and acne-associated strains of P. acnes.
Cloned plasmid DNA fragments as calibrators for controlling GMOs: different real-time duplex quantitative PCR methods.

PubMed

Taverniers, Isabel; Van Bockstaele, Erik; De Loose, Marc

2004-03-01

Analytical real-time PCR technology is a powerful tool for implementation of the GMO labeling regulations enforced in the EU. The quality of analytical measurement data obtained by quantitative real-time PCR depends on the correct use of calibrator and reference materials (RMs). For GMO methods of analysis, the choice of appropriate RMs is currently under debate. So far, genomic DNA solutions from certified reference materials (CRMs) are most often used as calibrators for GMO quantification by means of real-time PCR. However, due to some intrinsic features of these CRMs, errors may be expected in the estimations of DNA sequence quantities. In this paper, two new real-time PCR methods are presented for Roundup Ready soybean, in which two types of plasmid DNA fragments are used as calibrators. Single-target plasmids (STPs) diluted in a background of genomic DNA were used in the first method. Multiple-target plasmids (MTPs) containing both sequences in one molecule were used as calibrators for the second method. Both methods simultaneously detect a promoter 35S sequence as GMO-specific target and a lectin gene sequence as endogenous reference target in a duplex PCR. For the estimation of relative GMO percentages both "delta C(T)" and "standard curve" approaches are tested. Delta C(T) methods are based on direct comparison of measured C(T) values of both the GMO-specific target and the endogenous target. Standard curve methods measure absolute amounts of target copies or haploid genome equivalents. A duplex delta C(T) method with STP calibrators performed at least as well as a similar method with genomic DNA calibrators from commercial CRMs. Besides this, high quality results were obtained with a standard curve method using MTP calibrators. This paper demonstrates that plasmid DNA molecules containing either one or multiple target sequences form perfect alternative calibrators for GMO quantification and are especially suitable for duplex PCR reactions.
Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

PubMed

Webb, Kristen M; Rosenthal, Benjamin M

2011-01-01

The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Identifying candidate drivers of drug response in heterogeneous cancer by mining high throughput genomics data.

PubMed

Nabavi, Sheida

2016-08-15

With advances in technologies, huge amounts of multiple types of high-throughput genomics data are available. These data have tremendous potential to identify new and clinically valuable biomarkers to guide the diagnosis, assessment of prognosis, and treatment of complex diseases, such as cancer. Integrating, analyzing, and interpreting big and noisy genomics data to obtain biologically meaningful results, however, remains highly challenging. Mining genomics datasets by utilizing advanced computational methods can help to address these issues. To facilitate the identification of a short list of biologically meaningful genes as candidate drivers of anti-cancer drug resistance from an enormous amount of heterogeneous data, we employed statistical machine-learning techniques and integrated genomics datasets. We developed a computational method that integrates gene expression, somatic mutation, and copy number aberration data of sensitive and resistant tumors. In this method, an integrative method based on module network analysis is applied to identify potential driver genes. This is followed by cross-validation and a comparison of the results of sensitive and resistance groups to obtain the final list of candidate biomarkers. We applied this method to the ovarian cancer data from the cancer genome atlas. The final result contains biologically relevant genes, such as COL11A1, which has been reported as a cis-platinum resistant biomarker for epithelial ovarian carcinoma in several recent studies. The described method yields a short list of aberrant genes that also control the expression of their co-regulated genes. The results suggest that the unbiased data driven computational method can identify biologically relevant candidate biomarkers. It can be utilized in a wide range of applications that compare two conditions with highly heterogeneous datasets.
Genomic Diversity of Erwinia carotovora subsp. carotovora and Its Correlation with Virulence

PubMed Central

Yap, Mee-Ngan; Barak, Jeri D.; Charkowski, Amy O.

2004-01-01

We used genetic and biochemical methods to examine the genomic diversity of the enterobacterial plant pathogen Erwinia carotovora subsp. carotovora. The results obtained with each method showed that E. carotovora subsp. carotovora strains isolated from one ecological niche, potato plants, are surprisingly diverse compared to related pathogens. A comparison of 23 partial mdh sequences revealed a maximum pairwise difference of 10.49% and an average pairwise difference of 2.13%, values which are much greater than the maximum variation (1.81%) and average variation (0.75%) previously reported for Escherichia coli. Pulsed-field gel electrophoresis analysis of I-CeuI-digested genomic DNA revealed seven rrn operons in all E. carotovora subsp. carotovora strains examined except strain WPP17, which had only six copies. We identified 26 I-CeuI restriction fragment length polymorphism patterns and observed significant polymorphism in fragment sizes ranging from 100 to 450 kb for all strains. We detected large plasmids in two strains, including the model strain E. carotovora subsp. carotovora 71. The two least virulent strains had an unusual chromosomal structure, suggesting that a particular pulsotype is correlated with virulence. To compare chromosomal organization of multiple enterobacterial genomes, several genes were mapped onto I-CeuI fragments. We identified portions of the genome that appear to be conserved across enterobacteria and portions that have undergone genome rearrangements. We found that the least virulent strain, WPP17, failed to oxidize cellobiose and was missing several hrp and hrc genes. The unexpected variability among isolates obtained from clonal hosts in one region and in one season suggests that factors other than the host plant, potato, drive the evolution of this common environmental bacterium and key plant pathogen. PMID:15128563
CMS: A Web-Based System for Visualization and Analysis of Genome-Wide Methylation Data of Human Cancers

PubMed Central

Huang, Yi-Wen; Roa, Juan C.; Goodfellow, Paul J.; Kizer, E. Lynette; Huang, Tim H. M.; Chen, Yidong

2013-01-01

Background DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Methodology/Principal Findings Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. Conclusions/Significance CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/. PMID:23630576
Comparison of Intracellular "Ca. Endomicrobium Trichonymphae" Genomovars Illuminates the Requirement and Decay of Defense Systems against Foreign DNA.

PubMed

Izawa, Kazuki; Kuwahara, Hirokazu; Kihara, Kumiko; Yuki, Masahiro; Lo, Nathan; Itoh, Takehiko; Ohkuma, Moriya; Hongoh, Yuichi

2016-10-13

"Candidatus Endomicrobium trichonymphae" (Bacteria; Elusimicrobia) is an obligate intracellular symbiont of the cellulolytic protist genus Trichonympha in the termite gut. A previous genome analysis of "Ca Endomicrobium trichonymphae" phylotype Rs-D17 (genomovar Ri2008), obtained from a Trichonympha agilis cell in the gut of the termite Reticulitermes speratus, revealed that its genome is small (1.1 Mb) and contains many pseudogenes; it is in the course of reductive genome evolution. Here we report the complete genome sequence of another Rs-D17 genomovar, Ti2015, obtained from a different T. agilis cell present in an R. speratus gut. These two genomovars share most intact protein-coding genes and pseudogenes, showing 98.6% chromosome sequence similarity. However, characteristic differences were found in their defense systems, which comprised restriction-modification and CRISPR/Cas systems. The repertoire of intact restriction-modification systems differed between the genomovars, and two of the three CRISPR/Cas loci in genomovar Ri2008 are pseudogenized or missing in genomovar Ti2015. These results suggest relaxed selection pressure for maintaining these defense systems. Nevertheless, the remaining CRISPR/Cas system in each genomovar appears to be active; none of the "spacer" sequences (112 in Ri2008 and 128 in Ti2015) were shared whereas the "repeat" sequences were identical. Furthermore, we obtained draft genomes of three additional endosymbiotic Endomicrobium phylotypes from different host protist species, and discovered multiple, intact CRISPR/Cas systems in each genome. Collectively, unlike bacteriome endosymbionts in insects, the Endomicrobium endosymbionts of termite-gut protists appear to require defense against foreign DNA, although the required level of defense has likely been reduced during their intracellular lives. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
CMS: a web-based system for visualization and analysis of genome-wide methylation data of human cancers.

PubMed

Gu, Fei; Doderer, Mark S; Huang, Yi-Wen; Roa, Juan C; Goodfellow, Paul J; Kizer, E Lynette; Huang, Tim H M; Chen, Yidong

2013-01-01

DNA methylation of promoter CpG islands is associated with gene suppression, and its unique genome-wide profiles have been linked to tumor progression. Coupled with high-throughput sequencing technologies, it can now efficiently determine genome-wide methylation profiles in cancer cells. Also, experimental and computational technologies make it possible to find the functional relationship between cancer-specific methylation patterns and their clinicopathological parameters. Cancer methylome system (CMS) is a web-based database application designed for the visualization, comparison and statistical analysis of human cancer-specific DNA methylation. Methylation intensities were obtained from MBDCap-sequencing, pre-processed and stored in the database. 191 patient samples (169 tumor and 22 normal specimen) and 41 breast cancer cell-lines are deposited in the database, comprising about 6.6 billion uniquely mapped sequence reads. This provides comprehensive and genome-wide epigenetic portraits of human breast cancer and endometrial cancer to date. Two views are proposed for users to better understand methylation structure at the genomic level or systemic methylation alteration at the gene level. In addition, a variety of annotation tracks are provided to cover genomic information. CMS includes important analytic functions for interpretation of methylation data, such as the detection of differentially methylated regions, statistical calculation of global methylation intensities, multiple gene sets of biologically significant categories, interactivity with UCSC via custom-track data. We also present examples of discoveries utilizing the framework. CMS provides visualization and analytic functions for cancer methylome datasets. A comprehensive collection of datasets, a variety of embedded analytic functions and extensive applications with biological and translational significance make this system powerful and unique in cancer methylation research. CMS is freely accessible at: http://cbbiweb.uthscsa.edu/KMethylomes/.
Within-host whole genome analysis of an antibiotic resistant Pseudomonas aeruginosa strain sub-type in cystic fibrosis.

PubMed

Sherrard, Laura J; Tai, Anna S; Wee, Bryan A; Ramsay, Kay A; Kidd, Timothy J; Ben Zakour, Nouri L; Whiley, David M; Beatson, Scott A; Bell, Scott C

2017-01-01

A Pseudomonas aeruginosa AUST-02 strain sub-type (M3L7) has been identified in Australia, infects the lungs of some people with cystic fibrosis and is associated with antibiotic resistance. Multiple clonal lineages may emerge during treatment with mutations in chromosomally encoded antibiotic resistance genes commonly observed. Here we describe the within-host diversity and antibiotic resistance of M3L7 during and after antibiotic treatment of an acute pulmonary exacerbation using whole genome sequencing and show both variation and shared mutations in important genes. Eleven isolates from an M3L7 population (n = 134) isolated over 3 months from an individual with cystic fibrosis underwent whole genome sequencing. A phylogeny based on core genome SNPs identified three distinct phylogenetic groups comprising two groups with higher rates of mutation (hypermutators) and one non-hypermutator group. Genomes were screened for acquired antibiotic resistance genes with the result suggesting that M3L7 resistance is principally driven by chromosomal mutations as no acquired mechanisms were detected. Small genetic variations, shared by all 11 isolates, were found in 49 genes associated with antibiotic resistance including frame-shift mutations (mexA, mexT), premature stop codons (oprD, mexB) and mutations in quinolone-resistance determining regions (gyrA, parE). However, whole genome sequencing also revealed mutations in 21 genes that were acquired following divergence of groups, which may also impact the activity of antibiotics and multi-drug efflux pumps. Comparison of mutations with minimum inhibitory concentrations of anti-pseudomonal antibiotics could not easily explain all resistance profiles observed. These data further demonstrate the complexity of chronic and antibiotic resistant P. aeruginosa infection where a multitude of co-existing genotypically diverse sub-lineages might co-exist during and after intravenous antibiotic treatment.
Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse.

PubMed

Orlando, Ludovic; Ginolhac, Aurélien; Zhang, Guojie; Froese, Duane; Albrechtsen, Anders; Stiller, Mathias; Schubert, Mikkel; Cappellini, Enrico; Petersen, Bent; Moltke, Ida; Johnson, Philip L F; Fumagalli, Matteo; Vilstrup, Julia T; Raghavan, Maanasa; Korneliussen, Thorfinn; Malaspinas, Anna-Sapfo; Vogt, Josef; Szklarczyk, Damian; Kelstrup, Christian D; Vinther, Jakob; Dolocan, Andrei; Stenderup, Jesper; Velazquez, Amhed M V; Cahill, James; Rasmussen, Morten; Wang, Xiaoli; Min, Jiumeng; Zazula, Grant D; Seguin-Orlando, Andaine; Mortensen, Cecilie; Magnussen, Kim; Thompson, John F; Weinstock, Jacobo; Gregersen, Kristian; Røed, Knut H; Eisenmann, Véra; Rubin, Carl J; Miller, Donald C; Antczak, Douglas F; Bertelsen, Mads F; Brunak, Søren; Al-Rasheid, Khaled A S; Ryder, Oliver; Andersson, Leif; Mundy, John; Krogh, Anders; Gilbert, M Thomas P; Kjær, Kurt; Sicheritz-Ponten, Thomas; Jensen, Lars Juhl; Olsen, Jesper V; Hofreiter, Michael; Nielsen, Rasmus; Shapiro, Beth; Wang, Jun; Willerslev, Eske

2013-07-04

The rich fossil record of equids has made them a model for evolutionary processes. Here we present a 1.12-times coverage draft genome from a horse bone recovered from permafrost dated to approximately 560-780 thousand years before present (kyr BP). Our data represent the oldest full genome sequence determined so far by almost an order of magnitude. For comparison, we sequenced the genome of a Late Pleistocene horse (43 kyr BP), and modern genomes of five domestic horse breeds (Equus ferus caballus), a Przewalski's horse (E. f. przewalskii) and a donkey (E. asinus). Our analyses suggest that the Equus lineage giving rise to all contemporary horses, zebras and donkeys originated 4.0-4.5 million years before present (Myr BP), twice the conventionally accepted time to the most recent common ancestor of the genus Equus. We also find that horse population size fluctuated multiple times over the past 2 Myr, particularly during periods of severe climatic changes. We estimate that the Przewalski's and domestic horse populations diverged 38-72 kyr BP, and find no evidence of recent admixture between the domestic horse breeds and the Przewalski's horse investigated. This supports the contention that Przewalski's horses represent the last surviving wild horse population. We find similar levels of genetic variation among Przewalski's and domestic populations, indicating that the former are genetically viable and worthy of conservation efforts. We also find evidence for continuous selection on the immune system and olfaction throughout horse evolution. Finally, we identify 29 genomic regions among horse breeds that deviate from neutrality and show low levels of genetic variation compared to the Przewalski's horse. Such regions could correspond to loci selected early during domestication.
Common position of indels that cause deviations from canonical genome organization in different measles virus strains.

PubMed

Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko

2016-07-29

The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.
Genomic sequence analysis of the Illinois strain of the Agrotis ipsilon multiple nucleopolyhedrovirus

USDA-ARS?s Scientific Manuscript database

The Agrotis ipsilon multiple nucleopolyhedrovirus (AgipMNPV) is a group II nucleopolyhedrovirus (NPV) from the black cutworm, A. ipsilon, with potential as a biopesticide to control infestations of cutworm larvae. The genome of the Illinois strain of AgipMNPV was completely sequenced. The AgipMNPV...
Origin and Reticulate Evolutionary Process of Wheatgrass Elymus trachycaulus (Triticeae: Poaceae)

PubMed Central

Zuo, Hongwei; Wu, Panpan; Wu, Dexiang; Sun, Genlou

2015-01-01

To study origin and evolutionary dynamics of tetraploid Elymus trachycaulus that has been cytologically defined as containing StH genomes, thirteen accessions of E. trachycaulus were analyzed using two low-copy nuclear gene Pepc (phosphoenolpyruvate carboxylase) and Rpb2 (the second largest subunit of RNA polymerase II), and one chloroplast region trnL–trnF (spacer between the tRNA Leu (UAA) gene and the tRNA-Phe (GAA) gene). Our chloroplast data indicated that Pseudoroegneria (St genome) was the maternal donor of E. trachycaulus. Rpb2 data indicated that the St genome in E. trachycaulus was originated from either P. strigosa, P. stipifolia, P. spicata or P. geniculate. The Hordeum (H genome)-like sequences of E. trachycaulus are polyphyletic in the Pepc tree, suggesting that the H genome in E. trachycaulus was contributed by multiple sources, whether due to multiple origins or introgression resulting from subsequent hybridization. Failure to recovering St copy of Pepc sequence in most accessions of E. trachycaulus might be caused by genome convergent evolution in allopolyploids. Multiple copies of H-like Pepc sequence from each accession with relative large deletions and insertions might be caused by either instability of Pepc sequence in H- genome or incomplete concerted evolution. Our results highlighted complex evolutionary history of E. trachycaulus. PMID:25946188
Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

PubMed Central

Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

2014-01-01

Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool, with a user-friendly JAVA graphical interface. PMID:24291226
Partial genome assembly for a candidate division OP11 single cell from an anoxic spring (Zodletone Spring, Oklahoma).

PubMed

Youssef, Noha H; Blainey, Paul C; Quake, Stephen R; Elshahed, Mostafa S

2011-11-01

Members of candidate division OP11 are widely distributed in terrestrial and marine ecosystems, yet little information regarding their metabolic capabilities and ecological role within such habitats is currently available. Here, we report on the microfluidic isolation, multiple-displacement-amplification, pyrosequencing, and genomic analysis of a single cell (ZG1) belonging to candidate division OP11. Genome analysis of the ∼270-kb partial genome assembly obtained showed that it had no particular similarity to a specific phylum. Four hundred twenty-three open reading frames were identified, 46% of which had no function prediction. In-depth analysis revealed a heterotrophic lifestyle, with genes encoding endoglucanase, amylopullulanase, and laccase enzymes, suggesting a capacity for utilization of cellulose, starch, and, potentially, lignin, respectively. Genes encoding several glycolysis enzymes as well as formate utilization were identified, but no evidence for an electron transport chain was found. The presence of genes encoding various components of lipopolysaccharide biosynthesis indicates a Gram-negative bacterial cell wall. The partial genome also provides evidence for antibiotic resistance (β-lactamase, aminoglycoside phosphotransferase), as well as antibiotic production (bacteriocin) and extracellular bactericidal peptidases. Multiple mechanisms for stress response were identified, as were elements of type I and type IV secretion systems. Finally, housekeeping genes identified within the partial genome were used to demonstrate the OP11 affiliation of multiple hitherto unclassified genomic fragments from multiple database-deposited metagenomic data sets. These results provide the first glimpse into the lifestyle of a member of a ubiquitous, yet poorly understood bacterial candidate division.
The Genome Portal of the Department of Energy Joint Genome Institute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nordberg, Henrik; Cantor, Michael; Dushekyo, Serge

2014-03-14

The JGI Genome Portal (http://genome.jgi.doe.gov) provides unified access to all JGI genomic databases and analytical tools. A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes. Genome Portal in the past 2 years was significantly updated, with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI. A critical aspect of handling big data in genomics is the development of visualization and analysis tools that allow scientists to derive meaning from what are otherwise terrabases ofmore » inert sequence. An interactive visualization tool developed in the group allows us to explore contigs resulting from a single metagenome assembly. Implemented with modern web technologies that take advantage of the power of the computer's graphical processing unit (gpu), the tool allows the user to easily navigate over a 100,000 data points in multiple dimensions, among many biologically meaningful parameters of a dataset such as relative abundance, contig length, and G+C content.« less
Secure searching of biomarkers through hybrid homomorphic encryption scheme.

PubMed

Kim, Miran; Song, Yongsoo; Cheon, Jung Hee

2017-07-26

As genome sequencing technology develops rapidly, there has lately been an increasing need to keep genomic data secure even when stored in the cloud and still used for research. We are interested in designing a protocol for the secure outsourcing matching problem on encrypted data. We propose an efficient method to securely search a matching position with the query data and extract some information at the position. After decryption, only a small amount of comparisons with the query information should be performed in plaintext state. We apply this method to find a set of biomarkers in encrypted genomes. The important feature of our method is to encode a genomic database as a single element of polynomial ring. Since our method requires a single homomorphic multiplication of hybrid scheme for query computation, it has the advantage over the previous methods in parameter size, computation complexity, and communication cost. In particular, the extraction procedure not only prevents leakage of database information that has not been queried by user but also reduces the communication cost by half. We evaluate the performance of our method and verify that the computation on large-scale personal data can be securely and practically outsourced to a cloud environment during data analysis. It takes about 3.9 s to search-and-extract the reference and alternate sequences at the queried position in a database of size 4M. Our solution for finding a set of biomarkers in DNA sequences shows the progress of cryptographic techniques in terms of their capability can support real-world genome data analysis in a cloud environment.

Reptilian Transcriptomes v2.0: An Extensive Resource for Sauropsida Genomics and Transcriptomics

PubMed Central

Tzika, Athanasia C.; Ullate-Agote, Asier; Grbic, Djordje; Milinkovitch, Michel C.

2015-01-01

Despite the availability of deep-sequencing techniques, genomic and transcriptomic data remain unevenly distributed across phylogenetic groups. For example, reptiles are poorly represented in sequence databases, hindering functional evolutionary and developmental studies in these lineages substantially more diverse than mammals. In addition, different studies use different assembly and annotation protocols, inhibiting meaningful comparisons. Here, we present the “Reptilian Transcriptomes Database 2.0,” which provides extensive annotation of transcriptomes and genomes from species covering the major reptilian lineages. To this end, we sequenced normalized complementary DNA libraries of multiple adult tissues and various embryonic stages of the leopard gecko and the corn snake and gathered published reptilian sequence data sets from representatives of the four extant orders of reptiles: Squamata (snakes and lizards), the tuatara, crocodiles, and turtles. The LANE runner 2.0 software was implemented to annotate all assemblies within a single integrated pipeline. We show that this approach increases the annotation completeness of the assembled transcriptomes/genomes. We then built large concatenated protein alignments of single-copy genes and inferred phylogenetic trees that support the positions of turtles and the tuatara as sister groups of Archosauria and Squamata, respectively. The Reptilian Transcriptomes Database 2.0 resource will be updated to include selected new data sets as they become available, thus making it a reference for differential expression studies, comparative genomics and transcriptomics, linkage mapping, molecular ecology, and phylogenomic analyses involving reptiles. The database is available at www.reptilian-transcriptomes.org and can be enquired using a wwwblast server installed at the University of Geneva. PMID:26133641
Multiple capacitors for natural genetic variation in Drosophila melanogaster.

PubMed

Takahashi, Kazuo H

2013-03-01

Cryptic genetic variation (CGV) or a standing genetic variation that is not ordinarily expressed as a phenotype is released when the robustness of organisms is impaired under environmental or genetic perturbations. Evolutionary capacitors modulate the amount of genetic variation exposed to natural selection and hidden cryptically; they have a fundamental effect on the evolvability of traits on evolutionary timescales. In this study, I have demonstrated the effects of multiple genomic regions of Drosophila melanogaster on CGV in wing shape. I examined the effects of 61 genomic deficiencies on quantitative and qualitative natural genetic variation in the wing shape of D. melanogaster. I have identified 10 genomic deficiencies that do not encompass a known candidate evolutionary capacitor, Hsp90, exposing natural CGV differently depending on the location of the deficiencies in the genome. Furthermore, five genomic deficiencies uncovered qualitative CGV in wing morphology. These findings suggest that CGV in wing shape of wild-type D. melanogaster is regulated by multiple capacitors with divergent functions. Future analysis of genes encompassed by these genomic regions would help elucidate novel capacitor genes and better understand the general features of capacitors regarding natural genetic variation. © 2012 Blackwell Publishing Ltd.
Comprehensive performance comparison of high-resolution array platforms for genome-wide Copy Number Variation (CNV) analysis in humans.

PubMed

Haraksingh, Rajini R; Abyzov, Alexej; Urban, Alexander Eckehart

2017-04-24

High-resolution microarray technology is routinely used in basic research and clinical practice to efficiently detect copy number variants (CNVs) across the entire human genome. A new generation of arrays combining high probe densities with optimized designs will comprise essential tools for genome analysis in the coming years. We systematically compared the genome-wide CNV detection power of all 17 available array designs from the Affymetrix, Agilent, and Illumina platforms by hybridizing the well-characterized genome of 1000 Genomes Project subject NA12878 to all arrays, and performing data analysis using both manufacturer-recommended and platform-independent software. We benchmarked the resulting CNV call sets from each array using a gold standard set of CNVs for this genome derived from 1000 Genomes Project whole genome sequencing data. The arrays tested comprise both SNP and aCGH platforms with varying designs and contain between ~0.5 to ~4.6 million probes. Across the arrays CNV detection varied widely in number of CNV calls (4-489), CNV size range (~40 bp to ~8 Mbp), and percentage of non-validated CNVs (0-86%). We discovered strikingly strong effects of specific array design principles on performance. For example, some SNP array designs with the largest numbers of probes and extensive exonic coverage produced a considerable number of CNV calls that could not be validated, compared to designs with probe numbers that are sometimes an order of magnitude smaller. This effect was only partially ameliorated using different analysis software and optimizing data analysis parameters. High-resolution microarrays will continue to be used as reliable, cost- and time-efficient tools for CNV analysis. However, different applications tolerate different limitations in CNV detection. Our study quantified how these arrays differ in total number and size range of detected CNVs as well as sensitivity, and determined how each array balances these attributes. This analysis will inform appropriate array selection for future CNV studies, and allow better assessment of the CNV-analytical power of both published and ongoing array-based genomics studies. Furthermore, our findings emphasize the importance of concurrent use of multiple analysis algorithms and independent experimental validation in array-based CNV detection studies.
Controlling the Rate of GWAS False Discoveries

PubMed Central

Brzyski, Damian; Peterson, Christine B.; Sobczyk, Piotr; Candès, Emmanuel J.; Bogdan, Malgorzata; Sabatti, Chiara

2017-01-01

With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR-controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR-controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on prescreening to identify the level of resolution of distinct hypotheses. We show how FDR-controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the North Finland Birth Cohort 66 cohort study. PMID:27784720
Controlling the Rate of GWAS False Discoveries.

PubMed

Brzyski, Damian; Peterson, Christine B; Sobczyk, Piotr; Candès, Emmanuel J; Bogdan, Malgorzata; Sabatti, Chiara

2017-01-01

With the rise of both the number and the complexity of traits of interest, control of the false discovery rate (FDR) in genetic association studies has become an increasingly appealing and accepted target for multiple comparison adjustment. While a number of robust FDR-controlling strategies exist, the nature of this error rate is intimately tied to the precise way in which discoveries are counted, and the performance of FDR-controlling procedures is satisfactory only if there is a one-to-one correspondence between what scientists describe as unique discoveries and the number of rejected hypotheses. The presence of linkage disequilibrium between markers in genome-wide association studies (GWAS) often leads researchers to consider the signal associated to multiple neighboring SNPs as indicating the existence of a single genomic locus with possible influence on the phenotype. This a posteriori aggregation of rejected hypotheses results in inflation of the relevant FDR. We propose a novel approach to FDR control that is based on prescreening to identify the level of resolution of distinct hypotheses. We show how FDR-controlling strategies can be adapted to account for this initial selection both with theoretical results and simulations that mimic the dependence structure to be expected in GWAS. We demonstrate that our approach is versatile and useful when the data are analyzed using both tests based on single markers and multiple regression. We provide an R package that allows practitioners to apply our procedure on standard GWAS format data, and illustrate its performance on lipid traits in the North Finland Birth Cohort 66 cohort study. Copyright © 2017 by the Genetics Society of America.
Single-cell genomic sequencing using Multiple Displacement Amplification.

PubMed

Lasken, Roger S

2007-10-01

Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).
Whole-genome multiple displacement amplification from single cells.

PubMed

Spits, Claudia; Le Caignec, Cédric; De Rycke, Martine; Van Haute, Lindsey; Van Steirteghem, André; Liebaers, Inge; Sermon, Karen

2006-01-01

Multiple displacement amplification (MDA) is a recently described method of whole-genome amplification (WGA) that has proven efficient in the amplification of small amounts of DNA, including DNA from single cells. Compared with PCR-based WGA methods, MDA generates DNA with a higher molecular weight and shows better genome coverage. This protocol was developed for preimplantation genetic diagnosis, and details a method for performing single-cell MDA using the phi29 DNA polymerase. It can also be useful for the amplification of other minute quantities of DNA, such as from forensic material or microdissected tissue. The protocol includes the collection and lysis of single cells, and all materials and steps involved in the MDA reaction. The whole procedure takes 3 h and generates 1-2 microg of DNA from a single cell, which is suitable for multiple downstream applications, such as sequencing, short tandem repeat analysis or array comparative genomic hybridization.
Metabolomic Profiling and Genomic Study of a Marine Sponge-Associated Streptomyces sp

PubMed Central

Viegelmann, Christina; Margassery, Lekha Menon; Kennedy, Jonathan; Zhang, Tong; O’Brien, Ciarán; O’Gara, Fergal; Morrissey, John P.; Dobson, Alan D. W.; Edrada-Ebel, RuAngelie

2014-01-01

Metabolomics and genomics are two complementary platforms for analyzing an organism as they provide information on the phenotype and genotype, respectively. These two techniques were applied in the dereplication and identification of bioactive compounds from a Streptomyces sp. (SM8) isolated from the sponge Haliclona simulans from Irish waters. Streptomyces strain SM8 extracts showed antibacterial and antifungal activity. NMR analysis of the active fractions proved that hydroxylated saturated fatty acids were the major components present in the antibacterial fractions. Antimycin compounds were initially putatively identified in the antifungal fractions using LC-Orbitrap. Their presence was later confirmed by comparison to a standard. Genomic analysis of Streptomyces sp. SM8 revealed the presence of multiple secondary metabolism gene clusters, including a gene cluster for the biosynthesis of the antifungal antimycin family of compounds. The antimycin gene cluster of Streptomyces sp. SM8 was inactivated by disruption of the antimycin biosynthesis gene antC. Extracts from this mutant strain showed loss of antimycin production and significantly less antifungal activity than the wild-type strain. Three butenolides, 4,10-dihydroxy-10-methyl-dodec-2-en-1,4-olide (1), 4,11-dihydroxy-10-methyl-dodec-2-en-1,4-olide (2), and 4-hydroxy-10-methyl-11-oxo-dodec-2-en-1,4-olide (3) that had previously been reported from marine Streptomyces species were also isolated from SM8. Comparison of the extracts of Streptomyces strain SM8 and its host sponge, H. simulans, using LC-Orbitrap revealed the presence of metabolites common to both extracts, providing direct evidence linking sponge metabolites to a specific microbial symbiont. PMID:24893324
Genome-wide association study of clinical dimensions of schizophrenia: polygenic effect on disorganized symptoms.

PubMed

Fanous, Ayman H; Zhou, Baiyu; Aggen, Steven H; Bergen, Sarah E; Amdur, Richard L; Duan, Jubao; Sanders, Alan R; Shi, Jianxin; Mowry, Bryan J; Olincy, Ann; Amin, Farooq; Cloninger, C Robert; Silverman, Jeremy M; Buccola, Nancy G; Byerley, William F; Black, Donald W; Freedman, Robert; Dudbridge, Frank; Holmans, Peter A; Ripke, Stephan; Gejman, Pablo V; Kendler, Kenneth S; Levinson, Douglas F

2012-12-01

Multiple sources of evidence suggest that genetic factors influence variation in clinical features of schizophrenia. The authors present the first genome-wide association study (GWAS) of dimensional symptom scores among individuals with schizophrenia. Based on the Lifetime Dimensions of Psychosis Scale ratings of 2,454 case subjects of European ancestry from the Molecular Genetics of Schizophrenia (MGS) sample, three symptom factors (positive, negative/disorganized, and mood) were identified with exploratory factor analysis. Quantitative scores for each factor from a confirmatory factor analysis were analyzed for association with 696,491 single-nucleotide polymorphisms (SNPs) using linear regression, with correction for age, sex, clinical site, and ancestry. Polygenic score analysis was carried out to determine whether case and comparison subjects in 16 Psychiatric GWAS Consortium (PGC) schizophrenia samples (excluding MGS samples) differed in scores computed by weighting their genotypes by MGS association test results for each symptom factor. No genome-wide significant associations were observed between SNPs and factor scores. Most of the SNPs producing the strongest evidence for association were in or near genes involved in neurodevelopment, neuroprotection, or neurotransmission, including genes playing a role in Mendelian CNS diseases, but no statistically significant effect was observed for any defined gene pathway. Finally, polygenic scores based on MGS GWAS results for the negative/disorganized factor were significantly different between case and comparison subjects in the PGC data set; for MGS subjects, negative/disorganized factor scores were correlated with polygenic scores generated using case-control GWAS results from the other PGC samples. The polygenic signal that has been observed in cross-sample analyses of schizophrenia GWAS data sets could be in part related to genetic effects on negative and disorganized symptoms (i.e., core features of chronic schizophrenia).
High-Throughput resequencing of maize landraces at genomic regions associated with flowering time

USDA-ARS?s Scientific Manuscript database

Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

PubMed Central

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-01-01

Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
The Chloroplast Genome of Symplocarpus renifolius: A Comparison of Chloroplast Genome Structure in Araceae.

PubMed

Choi, Kyoung Su; Park, Kyu Tae; Park, SeonJoo

2017-11-16

Symplocarpus renifolius is a member of Araceae family that is extraordinarily diverse in appearance. Previous studies on chloroplast genomes in Araceae were focused on duckweeds (Lemnoideae) and root crops ( Colocasia , commonly known as taro). Here, we determined the chloroplast genome of Symplocarpus renifolius and compared the factors, such as genes and inverted repeat (IR) junctions and performed phylogenetic analysis using other Araceae species. The chloroplast genome of S. renifolius is 158,521 bp and includes 113 genes. A comparison among the Araceae chloroplast genomes showed that infA in Lemna , Spirodela , Wolffiella , Wolffia , Dieffenbachia and Colocasia has been lost or has become a pseudogene and has only been retained in Symplocarpus . In the Araceae chloroplast DNA (cpDNA), psbZ is retained. However, psbZ duplication occurred in Wolffia species and tandem repeats were noted around the duplication regions. A comparison of the IR junction in Araceae species revealed the presence of ycf1 and rps15 in the small single copy region, whereas duckweed species contained ycf1 and rps15 in the IR region. The phylogenetic analyses of the chloroplast genomes revealed that Symplocarpus are a basal group and are sister to the other Araceae species. Consequently, infA deletion or pseudogene events in Araceae occurred after the divergence of Symplocarpus and aquatic plants (duckweeds) in Araceae and duplication events of rps15 and ycf1 occurred in the IR region.
Towards the delineation of the ancestral eutherian genome organization: comparative genome maps of human and the African elephant (Loxodonta africana) generated by chromosome painting.

PubMed Central

Frönicke, Lutz; Wienberg, Johannes; Stone, Gary; Adams, Lisa; Stanyon, Roscoe

2003-01-01

This study presents a whole-genome comparison of human and a representative of the Afrotherian clade, the African elephant, generated by reciprocal Zoo-FISH. An analysis of Afrotheria genomes is of special interest, because recent DNA sequence comparisons identify them as the oldest placental mammalian clade. Complete sets of whole-chromosome specific painting probes for the African elephant and human were constructed by degenerate oligonucleotide-primed PCR amplification of flow-sorted chromosomes. Comparative genome maps are presented based on their hybridization patterns. These maps show that the elephant has a moderately rearranged chromosome complement when compared to humans. The human paint probes identified 53 evolutionary conserved segments on the 27 autosomal elephant chromosomes and the X chromosome. Reciprocal experiments with elephant probes delineated 68 conserved segments in the human genome. The comparison with a recent aardvark and elephant Zoo-FISH study delineates new chromosomal traits which link the two Afrotherian species phylogenetically. In the absence of any morphological evidence the chromosome painting data offer the first non-DNA sequence support for an Afrotherian clade. The comparative human and elephant genome maps provide new insights into the karyotype organization of the proto-afrotherian, the ancestor of extant placental mammals, which most probably consisted of 2n=46 chromosomes. PMID:12965023
Genomic suppression subtractive hybridization as a tool to identify differences in mycorrhizal fungal genomes.

PubMed

Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola

2011-05-01

Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis.

PubMed

Wang, Yan; Stata, Matt; Wang, Wei; Stajich, Jason E; White, Merlin M; Moncalvo, Jean-Marc

2018-05-15

Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. IMPORTANCE Insect guts harbor various microbes that are important for host digestion, immune response, and disease dispersal in certain cases. Bacteria, which are among the primary endosymbionts, have been studied extensively. However, fungi, which are also frequently encountered, are poorly known with respect to their biology within the insect guts. To understand the genomic features and related biology, we produced the whole-genome sequences of nine gut commensal fungi from disease-bearing insects (black flies, midges, and mosquitoes). The results show that insect gut fungi tend to have low GC content across their genomes. By comparing these commensals with entomopathogenic and free-living fungi that have available genome sequences, we found a universal core gene toolbox that is unique and thus potentially important for the insect-fungus symbiosis. This comparative work also uncovered different host invasion strategies employed by insect pathogens and commensals, as well as a model system to study ancient fungal genome duplication within the gut of insects. © Crown copyright 2018.
Multiple-Locus Variable-Number Tandem-Repeats Analysis of Escherichia coli O157 using PCR multiplexing and multi-colored capillary electrophoresis.

PubMed

Lindstedt, Bjørn-Arne; Vardund, Traute; Kapperud, Georg

2004-08-01

The Multiple-Locus Variable-Number Tandem-Repeats Analysis (MLVA) method is currently being used as the primary typing tool for Shiga-toxin-producing Escherichia coli (STEC) O157 isolates in our laboratory. The initial assay was performed using a single fluorescent dye and the different patterns were assigned using a gel image. Here, we present a significantly improved assay using multiple dye colors and enhanced PCR multiplexing to increase speed, and ease the interpretation of the results. The different MLVA patterns are now based on allele sizes entered as character values, thus removing the uncertainties introduced when analyzing band patterns from the gel image. We additionally propose an easy numbering scheme for the identification of separate isolates that will facilitate exchange of typing data. Seventy-two human and animal strains of Shiga-toxin-producing E. coli O157 were used for the development of the improved MLVA assay. The method is based on capillary separation of multiplexed PCR products of VNTR loci in the E. coli O157 genome labeled with multiple fluorescent dyes. The different alleles at each locus were then assigned to allele numbers, which were used for strain comparison.
MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

PubMed

Malekpour, Seyed Amir; Pezeshk, Hamid; Sadeghi, Mehdi

2018-03-05

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
Comparative genomics of duplicate γ-glutamyl transferase genes in teleosts: medaka (Oryzias latipes), stickleback (Gasterosteus aculeatus), green spotted pufferfish (Tetraodon nigroviridis), fugu (Takifugu rubripes), and zebrafish (Danio rerio).

PubMed

Law, Sheran Hiu Wan; Redelings, Benjamin David; Kullman, Seth William

2012-01-15

The availability of multiple teleost (bony fish) genomes is providing unprecedented opportunities to understand the diversity and function of gene duplication events using comparative genomics. Here we examine multiple paralogous genes of γ-glutamyl transferase (GGT) in several distantly related teleost species including medaka, stickleback, green spotted pufferfish, fugu, and zebrafish. Through mining genome databases, we have identified multiple GGT orthologs. Duplicate (paralogous) GGT sequences for GGT1 (GGT1 a and b), GGTL1 (GGTL1 a and b), and GGTL3 (GGTL3 a and b) were identified for each species. Phylogenetic analysis suggests that GGTs are ancient proteins conserved across most metazoan phyla and those paralogous GGTs in teleosts likely arose from the serial 3R genome duplication events. A third GGTL1 gene (GGTL1c) was found in green spotted pufferfish; however, this gene is not present in medaka, stickleback, or fugu. Similarly, one or both paralogs of GGTL3 appear to have been lost in green spotted pufferfish, fugu, and zebrafish. Syntenic relationships were highly maintained between duplicated teleost chromosomes, among teleosts and across ray-finned (Actinopterygii) and lobe-finned (Sarcopterygii) species. To assess subfunction partitioning, six medaka GGT genes were cloned and assessed for developmental and tissue-specific expression. On the basis of these data, we propose a modification of the "duplication-degeneration-complementation" model of subfunction partitioning where quantitative differences rather than absolute differences in gene expression are observed between gene paralogs. Our results demonstrate that multiple GGT genes have been retained within teleost genomes. Questions remain, however, regarding the functional roles of multiple GGTs in these species. Copyright © 2011 Wiley Periodicals, Inc., A Wiley Company.
Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements

PubMed Central

Liu, Pengfei; Erez, Ayelet; Sreenath Nagamani, Sandesh C.; Dhar, Shweta U.; Kołodziejska, Katarzyna E.; Dharmadhikari, Avinash V.; Cooper, M. Lance; Wiszniewska, Joanna; Zhang, Feng; Withers, Marjorie A.; Bacino, Carlos A.; Campos-Acevedo, Luis Daniel; Delgado, Mauricio R.; Freedenberg, Debra; Garnica, Adolfo; Grebe, Theresa A.; Hernández-Almaguer, Dolores; Immken, LaDonna; Lalani, Seema R.; McLean, Scott D.; Northrup, Hope; Scaglia, Fernando; Strathearn, Lane; Trapane, Pamela; Kang, Sung-Hae L.; Patel, Ankita; Cheung, Sau Wai; Hastings, P. J.; Stankiewicz, Paweł; Lupski, James R.; Bi, Weimin

2011-01-01

SUMMARY Complex genomic rearrangements (CGR) consisting of two or more breakpoint junctions have been observed in genomic disorders. Recently, a chromosome catastrophe phenomenon termed chromothripsis, in which numerous genomic rearrangements are apparently acquired in one single catastrophic event, was described in multiple cancers. Here we show that constitutionally acquired CGRs share similarities with cancer chromothripsis. In the 17 CGR cases investigated we observed localization and multiple copy number changes including deletions, duplications and/or triplications, as well as extensive translocations and inversions. Genomic rearrangements involved varied in size and complexities; in one case, array comparative genomic hybridization revealed 18 copy number changes. Breakpoint sequencing identified characteristic features, including small templated insertions at breakpoints and microhomology at breakpoint junctions, which have been attributed to replicative processes. The resemblance between CGR and chromothripsis suggests similar mechanistic underpinnings. Such chromosome catastrophic events appear to reflect basic DNA metabolism operative throughout an organism’s life cycle. PMID:21925314
Recovering complete and draft population genomes from metagenome datasets

DOE PAGES

Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

2016-03-08

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

Recovering complete and draft population genomes from metagenome datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Comparison of domestic and foreign genotypes by country and continent

USDA-ARS?s Scientific Manuscript database

Genomic evaluations for foreign animals are easily computed, and reliabilities are highest for animals well connected to the domestic reference population and managed in similar environments. Genomic and pedigree relationships, inbreeding, pedigree completeness, pedigree accuracy and genomic merit w...
Genome-wide association study of immunoglobulin light chain amyloidosis in three patient cohorts: comparison with myeloma.

PubMed

da Silva Filho, M I; Försti, A; Weinhold, N; Meziane, I; Campo, C; Huhn, S; Nickel, J; Hoffmann, P; Nöthen, M M; Jöckel, K-H; Landi, S; Mitchell, J S; Johnson, D; Morgan, G J; Houlston, R; Goldschmidt, H; Jauch, A; Milani, P; Merlini, G; Rowcieno, D; Hawkins, P; Hegenbart, U; Palladini, G; Wechalekar, A; Schönland, S O; Hemminki, K

2017-08-01

Immunoglobulin light chain (AL) amyloidosis is characterized by tissue deposition of amyloid fibers derived from immunoglobulin light chain. AL amyloidosis and multiple myeloma (MM) originate from monoclonal gammopathy of undetermined significance. We wanted to characterize germline susceptibility to AL amyloidosis using a genome-wide association study (GWAS) on 1229 AL amyloidosis patients from Germany, UK and Italy, and 7526 healthy local controls. For comparison with MM, recent GWAS data on 3790 cases were used. For AL amyloidosis, single nucleotide polymorphisms (SNPs) at 10 loci showed evidence of an association at P<10 -5 with homogeneity of results from the 3 sample sets; some of these were previously documented to influence MM risk, including the SNP at the IRF4 binding site. In AL amyloidosis, rs9344 at the splice site of cyclin D1, promoting translocation (11;14), reached the highest significance, P=7.80 × 10 -11 ; the SNP was only marginally significant in MM. SNP rs79419269 close to gene SMARCD3 involved in chromatin remodeling was also significant (P=5.2 × 10 -8 ). These data provide evidence for common genetic susceptibility to AL amyloidosis and MM. Cyclin D1 is a more prominent driver in AL amyloidosis than in MM, but the links to aggregation of light chains need to be demonstrated.
Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

PubMed

Loots, Gabriela G

2008-01-01

Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.
CRISPR/Cas9-Based Multiplex Genome Editing in Monocot and Dicot Plants.

PubMed

Ma, Xingliang; Liu, Yao-Guang

2016-07-01

The clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-mediated genome targeting system has been applied to a variety of organisms, including plants. Compared to other genome-targeting technologies such as zinc-finger nucleases (ZFNs) and transcription activator-like effector nucleases (TALENs), the CRISPR/Cas9 system is easier to use and has much higher editing efficiency. In addition, multiple "single guide RNAs" (sgRNAs) with different target sequences can be designed to direct the Cas9 protein to multiple genomic sites for simultaneous multiplex editing. Here, we present a procedure for highly efficient multiplex genome targeting in monocot and dicot plants using a versatile and robust CRISPR/Cas9 vector system, emphasizing the construction of binary constructs with multiple sgRNA expression cassettes in one round of cloning using Golden Gate ligation. We also describe the genotyping of targeted mutations in transgenic plants by direct Sanger sequencing followed by decoding of superimposed sequencing chromatograms containing biallelic or heterozygous mutations using the Web-based tool DSDecode. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
Insights into natural products biosynthesis from analysis of 490 polyketide synthases from Fusarium.

PubMed

Brown, Daren W; Proctor, Robert H

2016-04-01

Species of the fungus Fusarium collectively cause disease on almost all crop plants and produce numerous natural products (NPs), including some of the mycotoxins of greatest concern to agriculture. Many Fusarium NPs are derived from polyketide synthases (PKSs), large multi-domain enzymes that catalyze sequential condensation of simple carboxylic acids to form polyketides. To gain insight into the biosynthesis of polyketide-derived NPs in Fusarium, we retrieved 488 PKS gene sequences from genome sequences of 31 species of the fungus. In addition to these apparently functional PKS genes, the genomes collectively included 81 pseudogenized PKS genes. Phylogenetic analysis resolved the PKS genes into 67 clades, and based on multiple lines of evidence, we propose that homologs in each clade are responsible for synthesis of a polyketide that is distinct from those synthesized by PKSs in other clades. The presence and absence of PKS genes among the species examined indicated marked differences in distribution of PKS homologs. Comparisons of Fusarium PKS genes and genes flanking them to those from other Ascomycetes provided evidence that Fusarium has the genetic potential to synthesize multiple NPs that are the same or similar to those reported in other fungi, but that have not yet been reported in Fusarium. The results also highlight ways in which such analyses can help guide identification of novel Fusarium NPs and differences in NP biosynthetic capabilities that exist among fungi. Published by Elsevier Inc.
GWIPS‐viz as a tool for exploring ribosome profiling evidence supporting the synthesis of alternative proteoforms

PubMed Central

Michel, Audrey M.; Ahern, Anna M.; Donohue, Claire A.

2015-01-01

The boundaries of protein coding sequences are more difficult to define at the 5′ end than at the 3′ end due to potential multiple translation initiation sites (TISs). Even in the presence of phylogenetic data, the use of sequence information only may not be sufficient for the accurate identification of TISs. Traditional proteomics approaches may also fail because the N‐termini of newly synthesized proteins are often processed. Thus ribosome profiling (ribo‐seq), producing a snapshot of the ribosome distribution across the entire transcriptome, is an attractive experimental technique for the purpose of TIS location exploration. The GWIPS‐viz (Genome Wide Information on Protein Synthesis visualized) browser (http://gwips.ucc.ie) provides free access to the genomic alignments of ribo‐seq data and corresponding mRNA‐seq data along with relevant annotation tracks. In this brief, we illustrate how GWIPS‐viz can be used to explore the ribosome occupancy at the 5′ ends of protein coding genes to assess the activity of AUG and non‐AUG TISs responsible for the synthesis of proteoforms with alternative or heterogeneous N‐termini. The presence of ribo‐seq tracks for various organisms allows for cross‐species comparison of orthologous genes and the availability of datasets from multiple laboratories permits the assessment of the technical reproducibility of the ribosome densities. PMID:25736862
Recapitulating phylogenies using k-mers: from trees to networks.

PubMed

Bernard, Guillaume; Ragan, Mark A; Chan, Cheong Xin

2016-01-01

Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k -mers (subsequences at fixed length k ). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k -mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
The Pathogen-Host Interactions database (PHI-base): additions and future developments

PubMed Central

Urban, Martin; Pant, Rashmi; Raghunath, Arathi; Irvine, Alistair G.; Pedro, Helder; Hammond-Kosack, Kim E.

2015-01-01

Rapidly evolving pathogens cause a diverse array of diseases and epidemics that threaten crop yield, food security as well as human, animal and ecosystem health. To combat infection greater comparative knowledge is required on the pathogenic process in multiple species. The Pathogen-Host Interactions database (PHI-base) catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and protist pathogens. Mutant phenotypes are associated with gene information. The included pathogens infect a wide range of hosts including humans, animals, plants, insects, fish and other fungi. The current version, PHI-base 3.6, available at http://www.phi-base.org, stores information on 2875 genes, 4102 interactions, 110 host species, 160 pathogenic species (103 plant, 3 fungal and 54 animal infecting species) and 181 diseases drawn from 1243 references. Phenotypic and gene function information has been obtained by manual curation of the peer-reviewed literature. A controlled vocabulary consisting of nine high-level phenotype terms permits comparisons and data analysis across the taxonomic space. PHI-base phenotypes were mapped via their associated gene information to reference genomes available in Ensembl Genomes. Virulence genes and hotspots can be visualized directly in genome browsers. Future plans for PHI-base include development of tools facilitating community-led curation and inclusion of the corresponding host target(s). PMID:25414340
Human CD4 T cell epitopes selective for Vaccinia versus Variola virus.

PubMed

Probst, Alicia; Besse, Aurore; Favry, Emmanuel; Imbert, Gilles; Tanchou, Valérie; Castelli, Florence Anne; Maillere, Bernard

2013-04-01

Due to the high degree of sequence identity between Orthopoxvirus species, the specific B and T cell responses raised against these viruses are largely cross-reactive and poorly selective. We therefore searched for CD4 T cell epitopes present in the conserved parts of the Vaccinia genome (VACV) but absent from Variola viruses (VARV), with a view to identifying immunogenic sequences selective for VACV. We identified three long peptide fragments from the B7R, B10R and E7R proteins by in silico comparisons of the poxvirus genomes, and evaluated the recognition of these fragments by VACV-specific T cell lines derived from healthy donors. For the 12 CD4 T cell epitopes identified, we assessed their binding to common HLA-DR allotypes and their capacity to induce peptide-specific CD4 T-cell lines. Four peptides from B7R and B10R displayed a broad binding specificity for HLA-DR molecules and induced multiple T cell lines from healthy donors. Besides their absence from VARV, the two B10R peptide sequences were mutated in the Cowpox virus and completely absent from the Monkeypox genome. This work contributes to the development of differential diagnosis of poxvirus infections. Copyright © 2012 Elsevier Ltd. All rights reserved.
Integrative Functional Genomics for Systems Genetics in GeneWeaver.org.

PubMed

Bubier, Jason A; Langston, Michael A; Baker, Erich J; Chesler, Elissa J

2017-01-01

The abundance of existing functional genomics studies permits an integrative approach to interpreting and resolving the results of diverse systems genetics studies. However, a major challenge lies in assembling and harmonizing heterogeneous data sets across species for facile comparison to the positional candidate genes and coexpression networks that come from systems genetic studies. GeneWeaver is an online database and suite of tools at www.geneweaver.org that allows for fast aggregation and analysis of gene set-centric data. GeneWeaver contains curated experimental data together with resource-level data such as GO annotations, MP annotations, and KEGG pathways, along with persistent stores of user entered data sets. These can be entered directly into GeneWeaver or transferred from widely used resources such as GeneNetwork.org. Data are analyzed using statistical tools and advanced graph algorithms to discover new relations, prioritize candidate genes, and generate function hypotheses. Here we use GeneWeaver to find genes common to multiple gene sets, prioritize candidate genes from a quantitative trait locus, and characterize a set of differentially expressed genes. Coupling a large multispecies repository curated and empirical functional genomics data to fast computational tools allows for the rapid integrative analysis of heterogeneous data for interpreting and extrapolating systems genetics results.
A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.

PubMed

Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas

2011-03-15

Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.
Genomic and Epigenomic Alterations in Cancer.

PubMed

Chakravarthi, Balabhadrapatruni V S K; Nepal, Saroj; Varambally, Sooryanarayana

2016-07-01

Multiple genetic and epigenetic events characterize tumor progression and define the identity of the tumors. Advances in high-throughput technologies, like gene expression profiling, next-generation sequencing, proteomics, and metabolomics, have enabled detailed molecular characterization of various tumors. The integration and analyses of these high-throughput data have unraveled many novel molecular aberrations and network alterations in tumors. These molecular alterations include multiple cancer-driving mutations, gene fusions, amplification, deletion, and post-translational modifications, among others. Many of these genomic events are being used in cancer diagnosis, whereas others are therapeutically targeted with small-molecule inhibitors. Multiple genes/enzymes that play a role in DNA and histone modifications are also altered in various cancers, changing the epigenomic landscape during cancer initiation and progression. Apart from protein-coding genes, studies are uncovering the critical regulatory roles played by noncoding RNAs and noncoding regions of the genome during cancer progression. Many of these genomic and epigenetic events function in tandem to drive tumor development and metastasis. Concurrent advances in genome-modulating technologies, like gene silencing and genome editing, are providing ability to understand in detail the process of cancer initiation, progression, and signaling as well as opening up avenues for therapeutic targeting. In this review, we discuss some of the recent advances in cancer genomic and epigenomic research. Copyright © 2016 American Society for Investigative Pathology. Published by Elsevier Inc. All rights reserved.
Multiple origins of interdependent endosymbiotic complexes in a genus of cicadas.

PubMed

Łukasik, Piotr; Nazario, Katherine; Van Leuven, James T; Campbell, Matthew A; Meyer, Mariah; Michalik, Anna; Pessacq, Pablo; Simon, Chris; Veloso, Claudio; McCutcheon, John P

2018-01-09

Bacterial endosymbionts that provide nutrients to hosts often have genomes that are extremely stable in structure and gene content. In contrast, the genome of the endosymbiont Hodgkinia cicadicola has fractured into multiple distinct lineages in some species of the cicada genus Tettigades To better understand the frequency, timing, and outcomes of Hodgkinia lineage splitting throughout this cicada genus, we sampled cicadas over three field seasons in Chile and performed genomics and microscopy on representative samples. We found that a single ancestral Hodgkinia lineage has split at least six independent times in Tettigades over the last 4 million years, resulting in complexes of between two and six distinct Hodgkinia lineages per host. Individual genomes in these symbiotic complexes differ dramatically in relative abundance, genome size, organization, and gene content. Each Hodgkinia lineage retains a small set of core genes involved in genetic information processing, but the high level of gene loss experienced by all genomes suggests that extensive sharing of gene products among symbiont cells must occur. In total, Hodgkinia complexes that consist of multiple lineages encode nearly complete sets of genes present on the ancestral single lineage and presumably perform the same functions as symbionts that have not undergone splitting. However, differences in the timing of the splits, along with dissimilar gene loss patterns on the resulting genomes, have led to very different outcomes of lineage splitting in extant cicadas.
Evaluating cell lines as tumour models by comparison of genomic profiles

PubMed Central

Domcke, Silvia; Sinha, Rileen; Levine, Douglas A.; Sander, Chris; Schultz, Nikolaus

2013-01-01

Cancer cell lines are frequently used as in vitro tumour models. Recent molecular profiles of hundreds of cell lines from The Cancer Cell Line Encyclopedia and thousands of tumour samples from the Cancer Genome Atlas now allow a systematic genomic comparison of cell lines and tumours. Here we analyse a panel of 47 ovarian cancer cell lines and identify those that have the highest genetic similarity to ovarian tumours. Our comparison of copy-number changes, mutations and mRNA expression profiles reveals pronounced differences in molecular profiles between commonly used ovarian cancer cell lines and high-grade serous ovarian cancer tumour samples. We identify several rarely used cell lines that more closely resemble cognate tumour profiles than commonly used cell lines, and we propose these lines as the most suitable models of ovarian cancer. Our results indicate that the gap between cell lines and tumours can be bridged by genomically informed choices of cell line models for all tumour types. PMID:23839242
Applications of the 1000 Genomes Project resources

PubMed Central

Zheng-Bradley, Xiangqun

2017-01-01

Abstract The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. PMID:27436001
An analysis of synteny of Arachis with Lotus and Medicago sheds new light on the structure, stability and evolution of legume genomes

PubMed Central

Bertioli, David J; Moretzsohn, Marcio C; Madsen, Lene H; Sandal, Niels; Leal-Bertioli, Soraya CM; Guimarães, Patricia M; Hougaard, Birgit K; Fredslund, Jakob; Schauser, Leif; Nielsen, Anna M; Sato, Shusei; Tabata, Satoshi; Cannon, Steven B; Stougaard, Jens

2009-01-01

Background Most agriculturally important legumes fall within two sub-clades of the Papilionoid legumes: the Phaseoloids and Galegoids, which diverged about 50 Mya. The Phaseoloids are mostly tropical and include crops such as common bean and soybean. The Galegoids are mostly temperate and include clover, fava bean and the model legumes Lotus and Medicago (both with substantially sequenced genomes). In contrast, peanut (Arachis hypogaea) falls in the Dalbergioid clade which is more basal in its divergence within the Papilionoids. The aim of this work was to integrate the genetic map of Arachis with Lotus and Medicago and improve our understanding of the Arachis genome and legume genomes in general. To do this we placed on the Arachis map, comparative anchor markers defined using a previously described bioinformatics pipeline. Also we investigated the possible role of transposons in the patterns of synteny that were observed. Results The Arachis genetic map was substantially aligned with Lotus and Medicago with most synteny blocks presenting a single main affinity to each genome. This indicates that the last common whole genome duplication within the Papilionoid legumes predated the divergence of Arachis from the Galegoids and Phaseoloids sufficiently that the common ancestral genome was substantially diploidized. The Arachis and model legume genomes comparison made here, together with a previously published comparison of Lotus and Medicago allowed all possible Arachis-Lotus-Medicago species by species comparisons to be made and genome syntenies observed. Distinct conserved synteny blocks and non-conserved regions were present in all genome comparisons, implying that certain legume genomic regions are consistently more stable during evolution than others. We found that in Medicago and possibly also in Lotus, retrotransposons tend to be more frequent in the variable regions. Furthermore, while these variable regions generally have lower densities of single copy genes than the more conserved regions, some harbor high densities of the fast evolving disease resistance genes. Conclusion We suggest that gene space in Papilionoids may be divided into two broadly defined components: more conserved regions which tend to have low retrotransposon densities and are relatively stable during evolution; and variable regions that tend to have high retrotransposon densities, and whose frequent restructuring may fuel the evolution of some gene families. PMID:19166586
A Genome Wide Association Study Identifies Common Variants Associated with Lipid Levels in the Chinese Population

PubMed Central

Wu, Chen; Yang, Handong; Yu, Dianke; Yang, Xiaobo; Zhang, Xiaomin; Wang, Yiqin; Sun, Jielin; Gao, Yong; Tan, Aihua; He, Yunfeng; Zhang, Haiying; Qin, Xue; Zhu, Jingwen; Li, Huaixing; Lin, Xu; Zhu, Jiang; Min, Xinwen; Lang, Mingjian; Li, Dongfeng; Zhai, Kan; Chang, Jiang; Tan, Wen; Yuan, Jing; Chen, Weihong; Wang, Youjie; Wei, Sheng; Miao, Xiaoping; Wang, Feng; Fang, Weimin; Liang, Yuan; Deng, Qifei; Dai, Xiayun; Lin, Dafeng; Huang, Suli; Guo, Huan; Lilly Zheng, S.; Xu, Jianfeng; Lin, Dongxin; Hu, Frank B.; Wu, Tangchun

2013-01-01

Plasma lipid levels are important risk factors for cardiovascular disease and are influenced by genetic and environmental factors. Recent genome wide association studies (GWAS) have identified several lipid-associated loci, but these loci have been identified primarily in European populations. In order to identify genetic markers for lipid levels in a Chinese population and analyze the heterogeneity between Europeans and Asians, especially Chinese, we performed a meta-analysis of two genome wide association studies on four common lipid traits including total cholesterol (TC), triglycerides (TG), low-density lipoprotein cholesterol (LDL) and high-density lipoprotein cholesterol (HDL) in a Han Chinese population totaling 3,451 healthy subjects. Replication was performed in an additional 8,830 subjects of Han Chinese ethnicity. We replicated eight loci associated with lipid levels previously reported in a European population. The loci genome wide significantly associated with TC were near DOCK7, HMGCR and ABO; those genome wide significantly associated with TG were near APOA1/C3/A4/A5 and LPL; those genome wide significantly associated with LDL were near HMGCR, ABO and TOMM40; and those genome wide significantly associated with HDL were near LPL, LIPC and CETP. In addition, an additive genotype score of eight SNPs representing the eight loci that were found to be associated with lipid levels was associated with higher TC, TG and LDL levels (P = 5.52×10-16, 1.38×10-6 and 5.59×10-9, respectively). These findings suggest the cumulative effects of multiple genetic loci on plasma lipid levels. Comparisons with previous GWAS of lipids highlight heterogeneity in allele frequency and in effect size for some loci between Chinese and European populations. The results from our GWAS provided comprehensive and convincing evidence of the genetic determinants of plasma lipid levels in a Chinese population. PMID:24386095
Metagenomic Signatures of Bacterial Adaptation to Life in the Phyllosphere of a Salt-Secreting Desert Tree.

PubMed

Finkel, Omri M; Delmont, Tom O; Post, Anton F; Belkin, Shimshon

2016-05-01

The leaves of Tamarix aphylla, a globally distributed, salt-secreting desert tree, are dotted with alkaline droplets of high salinity. To successfully inhabit these organic carbon-rich droplets, bacteria need to be adapted to multiple stress factors, including high salinity, high alkalinity, high UV radiation, and periodic desiccation. To identify genes that are important for survival in this harsh habitat, microbial community DNA was extracted from the leaf surfaces of 10 Tamarix aphylla trees along a 350-km longitudinal gradient. Shotgun metagenomic sequencing, contig assembly, and binning yielded 17 genome bins, six of which were >80% complete. These genomic bins, representing three phyla (Proteobacteria,Bacteroidetes, and Firmicutes), were closely related to halophilic and alkaliphilic taxa isolated from aquatic and soil environments. Comparison of these genomic bins to the genomes of their closest relatives revealed functional traits characteristic of bacterial populations inhabiting the Tamarix phyllosphere, independent of their taxonomic affiliation. These functions, most notably light-sensing genes, are postulated to represent important adaptations toward colonization of this habitat. Plant leaves are an extensive and diverse microbial habitat, forming the main interface between solar energy and the terrestrial biosphere. There are hundreds of thousands of plant species in the world, exhibiting a wide range of morphologies, leaf surface chemistries, and ecological ranges. In order to understand the core adaptations of microorganisms to this habitat, it is important to diversify the type of leaves that are studied. This study provides an analysis of the genomic content of the most abundant bacterial inhabitants of the globally distributed, salt-secreting desert tree Tamarix aphylla Draft genomes of these bacteria were assembled, using the culture-independent technique of assembly and binning of metagenomic data. Analysis of the genomes reveals traits that are important for survival in this habitat, most notably, light-sensing and light utilization genes. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Segmental duplications: evolution and impact among the current Lepidoptera genomes.

PubMed

Zhao, Qian; Ma, Dongna; Vasseur, Liette; You, Minsheng

2017-07-06

Structural variation among genomes is now viewed to be as important as single nucleoid polymorphisms in influencing the phenotype and evolution of a species. Segmental duplication (SD) is defined as segments of DNA with homologous sequence. Here, we performed a systematic analysis of segmental duplications (SDs) among five lepidopteran reference genomes (Plutella xylostella, Danaus plexippus, Bombyx mori, Manduca sexta and Heliconius melpomene) to understand their potential impact on the evolution of these species. We find that the SDs content differed substantially among species, ranging from 1.2% of the genome in B. mori to 15.2% in H. melpomene. Most SDs formed very high identity (similarity higher than 90%) blocks but had very few large blocks. Comparative analysis showed that most of the SDs arose after the divergence of each linage and we found that P. xylostella and H. melpomene showed more duplications than other species, suggesting they might be able to tolerate extensive levels of variation in their genomes. Conserved ancestral and species specific SD events were assessed, revealing multiple examples of the gain, loss or maintenance of SDs over time. SDs content analysis showed that most of the genes embedded in SDs regions belonged to species-specific SDs ("Unique" SDs). Functional analysis of these genes suggested their potential roles in the lineage-specific evolution. SDs and flanking regions often contained transposable elements (TEs) and this association suggested some involvement in SDs formation. Further studies on comparison of gene expression level between SDs and non-SDs showed that the expression level of genes embedded in SDs was significantly lower, suggesting that structure changes in the genomes are involved in gene expression differences in species. The results showed that most of the SDs were "unique SDs", which originated after species formation. Functional analysis suggested that SDs might play different roles in different species. Our results provide a valuable resource beyond the genetic mutation to explore the genome structure for future Lepidoptera research.

Lineage tracing of genome-edited alleles reveals high fidelity axolotl limb regeneration.

PubMed

Flowers, Grant Parker; Sanor, Lucas D; Crews, Craig M

2017-09-16

Salamanders are unparalleled among tetrapods in their ability to regenerate many structures, including entire limbs, and the study of this ability may provide insights into human regenerative therapies. The complex structure of the limb poses challenges to the investigation of the cellular and molecular basis of its regeneration. Using CRISPR/Cas, we genetically labelled unique cell lineages within the developing axolotl embryo and tracked the frequency of each lineage within amputated and fully regenerated limbs. This allowed us, for the first time, to assess the contributions of multiple low frequency cell lineages to the regenerating limb at once. Our comparisons reveal that regenerated limbs are high fidelity replicas of the originals even after repeated amputations.
Comparison of the complete genome sequences of four γ-hexachlorocyclohexane-degrading bacterial strains: insights into the evolution of bacteria able to degrade a recalcitrant man-made pesticide.

PubMed

Tabata, Michiro; Ohhata, Satoshi; Nikawadori, Yuki; Kishida, Kouhei; Sato, Takuya; Kawasumi, Toru; Kato, Hiromi; Ohtsubo, Yoshiyuki; Tsuda, Masataka; Nagata, Yuji

2016-12-01

γ-Hexachlorocyclohexane (γ-HCH) is a recalcitrant man-made chlorinated pesticide. Here, the complete genome sequences of four γ-HCH-degrading sphingomonad strains, which are most unlikely to have been derived from one ancestral γ-HCH degrader, were compared. Together with several experimental data, we showed that (i) all the four strains carry almost identical linA to linE genes for the conversion of γ-HCH to maleylacetate (designated "specific" lin genes), (ii) considerably different genes are used for the metabolism of maleylacetate in one of the four strains, and (iii) the linKLMN genes for the putative ABC transporter necessary for γ-HCH utilization exhibit structural divergence, which reflects the phylogenetic relationship of their hosts. Replicon organization and location of the lin genes in the four genomes are significantly different with one another, and that most of the specific lin genes are located on multiple sphingomonad-unique plasmids. Copies of IS6100, the most abundant insertion sequence in the four strains, are often located in close proximity to the specific lin genes. Analysis of the footprints of target duplication upon IS6100 transposition and the experimental detection of IS6100 transposition strongly suggested that the IS6100 transposition has caused dynamic genome rearrangements and the diversification of lin-flanking regions in the four strains. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Whole genome analysis of an MDR Beijing/W strain of Mycobacterium tuberculosis with large genomic deletions associated with resistance to isoniazid.

PubMed

Zhang, Qiufen; Wan, Baoshan; Zhou, Aiping; Ni, Jinjing; Xu, Zhihong; Li, Shuxian; Tao, Jing; Yao, YuFeng

2016-05-15

Mycobacterium tuberculosis (M.tb) is one of the most prevalent bacterial pathogens in the world. With geographical wide spread and hypervirulence, Beijing/W family is the most successful M.tb lineage. China is a country of high tuberculosis (TB) and high multiple drug-resistant TB (MDR-TB) burden, and the Beijing/W family strains take the largest share of MDR strains. To study the genetic basis of Beijing/W family strains' virulence and drug resistance, we performed the whole genome sequencing of M.tb strain W146, a clinical Beijing/W genotype MDR isolated from Wuxi, Jiangsu province, China. Compared with genome sequence of M.tb strain H37Rv, we found that strain W146 lacks three large fragments and the missing of furA-katG operon confers isoniazid resistance. Besides the missing of furA-katG operon, strain W146 harbored almost all known drug resistance-associated mutations. Comparison analysis of single nucleotide polymorphisms (SNPs) and indels between strain W146 and Beijing/W genotype strains and non-Beijing/W genotype strains revealed that strain W146 possessed some unique mutations, which may be related to drug resistance, transmission and pathogenicity. These findings will help to understand the large sequence polymorphisms (LSPs) and the transmission and drug resistance related genetic characteristics of the Beijing/W genotype of M.tb. Copyright © 2016 Elsevier B.V. All rights reserved.
Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Leung, Elo; Huang, Amy; Cadag, Eithon

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Common genetic variants related to genomic integrity and risk of papillary thyroid cancer

PubMed Central

Neta, Gila; Brenner, Alina V.; Sturgis, Erich M.; Pfeiffer, Ruth M.; Hutchinson, Amy A.; Aschebrook-Kilfoy, Briseis; Yeager, Meredith; Xu, Li; Wheeler, William; Abend, Michael; Ron, Elaine; Tucker, Margaret A.; Chanock, Stephen J.; Sigurdson, Alice J.

2011-01-01

DNA damage is an important mechanism in carcinogenesis, so genes related to maintaining genomic integrity may influence papillary thyroid cancer (PTC) risk. Candidate gene studies targeting some of these genes have identified only a few polymorphisms associated with risk of PTC. Here, we expanded the scope of previous candidate studies by increasing the number and coverage of genes related to maintenance of genomic integrity. We evaluated 5077 tag single-nucleotide polymorphisms (SNPs) from 340 candidate gene regions hypothesized to be involved in DNA repair, epigenetics, tumor suppression, apoptosis, telomere function and cell cycle control and signaling pathways in a case–control study of 344 PTC cases and 452 matched controls. We estimated odds ratios for associations of single SNPs with PTC risk and combined P values for SNPs in the same gene region or pathway to obtain gene region-specific or pathway-specific P values using adaptive rank-truncated product methods. Nine SNPs had P values <0.0005, three of which were in HDAC4 and were inversely related to PTC risk. After multiple comparisons adjustment, no SNPs remained associated with PTC risk. Seven gene regions were associated with PTC risk at P < 0.01, including HUS1, ALKBH3, HDAC4, BAK1, FAF1_CDKN2C, DACT3 and FZD6. Our results suggest a possible role of genes involved in maintenance of genomic integrity in relation to risk of PTC. PMID:21642358
Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

DOE PAGES

Leung, Elo; Huang, Amy; Cadag, Eithon; ...

2016-01-20

In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Gene amplification confers glyphosate resistance in Amaranthus palmeri

PubMed Central

Gaines, Todd A.; Zhang, Wenli; Wang, Dafu; Bukun, Bekir; Chisholm, Stephen T.; Shaner, Dale L.; Nissen, Scott J.; Patzoldt, William L.; Tranel, Patrick J.; Culpepper, A. Stanley; Grey, Timothy L.; Webster, Theodore M.; Vencill, William K.; Sammons, R. Douglas; Jiang, Jiming; Preston, Christopher; Leach, Jan E.; Westra, Philip

2009-01-01

The herbicide glyphosate became widely used in the United States and other parts of the world after the commercialization of glyphosate-resistant crops. These crops have constitutive overexpression of a glyphosate-insensitive form of the herbicide target site gene, 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS). Increased use of glyphosate over multiple years imposes selective genetic pressure on weed populations. We investigated recently discovered glyphosate-resistant Amaranthus palmeri populations from Georgia, in comparison with normally sensitive populations. EPSPS enzyme activity from resistant and susceptible plants was equally inhibited by glyphosate, which led us to use quantitative PCR to measure relative copy numbers of the EPSPS gene. Genomes of resistant plants contained from 5-fold to more than 160-fold more copies of the EPSPS gene than did genomes of susceptible plants. Quantitative RT-PCR on cDNA revealed that EPSPS expression was positively correlated with genomic EPSPS relative copy number. Immunoblot analyses showed that increased EPSPS protein level also correlated with EPSPS genomic copy number. EPSPS gene amplification was heritable, correlated with resistance in pseudo-F2 populations, and is proposed to be the molecular basis of glyphosate resistance. FISH revealed that EPSPS genes were present on every chromosome and, therefore, gene amplification was likely not caused by unequal chromosome crossing over. This occurrence of gene amplification as an herbicide resistance mechanism in a naturally occurring weed population is particularly significant because it could threaten the sustainable use of glyphosate-resistant crop technology. PMID:20018685
Patterns of genome evolution that have accompanied host adaptation in Salmonella

PubMed Central

Langridge, Gemma C.; Fookes, Maria; Connor, Thomas R.; Feltwell, Theresa; Feasey, Nicholas; Parsons, Bryony N.; Seth-Smith, Helena M. B.; Barquist, Lars; Stedman, Anna; Humphrey, Tom; Wigley, Paul; Peters, Sarah E.; Maskell, Duncan J.; Corander, Jukka; Chabalgoity, Jose A.; Barrow, Paul; Parkhill, Julian; Dougan, Gordon; Thomson, Nicholas R.

2015-01-01

Many bacterial pathogens are specialized, infecting one or few hosts, and this is often associated with more acute disease presentation. Specific genomes show markers of this specialization, which often reflect a balance between gene acquisition and functional gene loss. Within Salmonella enterica subspecies enterica, a single lineage exists that includes human and animal pathogens adapted to cause infection in different hosts, including S. enterica serovar Enteritidis (multiple hosts), S. Gallinarum (birds), and S. Dublin (cattle). This provides an excellent evolutionary context in which differences between these pathogen genomes can be related to host range. Genome sequences were obtained from ∼60 isolates selected to represent the known diversity of this lineage. Examination and comparison of the clades within the phylogeny of this lineage revealed signs of host restriction as well as evolutionary events that mark a path to host generalism. We have identified the nature and order of events for both evolutionary trajectories. The impact of functional gene loss was predicted based upon position within metabolic pathways and confirmed with phenotyping assays. The structure of S. Enteritidis is more complex than previously known, as a second clade of S. Enteritidis was revealed that is distinct from those commonly seen to cause disease in humans or animals, and that is more closely related to S. Gallinarum. Isolates from this second clade were tested in a chick model of infection and exhibited a reduced colonization phenotype, which we postulate represents an intermediate stage in pathogen–host adaptation. PMID:25535353
USE OF COMPETITIVE DNA HYBRIDIZATION TO IDENTIFY DIFFERENCES IN THE GENOMES OF TWO CLOSELY RELATED FECAL INDICATOR BACTERIA

EPA Science Inventory

Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, comparisons of closely related bacterial species and individual isolates by whole-genome sequencing approaches remains prohibitively expens...
Genomic islands of divergence are not affected by geography of speciation in sunflowers.

PubMed

Renaut, S; Grassa, C J; Yeaman, S; Moyers, B T; Lai, Z; Kane, N C; Bowers, J E; Burke, J M; Rieseberg, L H

2013-01-01

Genomic studies of speciation often report the presence of highly differentiated genomic regions interspersed within a milieu of weakly diverged loci. The formation of these speciation islands is generally attributed to reduced inter-population gene flow near loci under divergent selection, but few studies have critically evaluated this hypothesis. Here, we report on transcriptome scans among four recently diverged pairs of sunflower (Helianthus) species that vary in the geographical context of speciation. We find that genetic divergence is lower in sympatric and parapatric comparisons, consistent with a role for gene flow in eroding neutral differences. However, genomic islands of divergence are numerous and small in all comparisons, and contrary to expectations, island number and size are not significantly affected by levels of interspecific gene flow. Rather, island formation is strongly associated with reduced recombination rates. Overall, our results indicate that the functional architecture of genomes plays a larger role in shaping genomic divergence than does the geography of speciation.
Genome Analysis of Staphylococcus agnetis, an Agent of Lameness in Broiler Chickens

PubMed Central

Ojha, Sohita; Pummill, Jeff F.; Koon, Joseph A.; Wideman, Robert F.; Rhoads, Douglas D.

2015-01-01

Lameness in broiler chickens is a significant animal welfare and financial issue. Lameness can be enhanced by rearing young broilers on wire flooring. We have identified Staphylococcus agnetis as significantly involved in bacterial chondronecrosis with osteomyelitis (BCO) in proximal tibia and femorae, leading to lameness in broiler chickens in the wire floor system. Administration of S. agnetis in water induces lameness. Previously reported in some cases of cattle mastitis, this is the first report of this poorly described pathogen in chickens. We used long and short read next generation sequencing to assemble single finished contigs for the genome and a large plasmid from the chicken pathogen. Comparison of the S. agnetis genome to those of other pathogenic Staphylococci shows that S.agnetis contains a distinct repertoire of virulence determinants. Additionally, the S. agnetis genome has several regions that differ substantially from the genomes of other pathogenic Staphylococci. Comparison of our finished genome to a recent draft genome for a cattle mastitis isolate suggests that future investigations focus on the evolutionary epidemiology of this emerging pathogen of domestic animals. PMID:26606420
Conserved microstructure of the Brassica B Genome of Brassica nigra in relation to homologous regions of Arabidopsis thaliana, B. rapa and B. oleracea

PubMed Central

2013-01-01

Background The Brassica B genome is known to carry several important traits, yet there has been limited analyses of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb region of A. thaliana that had been well characterised in both the Brassica A and C genomes. Results Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding regions in B. nigra. The target region is duplicated in A. thaliana and six homologous contigs were found in B. nigra resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. Conclusions Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement differentiated the B genome suggesting such events could contribute to the lack of recombination observed between B genome species and those of the closely related A/C lineage. PMID:23586706
Draft genome sequence of the coccolithovirus Emiliania huxleyi virus 202.

PubMed

Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

2012-02-01

Emiliania huxleyi virus 202 (EhV-202) is a member of the Coccolithoviridae, a group of viruses that infect the marine coccolithophorid Emiliania huxleyi. EhV-202 has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 407 kbp, consisting of 485 coding sequences (CDSs). Here we describe the genomic features of EhV-202, together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome.
Draft genome sequence of the Coccolithovirus Emiliania huxleyi virus 203.

PubMed

Nissimov, Jozef I; Worthy, Charlotte A; Rooks, Paul; Napier, Johnathan A; Kimmance, Susan A; Henn, Matthew R; Ogata, Hiroyuki; Allen, Michael J

2011-12-01

The Coccolithoviridae are a recently discovered group of viruses that infect the marine coccolithophorid Emiliania huxleyi. Emiliania huxleyi virus 203 (EhV-203) has a 160- to 180-nm-diameter icosahedral structure and a genome of approximately 400 kbp, consisting of 464 coding sequences (CDSs). Here we describe the genomic features of EhV-203 together with a draft genome sequence and its annotation, highlighting the homology and heterogeneity of this genome in comparison with the EhV-86 reference genome.
The Chloroplast Genome of Symplocarpus renifolius: A Comparison of Chloroplast Genome Structure in Araceae

PubMed Central

Park, Kyu Tae

2017-01-01

Symplocarpus renifolius is a member of Araceae family that is extraordinarily diverse in appearance. Previous studies on chloroplast genomes in Araceae were focused on duckweeds (Lemnoideae) and root crops (Colocasia, commonly known as taro). Here, we determined the chloroplast genome of Symplocarpus renifolius and compared the factors, such as genes and inverted repeat (IR) junctions and performed phylogenetic analysis using other Araceae species. The chloroplast genome of S. renifolius is 158,521 bp and includes 113 genes. A comparison among the Araceae chloroplast genomes showed that infA in Lemna, Spirodela, Wolffiella, Wolffia, Dieffenbachia and Colocasia has been lost or has become a pseudogene and has only been retained in Symplocarpus. In the Araceae chloroplast DNA (cpDNA), psbZ is retained. However, psbZ duplication occurred in Wolffia species and tandem repeats were noted around the duplication regions. A comparison of the IR junction in Araceae species revealed the presence of ycf1 and rps15 in the small single copy region, whereas duckweed species contained ycf1 and rps15 in the IR region. The phylogenetic analyses of the chloroplast genomes revealed that Symplocarpus are a basal group and are sister to the other Araceae species. Consequently, infA deletion or pseudogene events in Araceae occurred after the divergence of Symplocarpus and aquatic plants (duckweeds) in Araceae and duplication events of rps15 and ycf1 occurred in the IR region. PMID:29144427
Variation block-based genomics method for crop plants.

PubMed

Kim, Yul Ho; Park, Hyang Mi; Hwang, Tae-Young; Lee, Seuk Ki; Choi, Man Soo; Jho, Sungwoong; Hwang, Seungwoo; Kim, Hak-Min; Lee, Dongwoo; Kim, Byoung-Chul; Hong, Chang Pyo; Cho, Yun Sung; Kim, Hyunmin; Jeong, Kwang Ho; Seo, Min Jung; Yun, Hong Tai; Kim, Sun Lim; Kwon, Young-Up; Kim, Wook Han; Chun, Hye Kyung; Lim, Sang Jong; Shin, Young-Ah; Choi, Ik-Young; Kim, Young Sun; Yoon, Ho-Sung; Lee, Suk-Ha; Lee, Sunghoon

2014-06-15

In contrast with wild species, cultivated crop genomes consist of reshuffled recombination blocks, which occurred by crossing and selection processes. Accordingly, recombination block-based genomics analysis can be an effective approach for the screening of target loci for agricultural traits. We propose the variation block method, which is a three-step process for recombination block detection and comparison. The first step is to detect variations by comparing the short-read DNA sequences of the cultivar to the reference genome of the target crop. Next, sequence blocks with variation patterns are examined and defined. The boundaries between the variation-containing sequence blocks are regarded as recombination sites. All the assumed recombination sites in the cultivar set are used to split the genomes, and the resulting sequence regions are termed variation blocks. Finally, the genomes are compared using the variation blocks. The variation block method identified recurring recombination blocks accurately and successfully represented block-level diversities in the publicly available genomes of 31 soybean and 23 rice accessions. The practicality of this approach was demonstrated by the identification of a putative locus determining soybean hilum color. We suggest that the variation block method is an efficient genomics method for the recombination block-level comparison of crop genomes. We expect that this method will facilitate the development of crop genomics by bringing genomics technologies to the field of crop breeding.
MANTIS: a phylogenetic framework for multi-species genome comparisons.

PubMed

Tzika, Athanasia C; Helaers, Raphaël; Van de Peer, Yves; Milinkovitch, Michel C

2008-01-15

Practitioners of comparative genomics face huge analytical challenges as whole genome sequences and functional/expression data accumulate. Furthermore, the field would greatly benefit from a better integration of this wealth of data with evolutionary concepts. Here, we present MANTIS, a relational database for the analysis of (i) gains and losses of genes on specific branches of the metazoan phylogeny, (ii) reconstructed genome content of ancestral species and (iii) over- or under-representation of functions/processes and tissue specificity of gained, duplicated and lost genes. MANTIS estimates the most likely positions of gene losses on the true phylogeny using a maximum-likelihood function. A user-friendly interface and an extensive query system allow to investigate questions pertaining to gene identity, phylogenetic mapping and function/expression parameters. MANTIS is freely available at http://www.mantisdb.org and constitutes the missing link between multi-species genome comparisons and functional analyses.
Draft Genomes of Anopheles cracens and Anopheles maculatus: Comparison of Simian Malaria and Human Malaria Vectors in Peninsular Malaysia

PubMed Central

Chen, Junhui; Zhong, Zhen; Jian, Jianbo; Amir, Amirah; Cheong, Fei-Wen; Sum, Jia-Siang; Fong, Mun-Yik

2016-01-01

Anopheles cracens has been incriminated as the vector of human knowlesi malaria in peninsular Malaysia. Besides, it is a good laboratory vector of Plasmodium falciparum and P. vivax. The distribution of An. cracens overlaps with that of An. maculatus, the human malaria vector in peninsular Malaysia that seems to be refractory to P. knowlesi infection in natural settings. Whole genome sequencing was performed on An. cracens and An. maculatus collected here. The draft genome of An. cracens was 395 Mb in size whereas the size of An. maculatus draft genome was 499 Mb. Comparison with the published Malaysian An. maculatus genome suggested the An. maculatus specimen used in this study as a different geographical race. Comparative analyses highlighted the similarities and differences between An. cracens and An. maculatus, providing new insights into their biological behavior and characteristics. PMID:27347683
[Investigation of RNA viral genome amplification by multiple displacement amplification technique].

PubMed

Pang, Zheng; Li, Jian-Dong; Li, Chuan; Liang, Mi-Fang; Li, De-Xin

2013-06-01

In order to facilitate the detection of newly emerging or rare viral infectious diseases, a negative-strand RNA virus-severe fever with thrombocytopenia syndrome bunyavirus, and a positive-strand RNA virus-dengue virus, were used to investigate RNA viral genome unspecific amplification by multiple displacement amplification technique from clinical samples. Series of 10-fold diluted purified viral RNA were utilized as analog samples with different pathogen loads, after a series of reactions were sequentially processed, single-strand cDNA, double-strand cDNA, double-strand cDNA treated with ligation without or with supplemental RNA were generated, then a Phi29 DNA polymerase depended isothermal amplification was employed, and finally the target gene copies were detected by real time PCR assays to evaluate the amplification efficiencies of various methods. The results showed that multiple displacement amplification effects of single-strand or double-strand cDNA templates were limited, while the fold increases of double-strand cDNA templates treated with ligation could be up to 6 X 10(3), even 2 X 10(5) when supplemental RNA existed, and better results were obtained when viral RNA loads were lower. A RNA viral genome amplification system using multiple displacement amplification technique was established in this study and effective amplification of RNA viral genome with low load was achieved, which could provide a tool to synthesize adequate viral genome for multiplex pathogens detection.
Comparison of Burrows-Wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to Illumina data for livestock genomes

USDA-ARS?s Scientific Manuscript database

Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...

The Genome of the Anaerobic Fungus Orpinomyces sp. Strain C1A Reveals the Unique Evolutionary History of a Remarkable Plant Biomass Degrader

PubMed Central

Youssef, Noha H.; Couger, M. B.; Struchtemeyer, Christopher G.; Liggenstoffer, Audra S.; Prade, Rolf A.; Najar, Fares Z.; Atiyeh, Hasan K.; Wilkins, Mark R.

2013-01-01

Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production. PMID:23709508
The genome of the anaerobic fungus Orpinomyces sp. strain C1A reveals the unique evolutionary history of a remarkable plant biomass degrader.

PubMed

Youssef, Noha H; Couger, M B; Struchtemeyer, Christopher G; Liggenstoffer, Audra S; Prade, Rolf A; Najar, Fares Z; Atiyeh, Hasan K; Wilkins, Mark R; Elshahed, Mostafa S

2013-08-01

Anaerobic gut fungi represent a distinct early-branching fungal phylum (Neocallimastigomycota) and reside in the rumen, hindgut, and feces of ruminant and nonruminant herbivores. The genome of an anaerobic fungal isolate, Orpinomyces sp. strain C1A, was sequenced using a combination of Illumina and PacBio single-molecule real-time (SMRT) technologies. The large genome (100.95 Mb, 16,347 genes) displayed extremely low G+C content (17.0%), large noncoding intergenic regions (73.1%), proliferation of microsatellite repeats (4.9%), and multiple gene duplications. Comparative genomic analysis identified multiple genes and pathways that are absent in Dikarya genomes but present in early-branching fungal lineages and/or nonfungal Opisthokonta. These included genes for posttranslational fucosylation, the production of specific intramembrane proteases and extracellular protease inhibitors, the formation of a complete axoneme and intraflagellar trafficking machinery, and a near-complete focal adhesion machinery. Analysis of the lignocellulolytic machinery in the C1A genome revealed an extremely rich repertoire, with evidence of horizontal gene acquisition from multiple bacterial lineages. Experimental analysis indicated that strain C1A is a remarkable biomass degrader, capable of simultaneous saccharification and fermentation of the cellulosic and hemicellulosic fractions in multiple untreated grasses and crop residues examined, with the process significantly enhanced by mild pretreatments. This capability, acquired during its separate evolutionary trajectory in the rumen, along with its resilience and invasiveness compared to prokaryotic anaerobes, renders anaerobic fungi promising agents for consolidated bioprocessing schemes in biofuels production.
Evidence-based gene models for structural and functional annotations of the oil palm genome.

PubMed

Chan, Kuang-Lim; Tatarinova, Tatiana V; Rosli, Rozana; Amiruddin, Nadzirah; Azizi, Norazah; Halim, Mohd Amin Ab; Sanusi, Nik Shazana Nik Mohd; Jayanthi, Nagappan; Ponomarenko, Petr; Triska, Martin; Solovyev, Victor; Firdaus-Raih, Mohd; Sambanthamurthi, Ravigadevi; Murphy, Denis; Low, Eng-Ti Leslie

2017-09-08

Oil palm is an important source of edible oil. The importance of the crop, as well as its long breeding cycle (10-12 years) has led to the sequencing of its genome in 2013 to pave the way for genomics-guided breeding. Nevertheless, the first set of gene predictions, although useful, had many fragmented genes. Classification and characterization of genes associated with traits of interest, such as those for fatty acid biosynthesis and disease resistance, were also limited. Lipid-, especially fatty acid (FA)-related genes are of particular interest for the oil palm as they specify oil yields and quality. This paper presents the characterization of the oil palm genome using different gene prediction methods and comparative genomics analysis, identification of FA biosynthesis and disease resistance genes, and the development of an annotation database and bioinformatics tools. Using two independent gene-prediction pipelines, Fgenesh++ and Seqping, 26,059 oil palm genes with transcriptome and RefSeq support were identified from the oil palm genome. These coding regions of the genome have a characteristic broad distribution of GC 3 (fraction of cytosine and guanine in the third position of a codon) with over half the GC 3 -rich genes (GC 3 ≥ 0.75286) being intronless. In comparison, only one-seventh of the oil palm genes identified are intronless. Using comparative genomics analysis, characterization of conserved domains and active sites, and expression analysis, 42 key genes involved in FA biosynthesis in oil palm were identified. For three of them, namely EgFABF, EgFABH and EgFAD3, segmental duplication events were detected. Our analysis also identified 210 candidate resistance genes in six classes, grouped by their protein domain structures. We present an accurate and comprehensive annotation of the oil palm genome, focusing on analysis of important categories of genes (GC 3 -rich and intronless), as well as those associated with important functions, such as FA biosynthesis and disease resistance. The study demonstrated the advantages of having an integrated approach to gene prediction and developed a computational framework for combining multiple genome annotations. These results, available in the oil palm annotation database ( http://palmxplore.mpob.gov.my ), will provide important resources for studies on the genomes of oil palm and related crops. This article was reviewed by Alexander Kel, Igor Rogozin, and Vladimir A. Kuznetsov.
A discrimination index for selecting markers of tumor growth dynamic across multiple cancer studies with a cure fraction.

PubMed

Rouam, Sigrid; Broët, Philippe

2013-08-01

To identify genomic markers with consistent effect on tumor dynamics across multiple cancer series, discrimination indices based on proportional hazards models can be used since they do not depend heavily on the sample size. However, the underlying assumption of proportionality of the hazards does not always hold, especially when the studied population is a mixture of cured and uncured patients, like in early-stage cancers. We propose a novel index that quantifies the capability of a genomic marker to separate uncured patients, according to their time-to-event outcomes. It allows to identify genomic markers characterizing tumor growth dynamic across multiple studies. Simulation results show that our index performs better than classical indices based on the Cox model. It is neither affected by the sample size nor the cure rate fraction. In a cross-study of early-stage breast cancers, the index allows to select genomic markers with a potential consistent effect on tumor growth dynamics. Copyright © 2013 Elsevier Inc. All rights reserved.
Genomic and Epigenomic Insights into Nutrition and Brain Disorders

PubMed Central

Dauncey, Margaret Joy

2013-01-01

Considerable evidence links many neuropsychiatric, neurodevelopmental and neurodegenerative disorders with multiple complex interactions between genetics and environmental factors such as nutrition. Mental health problems, autism, eating disorders, Alzheimer’s disease, schizophrenia, Parkinson’s disease and brain tumours are related to individual variability in numerous protein-coding and non-coding regions of the genome. However, genotype does not necessarily determine neurological phenotype because the epigenome modulates gene expression in response to endogenous and exogenous regulators, throughout the life-cycle. Studies using both genome-wide analysis of multiple genes and comprehensive analysis of specific genes are providing new insights into genetic and epigenetic mechanisms underlying nutrition and neuroscience. This review provides a critical evaluation of the following related areas: (1) recent advances in genomic and epigenomic technologies, and their relevance to brain disorders; (2) the emerging role of non-coding RNAs as key regulators of transcription, epigenetic processes and gene silencing; (3) novel approaches to nutrition, epigenetics and neuroscience; (4) gene-environment interactions, especially in the serotonergic system, as a paradigm of the multiple signalling pathways affected in neuropsychiatric and neurological disorders. Current and future advances in these four areas should contribute significantly to the prevention, amelioration and treatment of multiple devastating brain disorders. PMID:23503168
Phenetic Comparison of Prokaryotic Genomes Using k-mers

PubMed Central

Déraspe, Maxime; Raymond, Frédéric; Boisvert, Sébastien; Culley, Alexander; Roy, Paul H.; Laviolette, François; Corbeil, Jacques

2017-01-01

Abstract Bacterial genomics studies are getting more extensive and complex, requiring new ways to envision analyses. Using the Ray Surveyor software, we demonstrate that comparison of genomes based on their k-mer content allows reconstruction of phenetic trees without the need of prior data curation, such as core genome alignment of a species. We validated the methodology using simulated genomes and previously published phylogenomic studies of Streptococcus pneumoniae and Pseudomonas aeruginosa. We also investigated the relationship of specific genetic determinants with bacterial population structures. By comparing clusters from the complete genomic content of a genome population with clusters from specific functional categories of genes, we can determine how the population structures are correlated. Indeed, the strain clustering based on a subset of k-mers allows determination of its similarity with the whole genome clusters. We also applied this methodology on 42 species of bacteria to determine the correlational significance of five important bacterial genomic characteristics. For example, intrinsic resistance is more important in P. aeruginosa than in S. pneumoniae, and the former has increased correlation of its population structure with antibiotic resistance genes. The global view of the pangenome of bacteria also demonstrated the taxa-dependent interaction of population structure with antibiotic resistance, bacteriophage, plasmid, and mobile element k-mer data sets. PMID:28957508
Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

PubMed Central

2011-01-01

Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921
Single Molecule Analysis of Replicated DNA Reveals the Usage of Multiple KSHV Genome Regions for Latent Replication

PubMed Central

Verma, Subhash C.; Lu, Jie; Cai, Qiliang; Kosiyatrakul, Settapong; McDowell, Maria E.; Schildkraut, Carl L.; Robertson, Erle S.

2011-01-01

Kaposi's sarcoma associated herpesvirus (KSHV), an etiologic agent of Kaposi's sarcoma, Body Cavity Based Lymphoma and Multicentric Castleman's Disease, establishes lifelong latency in infected cells. The KSHV genome tethers to the host chromosome with the help of a latency associated nuclear antigen (LANA). Additionally, LANA supports replication of the latent origins within the terminal repeats by recruiting cellular factors. Our previous studies identified and characterized another latent origin, which supported the replication of plasmids ex-vivo without LANA expression in trans. Therefore identification of an additional origin site prompted us to analyze the entire KSHV genome for replication initiation sites using single molecule analysis of replicated DNA (SMARD). Our results showed that replication of DNA can initiate throughout the KSHV genome and the usage of these regions is not conserved in two different KSHV strains investigated. SMARD also showed that the utilization of multiple replication initiation sites occurs across large regions of the genome rather than a specified sequence. The replication origin of the terminal repeats showed only a slight preference for their usage indicating that LANA dependent origin at the terminal repeats (TR) plays only a limited role in genome duplication. Furthermore, we performed chromatin immunoprecipitation for ORC2 and MCM3, which are part of the pre-replication initiation complex to determine the genomic sites where these proteins accumulate, to provide further characterization of potential replication initiation sites on the KSHV genome. The ChIP data confirmed accumulation of these pre-RC proteins at multiple genomic sites in a cell cycle dependent manner. Our data also show that both the frequency and the sites of replication initiation vary within the two KSHV genomes studied here, suggesting that initiation of replication is likely to be affected by the genomic context rather than the DNA sequences. PMID:22072974
Deep whole-genome sequencing of 100 southeast Asian Malays.

PubMed

Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2013-01-10

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Deep Whole-Genome Sequencing of 100 Southeast Asian Malays

PubMed Central

Wong, Lai-Ping; Ong, Rick Twee-Hee; Poh, Wan-Ting; Liu, Xuanyao; Chen, Peng; Li, Ruoying; Lam, Kevin Koi-Yau; Pillai, Nisha Esakimuthu; Sim, Kar-Seng; Xu, Haiyan; Sim, Ngak-Leng; Teo, Shu-Mei; Foo, Jia-Nee; Tan, Linda Wei-Lin; Lim, Yenly; Koo, Seok-Hwee; Gan, Linda Seo-Hwee; Cheng, Ching-Yu; Wee, Sharon; Yap, Eric Peng-Huat; Ng, Pauline Crystal; Lim, Wei-Yen; Soong, Richie; Wenk, Markus Rene; Aung, Tin; Wong, Tien-Yin; Khor, Chiea-Chuen; Little, Peter; Chia, Kee-Seng; Teo, Yik-Ying

2013-01-01

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies. PMID:23290073
Assessing the presence of shared genetic architecture between Alzheimer's disease and major depressive disorder using genome-wide association data

PubMed Central

Gibson, J; Russ, T C; Adams, M J; Clarke, T-K; Howard, D M; Hall, L S; Fernandez-Pujals, A M; Wigmore, E M; Hayward, C; Davies, G; Murray, A D; Smith, B H; Porteous, D J; Deary, I J; McIntosh, A M

2017-01-01

Major depressive disorder (MDD) and Alzheimer's disease (AD) are both common in older age and frequently co-occur. Numerous phenotypic studies based on clinical diagnoses suggest that a history of depression increases risk of subsequent AD, although the basis of this relationship is uncertain. Both illnesses are polygenic, and shared genetic risk factors could explain some of the observed association. We used genotype data to test whether MDD and AD have an overlapping polygenic architecture in two large population-based cohorts, Generation Scotland's Scottish Family Health Study (GS:SFHS; N=19 889) and UK Biobank (N=25 118), and whether age of depression onset influences any relationship. Using two complementary techniques, we found no evidence that the disorders are influenced by common genetic variants. Using linkage disequilibrium score regression with genome-wide association study (GWAS) summary statistics from the International Genomics of Alzheimer's Project, we report no significant genetic correlation between AD and MDD (rG=−0.103, P=0.59). Polygenic risk scores (PRS) generated using summary data from International Genomics of Alzheimer's Project (IGAP) and the Psychiatric Genomics Consortium were used to assess potential pleiotropy between the disorders. PRS for MDD were nominally associated with participant-recalled AD family history in GS:SFHS, although this association did not survive multiple comparison testing. AD PRS were not associated with depression status or late-onset depression, and a survival analysis showed no association between age of depression onset and genetic risk for AD. This study found no evidence to support a common polygenic structure for AD and MDD, suggesting that the comorbidity of these disorders is not explained by common genetic variants. PMID:28418403
Genome-wide association study of CSF biomarkers Abeta1-42, t-tau, and p-tau181p in the ADNI cohort.

PubMed

Kim, S; Swaminathan, S; Shen, L; Risacher, S L; Nho, K; Foroud, T; Shaw, L M; Trojanowski, J Q; Potkin, S G; Huentelman, M J; Craig, D W; DeChairo, B M; Aisen, P S; Petersen, R C; Weiner, M W; Saykin, A J

2011-01-04

CSF levels of Aβ1-42, t-tau, and p-tau181p are potential early diagnostic markers for probable Alzheimer disease (AD). The influence of genetic variation on these markers has been investigated for candidate genes but not on a genome-wide basis. We report a genome-wide association study (GWAS) of CSF biomarkers (Aβ1-42, t-tau, p-tau181p, p-tau181p/Aβ1-42, and t-tau/Aβ1-42). A total of 374 non-Hispanic Caucasian participants in the Alzheimer's Disease Neuroimaging Initiative cohort with quality-controlled CSF and genotype data were included in this analysis. The main effect of single nucleotide polymorphisms (SNPs) under an additive genetic model was assessed on each of 5 CSF biomarkers. The p values of all SNPs for each CSF biomarker were adjusted for multiple comparisons by the Bonferroni method. We focused on SNPs with corrected p<0.01 (uncorrected p<3.10×10(-8)) and secondarily examined SNPs with uncorrected p values less than 10(-5) to identify potential candidates. Four SNPs in the regions of the APOE, LOC100129500, TOMM40, and EPC2 genes reached genome-wide significance for associations with one or more CSF biomarkers. SNPs in CCDC134, ABCG2, SREBF2, and NFATC4, although not reaching genome-wide significance, were identified as potential candidates. In addition to known candidate genes, APOE, TOMM40, and one hypothetical gene LOC100129500 partially overlapping APOE; one novel gene, EPC2, and several other interesting genes were associated with CSF biomarkers that are related to AD. These findings, especially the new EPC2 results, require replication in independent cohorts.
Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

PubMed Central

Moore, Carrie B.; Wallace, John R.; Wolfe, Daniel J.; Frase, Alex T.; Pendergrass, Sarah A.; Weiss, Kenneth M.; Ritchie, Marylyn D.

2013-01-01

Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses. PMID:24385916
The adaptation of Escherichia coli cells grown in simulated microgravity for an extended period is both phenotypic and genomic.

PubMed

Tirumalai, Madhan R; Karouia, Fathi; Tran, Quyen; Stepanov, Victor G; Bruce, Rebekah J; Ott, C Mark; Pierson, Duane L; Fox, George E

2017-01-01

Microorganisms impact spaceflight in a variety of ways. They play a positive role in biological systems, such as waste water treatment but can be problematic through buildups of biofilms that can affect advanced life support. Of special concern is the possibility that during extended missions, the microgravity environment will provide positive selection for undesirable genomic changes. Such changes could affect microbial antibiotic sensitivity and possibly pathogenicity. To evaluate this possibility, Escherichia coli (lac plus) cells were grown for over 1000 generations on Luria Broth medium under low-shear modeled microgravity conditions in a high aspect rotating vessel. This is the first study of its kind to grow bacteria for multiple generations over an extended period under low-shear modeled microgravity. Comparisons were made to a non-adaptive control strain using growth competitions. After 1000 generations, the final low-shear modeled microgravity-adapted strain readily outcompeted the unadapted lac minus strain. A portion of this advantage was maintained when the low-shear modeled microgravity strain was first grown in a shake flask environment for 10, 20, or 30 generations of growth. Genomic sequencing of the 1000 generation strain revealed 16 mutations. Of the five changes affecting codons, none were neutral. It is not clear how significant these mutations are as individual changes or as a group. It is concluded that part of the long-term adaptation to low-shear modeled microgravity is likely genomic. The strain was monitored for acquisition of antibiotic resistance by VITEK analysis throughout the adaptation period. Despite the evidence of genomic adaptation, resistance to a variety of antibiotics was never observed.
The genome of Pelobacter carbinolicus reveals surprising metabolic capabilities and physiological features

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aklujkar, Muktak; Haveman, Shelley; DiDonatoJr, Raymond

2012-01-01

Background: The bacterium Pelobacter carbinolicus is able to grow by fermentation, syntrophic hydrogen/formate transfer, or electron transfer to sulfur from short-chain alcohols, hydrogen or formate; it does not oxidize acetate and is not known to ferment any sugars or grow autotrophically. The genome of P. carbinolicus was sequenced in order to understand its metabolic capabilities and physiological features in comparison with its relatives, acetate-oxidizing Geobacter species. Results: Pathways were predicted for catabolism of known substrates: 2,3-butanediol, acetoin, glycerol, 1,2-ethanediol, ethanolamine, choline and ethanol. Multiple isozymes of 2,3-butanediol dehydrogenase, ATP synthase and [FeFe]-hydrogenase were differentiated and assigned roles according to theirmore » structural properties and genomic contexts. The absence of asparagine synthetase and the presence of a mutant tRNA for asparagine encoded among RNA-active enzymes suggest that P. carbinolicus may make asparaginyl-tRNA in a novel way. Catabolic glutamate dehydrogenases were discovered, implying that the tricarboxylic acid (TCA) cycle can function catabolically. A phosphotransferase system for uptake of sugars was discovered, along with enzymes that function in 2,3-butanediol production. Pyruvate: ferredoxin/flavodoxin oxidoreductase was identified as a potential bottleneck in both the supply of oxaloacetate for oxidation of acetate by the TCA cycle and the connection of glycolysis to production of ethanol. The P. carbinolicus genome was found to encode autotransporters and various appendages, including three proteins with similarity to the geopilin of electroconductive nanowires. Conclusions: Several surprising metabolic capabilities and physiological features were predicted from the genome of P. carbinolicus, suggesting that it is more versatile than anticipated.« less
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.

PubMed

Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de

2006-03-31

Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
Comparison of Marker-Based Genomic Estimated Breeding Values and Phenotypic Evaluation for Selection of Bacterial Spot Resistance in Tomato.

PubMed

Liabeuf, Debora; Sim, Sung-Chur; Francis, David M

2018-03-01

Bacterial spot affects tomato crops (Solanum lycopersicum) grown under humid conditions. Major genes and quantitative trait loci (QTL) for resistance have been described, and multiple loci from diverse sources need to be combined to improve disease control. We investigated genomic selection (GS) prediction models for resistance to Xanthomonas euvesicatoria and experimentally evaluated the accuracy of these models. The training population consisted of 109 families combining resistance from four sources and directionally selected from a population of 1,100 individuals. The families were evaluated on a plot basis in replicated inoculated trials and genotyped with single nucleotide polymorphisms (SNP). We compared the prediction ability of models developed with 14 to 387 SNP. Genomic estimated breeding values (GEBV) were derived using Bayesian least absolute shrinkage and selection operator regression (BL) and ridge regression (RR). Evaluations were based on leave-one-out cross validation and on empirical observations in replicated field trials using the next generation of inbred progeny and a hybrid population resulting from selections in the training population. Prediction ability was evaluated based on correlations between GEBV and phenotypes (r g ), percentage of coselection between genomic and phenotypic selection, and relative efficiency of selection (r g /r p ). Results were similar with BL and RR models. Models using only markers previously identified as significantly associated with resistance but weighted based on GEBV and mixed models with markers associated with resistance treated as fixed effects and markers distributed in the genome treated as random effects offered greater accuracy and a high percentage of coselection. The accuracy of these models to predict the performance of progeny and hybrids exceeded the accuracy of phenotypic selection.
Chromosomal Speciation in the Genomics Era: Disentangling Phylogenetic Evolution of Rock-wallabies.

PubMed

Potter, Sally; Bragg, Jason G; Blom, Mozes P K; Deakin, Janine E; Kirkpatrick, Mark; Eldridge, Mark D B; Moritz, Craig

2017-01-01

The association of chromosome rearrangements (CRs) with speciation is well established, and there is a long history of theory and evidence relating to "chromosomal speciation." Genomic sequencing has the potential to provide new insights into how reorganization of genome structure promotes divergence, and in model systems has demonstrated reduced gene flow in rearranged segments. However, there are limits to what we can understand from a small number of model systems, which each only tell us about one episode of chromosomal speciation. Progressing from patterns of association between chromosome (and genic) change, to understanding processes of speciation requires both comparative studies across diverse systems and integration of genome-scale sequence comparisons with other lines of evidence. Here, we showcase a promising example of chromosomal speciation in a non-model organism, the endemic Australian marsupial genus Petrogale . We present initial phylogenetic results from exon-capture that resolve a history of divergence associated with extensive and repeated CRs. Yet it remains challenging to disentangle gene tree heterogeneity caused by recent divergence and gene flow in this and other such recent radiations. We outline a way forward for better integration of comparative genomic sequence data with evidence from molecular cytogenetics, and analyses of shifts in the recombination landscape and potential disruption of meiotic segregation and epigenetic programming. In all likelihood, CRs impact multiple cellular processes and these effects need to be considered together, along with effects of genic divergence. Understanding the effects of CRs together with genic divergence will require development of more integrative theory and inference methods. Together, new data and analysis tools will combine to shed light on long standing questions of how chromosome and genic divergence promote speciation.
Assessing the presence of shared genetic architecture between Alzheimer's disease and major depressive disorder using genome-wide association data.

PubMed

Gibson, J; Russ, T C; Adams, M J; Clarke, T-K; Howard, D M; Hall, L S; Fernandez-Pujals, A M; Wigmore, E M; Hayward, C; Davies, G; Murray, A D; Smith, B H; Porteous, D J; Deary, I J; McIntosh, A M

2017-04-18

Major depressive disorder (MDD) and Alzheimer's disease (AD) are both common in older age and frequently co-occur. Numerous phenotypic studies based on clinical diagnoses suggest that a history of depression increases risk of subsequent AD, although the basis of this relationship is uncertain. Both illnesses are polygenic, and shared genetic risk factors could explain some of the observed association. We used genotype data to test whether MDD and AD have an overlapping polygenic architecture in two large population-based cohorts, Generation Scotland's Scottish Family Health Study (GS:SFHS; N=19 889) and UK Biobank (N=25 118), and whether age of depression onset influences any relationship. Using two complementary techniques, we found no evidence that the disorders are influenced by common genetic variants. Using linkage disequilibrium score regression with genome-wide association study (GWAS) summary statistics from the International Genomics of Alzheimer's Project, we report no significant genetic correlation between AD and MDD (r G =-0.103, P=0.59). Polygenic risk scores (PRS) generated using summary data from International Genomics of Alzheimer's Project (IGAP) and the Psychiatric Genomics Consortium were used to assess potential pleiotropy between the disorders. PRS for MDD were nominally associated with participant-recalled AD family history in GS:SFHS, although this association did not survive multiple comparison testing. AD PRS were not associated with depression status or late-onset depression, and a survival analysis showed no association between age of depression onset and genetic risk for AD. This study found no evidence to support a common polygenic structure for AD and MDD, suggesting that the comorbidity of these disorders is not explained by common genetic variants.
Applications of the 1000 Genomes Project resources.

PubMed

Zheng-Bradley, Xiangqun; Flicek, Paul

2017-05-01

The 1000 Genomes Project created a valuable, worldwide reference for human genetic variation. Common uses of the 1000 Genomes dataset include genotype imputation supporting Genome-wide Association Studies, mapping expression Quantitative Trait Loci, filtering non-pathogenic variants from exome, whole genome and cancer genome sequencing projects, and genetic analysis of population structure and molecular evolution. In this article, we will highlight some of the multiple ways that the 1000 Genomes data can be and has been utilized for genetic studies. © The Author 2016. Published by Oxford University Press.

Development of microbial genome-probing microarrays using digital multiple displacement amplification of uncultivated microbial single cells.

PubMed

Chang, Ho-Won; Sung, Youlboong; Kim, Kyoung-Ho; Nam, Young-Do; Roh, Seong Woon; Kim, Min-Soo; Jeon, Che Ok; Bae, Jin-Woo

2008-08-15

A crucial problem in the use of previously developed genome-probing microarrays (GPM) has been the inability to use uncultivated bacterial genomes to take advantage of the high sensitivity and specificity of GPM in microbial detection and monitoring. We show here a method, digital multiple displacement amplification (MDA), to amplify and analyze various genomes obtained from single uncultivated bacterial cells. We used 15 genomes from key microbes involved in dichloromethane (DCM)-dechlorinating enrichment as microarray probes to uncover the bacterial population dynamics of samples without PCR amplification. Genomic DNA amplified from single cells originating from uncultured bacteria with 80.3-99.4% similarity to 16S rRNA genes of cultivated bacteria. The digital MDA-GPM method successfully monitored the dynamics of DCM-dechlorinating communities from different phases of enrichment status. Without a priori knowledge of microbial diversity, the digital MDA-GPM method could be designed to monitor most microbial populations in a given environmental sample.
Minimal-assumption inference from population-genomic data

NASA Astrophysics Data System (ADS)

Weissman, Daniel; Hallatschek, Oskar

Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.
Comparative genomics of Lactobacillus

PubMed Central

Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

2011-01-01

Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712
Economic importance, taxonomic representation and scientific priority as drivers of genome sequencing projects.

PubMed

Vallée, Geneviève C; Muñoz, Daniella Santos; Sankoff, David

2016-11-11

Of the approximately two hundred sequenced plant genomes, how many and which ones were sequenced motivated by strictly or largely scientific considerations, and how many by chiefly economic, in a wide sense, incentives? And how large a role does publication opportunity play? In an integration of multiple disparate databases and other sources of information, we collect and analyze data on the size (number of species) in the plant orders and families containing sequenced genomes, on the trade value of these species, and of all the same-family or same-order species, and on the publication priority within the family and order. These data are subjected to multiple regression and other statistical analyses. We find that despite the initial importance of model organisms, it is clearly economic considerations that outweigh others in the choice of genome to be sequenced. This has important implications for generalizations about plant genomes, since human choices of plants to harvest (and cultivate) will have incurred many biases with respect to phenotypic characteristics and hence of genomic properties, and recent genomic evolution will also have been affected by human agricultural practices.
Comparison of different methods for isolation of bacterial DNA from retail oyster tissues

USDA-ARS?s Scientific Manuscript database

Oysters are filter-feeders that bio-accumulate bacteria in water while feeding. To evaluate the bacterial genomic DNA extracted from retail oyster tissues, including the gills and digestive glands, four isolation methods were used. Genomic DNA extraction was performed using the Allmag™ Blood Genomic...
A salmonid EST genomic study: genes, duplications, phylogeny and microarrays

USDA-ARS?s Scientific Manuscript database

Background: Salmonids are of interest because of their relatively recent genome duplication, and their extensive use in wild fisheries and aquaculture. A comprehensive gene list and a comparison of genes in some of the different species provide valuable genomic information for one of the most wide...
Complete genome sequence of the plant pathogen Erwinia amylovora strain ATCC 49946

USDA-ARS?s Scientific Manuscript database

Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related...
Genome sequence of Lactobacillus rhamnosus ATCC 8530.

PubMed

Pittet, Vanessa; Ewen, Emily; Bushell, Barry R; Ziola, Barry

2012-02-01

Lactobacillus rhamnosus is found in the human gastrointestinal tract and is important for probiotics. We became interested in L. rhamnosus isolate ATCC 8530 in relation to beer spoilage and hops resistance. We report here the genome sequence of this isolate, along with a brief comparison to other available L. rhamnosus genome sequences.
Microarray Genomic Systems Development

DTIC Science & Technology

2008-06-01

11 species), Escherichia coli TOP10 (7 strains), and Geobacillus stearothermophilus . Using standard molecular biology methods, we isolated genomic...comparisons. Results: Different species of bacteria, including Escherichia coli, Bacillus bacteria, and Geobacillus stearothermophilus produce qualitatively...oligonucleotides to labelled genomic DNA from a set of test samples, including eleven Bacillus species, Geobacillus stearothermophilus , and seven Escherichia
High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

USDA-ARS?s Scientific Manuscript database

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic mode...
Ocean biogeochemistry modeled with emergent trait-based genomics

NASA Astrophysics Data System (ADS)

Coles, V. J.; Stukel, M. R.; Brooks, M. T.; Burd, A.; Crump, B. C.; Moran, M. A.; Paul, J. H.; Satinsky, B. M.; Yager, P. L.; Zielinski, B. L.; Hood, R. R.

2017-12-01

Marine ecosystem models have advanced to incorporate metabolic pathways discovered with genomic sequencing, but direct comparisons between models and “omics” data are lacking. We developed a model that directly simulates metagenomes and metatranscriptomes for comparison with observations. Model microbes were randomly assigned genes for specialized functions, and communities of 68 species were simulated in the Atlantic Ocean. Unfit organisms were replaced, and the model self-organized to develop community genomes and transcriptomes. Emergent communities from simulations that were initialized with different cohorts of randomly generated microbes all produced realistic vertical and horizontal ocean nutrient, genome, and transcriptome gradients. Thus, the library of gene functions available to the community, rather than the distribution of functions among specific organisms, drove community assembly and biogeochemical gradients in the model ocean.
CnidBase: The Cnidarian Evolutionary Genomics Database

PubMed Central

Ryan, Joseph F.; Finnerty, John R.

2003-01-01

CnidBase, the Cnidarian Evolutionary Genomics Database, is a tool for investigating the evolutionary, developmental and ecological factors that affect gene expression and gene function in cnidarians. In turn, CnidBase will help to illuminate the role of specific genes in shaping cnidarian biodiversity in the present day and in the distant past. CnidBase highlights evolutionary changes between species within the phylum Cnidaria and structures genomic and expression data to facilitate comparisons to non-cnidarian metazoans. CnidBase aims to further the progress that has already been made in the realm of cnidarian evolutionary genomics by creating a central community resource which will help drive future research and facilitate more accurate classification and comparison of new experimental data with existing data. CnidBase is available at http://cnidbase.bu.edu/. PMID:12519972
Hemorrhage and Subsequent Allogenic Red Blood Cell Transfusion are Associated With Characteristic Monocyte Messenger RNA Expression Patterns in Patients After Multiple Injury—A Genome Wide View

PubMed Central

Bogner, Viktoria; Baker, Henry V.; Kanz, Karl-Georg; Moldawer, L. L.; Mutschler, Wolf; Biberthaler, Peter

2014-01-01

Introduction As outcome to severe trauma is frequently affected by massive blood loss and consecutive hemorrhagic shock, replacement of red blood cell (RBC) units remains indispensable. Administration of RBC units is an independent risk factor for adverse outcome in patients with trauma. The impact of massive blood transfusion or uncrossmatched blood transfusion on the patients’ immune response in the early posttraumatic period remains unclear. Material Thirteen patients presenting with blunt multiple injuries (Injury Severity Score >16) were studied. Monocytes were obtained on admission and at 6, 12, 24, 48, and 72 hours after trauma. Biotinylated complementary RNA targets were hybridized to Affymetrix HG U 133A microarrays. The data were analyzed by a supervised analysis based on whether the patients received massive blood transfusions, and then subsequently, by hierarchical clustering, and by Ingenuity pathway analysis. Results Supervised analysis identified 224 probe sets to be differentially expressed (p < 0.001) in patients who received massive blood transfusion, when compared with those who did not. In addition, 331 probe sets were found differentially expressed (p < 0.001) in patients who received uncrossmatched RBC units in comparison with those who exclusively gained crossmatched ones. Functional pathway analysis of the respectively identified gene expression profiles suggests a contributory role by the AKT/PI3Kinase pathway, the mitogen-activated protein-kinase pathway, the Ubiquitin pathway, and the diverse inflammatory networks. Conclusion We exhibited for the first time a serial, sequential screening analysis of monocyte messenger RNA expression patterns in patients with multiple trauma indicating a strongly significant association between the patients’ genomic response in blood monocytes and massive or uncross-matched RBC substitution. PMID:19820587
Interpreting Mammalian Evolution using Fugu Genome Comparisons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stubbs, L; Ovcharenko, I; Loots, G G

2004-04-02

Comparative sequence analysis of the human and the pufferfish Fugu rubripes (fugu) genomes has revealed several novel functional coding and noncoding regions in the human genome. In particular, the fugu genome has been extremely valuable for identifying transcriptional regulatory elements in human loci harboring unusually high levels of evolutionary conservation to rodent genomes. In such regions, the large evolutionary distance between human and fishes provides an additional filter through which functional noncoding elements can be detected with high efficiency.
The Complete Chloroplast Genome of Wild Rice (Oryza minuta) and Its Comparison to Related Species.

PubMed

Asaf, Sajjad; Waqas, Muhammad; Khan, Abdul L; Khan, Muhammad A; Kang, Sang-Mo; Imran, Qari M; Shahzad, Raheem; Bilal, Saqib; Yun, Byung-Wook; Lee, In-Jung

2017-01-01

Oryza minuta , a tetraploid wild relative of cultivated rice (family Poaceae), possesses a BBCC genome and contains genes that confer resistance to bacterial blight (BB) and white-backed (WBPH) and brown (BPH) plant hoppers. Based on the importance of this wild species, this study aimed to understand the phylogenetic relationships of O. minuta with other Oryza species through an in-depth analysis of the composition and diversity of the chloroplast (cp) genome. The analysis revealed a cp genome size of 135,094 bp with a typical quadripartite structure and consisting of a pair of inverted repeats separated by small and large single copies, 139 representative genes, and 419 randomly distributed microsatellites. The genomic organization, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. Approximately 30 forward, 28 tandem and 20 palindromic repeats were detected in the O . minuta cp genome. Comparison of the complete O. minuta cp genome with another eleven Oryza species showed a high degree of sequence similarity and relatively high divergence of intergenic spacers. Phylogenetic analyses were conducted based on the complete genome sequence, 65 shared genes and matK gene showed same topologies and O. minuta forms a single clade with parental O. punctata . Thus, the complete O . minuta cp genome provides interesting insights and valuable information that can be used to identify related species and reconstruct its phylogeny.
Extensive genome rearrangements and multiple horizontal gene transfers in a population of pyrococcus isolates from Vulcano Island, Italy.

PubMed

White, James R; Escobar-Paramo, Patricia; Mongodin, Emmanuel F; Nelson, Karen E; DiRuggiero, Jocelyne

2008-10-01

The extent of chromosome rearrangements in Pyrococcus isolates from marine hydrothermal vents in Vulcano Island, Italy, was evaluated by high-throughput genomic methods. The results illustrate the dynamic nature of the genomes of the genus Pyrococcus and raise the possibility of a connection between rapidly changing environmental conditions and adaptive genomic properties.
Shifting from clonal to sexual reproduction in aphids: physiological and developmental aspects.

PubMed

Le Trionnaire, Gaël; Hardie, Jim; Jaubert-Possamai, Stéphanie; Simon, Jean-Christophe; Tagu, Denis

2008-08-01

Developmental biology is one of the fastest growing and fascinating research fields in life sciences. Among the wide range of embryonic development, a fundamental difference exists between organisms with sexual or asexual development. Aphids are unusual organisms which display alternative pathways of sexual and asexual development, the orientation of the pathway being determined by environmental conditions. These insects offer an adapted system in which to study developmental plasticity, because a side-by-side comparison of sexual and asexual development can be made in individuals with the same genotype. In this review, we describe the developmental mechanisms that have evolved in aphids for alternative sexual and asexual reproduction. In particular, we discuss how environmental cues orientate the reproductive mode of aphids from signal perception to endocrine regulation, and propose a comparative analysis of sexual and asexual gametogenesis and embryogenesis, which has been possible due to the development of molecular methods. As a result of the recent development of genomic resources in aphids, we expect these species will permit major advances in the study of the genomic basis underlying the choice of developmental fate and multiple reproduction strategies.
Campylobacter fetus subsp. testudinum subsp. nov., isolated from humans and reptiles.

PubMed

Fitzgerald, Collette; Tu, Zheng Chao; Patrick, Mary; Stiles, Tracy; Lawson, Andy J; Santovenia, Monica; Gilbert, Maarten J; van Bergen, Marcel; Joyce, Kevin; Pruckler, Janet; Stroika, Steven; Duim, Birgitta; Miller, William G; Loparev, Vladimir; Sinnige, Jan C; Fields, Patricia I; Tauxe, Robert V; Blaser, Martin J; Wagenaar, Jaap A

2014-09-01

A polyphasic study was undertaken to determine the taxonomic position of 13 Campylobacter fetus-like strains from humans (n = 8) and reptiles (n = 5). The results of matrix-assisted laser desorption ionization time-of-flight (MALDI-TOF) MS and genomic data from sap analysis, 16S rRNA gene and hsp60 sequence comparison, pulsed-field gel electrophoresis, amplified fragment length polymorphism analysis, DNA-DNA hybridization and whole genome sequencing demonstrated that these strains are closely related to C. fetus but clearly differentiated from recognized subspecies of C. fetus. Therefore, this unique cluster of 13 strains represents a novel subspecies within the species C. fetus, for which the name Campylobacter fetus subsp. testudinum subsp. nov. is proposed, with strain 03-427(T) ( = ATCC BAA-2539(T) = LMG 27499(T)) as the type strain. Although this novel taxon could not be differentiated from C. fetus subsp. fetus and C. fetus subsp. venerealis using conventional phenotypic tests, MALDI-TOF MS revealed the presence of multiple phenotypic biomarkers which distinguish Campylobacter fetus subsp. testudinum subsp. nov. from recognized subspecies of C. fetus.
A Comprehensive Analysis of Replicative Lifespan in 4,698 Single-Gene Deletion Strains Uncovers Conserved Mechanisms of Aging.

PubMed

McCormick, Mark A; Delaney, Joe R; Tsuchiya, Mitsuhiro; Tsuchiyama, Scott; Shemorry, Anna; Sim, Sylvia; Chou, Annie Chia-Zong; Ahmed, Umema; Carr, Daniel; Murakami, Christopher J; Schleit, Jennifer; Sutphin, George L; Wasko, Brian M; Bennett, Christopher F; Wang, Adrienne M; Olsen, Brady; Beyer, Richard P; Bammler, Theodor K; Prunkard, Donna; Johnson, Simon C; Pennypacker, Juniper K; An, Elroy; Anies, Arieanna; Castanza, Anthony S; Choi, Eunice; Dang, Nick; Enerio, Shiena; Fletcher, Marissa; Fox, Lindsay; Goswami, Sarani; Higgins, Sean A; Holmberg, Molly A; Hu, Di; Hui, Jessica; Jelic, Monika; Jeong, Ki-Soo; Johnston, Elijah; Kerr, Emily O; Kim, Jin; Kim, Diana; Kirkland, Katie; Klum, Shannon; Kotireddy, Soumya; Liao, Eric; Lim, Michael; Lin, Michael S; Lo, Winston C; Lockshon, Dan; Miller, Hillary A; Moller, Richard M; Muller, Brian; Oakes, Jonathan; Pak, Diana N; Peng, Zhao Jun; Pham, Kim M; Pollard, Tom G; Pradeep, Prarthana; Pruett, Dillon; Rai, Dilreet; Robison, Brett; Rodriguez, Ariana A; Ros, Bopharoth; Sage, Michael; Singh, Manpreet K; Smith, Erica D; Snead, Katie; Solanky, Amrita; Spector, Benjamin L; Steffen, Kristan K; Tchao, Bie Nga; Ting, Marc K; Vander Wende, Helen; Wang, Dennis; Welton, K Linnea; Westman, Eric A; Brem, Rachel B; Liu, Xin-Guang; Suh, Yousin; Zhou, Zhongjun; Kaeberlein, Matt; Kennedy, Brian K

2015-11-03

Many genes that affect replicative lifespan (RLS) in the budding yeast Saccharomyces cerevisiae also affect aging in other organisms such as C. elegans and M. musculus. We performed a systematic analysis of yeast RLS in a set of 4,698 viable single-gene deletion strains. Multiple functional gene clusters were identified, and full genome-to-genome comparison demonstrated a significant conservation in longevity pathways between yeast and C. elegans. Among the mechanisms of aging identified, deletion of tRNA exporter LOS1 robustly extended lifespan. Dietary restriction (DR) and inhibition of mechanistic Target of Rapamycin (mTOR) exclude Los1 from the nucleus in a Rad53-dependent manner. Moreover, lifespan extension from deletion of LOS1 is nonadditive with DR or mTOR inhibition, and results in Gcn4 transcription factor activation. Thus, the DNA damage response and mTOR converge on Los1-mediated nuclear tRNA export to regulate Gcn4 activity and aging. Copyright © 2015 Elsevier Inc. All rights reserved.
GenomeGems: evaluation of genetic variability from deep sequencing data

PubMed Central

2012-01-01

Background Detection of disease-causing mutations using Deep Sequencing technologies possesses great challenges. In particular, organizing the great amount of sequences generated so that mutations, which might possibly be biologically relevant, are easily identified is a difficult task. Yet, for this assignment only limited automatic accessible tools exist. Findings We developed GenomeGems to gap this need by enabling the user to view and compare Single Nucleotide Polymorphisms (SNPs) from multiple datasets and to load the data onto the UCSC Genome Browser for an expanded and familiar visualization. As such, via automatic, clear and accessible presentation of processed Deep Sequencing data, our tool aims to facilitate ranking of genomic SNP calling. GenomeGems runs on a local Personal Computer (PC) and is freely available at http://www.tau.ac.il/~nshomron/GenomeGems. Conclusions GenomeGems enables researchers to identify potential disease-causing SNPs in an efficient manner. This enables rapid turnover of information and leads to further experimental SNP validation. The tool allows the user to compare and visualize SNPs from multiple experiments and to easily load SNP data onto the UCSC Genome browser for further detailed information. PMID:22748151

Some links on this page may take you to non-federal websites. Their policies may differ from this site.