Science.gov

Sample records for comparative genomic motif

  1. Comparative Whole-Genome Mapping To Determine Staphylococcus aureus Genome Size, Virulence Motifs, and Clonality

    PubMed Central

    Pantrang, Madhulatha; Stahl, Buffy; Briska, Adam M.; Stemper, Mary E.; Wagner, Trevor K.; Zentz, Emily B.; Callister, Steven M.; Lovrich, Steven D.; Henkhaus, John K.; Dykes, Colin W.

    2012-01-01

    Despite being a clonal pathogen, Staphylococcus aureus continues to acquire virulence and antibiotic-resistant genes located on mobile genetic elements such as genomic islands, prophages, pathogenicity islands, and the staphylococcal chromosomal cassette mec (SCCmec) by horizontal gene transfer from other staphylococci. The potential virulence of a S. aureus strain is often determined by comparing its pulsed-field gel electrophoresis (PFGE) or multilocus sequence typing profiles to that of known epidemic or virulent clones and by PCR of the toxin genes. Whole-genome mapping (formerly optical mapping), which is a high-resolution ordered restriction mapping of a bacterial genome, is a relatively new genomic tool that allows comparative analysis across entire bacterial genomes to identify regions of genomic similarities and dissimilarities, including small and large insertions and deletions. We explored whether whole-genome maps (WGMs) of methicillin-resistant S. aureus (MRSA) could be used to predict the presence of methicillin resistance, SCCmec type, and Panton-Valentine leukocidin (PVL)-producing genes on an S. aureus genome. We determined the WGMs of 47 diverse clinical isolates of S. aureus, including well-characterized reference MRSA strains, and annotated the signature restriction pattern in SCCmec types, arginine catabolic mobile element (ACME), and PVL-carrying prophage, PhiSa2 or PhiSa2-like regions on the genome. WGMs of these isolates accurately characterized them as MRSA or methicillin-sensitive S. aureus based on the presence or absence of the SCCmec motif, ACME and the unique signature pattern for the prophage insertion that harbored the PVL genes. Susceptibility to methicillin resistance and the presence of mecA, SCCmec types, and PVL genes were confirmed by PCR. A WGM clustering approach was further able to discriminate isolates within the same PFGE clonal group. These results showed that WGMs could be used not only to genotype S. aureus but also to

  2. Comparative whole-genome mapping to determine Staphylococcus aureus genome size, virulence motifs, and clonality.

    PubMed

    Shukla, Sanjay K; Pantrangi, Madhulatha; Stahl, Buffy; Briska, Adam M; Stemper, Mary E; Wagner, Trevor K; Zentz, Emily B; Callister, Steven M; Lovrich, Steven D; Henkhaus, John K; Dykes, Colin W

    2012-11-01

    Despite being a clonal pathogen, Staphylococcus aureus continues to acquire virulence and antibiotic-resistant genes located on mobile genetic elements such as genomic islands, prophages, pathogenicity islands, and the staphylococcal chromosomal cassette mec (SCCmec) by horizontal gene transfer from other staphylococci. The potential virulence of a S. aureus strain is often determined by comparing its pulsed-field gel electrophoresis (PFGE) or multilocus sequence typing profiles to that of known epidemic or virulent clones and by PCR of the toxin genes. Whole-genome mapping (formerly optical mapping), which is a high-resolution ordered restriction mapping of a bacterial genome, is a relatively new genomic tool that allows comparative analysis across entire bacterial genomes to identify regions of genomic similarities and dissimilarities, including small and large insertions and deletions. We explored whether whole-genome maps (WGMs) of methicillin-resistant S. aureus (MRSA) could be used to predict the presence of methicillin resistance, SCCmec type, and Panton-Valentine leukocidin (PVL)-producing genes on an S. aureus genome. We determined the WGMs of 47 diverse clinical isolates of S. aureus, including well-characterized reference MRSA strains, and annotated the signature restriction pattern in SCCmec types, arginine catabolic mobile element (ACME), and PVL-carrying prophage, PhiSa2 or PhiSa2-like regions on the genome. WGMs of these isolates accurately characterized them as MRSA or methicillin-sensitive S. aureus based on the presence or absence of the SCCmec motif, ACME and the unique signature pattern for the prophage insertion that harbored the PVL genes. Susceptibility to methicillin resistance and the presence of mecA, SCCmec types, and PVL genes were confirmed by PCR. A WGM clustering approach was further able to discriminate isolates within the same PFGE clonal group. These results showed that WGMs could be used not only to genotype S. aureus but also to

  3. CSMET: Comparative Genomic Motif Detection via Multi-Resolution Phylogenetic Shadowing

    PubMed Central

    Kolar, Mladen; Xing, Eric P.

    2008-01-01

    Functional turnover of transcription factor binding sites (TFBSs), such as whole-motif loss or gain, are common events during genome evolution. Conventional probabilistic phylogenetic shadowing methods model the evolution of genomes only at nucleotide level, and lack the ability to capture the evolutionary dynamics of functional turnover of aligned sequence entities. As a result, comparative genomic search of non-conserved motifs across evolutionarily related taxa remains a difficult challenge, especially in higher eukaryotes, where the cis-regulatory regions containing motifs can be long and divergent; existing methods rely heavily on specialized pattern-driven heuristic search or sampling algorithms, which can be difficult to generalize and hard to interpret based on phylogenetic principles. We propose a new method: Conditional Shadowing via Multi-resolution Evolutionary Trees, or CSMET, which uses a context-dependent probabilistic graphical model that allows aligned sites from different taxa in a multiple alignment to be modeled by either a background or an appropriate motif phylogeny conditioning on the functional specifications of each taxon. The functional specifications themselves are the output of a phylogeny which models the evolution not of individual nucleotides, but of the overall functionality (e.g., functional retention or loss) of the aligned sequence segments over lineages. Combining this method with a hidden Markov model that autocorrelates evolutionary rates on successive sites in the genome, CSMET offers a principled way to take into consideration lineage-specific evolution of TFBSs during motif detection, and a readily computable analytical form of the posterior distribution of motifs under TFBS turnover. On both simulated and real Drosophila cis-regulatory modules, CSMET outperforms other state-of-the-art comparative genomic motif finders. PMID:18535663

  4. SALAD database: a motif-based database of protein annotations for plant comparative genomics.

    PubMed

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.

  5. SALAD database: a motif-based database of protein annotations for plant comparative genomics

    PubMed Central

    Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

    2010-01-01

    Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933

  6. Unitary circular code motifs in genomes of eukaryotes.

    PubMed

    El Soufi, Karim; Michel, Christian J

    A set X of 20 trinucleotides was identified in genes of bacteria, eukaryotes, plasmids and viruses, which has in average the highest occurrence in reading frame compared to its two shifted frames (Michel, 2015; Arquès and Michel, 1996). This set X has an interesting mathematical property as X is a circular code (Arquès and Michel, 1996). Thus, the motifs from this circular code X, called X motifs, have the property to always retrieve, synchronize and maintain the reading frame in genes. The origin of this circular code X in genes is an open problem since its discovery in 1996. Here, we first show that the unitary circular codes (UCC), i.e. sets of one word, allow to generate unitary circular code motifs (UCC motifs), i.e. a concatenation of the same motif (simple repeats) leading to low complexity DNA. Three classes of UCC motifs are studied here: repeated dinucleotides (D(+) motifs), repeated trinucleotides (T(+) motifs) and repeated tetranucleotides (T(+) motifs). Thus, the D(+), T(+) and T(+) motifs allow to retrieve, synchronize and maintain a frame modulo 2, modulo 3 and modulo 4, respectively, and their shifted frames (1 modulo 2; 1 and 2 modulo 3; 1, 2 and 3 modulo 4 according to the C(2), C(3) and C(4) properties, respectively) in the DNA sequences. The statistical distribution of the D(+), T(+) and T(+) motifs is analyzed in the genomes of eukaryotes. A UCC motif and its comp lementary UCC motif have the same distribution in the eukaryotic genomes. Furthermore, a UCC motif and its complementary UCC motif have increasing occurrences contrary to their number of hydrogen bonds, very significant with the T(+) motifs. The longest D(+), T(+) and T(+) motifs in the studied eukaryotic genomes are also given. Surprisingly, a scarcity of repeated trinucleotides (T(+) motifs) in the large eukaryotic genomes is observed compared to the D(+) and T(+) motifs. This result has been investigated and may be explained by two outcomes. Repeated trinucleotides (T(+) motifs

  7. Transcription factor motif quality assessment requires systematic comparative analysis

    PubMed Central

    Kibet, Caleb Kipkurui; Machanick, Philip

    2016-01-01

    Transcription factor (TF) binding site prediction remains a challenge in gene regulatory research due to degeneracy and potential variability in binding sites in the genome. Dozens of algorithms designed to learn binding models (motifs) have generated many motifs available in research papers with a subset making it to databases like JASPAR, UniPROBE and Transfac. The presence of many versions of motifs from the various databases for a single TF and the lack of a standardized assessment technique makes it difficult for biologists to make an appropriate choice of binding model and for algorithm developers to benchmark, test and improve on their models. In this study, we review and evaluate the approaches in use, highlight differences and demonstrate the difficulty of defining a standardized motif assessment approach. We review scoring functions, motif length, test data and the type of performance metrics used in prior studies as some of the factors that influence the outcome of a motif assessment. We show that the scoring functions and statistics used in motif assessment influence ranking of motifs in a TF-specific manner. We also show that TF binding specificity can vary by source of genomic binding data. We also demonstrate that information content of a motif is not in isolation a measure of motif quality but is influenced by TF binding behaviour. We conclude that there is a need for an easy-to-use tool that presents all available evidence for a comparative analysis. PMID:27092243

  8. [Specific motifs in the genomes of the family Chlamydiaceae].

    PubMed

    Demkin, V V; Kirillova, N V

    2012-01-01

    Specific motifs in the genomes of the family Chlamydiaceae were discussed. The search for genetic markers ofbacteria identification and typing is an urgent problem. The progress in sequencing technology resulted in compilation of the database of genomic nucleotide sequences of bacteria. This raised the problem of the search and selection of genetic targets for identification and typing in bacterial genes based on comparative analysis of complete genomic sequences. The goal of this work was to implement comparative genetic analysis of different species of the family Chlamydiaceae. This analysis was focused to detection of specific motifs capable of serving as genetic marker of this family. The consensus domains were detected using the Visual Basic for Application software for MS Excel. Complete coincidence of segments 25 nucleotide long was used as the test for consensus domain selection. One complete genomic sequence for each of 8 bacterial species was taken for the experiment. The experimental sample did not contain complete sequence of C. suis, because at the moment of this research this species was absence in the database GenBank. Comparative assay of the sequences of the C. trachomatis and other representatives of the family Chlamydiaceae revealed 41 common motifs for 8 Chlamydiaceae species tested in this work. The maximal number of consensus motifs was observed in genes of ribosomal RNA and t-RNA. In addition to genes of r-RNA and t-RNA consensus motifs were observed in 5 genes and 6 intergene segments. The gene CTL0299, CTLO800, dagA, and hctA consensus motifs detected in this work can be regarded as identification domains of the family Chlamydiaceae.

  9. Unsupervised statistical discovery of spaced motifs in prokaryotic genomes.

    PubMed

    Tong, Hao; Schliekelman, Paul; Mrázek, Jan

    2017-01-05

    DNA sequences contain repetitive motifs which have various functions in the physiology of the organism. A number of methods have been developed for discovery of such sequence motifs with a primary focus on detection of regulatory motifs and particularly transcription factor binding sites. Most motif-finding methods apply probabilistic models to detect motifs characterized by unusually high number of copies of the motif in the analyzed sequences. We present a novel method for detection of pairs of motifs separated by spacers of variable nucleotide sequence but conserved length. Unlike existing methods for motif discovery, the motifs themselves are not required to occur at unusually high frequency but only to exhibit a significant preference to occur at a specific distance from each other. In the present implementation of the method, motifs are represented by pentamers and all pairs of pentamers are evaluated for statistically significant preference for a specific distance. An important step of the algorithm eliminates motif pairs where the spacers separating the two motifs exhibit a high degree of sequence similarity; such motif pairs likely arise from duplications of the whole segment including the motifs and the spacer rather than due to selective constraints indicative of a functional importance of the motif pair. The method was used to scan 569 complete prokaryotic genomes for novel sequence motifs. Some motifs detected were previously known but other motifs found in the search appear to be novel. Selected motif pairs were subjected to further investigation and in some cases their possible biological functions were proposed. We present a new motif-finding technique that is applicable to scanning complete genomes for sequence motifs. The results from analysis of 569 genomes suggest that the method detects previously known motifs that are expected to be found as well as new motifs that are unlikely to be discovered by traditional motif-finding methods. We conclude

  10. Finding sequence motifs in prokaryotic genomes--a brief practical guide for a microbiologist.

    PubMed

    Mrázek, Jan

    2009-09-01

    Finding significant nucleotide sequence motifs in prokaryotic genomes can be divided into three types of tasks: (1) supervised motif finding, where a sample of motif sequences is used to find other similar sequences in genomes; (2) unsupervised motif finding, which typically relates to the task of finding regulatory motifs and protein binding sites and (3) exploratory motif finding, which aims to identify potential functionally significant sequence motifs as those that are unusual in some statistical sense. This article provides a conceptual overview for each type of task, a brief description of basic algorithms used in their solution, and a review of selected relevant software available online.

  11. Genomic analysis of membrane protein families: abundance and conserved motifs

    PubMed Central

    Liu, Yang; Engelman, Donald M; Gerstein, Mark

    2002-01-01

    Background Polytopic membrane proteins can be related to each other on the basis of the number of transmembrane helices and sequence similarities. Building on the Pfam classification of protein domain families, and using transmembrane-helix prediction and sequence-similarity searching, we identified a total of 526 well-characterized membrane protein families in 26 recently sequenced genomes. To this we added a clustering of a number of predicted but unclassified membrane proteins, resulting in a total of 637 membrane protein families. Results Analysis of the occurrence and composition of these families revealed several interesting trends. The number of assigned membrane protein domains has an approximately linear relationship to the total number of open reading frames (ORFs) in 26 genomes studied. Caenorhabditis elegans is an apparent outlier, because of its high representation of seven-span transmembrane (7-TM) chemoreceptor families. In all genomes, including that of C. elegans, the number of distinct membrane protein families has a logarithmic relation to the number of ORFs. Glycine, proline, and tyrosine locations tend to be conserved in transmembrane regions within families, whereas isoleucine, valine, and methionine locations are relatively mutable. Analysis of motifs in putative transmembrane helices reveals that GxxxG and GxxxxxxG (which can be written GG4 and GG7, respectively; see Materials and methods) are among the most prevalent. This was noted in earlier studies; we now find these motifs are particularly well conserved in families, however, especially those corresponding to transporters, symporters, and channels. Conclusions We carried out a genome-wide analysis on patterns of the classified polytopic membrane protein families and analyzed the distribution of conserved amino acids and motifs in the transmembrane helix regions in these families. PMID:12372142

  12. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  13. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  14. An integrative and applicable phylogenetic footprinting framework for cis-regulatory motifs identification in prokaryotic genomes.

    PubMed

    Liu, Bingqiang; Zhang, Hanyuan; Zhou, Chuan; Li, Guojun; Fennell, Anne; Wang, Guanghui; Kang, Yu; Liu, Qi; Ma, Qin

    2016-08-09

    Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction. Here we present an integrative phylogenetic footprinting framework for accurate motif predictions in prokaryotic genomes (MP(3)). The framework includes a new orthologous data preparation procedure, an additional promoter scoring and pruning method and an integration of six existing motif finding algorithms as basic motif search engines. Specifically, we collected orthologous genes from available prokaryotic genomes and built the orthologous regulatory regions based on sequence similarity of promoter regions. This procedure made full use of the large-scale genomic data and taxonomy information and filtered out the promoters with limited contribution to produce a high quality orthologous promoter set. The promoter scoring and pruning is implemented through motif voting by a set of complementary predicting tools that mine as many motif candidates as possible and simultaneously eliminate the effect of random noise. We have applied the framework to Escherichia coli k12 genome and evaluated the prediction performance through comparison with seven existing programs. This evaluation was systematically carried out at the nucleotide and binding site level, and the results showed that MP(3) consistently outperformed other popular motif finding tools. We have integrated MP(3) into our motif identification and analysis server DMINDA, allowing users to efficiently identify and analyze motifs in 2,072 completely sequenced prokaryotic genomes. The performance evaluation indicated that MP(3) is effective for predicting regulatory motifs in prokaryotic genomes. Its application may enhance

  15. Discovery of widespread GTP-binding motifs in genomic DNA and RNA.

    PubMed

    Curtis, Edward A; Liu, David R

    2013-04-18

    Biological RNAs that bind small molecules have been implicated in a variety of regulatory and catalytic processes. Inspired by these examples, we used in vitro selection to search a pool of genome-encoded RNA fragments for naturally occurring GTP aptamers. Several aptamer classes were identified, including one (the "G motif") with a G-quadruplex structure. Further analysis revealed that most RNA and DNA G-quadruplexes bind GTP. The G motif is abundant in eukaryotes, and the human genome contains ~75,000 examples with dissociation constants comparable to the GTP concentration of a eukaryotic cell (~300 μM). G-quadruplexes play roles in diverse cellular processes, and our findings raise the possibility that GTP may play a role in the function of these elements. Consistent with this possibility, the sequence requirements of several classes of regulatory G-quadruplexes parallel those of GTP binding. Copyright © 2013 Elsevier Ltd. All rights reserved.

  16. Ebolavirus comparative genomics

    DOE PAGES

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less

  17. Ebolavirus comparative genomics

    PubMed Central

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035

  18. Ebolavirus comparative genomics.

    PubMed

    Jun, Se-Ran; Leuze, Michael R; Nookaew, Intawat; Uberbacher, Edward C; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S; Pedersen, Thomas D; Wassenaar, Trudy M; Ussery, David W

    2015-09-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

  19. Genome-wide conserved consensus transcription factor binding motifs are hyper-methylated

    PubMed Central

    2010-01-01

    Background DNA methylation can regulate gene expression by modulating the interaction between DNA and proteins or protein complexes. Conserved consensus motifs exist across the human genome ("predicted transcription factor binding sites": "predicted TFBS") but the large majority of these are proven by chromatin immunoprecipitation and high throughput sequencing (ChIP-seq) not to be biological transcription factor binding sites ("empirical TFBS"). We hypothesize that DNA methylation at conserved consensus motifs prevents promiscuous or disorderly transcription factor binding. Results Using genome-wide methylation maps of the human heart and sperm, we found that all conserved consensus motifs as well as the subset of those that reside outside CpG islands have an aggregate profile of hyper-methylation. In contrast, empirical TFBS with conserved consensus motifs have a profile of hypo-methylation. 40% of empirical TFBS with conserved consensus motifs resided in CpG islands whereas only 7% of all conserved consensus motifs were in CpG islands. Finally we further identified a minority subset of TF whose profiles are either hypo-methylated or neutral at their respective conserved consensus motifs implicating that these TF may be responsible for establishing or maintaining an un-methylated DNA state, or whose binding is not regulated by DNA methylation. Conclusions Our analysis supports the hypothesis that at least for a subset of TF, empirical binding to conserved consensus motifs genome-wide may be controlled by DNA methylation. PMID:20875111

  20. Analysis of Genomic Sequence Motifs for Deciphering Transcription Factor Binding and Transcriptional Regulation in Eukaryotic Cells

    PubMed Central

    Boeva, Valentina

    2016-01-01

    Eukaryotic genomes contain a variety of structured patterns: repetitive elements, binding sites of DNA and RNA associated proteins, splice sites, and so on. Often, these structured patterns can be formalized as motifs and described using a proper mathematical model such as position weight matrix and IUPAC consensus. Two key tasks are typically carried out for motifs in the context of the analysis of genomic sequences. These are: identification in a set of DNA regions of over-represented motifs from a particular motif database, and de novo discovery of over-represented motifs. Here we describe existing methodology to perform these two tasks for motifs characterizing transcription factor binding. When applied to the output of ChIP-seq and ChIP-exo experiments, or to promoter regions of co-modulated genes, motif analysis techniques allow for the prediction of transcription factor binding events and enable identification of transcriptional regulators and co-regulators. The usefulness of motif analysis is further exemplified in this review by how motif discovery improves peak calling in ChIP-seq and ChIP-exo experiments and, when coupled with information on gene expression, allows insights into physical mechanisms of transcriptional modulation. PMID:26941778

  1. Phytozome Comparative Plant Genomics Portal

    SciTech Connect

    Goodstein, David; Batra, Sajeev; Carlson, Joseph; Hayes, Richard; Phillips, Jeremy; Shu, Shengqiang; Schmutz, Jeremy; Rokhsar, Daniel

    2014-09-09

    The Dept. of Energy Joint Genome Institute is a genomics user facility supporting DOE mission science in the areas of Bioenergy, Carbon Cycling, and Biogeochemistry. The Plant Program at the JGI applies genomic, analytical, computational and informatics platforms and methods to: 1. Understand and accelerate the improvement (domestication) of bioenergy crops 2. Characterize and moderate plant response to climate change 3. Use comparative genomics to identify constrained elements and infer gene function 4. Build high quality genomic resource platforms of JGI Plant Flagship genomes for functional and experimental work 5. Expand functional genomic resources for Plant Flagship genomes

  2. Comparative genomics of Lactobacillus

    PubMed Central

    Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

    2011-01-01

    Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712

  3. Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs

    PubMed Central

    Laserson, Uri; Gan, Hin Hark; Schlick, Tamar

    2005-01-01

    Riboswitches and RNA interference are important emerging mechanisms found in many organisms to control gene expression. To enhance our understanding of such RNA roles, finding small regulatory motifs in genomes presents a challenge on a wide scale. Many simple functional RNA motifs have been found by in vitro selection experiments, which produce synthetic target-binding aptamers as well as catalytic RNAs, including the hammerhead ribozyme. Motivated by the prediction of Piganeau and Schroeder [(2003) Chem. Biol., 10, 103–104] that synthetic RNAs may have natural counterparts, we develop and apply an efficient computational protocol for identifying aptamer-like motifs in genomes. We define motifs from the sequence and structural information of synthetic aptamers, search for sequences in genomes that will produce motif matches, and then evaluate the structural stability and statistical significance of the potential hits. Our application to aptamers for streptomycin, chloramphenicol, neomycin B and ATP identifies 37 candidate sequences (in coding and non-coding regions) that fold to the target aptamer structures in bacterial and archaeal genomes. Further energetic screening reveals that several candidates exhibit energetic properties and sequence conservation patterns that are characteristic of functional motifs. Besides providing candidates for experimental testing, our computational protocol offers an avenue for expanding natural RNA's functional repertoire. PMID:16254081

  4. Comparative genomics of nematodes.

    PubMed

    Mitreva, Makedonka; Blaxter, Mark L; Bird, David M; McCarter, James P

    2005-10-01

    Recent transcriptome and genome projects have dramatically expanded the biological data available across the phylum Nematoda. Here we summarize analyses of these sequences, which have revealed multiple unexpected results. Despite a uniform body plan, nematodes are more diverse at the molecular level than was previously recognized, with many species- and group-specific novel genes. In the genus Caenorhabditis, changes in chromosome arrangement, particularly local inversions, are also rapid, with breakpoints occurring at 50-fold the rate in vertebrates. Tylenchid plant parasitic nematode genomes contain several genes closely related to genes in bacteria, implicating horizontal gene transfer events in the origins of plant parasitism. Functional genomics techniques are also moving from Caenorhabditis elegans to application throughout the phylum. Soon, eight more draft nematode genome sequences will be available. This unique resource will underpin both molecular understanding of these most abundant metazoan organisms and aid in the examination of the dynamics of genome evolution in animals.

  5. Population genomics and transcriptional consequences of regulatory motif variation in globally diverse Saccharomyces cerevisiae strains.

    PubMed

    Connelly, Caitlin F; Skelly, Daniel A; Dunham, Maitreya J; Akey, Joshua M

    2013-07-01

    Noncoding genetic variation is known to significantly influence gene expression levels in a growing number of specific cases; however, the patterns of genome-wide noncoding variation present within populations, the evolutionary forces acting on noncoding variants, and the relative effects of regulatory polymorphisms on transcript abundance are not well characterized. Here, we address these questions by analyzing patterns of regulatory variation in motifs for 177 DNA binding proteins in 37 strains of Saccharomyces cerevisiae. Between S. cerevisiae strains, we found considerable polymorphism in regulatory motifs across strains (mean π = 0.005) as well as diversity in regulatory motifs (mean 0.91 motifs differences per regulatory region). Population genetics analyses reveal that motifs are under purifying selection, and there is considerable heterogeneity in the magnitude of selection across different motifs. Finally, we obtained RNA-Seq data in 22 strains and identified 49 polymorphic DNA sequence motifs in 30 distinct genes that are significantly associated with transcriptional differences between strains. In 22 of these genes, there was a single polymorphic motif associated with expression in the upstream region. Our results provide comprehensive insights into the evolutionary trajectory of regulatory variation in yeast and the characteristics of a compendium of regulatory alleles.

  6. Population Genomics and Transcriptional Consequences of Regulatory Motif Variation in Globally Diverse Saccharomyces cerevisiae Strains

    PubMed Central

    Connelly, Caitlin F.; Skelly, Daniel A.; Dunham, Maitreya J.; Akey, Joshua M.

    2013-01-01

    Noncoding genetic variation is known to significantly influence gene expression levels in a growing number of specific cases; however, the patterns of genome-wide noncoding variation present within populations, the evolutionary forces acting on noncoding variants, and the relative effects of regulatory polymorphisms on transcript abundance are not well characterized. Here, we address these questions by analyzing patterns of regulatory variation in motifs for 177 DNA binding proteins in 37 strains of Saccharomyces cerevisiae. Between S. cerevisiae strains, we found considerable polymorphism in regulatory motifs across strains (mean π = 0.005) as well as diversity in regulatory motifs (mean 0.91 motifs differences per regulatory region). Population genetics analyses reveal that motifs are under purifying selection, and there is considerable heterogeneity in the magnitude of selection across different motifs. Finally, we obtained RNA-Seq data in 22 strains and identified 49 polymorphic DNA sequence motifs in 30 distinct genes that are significantly associated with transcriptional differences between strains. In 22 of these genes, there was a single polymorphic motif associated with expression in the upstream region. Our results provide comprehensive insights into the evolutionary trajectory of regulatory variation in yeast and the characteristics of a compendium of regulatory alleles. PMID:23619145

  7. Identification of DNA Motifs Implicated in Maintenance of Bacterial Core Genomes by Predictive Modeling

    PubMed Central

    Halpern, David; Chiapello, Hélène; Schbath, Sophie; Robin, Stéphane; Hennequet-Antier, Christelle; Gruss, Alexandra; El Karoui, Meriem

    2007-01-01

    Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the “crossover hotspot instigator,” or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions. PMID:17941709

  8. Identification of DNA motifs implicated in maintenance of bacterial core genomes by predictive modeling.

    PubMed

    Halpern, David; Chiapello, Hélène; Schbath, Sophie; Robin, Stéphane; Hennequet-Antier, Christelle; Gruss, Alexandra; El Karoui, Meriem

    2007-09-01

    Bacterial biodiversity at the species level, in terms of gene acquisition or loss, is so immense that it raises the question of how essential chromosomal regions are spared from uncontrolled rearrangements. Protection of the genome likely depends on specific DNA motifs that impose limits on the regions that undergo recombination. Although most such motifs remain unidentified, they are theoretically predictable based on their genomic distribution properties. We examined the distribution of the "crossover hotspot instigator," or Chi, in Escherichia coli, and found that its exceptional distribution is restricted to the core genome common to three strains. We then formulated a set of criteria that were incorporated in a statistical model to search core genomes for motifs potentially involved in genome stability in other species. Our strategy led us to identify and biologically validate two distinct heptamers that possess Chi properties, one in Staphylococcus aureus, and the other in several streptococci. This strategy paves the way for wide-scale discovery of other important functional noncoding motifs that distinguish core genomes from the strain-variable regions.

  9. Comparative genomics of Brassicaceae crops

    PubMed Central

    Sharma, Ashutosh; Li, Xiaonan; Lim, Yong Pyo

    2014-01-01

    The family Brassicaceae is one of the major groups of the plant kingdom and comprises diverse species of great economic, agronomic and scientific importance, including the model plant Arabidopsis. The sequencing of the Arabidopsis genome has revolutionized our knowledge in the field of plant biology and provides a foundation in genomics and comparative biology. Genomic resources have been utilized in Brassica for diversity analyses, construction of genetic maps and identification of agronomic traits. In Brassicaceae, comparative sequence analysis across the species has been utilized to understand genome structure, evolution and the detection of conserved genomic segments. In this review, we focus on the progress made in genetic resource development, genome sequencing and comparative mapping in Brassica and related species. The utilization of genomic resources and next-generation sequencing approaches in improvement of Brassica crops is also discussed. PMID:24987286

  10. Prevalent RNA recognition motif duplication in the human genome.

    PubMed

    Tsai, Yihsuan S; Gomez, Shawn M; Wang, Zefeng

    2014-05-01

    The sequence-specific recognition of RNA by proteins is mediated through various RNA binding domains, with the RNA recognition motif (RRM) being the most frequent and present in >50% of RNA-binding proteins (RBPs). Many RBPs contain multiple RRMs, and it is unclear how each RRM contributes to the binding specificity of the entire protein. We found that RRMs within the same RBP (i.e., sibling RRMs) tend to have significantly higher similarity than expected by chance. Sibling RRM pairs from RBPs shared by multiple species tend to have lower similarity than those found only in a single species, suggesting that multiple RRMs within the same protein might arise from domain duplication followed by divergence through random mutations. This finding is exemplified by a recent RRM domain duplication in DAZ proteins and an ancient duplication in PABP proteins. Additionally, we found that different similarities between sibling RRMs are associated with distinct functions of an RBP and that the RBPs tend to contain repetitive sequences with low complexity. Taken together, this study suggests that the number of RBPs with multiple RRMs has expanded in mammals and that the multiple sibling RRMs may recognize similar target motifs in a cooperative manner.

  11. Inference of self-regulated transcriptional networks by comparative genomics.

    PubMed

    Cornish, Joseph P; Matthews, Fialelei; Thomas, Julien R; Erill, Ivan

    2012-01-01

    The assumption of basic properties, like self-regulation, in simple transcriptional regulatory networks can be exploited to infer regulatory motifs from the growing amounts of genomic and meta-genomic data. These motifs can in principle be used to elucidate the nature and scope of transcriptional networks through comparative genomics. Here we assess the feasibility of this approach using the SOS regulatory network of Gram-positive bacteria as a test case. Using experimentally validated data, we show that the known regulatory motif can be inferred through the assumption of self-regulation. Furthermore, the inferred motif provides a more robust search pattern for comparative genomics than the experimental motifs defined in reference organisms. We take advantage of this robustness to generate a functional map of the SOS response in Gram-positive bacteria. Our results reveal definite differences in the composition of the LexA regulon between Firmicutes and Actinobacteria, and confirm that regulation of cell-division inhibition is a widespread characteristic of this network among Gram-positive bacteria.

  12. Genome-wide analysis of ethylene-responsive element binding factor-associated amphiphilic repression motif-containing transcriptional regulators in Arabidopsis.

    PubMed

    Kagale, Sateesh; Links, Matthew G; Rozwadowski, Kevin

    2010-03-01

    The ethylene-responsive element binding factor-associated amphiphilic repression (EAR) motif is a transcriptional regulatory motif identified in members of the ethylene-responsive element binding factor, C2H2, and auxin/indole-3-acetic acid families of transcriptional regulators. Sequence comparison of the core EAR motif sites from these proteins revealed two distinct conservation patterns: LxLxL and DLNxxP. Proteins containing these motifs play key roles in diverse biological functions by negatively regulating genes involved in developmental, hormonal, and stress signaling pathways. Through a genome-wide bioinformatics analysis, we have identified the complete repertoire of the EAR repressome in Arabidopsis (Arabidopsis thaliana) comprising 219 proteins belonging to 21 different transcriptional regulator families. Approximately 72% of these proteins contain a LxLxL type of EAR motif, 22% contain a DLNxxP type of EAR motif, and the remaining 6% have a motif where LxLxL and DLNxxP are overlapping. Published in vitro and in planta investigations support approximately 40% of these proteins functioning as negative regulators of gene expression. Comparative sequence analysis of EAR motif sites and adjoining regions has identified additional preferred residues and potential posttranslational modification sites that may influence the functionality of the EAR motif. Homology searches against protein databases of poplar (Populus trichocarpa), grapevine (Vitis vinifera), rice (Oryza sativa), and sorghum (Sorghum bicolor) revealed that the EAR motif is conserved across these diverse plant species. This genome-wide analysis represents the most extensive survey of EAR motif-containing proteins in Arabidopsis to date and provides a resource enabling investigations into their biological roles and the mechanism of EAR motif-mediated transcriptional regulation.

  13. High resolution genome wide binding event finding and motif discovery reveals transcription factor spatial binding constraints.

    PubMed

    Guo, Yuchun; Mahony, Shaun; Gifford, David K

    2012-01-01

    An essential component of genome function is the syntax of genomic regulatory elements that determine how diverse transcription factors interact to orchestrate a program of regulatory control. A precise characterization of in vivo spacing constraints between key transcription factors would reveal key aspects of this genomic regulatory language. To discover novel transcription factor spatial binding constraints in vivo, we developed a new integrative computational method, genome wide event finding and motif discovery (GEM). GEM resolves ChIP data into explanatory motifs and binding events at high spatial resolution by linking binding event discovery and motif discovery with positional priors in the context of a generative probabilistic model of ChIP data and genome sequence. GEM analysis of 63 transcription factors in 214 ENCODE human ChIP-Seq experiments recovers more known factor motifs than other contemporary methods, and discovers six new motifs for factors with unknown binding specificity. GEM's adaptive learning of binding-event read distributions allows it to further improve upon previous methods for processing ChIP-Seq and ChIP-exo data to yield unsurpassed spatial resolution and discovery of closely spaced binding events of the same factor. In a systematic analysis of in vivo sequence-specific transcription factor binding using GEM, we have found hundreds of spatial binding constraints between factors. GEM found 37 examples of factor binding constraints in mouse ES cells, including strong distance-specific constraints between Klf4 and other key regulatory factors. In human ENCODE data, GEM found 390 examples of spatially constrained pair-wise binding, including such novel pairs as c-Fos:c-Jun/USF1, CTCF/Egr1, and HNF4A/FOXA1. The discovery of new factor-factor spatial constraints in ChIP data is significant because it proposes testable models for regulatory factor interactions that will help elucidate genome function and the implementation of combinatorial

  14. Development of a salicylic acid inducible minimal sub-genomic transcript promoter from Figwort mosaic virus with enhanced root- and leaf-activity using TGACG motif rearrangement.

    PubMed

    Kumar, Deepak; Patro, Sunita; Ghosh, Jayasish; Das, Abhimanyu; Maiti, Indu B; Dey, Nrisingha

    2012-07-15

    In Figwort mosaic virus sub-genomic transcript promoter (F-Sgt), function of the TGACG-regulatory motif, was investigated in the background of artificially designed promoter sequences. The 131bp (FS, -100 to +31) long F-Sgt promoter sequence containing one TGACG motif [FS-(TGACG)] was engineered to generate a set of three modified promoter constructs: [FS-(TGACG)(2), containing one additional TGACG motif at 7 nucleotides upstream of the original one], [FS-(TGACG)(3), containing two additional TGACG motifs at 7 nucleotides upstream and two nucleotides downstream of the original one] and [FS-(TGCTG)(mu), having a mutated TGACG motif]. EMSA and foot-printing analysis confirmed binding of tobacco nuclear factors with modified TGACG motif/s. The transcription-activation of the GUS gene by the TGACG motif/s in above promoter constructs was examined in transgenic tobacco and Arabidopsis plants and observed that the transcription activation was affected by the spacing/s and number/s of the TGACG motif/s. The FS-(TGACG)(2) promoter showed strongest root-activity compared to other modified and CaMV35S promoters. Also under salicylic acid (SA) stress, the leaf-activity of the said promoter was further enhanced. All above findings were confirmed by real-time and semi-qRT PCR analysis. Taken together, these results clearly demonstrated that the TGACG motif plays an important role in inducing the root-specific expression of the F-Sgt promoter. This study advocates the importance of genetic manipulation of functional cis-motif for amending the tissue specificity of a plant promoter. SA inducible FS-(TGACG)(2) promoter with enhanced activity could be a useful candidate promoter for developing plants with enhanced crop productivity.

  15. Comparative Microbial Genomics and Forensics.

    PubMed

    Massey, Steven E

    2016-08-01

    Forensic science concerns the application of scientific techniques to questions of a legal nature and may also be used to address questions of historical importance. Forensic techniques are often used in legal cases that involve crimes against persons or property, and they increasingly may involve cases of bioterrorism, crimes against nature, medical negligence, or tracing the origin of food- and crop-borne disease. Given the rapid advance of genome sequencing and comparative genomics techniques, we ask how these might be used to address cases of a forensic nature, focusing on the use of microbial genome sequence analysis. Such analyses rely on the increasingly large numbers of microbial genomes present in public databases, the ability of individual investigators to rapidly sequence whole microbial genomes, and an increasing depth of understanding of their evolution and function. Suggestions are made as to how comparative microbial genomics might be applied forensically and may represent possibilities for the future development of forensic techniques. A particular emphasis is on the nascent field of genomic epidemiology, which utilizes rapid whole-genome sequencing to identify the source and spread of infectious outbreaks. Also discussed is the application of comparative microbial genomics to the study of historical epidemics and deaths and how the approaches developed may also be applicable to more recent and actionable cases.

  16. Cloud computing for comparative genomics.

    PubMed

    Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

    2010-05-18

    Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.

  17. Cloud computing for comparative genomics

    PubMed Central

    2010-01-01

    Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786

  18. Comparative Genomics of the Cucurbitaceae

    USDA-ARS?s Scientific Manuscript database

    The genome size for watermelon, melon, cucumber, and pumpkin is 425, 454, 367, and 502 Mbp, respectively, and considered medium size as compared with most other crops. Whole-genome duplication is common in angiosperm plants. Research has revealed a paleohexaploidy (') event in the common ancestor of...

  19. DISCOVER: a feature-based discriminative method for motif search in complex genomes

    PubMed Central

    Fu, Wenjie; Ray, Pradipta; Xing, Eric P.

    2009-01-01

    Motivation: Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and intricate ‘grammatical organization’ of motifs within cis-regulatory modules (CRMs), extant pattern matching-based in silico motif search methods often suffer from impractically high false positive rates, especially in the context of analyzing large genomic datasets, and noisy position weight matrices which characterize binding sites. Here, we try to address this problem by using a framework to maximally utilize the information content of the genomic DNA in the region of query, taking cues from values of various biologically meaningful genetic and epigenetic factors in the query region such as clade-specific evolutionary parameters, presence/absence of nearby coding regions, etc. We present a new method for TFBS prediction in metazoan genomes that utilizes both the CRM architecture of sequences and a variety of features of individual motifs. Our proposed approach is based on a discriminative probabilistic model known as conditional random fields that explicitly optimizes the predictive probability of motif presence in large sequences, based on the joint effect of all such features. Results: This model overcomes weaknesses in earlier methods based on less effective statistical formalisms that are sensitive to spurious signals in the data. We evaluate our method on both simulated CRMs and real Drosophila sequences in comparison with a wide spectrum of existing models, and outperform the state of the art by 22% in F1 score. Availability and Implementation: The code is publicly available at http://www.sailing.cs.cmu.edu/discover.html. Contact: epxing@cs.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19478006

  20. Culex genome is not just another genome for comparative genomics.

    PubMed

    Reddy, B P Niranjan; Labbé, Pierrick; Corbel, Vincent

    2012-03-30

    Formal publication of the Culex genome sequence has closed the human disease vector triangle by meeting the Anopheles gambiae and Aedes aegypti genome sequences. Compared to these other mosquitoes, Culex quinquefasciatus possesses many specific hallmark characteristics, and may thus provide different angles for research which ultimately leads to a practical solution for controlling the ever increasing burden of insect-vector-borne diseases around the globe. We argue the special importance of the cosmopolitan species- Culex genome sequence by invoking many interesting questions and the possible of potential of the Culex genome to answer those.

  1. Culex genome is not just another genome for comparative genomics

    PubMed Central

    2012-01-01

    Formal publication of the Culex genome sequence has closed the human disease vector triangle by meeting the Anopheles gambiae and Aedes aegypti genome sequences. Compared to these other mosquitoes, Culex quinquefasciatus possesses many specific hallmark characteristics, and may thus provide different angles for research which ultimately leads to a practical solution for controlling the ever increasing burden of insect-vector-borne diseases around the globe. We argue the special importance of the cosmopolitan species- Culex genome sequence by invoking many interesting questions and the possible of potential of the Culex genome to answer those. PMID:22463777

  2. Comparative genomics for biodiversity conservation

    PubMed Central

    Grueber, Catherine E.

    2015-01-01

    Genomic approaches are gathering momentum in biology and emerging opportunities lie in the creative use of comparative molecular methods for revealing the processes that influence diversity of wildlife. However, few comparative genomic studies are performed with explicit and specific objectives to aid conservation of wild populations. Here I provide a brief overview of comparative genomic approaches that offer specific benefits to biodiversity conservation. Because conservation examples are few, I draw on research from other areas to demonstrate how comparing genomic data across taxa may be used to inform the characterisation of conservation units and studies of hybridisation, as well as studies that provide conservation outcomes from a better understanding of the drivers of divergence. A comparative approach can also provide valuable insight into the threatening processes that impact rare species, such as emerging diseases and their management in conservation. In addition to these opportunities, I note areas where additional research is warranted. Overall, comparing and contrasting the genomic composition of threatened and other species provide several useful tools for helping to preserve the molecular biodiversity of the global ecosystem. PMID:26106461

  3. Genome-wide upstream motif analysis of Cryptosporidium parvum genes clustered by expression profile

    PubMed Central

    2013-01-01

    Background There are very few molecular genetic tools available to study the apicomplexan parasite Cryptosporidium parvum. The organism is not amenable to continuous in vitro cultivation or transfection, and purification of intracellular developmental stages in sufficient numbers for most downstream molecular applications is difficult and expensive since animal hosts are required. As such, very little is known about gene regulation in C. parvum. Results We have clustered whole-genome gene expression profiles generated from a previous study of seven post-infection time points of 3,281 genes to identify genes that show similar expression patterns throughout the first 72 hours of in vitro epithelial cell culture. We used the algorithms MEME, AlignACE and FIRE to identify conserved, overrepresented DNA motifs in the upstream promoter region of genes with similar expression profiles. The most overrepresented motifs were E2F (5′-TGGCGCCA-3′); G-box (5′-G.GGGG-3′); a well-documented ApiAP2 binding motif (5′-TGCAT-3′), and an unknown motif (5′-[A/C] AACTA-3′). We generated a recombinant C. parvum DNA-binding protein domain from a putative ApiAP2 transcription factor [CryptoDB: cgd8_810] and determined its binding specificity using protein-binding microarrays. We demonstrate that cgd8_810 can putatively bind the overrepresented G-box motif, implicating this ApiAP2 in the regulation of many gene clusters. Conclusion Several DNA motifs were identified in the upstream sequences of gene clusters that might serve as potential cis-regulatory elements. These motifs, in concert with protein DNA binding site data, establish for the first time the beginnings of a global C. parvum gene regulatory map that will contribute to our understanding of the development of this zoonotic parasite. PMID:23895416

  4. Genome-wide upstream motif analysis of Cryptosporidium parvum genes clustered by expression profile.

    PubMed

    Oberstaller, Jenna; Joseph, Sandeep J; Kissinger, Jessica C

    2013-07-29

    There are very few molecular genetic tools available to study the apicomplexan parasite Cryptosporidium parvum. The organism is not amenable to continuous in vitro cultivation or transfection, and purification of intracellular developmental stages in sufficient numbers for most downstream molecular applications is difficult and expensive since animal hosts are required. As such, very little is known about gene regulation in C. parvum. We have clustered whole-genome gene expression profiles generated from a previous study of seven post-infection time points of 3,281 genes to identify genes that show similar expression patterns throughout the first 72 hours of in vitro epithelial cell culture. We used the algorithms MEME, AlignACE and FIRE to identify conserved, overrepresented DNA motifs in the upstream promoter region of genes with similar expression profiles. The most overrepresented motifs were E2F (5'-TGGCGCCA-3'); G-box (5'-G.GGGG-3'); a well-documented ApiAP2 binding motif (5'-TGCAT-3'), and an unknown motif (5'-[A/C] AACTA-3'). We generated a recombinant C. parvum DNA-binding protein domain from a putative ApiAP2 transcription factor [CryptoDB: cgd8_810] and determined its binding specificity using protein-binding microarrays. We demonstrate that cgd8_810 can putatively bind the overrepresented G-box motif, implicating this ApiAP2 in the regulation of many gene clusters. Several DNA motifs were identified in the upstream sequences of gene clusters that might serve as potential cis-regulatory elements. These motifs, in concert with protein DNA binding site data, establish for the first time the beginnings of a global C. parvum gene regulatory map that will contribute to our understanding of the development of this zoonotic parasite.

  5. Comparative Genome Mapping in Brassica

    PubMed Central

    Lagercrantz, U.; Lydiate, D. J.

    1996-01-01

    A Brassica nigra genetic linkage map was developed from a highly polymorphic cross analyzed with a set of low copy number Brassica RFLP probes. The Brassica genome is extensively duplicated with eight distinct sets of chromosomal segments, each present in three copies, covering virtually the whole genome. Thus, B. nigra could be descended from a hexaploid ancestor. A comparative analysis of B. nigra, B. oleracea and B. rapa genomes, based on maps developed using a common set of RFLP probes, was also performed. The three genomes have distinct chromosomal structures differentiated by a large number of rearrangements, but collinear regions involving virtually the whole of each the three genomes were identified. The genic contents of B. nigra, B. oleracea and B. rapa were basically equivalent and differences in chromosome number (8, 9 and 10, respectively) are probably the result of chromsome fusions and/or fissions. The strong conservation of overall genic content across the three Brassica genomes mirrors the conservation of genic content observed over a much longer evolutionary span in cereals. However, the rate of chromosomal rearrangement in crucifers is much higher than that observed in cereal genomes. PMID:8978073

  6. Identification of regulatory motifs in the CHO genome for stable monoclonal antibody production.

    PubMed

    Takagi, Yasuhiro; Yamazaki, Tomomi; Masuda, Kenji; Nishii, Shigeaki; Kawakami, Bunsei; Omasa, Takeshi

    2016-08-20

    Chinese hamster ovary (CHO) cell lines are widely used for therapeutic protein production. When a transgene is integrated into the genome of a CHO cell, the expression level is highly dependent on the site of integration because of positional effects such as gene silencing. To overcome negative positional effects and establish stable CHO cell lines with high productivity, several regulatory DNA elements are used in vector construction. Previously, we established the CHO DR1000L-4N cell line, a stable and high copy number Dhfr gene-amplified cell line. It was hypothesized that the chromosomal location of the exogenous gene-amplified region in the CHO DR1000L-4N genome contains regulatory motifs for stable protein production. Therefore, we isolated DNA regulatory motifs from the CHO DR1000L-4N cell line and determined whether these motifs act as an insulator. Our results suggest that stable expression of a transgene can be promoted by the CHO genome sequence, and it would be a powerful tool for therapeutic protein manufacturing.

  7. Comparative genomic analyses in Asparagus.

    PubMed

    Kuhl, Joseph C; Havey, Michael J; Martin, William J; Cheung, Foo; Yuan, Qiaoping; Landherr, Lena; Hu, Yi; Leebens-Mack, James; Town, Christopher D; Sink, Kenneth C

    2005-12-01

    Garden asparagus (Asparagus officinalis L.) belongs to the monocot family Asparagaceae in the order Asparagales. Onion (Allium cepa L.) and Asparagus officinalis are 2 of the most economically important plants of the core Asparagales, a well supported monophyletic group within the Asparagales. Coding regions in onion have lower GC contents than the grasses. We compared the GC content of 3374 unique expressed sequence tags (ESTs) from A. officinalis with Lycoris longituba and onion (both members of the core Asparagales), Acorus americanus (sister to all other monocots), the grasses, and Arabidopsis. Although ESTs in A. officinalis and Acorus had a higher average GC content than Arabidopsis, Lycoris, and onion, all were clearly lower than the grasses. The Asparagaceae have the smallest nuclear genomes among all plants in the core Asparagales, which typically have huge genomes. Within the Asparagaceae, European Asparagus species have approximately twice the nuclear DNA of that of southern African Asparagus species. We cloned and sequenced 20 genomic amplicons from European A. officinalis and the southern African species Asparagus plumosus and observed no clear evidence for a recent genome doubling in A. officinalis relative to A. plumosus. These results indicate that members of the genus Asparagus with smaller genomes may be useful genomic models for plants in the core Asparagales.

  8. Genome-wide analysis of VQ motif-containing proteins in Moso bamboo (Phyllostachys edulis).

    PubMed

    Wang, Yujiao; Liu, Huanlong; Zhu, Dongyue; Gao, Yameng; Yan, Hanwei; Xiang, Yan

    2017-07-01

    29 Moso bamboo VQ proteins were genome-wide identified for the first time, and bioinformatics analysis was performed to investigate phylogenetic relationships and evolutionary divergence. The qRT-PCR data show that PeVQ genes response to different stress treatments. Accumulating evidence suggests that VQ motif-containing proteins in rice (Oryza sativa), Arabidopsis (Arabidopsis thaliana), and maize (Zea mays) play fundamental roles in response to various biotic and abiotic stresses. However, little is known about the functions of VQ family proteins in Moso bamboo (Phyllostachys edulis). In this study, we performed a genome-wide bioinformatic analysis and expression profiling of PeVQ genes. A total of 29 VQ genes was identified and divided into seven subgroups (I-VII) based on phylogenetic analysis. Gene structure and conserved motif analysis revealed that 25 of 29 VQ genes contained no introns. Multiple sequence alignment showed that Moso bamboo VQ motif-containing proteins contained five variations of the conserved motif. The time of duplication and divergence of Moso bamboo from rice and maize was calculated using K s analysis. A heat map was generated using microarray data from 29 Moso bamboo VQ genes suggesting that these genes were expressed in different tissues or developmental stages. Quantitative real-time PCR (qRT-PCR) and promoter analysis indicated that PeVQ genes were differentially regulated following treatment with polyethylene glycol, abscisic acid and salicylic acid. Our results provide a solid foundation for further research of the specific functions of VQ motif-containing proteins in Moso bamboo.

  9. Enhancer Identification through Comparative Genomics

    SciTech Connect

    Visel, Axel; Bristow, James; Pennacchio, Len A.

    2006-10-01

    With the availability of genomic sequence from numerousvertebrates, a paradigm shift has occurred in the identification ofdistant-acting gene regulatory elements. In contrast to traditionalgene-centric studies in which investigators randomly scanned genomicfragments that flank genes of interest in functional assays, the modernapproach begins electronically with publicly available comparativesequence datasets that provide investigators with prioritized lists ofputative functional sequences based on their evolutionary conservation.However, although a large number of tools and resources are nowavailable, application of comparative genomic approaches remains far fromtrivial. In particular, it requires users to dynamically consider thespecies and methods for comparison depending on the specific biologicalquestion under investigation. While there is currently no single generalrule to this end, it is clear that when applied appropriately,comparative genomic approaches exponentially increase our power ingenerating biological hypotheses for subsequent experimentaltesting.

  10. Plant Comparative and Functional Genomics

    DOE PAGES

    Yang, Xiaohan; Leebens-Mack, Jim; Chen, Feng; ...

    2015-01-01

    Plants form the foundation for our global ecosystem and are essential for environmental and human health. An increasing number of available plant genomes and tractable experimental systems, comparative and functional plant genomics research is greatly expanding our knowledge of the molecular basis of economically and nutritionally important traits in crop plants. Inferences drawn from comparative genomics are motivating experimental investigations of gene function and gene interactions. In this special issue aims to highlight recent advances made in comparative and functional genomics research in plants. Nine original research articles in this special issue cover five important topics: (1) transcription factor genemore » families relevant to abiotic stress tolerance; (2) plant secondary metabolism; (3) transcriptomebased markers for quantitative trait locus; (4) epigenetic modifications in plant-microbe interactions; and (5) computational prediction of protein-protein interactions. Finally, we studied the plant species in these articles which include model species as well as nonmodel plant species of economic importance (e.g., food crops and medicinal plants).« less

  11. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set

  12. GenomeFingerprinter: The Genome Fingerprint and the Universal Genome Fingerprint Analysis for Systematic Comparative Genomics

    PubMed Central

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    Background No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. Results First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. Conclusions We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the

  13. Comparative Genomics of the Eukaryotes

    PubMed Central

    Rubin, Gerald M.; Yandell, Mark D.; Wortman, Jennifer R.; Gabor Miklos, George L.; Nelson, Catherine R.; Hariharan, Iswar K.; Fortini, Mark E.; Li, Peter W.; Apweiler, Rolf; Fleischmann, Wolfgang; Cherry, J. Michael; Henikoff, Steven; Skupski, Marian P.; Misra, Sima; Ashburner, Michael; Birney, Ewan; Boguski, Mark S.; Brody, Thomas; Brokstein, Peter; Celniker, Susan E.; Chervitz, Stephen A.; Coates, David; Cravchik, Anibal; Gabrielian, Andrei; Galle, Richard F.; Gelbart, William M.; George, Reed A.; Goldstein, Lawrence S. B.; Gong, Fangcheng; Guan, Ping; Harris, Nomi L.; Hay, Bruce A.; Hoskins, Roger A.; Li, Jiayin; Li, Zhenya; Hynes, Richard O.; Jones, S. J. M.; Kuehl, Peter M.; Lemaitre, Bruno; Littleton, J. Troy; Morrison, Deborah K.; Mungall, Chris; O'Farrell, Patrick H.; Pickeral, Oxana K.; Shue, Chris; Vosshall, Leslie B.; Zhang, Jiong; Zhao, Qi; Zheng, Xiangqun H.; Zhong, Fei; Zhong, Wenyan; Gibbs, Richard; Venter, J. Craig; Adams, Mark D.; Lewis, Suzanna

    2009-01-01

    A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae—and the proteins they are predicted to encode—was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease. PMID:10731134

  14. DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter.

    PubMed

    Quach, Bryan; Furey, Terrence S

    2017-04-01

    Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct 'footprint' patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites. Additionally, a comprehensive and systematic comparison of footprinting methods for specifically identifying which motif sites for a specific factor are bound has not been performed. Using DNase-seq data from the ENCODE project, we show that a large degree of previously uncharacterized site-to-site variability exists in footprint signal across motif sites for a transcription factor. To model this heterogeneity in the data, we introduce a novel, supervised learning footprinter called Detecting Footprints Containing Motifs (DeFCoM). We compare DeFCoM to nine existing methods using evaluation sets from four human cell-lines and eighteen transcription factors and show that DeFCoM outperforms current methods in determining bound and unbound motif sites. We also analyze the impact of several biological and technical factors on the quality of footprint predictions to highlight important considerations when conducting footprint analyses and assessing the performance of footprint prediction methods. Finally, we show that DeFCoM can detect footprints using ATAC-seq data with similar accuracy as when using DNase-seq data. Python code available at https://bitbucket.org/bryancquach/defcom. bquach@email.unc.edu or tsfurey@email.unc.edu. Supplementary data are available at Bioinformatics online.

  15. Comparative genomics of transcriptional regulation of methionine metabolism in Proteobacteria.

    PubMed

    Leyn, Semen A; Suvorova, Inna A; Kholina, Tatiana D; Sherstneva, Sofia S; Novichkov, Pavel S; Gelfand, Mikhail S; Rodionov, Dmitry A

    2014-01-01

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ∼ 200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.

  16. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data.

    PubMed

    Gelfond, Jonathan A L; Gupta, Mayetri; Ibrahim, Joseph G

    2009-12-01

    We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.

  17. A Bayesian hidden Markov model for motif discovery through joint modeling of genomic sequence and ChIP-chip data

    PubMed Central

    Gelfond, Jonathan A. L.; Gupta, Mayetri; Ibrahim, Joseph G.

    2009-01-01

    SUMMARY We propose a unified framework for the analysis of Chromatin (Ch) Immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov Chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity. PMID:19210737

  18. Genome-Wide Motif Statistics are Shaped by DNA Binding Proteins over Evolutionary Time Scales

    NASA Astrophysics Data System (ADS)

    Qian, Long; Kussell, Edo

    2016-10-01

    The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.

  19. Sequence search and analysis of gene products containing RNA recognition motifs in the human genome.

    PubMed

    Malhotra, Sony; Sowdhamini, Ramanathan

    2014-12-22

    Gene expression is tightly regulated at both transcriptional and post-transcriptional levels. RNA-binding proteins are involved in post-transcriptional gene regulation events. They are involved in a variety of functions such as splicing, alternative splicing, nuclear import and export of mRNA, RNA stability and translation. There are several well-characterized RNA-binding motifs present in a whole genome, such as RNA recognition motif (RRM), KH domain, zinc-fingers etc. In the present study, we have investigated human genome for the presence of RRM-containing gene products starting from RRM domains in the Pfam (Protein family database) repository. In Pfam, seven families are recorded to contain RRM-containing proteins. We studied these families for their taxonomic representation, sequence features (identity, length, phylogeny) and structural properties (mapping conservation on the structures). We then examined the presence of RRM-containing gene products in Homo sapiens genome and identified 928 RRM-containing gene products. These were studied for their predicted domain architectures, biological processes, involvement in pathways, disease relevance and disorder content. RRM domains were observed to occur multiple times in a single polypeptide. However, there are 56 other co-existing domains involved in different regulatory functions. Further, functional enrichment analysis revealed that RRM-containing gene products are mainly involved in biological functions such as mRNA splicing and its regulation. Our sequence analysis identified RRM-containing gene products in the human genome and provides insights into their domain architectures and biological functions. Since mRNA splicing and gene regulation are important in the cellular machinery, this analysis provides an early overview of genes that carry out these functions.

  20. Comparative primate genomics: emerging patterns of genome content and dynamics.

    PubMed

    Rogers, Jeffrey; Gibbs, Richard A

    2014-05-01

    Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for various primate species, and analyses of several others are underway. Whole-genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other non-human primates offer valuable insights into genetic similarities and differences among species that are used as models for disease-related research. This Review summarizes current knowledge regarding primate genome content and dynamics, and proposes a series of goals for the near future.

  1. Comparative primate genomics: emerging patterns of genome content and dynamics

    PubMed Central

    Rogers, Jeffrey; Gibbs, Richard A.

    2014-01-01

    Preface Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for several primates, with analyses of several others underway. Whole genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other nonhuman primates provide valuable insight into genetic similarities and differences among species used as models for disease-related research. This review summarizes current knowledge regarding primate genome content and dynamics and offers a series of goals for the near future. PMID:24709753

  2. What constitutes an Arabian Helicobacter pylori? Lessons from comparative genomics.

    PubMed

    Kumar, Narender; Albert, M John; Al Abkal, Hanan; Siddique, Iqbal; Ahmed, Niyaz

    2017-02-01

    Helicobacter pylori, the human gastric pathogen, causes a variety of gastric diseases ranging from mild gastritis to gastric cancer. While the studies on H. pylori are dominated by those based on either East Asian or Western strains, information regarding H. pylori strains prevalent in the Middle East remains scarce. Therefore, we carried out whole-genome sequencing and comparative analysis of three H. pylori strains isolated from three native Arab, Kuwaiti patients. H. pylori strains were sequenced using Illumina platform. The sequence reads were filtered and draft genomes were assembled and annotated. Various pathogenicity-associated regions and phages present within the genomes were identified. Phylogenetic analysis was carried out to determine the genetic relatedness of Kuwaiti strains to various lineages of H. pylori. The core genome content and virulence-related genes were analyzed to assess the pathogenic potential. The three genomes clustered along with HpEurope strains in the phylogenetic tree comprising various H. pylori lineages. A total of 1187 genes spread among various functional classes were identified in the core genome analysis. The three genomes possessed a complete cagPAI and also retained most of the known outer membrane proteins as well as virulence-related genes. The cagA gene in all three strains consisted of an AB-C type EPIYA motif. The comparative genomic analysis of Kuwaiti H. pylori strains revealed a European ancestry and a high pathogenic potential. © 2016 John Wiley & Sons Ltd.

  3. Datasets for evolutionary comparative genomics

    PubMed Central

    Liberles, David A

    2005-01-01

    Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856

  4. Gramene database: navigating plant comparative genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  5. Comparative genomics of the liberibacteral plant pathogens

    USDA-ARS?s Scientific Manuscript database

    Comparative analyses of multiple Liberibacter genomes provide significant insights into the evolutionary history, genetic diversity, and phylogenetic and metabolomic capacities among pathogenic bacteria that have caused tremendous economic losses to agricultural crops. In addition, genomic analyses ...

  6. Cocoa/Cotton Comparative Genomics

    USDA-ARS?s Scientific Manuscript database

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  7. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Freshwater bacterial lifestyles inferred from comparative genomics.

    PubMed

    Livermore, Joshua A; Emrich, Scott J; Tan, John; Jones, Stuart E

    2014-03-01

    While micro-organisms actively mediate and participate in freshwater ecosystem services, we know little about freshwater microbial genetic diversity. Genome sequences are available for many bacteria from the human microbiome and the ocean (over 800 and 200, respectively), but only two freshwater genomes are currently available: the streamlined genomes of Polynucleobacter necessarius ssp. asymbioticus and the Actinobacterium AcI-B1. Here, we sequenced and analysed draft genomes of eight phylogentically diverse freshwater bacteria exhibiting a range of lifestyle characteristics. Comparative genomics of these bacteria reveals putative freshwater bacterial lifestyles based on differences in predicted growth rate, capability to respond to environmental stimuli and diversity of useable carbon substrates. Our conceptual model based on these genomic characteristics provides a foundation on which further ecophysiological and genomic studies can be built. In addition, these genomes greatly expand the diversity of existing genomic context for future studies on the ecology and genetics of freshwater bacteria.

  9. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium.

    PubMed

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur, amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.

  10. Genome-Wide Analyses of Recombination Prone Regions Predict Role of DNA Structural Motif in Recombination

    PubMed Central

    Das, Swapan Kumar; Chowdhury, Shantanu

    2009-01-01

    HapMap findings reveal surprisingly asymmetric distribution of recombinogenic regions. Short recombinogenic regions (hotspots) are interspersed between large relatively non-recombinogenic regions. This raises the interesting possibility of DNA sequence and/or other cis- elements as determinants of recombination. We hypothesized the involvement of non-canonical sequences that can result in local non-B DNA structures and tested this using the G-quadruplex DNA as a model. G-quadruplex or G4 DNA is a unique form of four-stranded non-B DNA structure that engages certain G-rich sequences, presence of such motifs has been noted within telomeres. In support of this hypothesis, genome-wide computational analyses presented here reveal enrichment of potential G4 (PG4) DNA forming sequences within 25618 human hotspots relative to 9290 coldspots (p<0.0001). Furthermore, co-occurrence of PG4 DNA within several short sequence elements that are associated with recombinogenic regions was found to be significantly more than randomly expected. Interestingly, analyses of more than 50 DNA binding factors revealed that co-occurrence of PG4 DNA with target DNA binding sites of transcription factors c-Rel, NF-kappa B (p50 and p65) and Evi-1 was significantly enriched in recombination-prone regions. These observations support involvement of G4 DNA in recombination, predicting a functional model that is consistent with duplex-strand separation induced by formation of G4 motifs in supercoiled DNA and/or when assisted by other cellular factors. PMID:19198658

  11. Comparative genomics of protoploid Saccharomycetaceae

    PubMed Central

    Souciet, Jean-Luc; Dujon, Bernard; Gaillardin, Claude; Johnston, Mark; Baret, Philippe V.; Cliften, Paul; Sherman, David J.; Weissenbach, Jean; Westhof, Eric; Wincker, Patrick; Jubin, Claire; Poulain, Julie; Barbe, Valérie; Ségurens, Béatrice; Artiguenave, François; Anthouard, Véronique; Vacherie, Benoit; Val, Marie-Eve; Fulton, Robert S.; Minx, Patrick; Wilson, Richard; Durrens, Pascal; Jean, Géraldine; Marck, Christian; Martin, Tiphaine; Nikolski, Macha; Rolland, Thomas; Seret, Marie-Line; Casarégola, Serge; Despons, Laurence; Fairhead, Cécile; Fischer, Gilles; Lafontaine, Ingrid; Leh, Véronique; Lemaire, Marc; de Montigny, Jacky; Neuvéglise, Cécile; Thierry, Agnès; Blanc-Lenfle, Isabelle; Bleykasten, Claudine; Diffels, Julie; Fritsch, Emilie; Frangeul, Lionel; Goëffon, Adrien; Jauniaux, Nicolas; Kachouri-Lafond, Rym; Payen, Célia; Potier, Serge; Pribylova, Lenka; Ozanne, Christophe; Richard, Guy-Franck; Sacerdot, Christine; Straub, Marie-Laure; Talla, Emmanuel

    2009-01-01

    Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call “protoploid” because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified. PMID:19525356

  12. Inferring divergence of context-dependent substitution rates in Drosophila genomes with applications to comparative genomics.

    PubMed

    Chachick, Ran; Tanay, Amos

    2012-07-01

    Nucleotide substitution is a major evolutionary driving force that can incrementally and stochastically give rise to broad divergence patterns among species. The substitution process at each genomic position is frequently modeled independently of the other positions, although complex interactions between nearby bases are known to significantly affect mutation rates. Here, we study the evolution of 12 fly genomes using new algorithms for accurate inference of parameter-rich substitution models. By comparing models between lineages, we reveal the evolutionary histories of substitution rates at different flanking nucleotide contexts. We demonstrate these driving forces of molecular evolution to be constantly changing, suggesting that neutral drift of mutation rates is an important factor in the evolution of genomes and their sequence composition. This observation is used to develop a scalable approach for parameter-rich comparative genomics. By screening short DNA sequences, we demonstrate how homeoboxes and other transcription factor binding motifs are highly conserved based on our parameter-rich models but not according to standard conservation assays. With the increasing availability of genome sequences, rich substitution models become an attractive and practical approach for evolutionary analysis in general and comparative genomics in particular.

  13. Evolutionary Conserved Motif Finder (ECMFinder) for genome-wide identification of clustered YY1- and CTCF-binding sites

    PubMed Central

    Kang, Keunsoo; Chung, Jae Hoon; Kim, Joomyeong

    2009-01-01

    We have developed a new bioinformatics approach called ECMFinder (Evolutionary Conserved Motif Finder). This program searches for a given DNA motif within the entire genome of one species and uses the gene association information of a potential transcription factor-binding site (TFBS) to screen the homologous regions of a second and third species. If multiple species have this potential TFBS in homologous positions, this program recognizes the identified TFBS as an evolutionary conserved motif (ECM). This program outputs a list of ECMs, which can be uploaded as a Custom Track in the UCSC genome browser and can be visualized along with other available data. The feasibility of this approach was tested by searching the genomes of three mammals (human, mouse and cow) with the DNA-binding motifs of YY1 and CTCF. This program successfully identified many clustered YY1- and CTCF-binding sites that are conserved among these species but were previously undetected. In particular, this program identified CTCF-binding sites that are located close to the Dlk1, Magel2 and Cdkn1c imprinted genes. Individual ChIP experiments confirmed the in vivo binding of the YY1 and CTCF proteins to most of these newly discovered binding sites, demonstrating the feasibility and usefulness of ECMFinder. PMID:19208640

  14. Comparative Reannotation of 21 Aspergillus Genomes

    SciTech Connect

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  15. Building dictionaries of 1D and 3D motifs by mining the Unaligned 1D sequences of 17 archaeal and bacterial genomes.

    PubMed

    Rigoutsos, I; Gao, Y; Floratos, A; Parida, L

    1999-01-01

    We have used the Teiresias algorithm to carry out unsupervised pattern discovery in a database containing the unaligned ORFs from the 17 publicly available complete archaeal and bacterial genomes and build a 1D dictionary of motifs. These motifs which we refer to as seqlets account for and cover 97.88% of this genomic input at the level of amino acid positions. Each of the seqlets in this 1D dictionary was located among the sequences in Release 38.0 of the Protein Data Bank and the structural fragments corresponding to each seqlet's instances were identified and aligned in three dimensions: those of the seqlets that resulted in RMSD errors below a pre-selected threshold of 2.5 Angstroms were entered in a 3D dictionary of structurally conserved seqlets. These two dictionaries can be thought of as cross-indices that facilitate the tackling of tasks such as automated functional annotation of genomic sequences, local homology identification, local structure characterization, comparative genomics, etc.

  16. Genome-wide comparison of ferritin family from Archaea, Bacteria, Eukarya, and Viruses: its distribution, characteristic motif, and phylogenetic relationship.

    PubMed

    Bai, Lina; Xie, Ting; Hu, Qingqing; Deng, Changyan; Zheng, Rong; Chen, Wanping

    2015-10-01

    Ferritins are highly conserved proteins that are widely distributed in various species from archaea to humans. The ubiquitous characteristic of these proteins reflects the pivotal contribution of ferritins to the safe storage and timely delivery of iron to achieve iron homeostasis. This study investigated the ferritin genes in 248 genomes from various species, including viruses, archaea, bacteria, and eukarya. The distribution comparison suggests that mammals and eudicots possess abundant ferritin genes, whereas fungi contain very few ferritin genes. Archaea and bacteria show considerable numbers of ferritin genes. Generally, prokaryotes possess three types of ferritin (the typical ferritin, bacterioferritin, and DNA-binding protein from starved cell), whereas eukaryotes have various subunit types of ferritin, thereby indicating the individuation of the ferritin family during evolution. The characteristic motif analysis of ferritins suggested that all key residues specifying the unique structural motifs of ferritin are highly conserved across three domains of life. Meanwhile, the characteristic motifs were also distinguishable between ferritin groups, especially phytoferritins, which show a plant-specific motif. The phylogenetic analyses show that ferritins within the same subfamily or subunits are generally clustered together. The phylogenetic relationships among ferritin members suggest that both gene duplication and horizontal transfer contribute to the wide variety of ferritins, and their possible evolutionary scenario was also proposed. The results contribute to a better understanding of the distribution, characteristic motif, and evolutionary relationship of the ferritin family.

  17. Genome-wide comparison of ferritin family from Archaea, Bacteria, Eukarya, and Viruses: its distribution, characteristic motif, and phylogenetic relationship

    NASA Astrophysics Data System (ADS)

    Bai, Lina; Xie, Ting; Hu, Qingqing; Deng, Changyan; Zheng, Rong; Chen, Wanping

    2015-10-01

    Ferritins are highly conserved proteins that are widely distributed in various species from archaea to humans. The ubiquitous characteristic of these proteins reflects the pivotal contribution of ferritins to the safe storage and timely delivery of iron to achieve iron homeostasis. This study investigated the ferritin genes in 248 genomes from various species, including viruses, archaea, bacteria, and eukarya. The distribution comparison suggests that mammals and eudicots possess abundant ferritin genes, whereas fungi contain very few ferritin genes. Archaea and bacteria show considerable numbers of ferritin genes. Generally, prokaryotes possess three types of ferritin (the typical ferritin, bacterioferritin, and DNA-binding protein from starved cell), whereas eukaryotes have various subunit types of ferritin, thereby indicating the individuation of the ferritin family during evolution. The characteristic motif analysis of ferritins suggested that all key residues specifying the unique structural motifs of ferritin are highly conserved across three domains of life. Meanwhile, the characteristic motifs were also distinguishable between ferritin groups, especially phytoferritins, which show a plant-specific motif. The phylogenetic analyses show that ferritins within the same subfamily or subunits are generally clustered together. The phylogenetic relationships among ferritin members suggest that both gene duplication and horizontal transfer contribute to the wide variety of ferritins, and their possible evolutionary scenario was also proposed. The results contribute to a better understanding of the distribution, characteristic motif, and evolutionary relationship of the ferritin family.

  18. The genomic HDV ribozyme utilizes a previously unnoticed U-turn motif to accomplish fast site-specific catalysis

    PubMed Central

    Sefcikova, Jana; Krasovska, Maryna V.; Šponer, Jiří; Walter, Nils G.

    2007-01-01

    The genome of the human hepatitis delta virus (HDV) harbors a self-cleaving catalytic RNA motif, the genomic HDV ribozyme, whose crystal structure shows the dangling nucleotides 5′ of the cleavage site projecting away from the catalytic core. This 5′-sequence contains a clinically conserved U − 1 that we find to be essential for fast cleavage, as the order of activity follows U − 1 > C − 1 > A − 1 > G − 1, with a >25-fold activity loss from U − 1 to G − 1. Terbium(III) footprinting detects conformations for the P1.1 stem, the cleavage site wobble pair and the A-minor motif of the catalytic trefoil turn that depend on the identity of the N − 1 base. The most tightly folded catalytic core, resembling that of the reaction product, is found in the U − 1 wild-type precursor. Molecular dynamics simulations demonstrate that a U − 1 forms the most robust kink around the scissile phosphate, exposing it to the catalytic C75 in a previously unnoticed U-turn motif found also, for example, in the hammerhead ribozyme and tRNAs. Strikingly, we find that the common structural U-turn motif serves distinct functions in the HDV and hammerhead ribozymes. PMID:17337436

  19. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  20. Gramene 2013: Comparative plant genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...

  1. Gramene: a growing plant comparative genomics resource

    USDA-ARS?s Scientific Manuscript database

    Gramene (www.gramene.org) is a curated genetic, genomic and comparative genome analysis resource for the major crop species, such as rice, maize, wheat and many other plant (mainly grass) species. Gramene is an open-source project, with all data and software freely downloadable through the ftp site ...

  2. Comparative genomic analysis of esophageal cancers.

    PubMed

    Caygill, Christine P J; Gatenby, Piers A C; Herceg, Zdenko; Lima, Sheila C S; Pinto, Luis F R; Watson, Anthony; Wu, Ming-Shiang

    2014-09-01

    The following, from the 12th OESO World Conference: Cancers of the Esophagus, includes commentaries on comparative genomic analysis of esophageal cancers: genomic polymorphisms, the genetic and epigenetic drivers in esophageal cancers, and the collection of data in the UK Barrett's Oesophagus Registry.

  3. Comparative genomics of transcriptional regulation of methionine metabolism in proteobacteria

    DOE PAGES

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; ...

    2014-11-20

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ~200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific andmore » genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.« less

  4. Comparative genomics of transcriptional regulation of methionine metabolism in proteobacteria

    SciTech Connect

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; Sherstneva, Sofia S.; Novichkov, Pavel S.; Gelfand, Mikhail S.; Rodionov, Dmitry A.; Kuipers, Oscar P.

    2014-11-20

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ~200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.

  5. Comparative Genomics of Transcriptional Regulation of Methionine Metabolism in Proteobacteria

    PubMed Central

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; Sherstneva, Sofia S.; Novichkov, Pavel S.; Gelfand, Mikhail S.; Rodionov, Dmitry A.

    2014-01-01

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ∼200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria. PMID:25411846

  6. Comparative assembly hubs: Web-accessible browsers for comparative genomics

    PubMed Central

    Nguyen, Ngan; Hickey, Glenn; Raney, Brian J.; Armstrong, Joel; Clawson, Hiram; Zweig, Ann; Karolchik, Donna; Kent, William James; Haussler, David; Paten, Benedict

    2014-01-01

    Motivation: Researchers now have access to large volumes of genome sequences for comparative analysis, some generated by the plethora of public sequencing projects and, increasingly, from individual efforts. It is not possible, or necessarily desirable, that the public genome browsers attempt to curate all these data. Instead, a wealth of powerful tools is emerging to empower users to create their own visualizations and browsers. Results: We introduce a pipeline to easily generate collections of Web-accessible UCSC Genome Browsers interrelated by an alignment. It is intended to democratize our comparative genomic browser resources, serving the broad and growing community of evolutionary genomicists and facilitating easy public sharing via the Internet. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications. To demonstrate this work, we create a comparative assembly hub containing 57 Escherichia coli and 9 Shigella genomes and show examples that highlight their unique biology. Availability and implementation: The source code is available as open source at: https://github.com/glennhickey/progressiveCactus The E.coli and Shigella genome hub is now a public hub listed on the UCSC browser public hubs Web page. Contact: benedict@soe.ucsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25138168

  7. Homology-independent metrics for comparative genomics.

    PubMed

    Coutinho, Tarcisio José Domingos; Franco, Glória Regina; Lobo, Francisco Pereira

    2015-01-01

    A mainstream procedure to analyze the wealth of genomic data available nowadays is the detection of homologous regions shared across genomes, followed by the extraction of biological information from the patterns of conservation and variation observed in such regions. Although of pivotal importance, comparative genomic procedures that rely on homology inference are obviously not applicable if no homologous regions are detectable. This fact excludes a considerable portion of "genomic dark matter" with no significant similarity - and, consequently, no inferred homology to any other known sequence - from several downstream comparative genomic methods. In this review we compile several sequence metrics that do not rely on homology inference and can be used to compare nucleotide sequences and extract biologically meaningful information from them. These metrics comprise several compositional parameters calculated from sequence data alone, such as GC content, dinucleotide odds ratio, and several codon bias metrics. They also share other interesting properties, such as pervasiveness (patterns persist on smaller scales) and phylogenetic signal. We also cite examples where these homology-independent metrics have been successfully applied to support several bioinformatics challenges, such as taxonomic classification of biological sequences without homology inference. They where also used to detect higher-order patterns of interactions in biological systems, ranging from detecting coevolutionary trends between the genomes of viruses and their hosts to characterization of gene pools of entire microbial communities. We argue that, if correctly understood and applied, homology-independent metrics can add important layers of biological information in comparative genomic studies without prior homology inference.

  8. Human and mouse introns are linked to the same processes and functions through each genome's most frequent non-conserved motifs.

    PubMed

    Tsirigos, Aristotelis; Rigoutsos, Isidore

    2008-06-01

    We identified the most frequent, variable-length DNA sequence motifs in the human and mouse genomes and sub-selected those with multiple recurrences in the intergenic and intronic regions and at least one additional exonic instance in the corresponding genome. We discovered that these motifs have virtually no overlap with intronic sequences that are conserved between human and mouse, and thus are genome-specific. Moreover, we found that these motifs span a substantial fraction of previously uncharacterized human and mouse intronic space. Surprisingly, we found that these genome-specific motifs are over-represented in the introns of genes belonging to the same biological processes and molecular functions in both the human and mouse genomes even though the underlying sequences are not conserved between the two genomes. In fact, the processes and functions that are linked to these genome-specific sequence-motifs are distinct from the processes and functions which are associated with intronic regions that are conserved between human and mouse. The findings show that intronic regions from different genomes are linked to the same processes and functions in the absence of underlying sequence conservation. We highlight the ramifications of this observation with a concrete example that involves the microsatellite instability gene MLH1.

  9. A White Paper on Nematode Comparative Genomics

    PubMed Central

    Bird, David McK.; Blaxter, Mark L.; McCarter, James P.; Mitreva, Makedonka; Sternberg, Paul W.; Thomas, W. Kelley

    2005-01-01

    In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) nematodes have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequencing. PMID:19262884

  10. A white paper on nematode comparative genomics.

    PubMed

    Bird, David McK; Blaxter, Mark L; McCarter, James P; Mitreva, Makedonka; Sternberg, Paul W; Thomas, W Kelley

    2005-12-01

    In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) nematodes have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequencing.

  11. Comparative Genome Analysis of Enterobacter cloacae

    PubMed Central

    Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching

    2013-01-01

    The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314

  12. Comparative genomic analysis of the genus Enterococcus.

    PubMed

    Zhong, Zhi; Zhang, Wenyi; Song, Yuqin; Liu, Wenjun; Xu, Haiyan; Xi, Xiaoxia; Menghe, Bilige; Zhang, Heping; Sun, Zhihong

    2017-03-01

    As important lactic acid bacteria, Enterococcus species are widely used in the production of fermented food. However, as some strains of Enterococcus are opportunistic pathogens, their safety has not been generally accepted. In recent years, a large number of new species have been described and classified within the genus Enterococcus, so a better understanding of the genetic relationships and evolution of Enterococcus species is needed. In this study, the genomes of 29 type strains of Enterococcus species were sequenced. In combination with eight complete genome sequences from the Genbank database, the whole genomes of 37 strains of Enterococcus were comparatively analyzed. The average length of Enterococcus genomes was 3.20Mb and the average GC content was 37.99%. The core- and pan- genomes were defined based on the genomes of the 37 strains of Enterococcus. The core-genome contained 605 genes, a large proportion of which were associated with carbohydrate metabolism, protein metabolism, DNA and RNA metabolism. The phylogenetic tree showed that habitat is very important in the evolution of Enterococcus. The genetic relationships were closer in strains that come from similar habitats. According to the topology of the time tree, we found that humans and mammals may be the original hosts of Enterococcus, and then species from humans and mammals made a host-shift to plants, birds, food and other environments. However, it was just an evolutionary scenario, and more data and efforts were needed to prove this postulation. The comparative genomic analysis provided a snapshot of the evolution and genetic diversity of the genus Enterococcus, which paves the way for follow-up studies on its taxonomy and functional genomics. Copyright © 2017 Elsevier GmbH. All rights reserved.

  13. RegPredict: an integrated system for regulon inference in prokaryotes by comparative genomics approach

    SciTech Connect

    Novichkov, Pavel S.; Rodionov, Dmitry A.; Stavrovskaya, Elena D.; Novichkova, Elena S.; Kazakov, Alexey E.; Gelfand, Mikhail S.; Arkin, Adam P.; Mironov, Andrey A.; Dubchak, Inna

    2010-05-26

    RegPredict web server is designed to provide comparative genomics tools for reconstruction and analysis of microbial regulons using comparative genomics approach. The server allows the user to rapidly generate reference sets of regulons and regulatory motif profiles in a group of prokaryotic genomes. The new concept of a cluster of co-regulated orthologous operons allows the user to distribute the analysis of large regulons and to perform the comparative analysis of multiple clusters independently. Two major workflows currently implemented in RegPredict are: (i) regulon reconstruction for a known regulatory motif and (ii) ab initio inference of a novel regulon using several scenarios for the generation of starting gene sets. RegPredict provides a comprehensive collection of manually curated positional weight matrices of regulatory motifs. It is based on genomic sequences, ortholog and operon predictions from the MicrobesOnline. An interactive web interface of RegPredict integrates and presents diverse genomic and functional information about the candidate regulon members from several web resources. RegPredict is freely accessible at http://regpredict.lbl.gov.

  14. Comparative genomics of pyridoxal 5′-phosphate-dependent transcription factor regulons in Bacteria

    PubMed Central

    Suvorova, Inna A.

    2016-01-01

    The MocR-subfamily transcription factors (MocR-TFs) characterized by the GntR-family DNA-binding domain and aminotransferase-like sensory domain are broadly distributed among certain lineages of Bacteria. Characterized MocR-TFs bind pyridoxal 5′-phosphate (PLP) and control transcription of genes involved in PLP, gamma aminobutyric acid (GABA) and taurine metabolism via binding specific DNA operator sites. To identify putative target genes and DNA binding motifs of MocR-TFs, we performed comparative genomics analysis of over 250 bacterial genomes. The reconstructed regulons for 825 MocR-TFs comprise structural genes from over 200 protein families involved in diverse biological processes. Using the genome context and metabolic subsystem analysis we tentatively assigned functional roles for 38 out of 86 orthologous groups of studied regulators. Most of these MocR-TF regulons are involved in PLP metabolism, as well as utilization of GABA, taurine and ectoine. The remaining studied MocR-TF regulators presumably control genes encoding enzymes involved in reduction/oxidation processes, various transporters and PLP-dependent enzymes, for example aminotransferases. Predicted DNA binding motifs of MocR-TFs are generally similar in each orthologous group and are characterized by two to four repeated sequences. Identified motifs were classified according to their structures. Motifs with direct and/or inverted repeat symmetry constitute the majority of inferred DNA motifs, suggesting preferable TF dimerization in head-to-tail or head-to-head configuration. The obtained genomic collection of in silico reconstructed MocR-TF motifs and regulons in Bacteria provides a basis for future experimental characterization of molecular mechanisms for various regulators in this family. PMID:28348826

  15. Comparative genomics of green sulfur bacteria.

    PubMed

    Davenport, Colin; Ussery, David W; Tümmler, Burkhard

    2010-06-01

    Eleven completely sequenced Chlorobi genomes were compared in oligonucleotide usage, gene contents, and synteny. The green sulfur bacteria (GSB) are equipped with a core genome that sustains their anoxygenic phototrophic lifestyle by photosynthesis, sulfur oxidation, and CO(2) fixation. Whole-genome gene family and single gene sequence comparisons yielded similar phylogenetic trees of the sequenced chromosomes indicating a concerted vertical evolution of large gene sets. Chromosomal synteny of genes is not preserved in the phylum Chlorobi. The accessory genome is characterized by anomalous oligonucleotide usage and endows the strains with individual features for transport, secretion, cell wall, extracellular constituents, and a few elements of the biosynthetic apparatus. Giant genes are a peculiar feature of the genera Chlorobium and Prosthecochloris. The predicted proteins have a huge molecular weight of 10(6), and are probably instrumental for the bacteria to generate their own intimate (micro)environment.

  16. Comparative genomic hybridization with single cells after whole genome amplification

    SciTech Connect

    Haddad, B.R.; Baldini, A.; Hughes, M.R.

    1994-09-01

    Conventional karyotype analysis is the ideal way to diagnose chromosomal imbalances. However it requires cell culture and chromosome preparation. There are instances where a very small number of cells are available for cytogenetic evaluation and chromosomes cannot be obtained. Comparative genomic hybridization (CGH) is a novel molecular cytogenetic technique that provides information about genetic imbalances affecting the genome. The power of this technique lies in its ability to detect genetic imbalances using total genomic DNA. We have previously demonstrated the feasibility of whole genome amplification from single cells for subsequent analysis of multiple genetic loci by PCR. In this present work, we combine whole genome amplification with CGH to detect chromosomal imbalances from small numbers of cells. Both cytogenetically normal and abnormal cells were individually picked by micromanipulation and subjected to whole genome amplification using random oligonucleotide primers. Amplified test and control DNA were differentially labeled by incorporation of digoxigenin or biotin, mixed together and hybridized to normal male metaphase spreads. Hybridization was detected with two fluorochromes, rhodamine-anti-digoxigenin and FITC -Avidin. Ratio of intensities of the two fluorochromes along the target chromosomes was analyzed using locally developed computer imaging software. Using the combination of whole genome amplification and CGH, we were able to detect different chromosomal aneuploidies from 30, 20, and 10 cells. It can also be applied to the analysis of fetal cells sorted from maternal circulation, or to tumor cells obtained from needle biopsies or from different body fluids and effusions. Finally, its successful application to single cells will have a great impact on preimplantation diagnosis.

  17. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  18. Comparative modeling of marsupial MHC class I molecules identifies structural polymorphisms affecting functional motifs.

    PubMed

    Daly, Kerry; Church, W Bret; Nicholas, Kevin; Williamson, Peter

    2007-11-01

    Major histocompatibility complex (MHC) class I molecules are transmembrane glycoproteins that present antigenic peptides to CD8+ T cells and are subsequently important for the initiation of an immune response. In this study novel MHC class I sequences from the tammar wallaby (Macropus eugenii) have been characterized. Analysis and comparative modeling of these and existing marsupial molecules reveals potential functional polymorphisms within peptide-binding grooves, MHC assembly motifs and the T cell receptor recognition interface. In addition, we show that a previously identified marsupial-specific insertion is within a region, which is known as a putative NK cell receptor (Ly49A) binding site in the mouse, suggesting that this site may be functionally active in marsupials. Further, the analysis highlighted differences in structural and sequence based grouping of marsupial MHC class I molecules. (c) 2007 Wiley-Liss, Inc.

  19. Comparative genomics allows the discovery of cis-regulatory elements in mosquitoes

    PubMed Central

    Sieglaff, Douglas H.; Dunn, W. Augustine; Xie, Xiaohui S.; Megy, Karyn; Marinotti, Osvaldo; James, Anthony A.

    2009-01-01

    The discovery and mapping of cis-regulatory elements is important for understanding regulation of gene transcription in mosquito vectors of human diseases. Genome sequence data are available for 3 species, Aedes aegypti, Anopheles gambiae, and Culex quinquefasciatus (Diptera: Culicidae), representing 2 subfamilies (Culicinae and Anophelinae) that are estimated to have diverged 145 to 200 million years ago. Comparative genomics tools were used to screen genomic DNA fragments located in the 5′-end flanking regions of orthologous genes. These analyses resulted in the identification of 137 sequences, designated “mosquito motifs,” 7 to 9 nucleotides in length, representing 18 families of putative cis-regulatory elements conserved significantly among the 3 species when compared to the fruit fly, Drosophila melanogaster. Forty-one of the motifs were implicated previously in experiments as sites for binding transcription factors or functioning in the regulation of mosquito gene expression. Further analyses revealed associations between specific motifs and expression profiles, particularly in those genes that show increased or decreased mRNA abundance in females following a blood meal, and those accumulating transcription products exclusively or preferentially in the midgut, fat bodies, or ovaries. These results validate the methodology and support a relationship between the discovered motifs and the conservation of hematophagy in mosquitoes. PMID:19211788

  20. VISTA - computational tools for comparative genomics

    SciTech Connect

    Frazer, Kelly A.; Pachter, Lior; Poliakov, Alexander; Rubin,Edward M.; Dubchak, Inna

    2004-01-01

    Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/VISTA/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, submit their own sequences of interest to several VISTA servers for various types of comparative analysis, and obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kilobase (kb) interval on human chromosome 5 that encodes for the kinesin family member3A (KIF3A) protein.

  1. Ebolavirus comparative genomics

    SciTech Connect

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Ussery, David W.

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.

  2. Gramene 2013: comparative plant genomics resources

    PubMed Central

    Monaco, Marcela K.; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D’Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J.; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

    2014-01-01

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology. PMID:24217918

  3. Comparative and demographic analysis of orangutan genomes

    PubMed Central

    Locke, Devin P.; Hillier, LaDeana W.; Warren, Wesley C.; Worley, Kim C.; Nazareth, Lynne V.; Muzny, Donna M.; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T.; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D.; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A.; Fulton, Robert S.; Nelson, Joanne O.; Magrini, Vincent; Pohl, Craig; Graves, Tina A.; Markovic, Chris; Cree, Andy; Dinh, Huyen H.; Hume, Jennifer; Kovar, Christie L.; Fowler, Gerald R.; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P.; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M.; Eichler, Evan E.; White, Simon; Searle, Stephen; Vilella, Albert J.; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Valle, Giuliano Della; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K.; Walker, Jerilyn A.; Ullmer, Brygg; Batzer, Mark A.; Smit, Arian F. A.; Hubley, Robert; Casola, Claudio; Schrider, Daniel R.; Hahn, Matthew W.; Quesada, Victor; Puente, Xose S.; Ordoñez, Gonzalo R.; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S.; Miller, Webb; Kosiol, Carolin; Lawson, Heather A.; Taliwal, Vikas; Martins, André L.; Siepel, Adam; RoyChoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D.; Gutenkunst, Ryan N.; Mailund, Thomas; Dutheil, Julien Y.; Hobolth, Asger; Schierup, Mikkel H.; Chemnick, Leona; Ryder, Oliver A.; Yoshinaga, Yuko; de Jong, Pieter J.; Weinstock, George M.; Rogers, Jeffrey; Mardis, Elaine R.; Gibbs, Richard A.; Wilson, Richard K.

    2011-01-01

    “Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities

  4. Comparative genomics of Shiga toxin encoding bacteriophages

    PubMed Central

    2012-01-01

    Background Stx bacteriophages are responsible for driving the dissemination of Stx toxin genes (stx) across their bacterial host range. Lysogens carrying Stx phages can cause severe, life-threatening disease and Stx toxin is an integral virulence factor. The Stx-bacteriophage vB_EcoP-24B, commonly referred to as Ф24B, is capable of multiply infecting a single bacterial host cell at a high frequency, with secondary infection increasing the rate at which subsequent bacteriophage infections can occur. This is biologically unusual, therefore determining the genomic content and context of Ф24B compared to other lambdoid Stx phages is important to understanding the factors controlling this phenomenon and determining whether they occur in other Stx phages. Results The genome of the Stx2 encoding phage, Ф24B was sequenced and annotated. The genomic organisation and general features are similar to other sequenced Stx bacteriophages induced from Enterohaemorrhagic Escherichia coli (EHEC), however Ф24B possesses significant regions of heterogeneity, with implications for phage biology and behaviour. The Ф24B genome was compared to other sequenced Stx phages and the archetypal lambdoid phage, lambda, using the Circos genome comparison tool and a PCR-based multi-loci comparison system. Conclusions The data support the hypothesis that Stx phages are mosaic, and recombination events between the host, phages and their remnants within the same infected bacterial cell will continue to drive the evolution of Stx phage variants and the subsequent dissemination of shigatoxigenic potential. PMID:22799768

  5. Comparative genomics of biotechnologically important yeasts

    USDA-ARS?s Scientific Manuscript database

    Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the...

  6. Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions

    PubMed Central

    2013-01-01

    Background The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. Results We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. Conclusions The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite. PMID:23829473

  7. Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions.

    PubMed

    Chen, Yan ping; Pettis, Jeffery S; Zhao, Yan; Liu, Xinyue; Tallon, Luke J; Sadzewicz, Lisa D; Li, Renhua; Zheng, Huoqing; Huang, Shaokang; Zhang, Xuan; Hamilton, Michele C; Pernal, Stephen F; Melathopoulos, Andony P; Yan, Xianghe; Evans, Jay D

    2013-07-05

    The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite.

  8. Screening of repetitive motifs inside the genome of the flat oyster (Ostrea edulis): Transposable elements and short tandem repeats.

    PubMed

    Vera, Manuel; Bello, Xabier; Álvarez-Dios, Jose-Antonio; Pardo, Belen G; Sánchez, Laura; Carlsson, Jens; Carlsson, Jeanette E L; Bartolomé, Carolina; Maside, Xulio; Martinez, Paulino

    2015-12-01

    The flat oyster (Ostrea edulis) is one of the most appreciated molluscs in Europe, but its production has been greatly reduced by the parasite Bonamia ostreae. Here, new generation genomic resources were used to analyse the repetitive fraction of the oyster genome, with the aim of developing molecular markers to face this main oyster production challenge. The resulting oyster database, consists of two sets of 10,318 and 7159 unique contigs (4.8 Mbp and 6.8 Mbp in total length) representing the oyster's genome (WG) and haemocyte transcriptome (HT), respectively. A total of 1083 sequences were identified as TE-derived, which corresponded to 4.0% of WG and 1.1% of HT. They were clustered into 142 homology groups, most of which were assigned to the Penelope order of retrotransposons, and to the Helitron and TIR DNA-transposons. Simple repeats and rRNA pseudogenes, also made a significant contribution to the oyster's genome (0.5% and 0.3% of WG and HT, respectively).The most frequent short tandem repeats identified in WG were tetranucleotide motifs while trinucleotide motifs were in HT. Forty identified microsatellite loci, 20 from each database, were selected for technical validation. Success was much lower among WG than HT microsatellites (15% vs 55%), which could reflect higher variation in anonymous regions interfering with primer annealing. All microsatellites developed adjusted to Hardy-Weinberg proportions and represent a useful tool to support future breeding programmes and to manage genetic resources of natural flat oyster beds. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. BayesMotif: de novo protein sorting motif discovery from impure datasets

    PubMed Central

    2010-01-01

    Background Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. Methods We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Results Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. Conclusion We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which

  10. Comparative genomics of chondrichthyan Hoxa clusters.

    PubMed

    Mulley, John F; Zhong, Ying-Fu; Holland, Peter Wh

    2009-09-02

    The chondrichthyan or cartilaginous fish (chimeras, sharks, skates and rays) occupy an important phylogenetic position as the sister group to all other jawed vertebrates and as an early lineage to diverge from the vertebrate lineage following two whole genome duplication events in vertebrate evolution. There have been few comparative genomic analyses incorporating data from chondrichthyan fish and none comparing genomic information from within the group. We have sequenced the complete Hoxa cluster of the Little Skate (Leucoraja erinacea) and compared to the published Hoxa cluster of the Horn Shark (Heterodontus francisci) and to available data from the Elephant Shark (Callorhinchus milii) genome project. A BAC clone containing the full Little Skate Hoxa cluster was fully sequenced and assembled. Analyses of coding sequences and conserved non-coding elements reveal a strikingly high level of conservation across the cartilaginous fish, with twenty ultraconserved elements (100%,100 bp) found between Skate and Horn Shark, compared to three between human and marsupials. We have also identified novel potential non-coding RNAs in the Skate BAC clone, some of which are conserved to other species. We find that the Little Skate Hoxa cluster is remarkably similar to the previously published Horn Shark Hoxa cluster with respect to sequence identity, gene size and intergenic distance despite over 180 million years of separation between the two lineages. We suggest that the genomes of cartilaginous fish are more highly conserved than those of tetrapods or teleost fish and so are more likely to have retained ancestral non-coding elements. While useful for isolating homologous DNA, this complicates bioinformatic approaches to identify chondrichthyan-specific non-coding DNA elements.

  11. Comparative genomics of chondrichthyan Hoxa clusters

    PubMed Central

    Mulley, John F; Zhong, Ying-Fu; Holland, Peter WH

    2009-01-01

    Background The chondrichthyan or cartilaginous fish (chimeras, sharks, skates and rays) occupy an important phylogenetic position as the sister group to all other jawed vertebrates and as an early lineage to diverge from the vertebrate lineage following two whole genome duplication events in vertebrate evolution. There have been few comparative genomic analyses incorporating data from chondrichthyan fish and none comparing genomic information from within the group. We have sequenced the complete Hoxa cluster of the Little Skate (Leucoraja erinacea) and compared to the published Hoxa cluster of the Horn Shark (Heterodontus francisci) and to available data from the Elephant Shark (Callorhinchus milii) genome project. Results A BAC clone containing the full Little Skate Hoxa cluster was fully sequenced and assembled. Analyses of coding sequences and conserved non-coding elements reveal a strikingly high level of conservation across the cartilaginous fish, with twenty ultraconserved elements (100%,100 bp) found between Skate and Horn Shark, compared to three between human and marsupials. We have also identified novel potential non-coding RNAs in the Skate BAC clone, some of which are conserved to other species. Conclusion We find that the Little Skate Hoxa cluster is remarkably similar to the previously published Horn Shark Hoxa cluster with respect to sequence identity, gene size and intergenic distance despite over 180 million years of separation between the two lineages. We suggest that the genomes of cartilaginous fish are more highly conserved than those of tetrapods or teleost fish and so are more likely to have retained ancestral non-coding elements. While useful for isolating homologous DNA, this complicates bioinformatic approaches to identify chondrichthyan-specific non-coding DNA elements PMID:19725973

  12. DEEP MOTIF DASHBOARD: VISUALIZING AND UNDERSTANDING GENOMIC SEQUENCES USING DEEP NEURAL NETWORKS.

    PubMed

    Lanchantin, Jack; Singh, Ritambhara; Wang, Beilun; Qi, Yanjun

    2016-01-01

    Deep neural network (DNN) models have recently obtained state-of-the-art prediction accuracy for the transcription factor binding (TFBS) site classification task. However, it remains unclear how these approaches identify meaningful DNA sequence signals and give insights as to why TFs bind to certain locations. In this paper, we propose a toolkit called the Deep Motif Dashboard (DeMo Dashboard) which provides a suite of visualization strategies to extract motifs, or sequence patterns from deep neural network models for TFBS classification. We demonstrate how to visualize and understand three important DNN models: convolutional, recurrent, and convolutional-recurrent networks. Our first visualization method is finding a test sequence's saliency map which uses first-order derivatives to describe the importance of each nucleotide in making the final prediction. Second, considering recurrent models make predictions in a temporal manner (from one end of a TFBS sequence to the other), we introduce temporal output scores, indicating the prediction score of a model over time for a sequential input. Lastly, a class-specific visualization strategy finds the optimal input sequence for a given TFBS positive class via stochastic gradient optimization. Our experimental results indicate that a convolutional-recurrent architecture performs the best among the three architectures. The visualization techniques indicate that CNN-RNN makes predictions by modeling both motifs as well as dependencies among them.

  13. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.

    PubMed

    Thomas-Chollier, Morgane; Herrmann, Carl; Defrance, Matthieu; Sand, Olivier; Thieffry, Denis; van Helden, Jacques

    2012-02-01

    ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1,28,000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.

  14. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets

    PubMed Central

    Thomas-Chollier, Morgane; Herrmann, Carl; Defrance, Matthieu; Sand, Olivier; Thieffry, Denis; van Helden, Jacques

    2012-01-01

    ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs, a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1 28 000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks. PMID:22156162

  15. A Comparative Map of the Zebrafish Genome

    PubMed Central

    Woods, Ian G.; Kelly, Peter D.; Chu, Felicia; Ngo-Hazelett, Phuong; Yan, Yi-Lin; Huang, Hui; Postlethwait, John H.; Talbot, William S.

    2000-01-01

    Zebrafish mutations define the functions of hundreds of essential genes in the vertebrate genome. To accelerate the molecular analysis of zebrafish mutations and to facilitate comparisons among the genomes of zebrafish and other vertebrates, we used a homozygous diploid meiotic mapping panel to localize polymorphisms in 691 previously unmapped genes and expressed sequence tags (ESTs). Together with earlier efforts, this work raises the total number of markers scored in the mapping panel to 2119, including 1503 genes and ESTs and 616 previously characterized simple-sequence length polymorphisms. Sequence analysis of zebrafish genes mapped in this study and in prior work identified putative human orthologs for 804 zebrafish genes and ESTs. Map comparisons revealed 139 new conserved syntenies, in which two or more genes are on the same chromosome in zebrafish and human. Although some conserved syntenies are quite large, there were changes in gene order within conserved groups, apparently reflecting the relatively frequent occurrence of inversions and other intrachromosomal rearrangements since the divergence of teleost and tetrapod ancestors. Comparative mapping also shows that there is not a one-to-one correspondence between zebrafish and human chromosomes. Mapping of duplicate gene pairs identified segments of 20 linkage groups that may have arisen during a genome duplication that occurred early in the evolution of teleosts after the divergence of teleost and mammalian ancestors. This comparative map will accelerate the molecular analysis of zebrafish mutations and enhance the understanding of the evolution of the vertebrate genome. PMID:11116086

  16. Comparative genomics tools applied to bioterrorism defence.

    PubMed

    Slezak, Tom; Kuczmarski, Tom; Ott, Linda; Torres, Clinton; Medeiros, Dan; Smith, Jason; Truitt, Brian; Mulakken, Nisha; Lam, Marisa; Vitalis, Elizabeth; Zemla, Adam; Zhou, Carol Ecale; Gardner, Shea

    2003-06-01

    Rapid advances in the genomic sequencing of bacteria and viruses over the past few years have made it possible to consider sequencing the genomes of all pathogens that affect humans and the crops and livestock upon which our lives depend. Recent events make it imperative that full genome sequencing be accomplished as soon as possible for pathogens that could be used as weapons of mass destruction or disruption. This sequence information must be exploited to provide rapid and accurate diagnostics to identify pathogens and distinguish them from harmless near-neighbours and hoaxes. The Chem-Bio Non-Proliferation (CBNP) programme of the US Department of Energy (DOE) began a large-scale effort of pathogen detection in early 2000 when it was announced that the DOE would be providing bio-security at the 2002 Winter Olympic Games in Salt Lake City, Utah. Our team at the Lawrence Livermore National Lab (LLNL) was given the task of developing reliable and validated assays for a number of the most likely bioterrorist agents. The short timeline led us to devise a novel system that utilised whole-genome comparison methods to rapidly focus on parts of the pathogen genomes that had a high probability of being unique. Assays developed with this approach have been validated by the Centers for Disease Control (CDC). They were used at the 2002 Winter Olympics, have entered the public health system, and have been in continual use for non-publicised aspects of homeland defence since autumn 2001. Assays have been developed for all major threat list agents for which adequate genomic sequence is available, as well as for other pathogens requested by various government agencies. Collaborations with comparative genomics algorithm developers have enabled our LLNL team to make major advances in pathogen detection, since many of the existing tools simply did not scale well enough to be of practical use for this application. It is hoped that a discussion of a real-life practical application of

  17. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  18. Comparative Genomics of Ethanolamine Utilization▿ † ‡

    PubMed Central

    Tsoy, Olga; Ravcheev, Dmitry; Mushegian, Arcady

    2009-01-01

    Ethanolamine can be used as a source of carbon and nitrogen by phylogenetically diverse bacteria. Ethanolamine-ammonia lyase, the enzyme that breaks ethanolamine into acetaldehyde and ammonia, is encoded by the gene tandem eutBC. Despite extensive studies of ethanolamine utilization in Salmonella enterica serovar Typhimurium, much remains to be learned about EutBC structure and catalytic mechanism, about the evolutionary origin of ethanolamine utilization, and about regulatory links between the metabolism of ethanolamine itself and the ethanolamine-ammonia lyase cofactor adenosylcobalamin. We used computational analysis of sequences, structures, genome contexts, and phylogenies of ethanolamine-ammonia lyases to address these questions and to evaluate recent data-mining studies that have suggested an association between bacterial food poisoning and the diol utilization pathways. We found that EutBC evolution included recruitment of a TIM barrel and a Rossmann fold domain and their fusion to N-terminal α-helical domains to give EutB and EutC, respectively. This fusion was followed by recruitment and occasional loss of auxiliary ethanolamine utilization genes in Firmicutes and by several horizontal transfers, most notably from the firmicute stem to the Enterobacteriaceae and from Alphaproteobacteria to Actinobacteria. We identified a conserved DNA motif that likely represents the EutR-binding site and is shared by the ethanolamine and cobalamin operons in several enterobacterial species, suggesting a mechanism for coupling the biosyntheses of apoenzyme and cofactor in these species. Finally, we found that the food poisoning phenotype is associated with the structural components of metabolosome more strongly than with ethanolamine utilization genes or with paralogous propanediol utilization genes per se. PMID:19783625

  19. CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

    PubMed Central

    Chen, Yong

    2016-01-01

    A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif clustering algorithm, CLIMP, is proposed by using maximal cliques and sped up by parallelizing its program. When a synthetic motif dataset from the database JASPAR, a set of putative motifs from a phylogenetic foot-printing dataset, and a set of putative motifs from a ChIP dataset are used to compare the performances of CLIMP and two other high-performance algorithms, the results demonstrate that CLIMP mostly outperforms the two algorithms on the three datasets for motif clustering, so that it can be a useful complement of the clustering procedures in some genome-wide motif prediction pipelines. CLIMP is available at http://sqzhang.cn/climp.html. PMID:27487245

  20. Mitotic control of human papillomavirus genome-containing cells is regulated by the function of the PDZ-binding motif of the E6 oncoprotein

    PubMed Central

    Marsh, Elizabeth K.; Delury, Craig P.; Davies, Nicholas J.; Weston, Christopher J.; Miah, Mohammed A.L.; Banks, Lawrence; Parish, Joanna L.

    2017-01-01

    The function of a conserved PDS95/DLG1/ZO1 (PDZ) binding motif (E6 PBM) at the C-termini of E6 oncoproteins of high-risk human papillomavirus (HPV) types contributes to the development of HPV-associated malignancies. Here, using a primary human keratinocyte-based model of the high-risk HPV18 life cycle, we identify a novel link between the E6 PBM and mitotic stability. In cultures containing a mutant genome in which the E6 PBM was deleted there was an increase in the frequency of abnormal mitoses, including multinucleation, compared to cells harboring the wild type HPV18 genome. The loss of the E6 PBM was associated with a significant increase in the frequency of mitotic spindle defects associated with anaphase and telophase. Furthermore, cells carrying this mutant genome had increased chromosome segregation defects and they also exhibited greater levels of genomic instability, as shown by an elevated level of centromere-positive micronuclei. In wild type HPV18 genome-containing organotypic cultures, the majority of mitotic cells reside in the suprabasal layers, in keeping with the hyperplastic morphology of the structures. However, in mutant genome-containing structures a greater proportion of mitotic cells were retained in the basal layer, which were often of undefined polarity, thus correlating with their reduced thickness. We conclude that the ability of E6 to target cellular PDZ proteins plays a critical role in maintaining mitotic stability of HPV infected cells, ensuring stable episome persistence and vegetative amplification. PMID:28061478

  1. Comparative analyses of the thermodynamic RNA binding signatures of different types of RNA recognition motifs

    PubMed Central

    Cléry, Antoine; Allain, Frédéric H-T

    2017-01-01

    Abstract RNA recognition motifs (RRMs) are structurally versatile domains important in regulation of alternative splicing. Structural mechanisms of sequence-specific recognition of single-stranded RNAs (ssRNAs) by RRMs are well understood. The thermodynamic strategies are however unclear. Therefore, we utilized microcalorimetry and semi-empirical analyses to comparatively analyze the cognate ssRNA binding thermodynamics of four different RRM domains, each with a different RNA binding mode. The different binding modes are: canonical binding to the β-sheet surface; canonical binding with involvement of N- and C-termini; binding to conserved loops; and binding to an α-helix. Our results identify enthalpy as the sole and general force driving association at physiological temperatures. Also, networks of weak interactions are a general feature regulating stability of the different RRM–ssRNA complexes. In agreement, non-polyelectrolyte effects contributed between ∼75 and 90% of the overall free energy of binding in the considered complexes. The various RNA binding modes also displayed enormous heat capacity differences, that upon dissection revealed large differential changes in hydration, conformations and dynamics upon binding RNA. Altogether, different modes employed by RRMs to bind cognate ssRNAs utilize various thermodynamics strategies during the association process. PMID:28334819

  2. Comparative analyses of the thermodynamic RNA binding signatures of different types of RNA recognition motifs.

    PubMed

    Samatanga, Brighton; Cléry, Antoine; Barraud, Pierre; Allain, Frédéric H-T; Jelesarov, Ilian

    2017-06-02

    RNA recognition motifs (RRMs) are structurally versatile domains important in regulation of alternative splicing. Structural mechanisms of sequence-specific recognition of single-stranded RNAs (ssRNAs) by RRMs are well understood. The thermodynamic strategies are however unclear. Therefore, we utilized microcalorimetry and semi-empirical analyses to comparatively analyze the cognate ssRNA binding thermodynamics of four different RRM domains, each with a different RNA binding mode. The different binding modes are: canonical binding to the β-sheet surface; canonical binding with involvement of N- and C-termini; binding to conserved loops; and binding to an α-helix. Our results identify enthalpy as the sole and general force driving association at physiological temperatures. Also, networks of weak interactions are a general feature regulating stability of the different RRM-ssRNA complexes. In agreement, non-polyelectrolyte effects contributed between ∼75 and 90% of the overall free energy of binding in the considered complexes. The various RNA binding modes also displayed enormous heat capacity differences, that upon dissection revealed large differential changes in hydration, conformations and dynamics upon binding RNA. Altogether, different modes employed by RRMs to bind cognate ssRNAs utilize various thermodynamics strategies during the association process. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  4. Comparative genome analysis of Basidiomycete fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  5. Comparative genomics of biotechnologically important yeasts

    PubMed Central

    Riley, Robert; Haridas, Sajeet; Wolfe, Kenneth H.; Lopes, Mariana R.; Hittinger, Chris Todd; Göker, Markus; Salamov, Asaf A.; Wisecaver, Jennifer H.; Long, Tanya M.; Aerts, Andrea L.; Barry, Kerrie W.; Choi, Cindy; Clum, Alicia; Coughlan, Aisling Y.; Deshpande, Shweta; Douglass, Alexander P.; Hanson, Sara J.; Klenk, Hans-Peter; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lipzen, Anna M.; Meier-Kolthoff, Jan P.; Ohm, Robin A.; Otillar, Robert P.; Pangilinan, Jasmyn L.; Peng, Yi; Rosa, Carlos A.; Scheuner, Carmen; Sibirny, Andriy A.; Slot, Jason C.; Stielow, J. Benjamin; Sun, Hui; Kurtzman, Cletus P.; Blackwell, Meredith; Grigoriev, Igor V.

    2016-01-01

    Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the clade sister to the known CUG-Ser clade. Our well-resolved yeast phylogeny shows that some traits, such as methylotrophy, are restricted to single clades, whereas others, such as l-rhamnose utilization, have patchy phylogenetic distributions. Gene clusters, with variable organization and distribution, encode many pathways of interest. Genomics can predict some biochemical traits precisely, but the genomic basis of others, such as xylose utilization, remains unresolved. Our data also provide insight into early evolution of ascomycetes. We document the loss of H3K9me2/3 heterochromatin, the origin of ascomycete mating-type switching, and panascomycete synteny at the MAT locus. These data and analyses will facilitate the engineering of efficient biosynthetic and degradative pathways and gateways for genomic manipulation. PMID:27535936

  6. COMPARISON OF COMPARATIVE GENOMIC HYBRIDIZATIONS TECHNOLOGIES ACROSS MICROARRAY PLATFORMS

    EPA Science Inventory

    Comparative Genomic Hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase...

  7. COMPARISON OF COMPARATIVE GENOMIC HYBRIDIZATIONS TECHNOLOGIES ACROSS MICROARRAY PLATFORMS

    EPA Science Inventory

    Comparative Genomic Hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase...

  8. Image analysis in comparative genomic hybridization

    SciTech Connect

    Lundsteen, C.; Maahr, J.; Christensen, B.

    1995-01-01

    Comparative genomic hybridization (CGH) is a new technique by which genomic imbalances can be detected by combining in situ suppression hybridization of whole genomic DNA and image analysis. We have developed software for rapid, quantitative CGH image analysis by a modification and extension of the standard software used for routine karyotyping of G-banded metaphase spreads in the Magiscan chromosome analysis system. The DAPI-counterstained metaphase spread is karyotyped interactively. Corrections for image shifts between the DAPI, FITC, and TRITC images are done manually by moving the three images relative to each other. The fluorescence background is subtracted. A mean filter is applied to smooth the FITC and TRITC images before the fluorescence ratio between the individual FITC and TRITC-stained chromosomes is computed pixel by pixel inside the area of the chromosomes determined by the DAPI boundaries. Fluorescence intensity ratio profiles are generated, and peaks and valleys indicating possible gains and losses of test DNA are marked if they exceed ratios below 0.75 and above 1.25. By combining the analysis of several metaphase spreads, consistent findings of gains and losses in all or almost all spreads indicate chromosomal imbalance. Chromosomal imbalances are detected either by visual inspection of fluorescence ratio (FR) profiles or by a statistical approach that compares FR measurements of the individual case with measurements of normal chromosomes. The complete analysis of one metaphase can be carried out in approximately 10 minutes. 8 refs., 7 figs., 1 tab.

  9. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  10. Comparative genome map of human and cattle

    SciTech Connect

    Solinas-Toldo, S.; Fries, R.; Lengauer, C.

    1995-06-10

    Chromosomal homologies between individual human chromosomes and the bovine karyotype have been established by using a new approach termed Zoo-FISH. Labeled DNA libraries from flow-sorted human chromosomes were used as probes for fluorescence in situ hybridization on cattle chromosomes. All human DNA libraries, except the Y chromosome library, hybridized to one or more cattle chromosomes, identifying and delineating 50 segments of homology, most of them corresponding to the regions of homology as identified by the previous mapping of individual conserved loci. However, Zoo-FISH refines the comparative maps constructed by molecular gene mapping of individual loci by providing information on the boundaries of conserved regions in the absence of obvious cytogenetic homologies of human and bovine chromosomes. It allows study of karyotypic evolution and opens new avenues for genomic analysis by facilitating the extrapolation of results from the human genome initiative. 50 refs., 3 figs., 1 tab.

  11. Prevalent Sequences in the Human Genome Can Form Mini i-Motif Structures at Physiological pH.

    PubMed

    Mir, Bartomeu; Serrano, Israel; Buitrago, Diana; Orozco, Modesto; Escaja, Núria; González, Carlos

    2017-10-11

    We report here the solution structure of several repetitive DNA sequences containing d(TCGTTCCGT) and related repeats. At physiological pH, these sequences fold into i-motif like quadruplexes in which every two repeats a globular structure is stabilized by two hemiprotonated C:C(+) base pairs, flanked by two minor groove tetrads resulting from the association of G:C or G:T base pairs. The interaction between the minor groove tetrads and the nearby C:C(+) base pairs affords a strong stabilization, which results in effective pHT values above 7.5. Longer sequences with more than two repeats are able to fold in tandem, forming a rosary bead-like structure. Bioinformatics analysis shows that these sequences are prevalent in the human genome, and are present in development-related genes.

  12. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes

    PubMed Central

    Gupta, Nitin; Benhamida, Jamal; Bhargava, Vipul; Goodman, Daniel; Kain, Elisabeth; Kerman, Ian; Nguyen, Ngan; Ollikainen, Noah; Rodriguez, Jesse; Wang, Jian; Lipton, Mary S.; Romine, Margaret; Bafna, Vineet; Smith, Richard D.; Pevzner, Pavel A.

    2008-01-01

    Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional “single proteome” approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of “one-hit-wonders” in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations. PMID:18426904

  13. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes.

    PubMed

    Gupta, Nitin; Benhamida, Jamal; Bhargava, Vipul; Goodman, Daniel; Kain, Elisabeth; Kerman, Ian; Nguyen, Ngan; Ollikainen, Noah; Rodriguez, Jesse; Wang, Jian; Lipton, Mary S; Romine, Margaret; Bafna, Vineet; Smith, Richard D; Pevzner, Pavel A

    2008-07-01

    Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional "single proteome" approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of "one-hit-wonders" in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations.

  14. Identification of a binding motif specific to HNF4 by comparative analysis of multiple nuclear receptors

    PubMed Central

    Fang, Bin; Mane-Padros, Daniel; Bolotin, Eugene; Jiang, Tao; Sladek, Frances M.

    2012-01-01

    Nuclear receptors (NRs) regulate gene expression by binding specific DNA sequences consisting of AG[G/T]TCA or AGAACA half site motifs in a variety of configurations. However, those motifs/configurations alone do not adequately explain the diversity of NR function in vivo. Here, a systematic examination of DNA binding specificity by protein-binding microarrays (PBMs) of three closely related human NRs—HNF4α, retinoid X receptor alpha (RXRα) and COUPTF2—reveals an HNF4-specific binding motif (H4-SBM), xxxxCAAAGTCCA, as well as a previously unrecognized polarity in the classical DR1 motif (AGGTCAxAGGTCA) for HNF4α, RXRα and COUPTF2 homodimers. ChIP-seq data indicate that the H4-SBM is uniquely bound by HNF4α but not 10 other NRs in vivo, while NRs PXR, FXRα, Rev-Erbα appear to bind adjacent to H4-SBMs. HNF4-specific DNA recognition and transactivation are mediated by residues Asp69 and Arg76 in the DNA-binding domain; this combination of amino acids is unique to HNF4 among all human NRs. Expression profiling and ChIP data predict ∼100 new human HNF4α target genes with an H4-SBM site, including several Co-enzyme A-related genes and genes with links to disease. These results provide important new insights into NR DNA binding. PMID:22383578

  15. A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

    USDA-ARS?s Scientific Manuscript database

    A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...

  16. Comparative genomics approaches to study organism similarities and differences

    SciTech Connect

    Wei, Liping; Liu, Yueyi; Dubchak, Inna; Shon, John; Park, John

    2002-06-01

    Comparative genomics is a large-scale, holistic approach that compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparative studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms. We discuss in detail the type of analyses that offer significant biological insights in the comparisons of (1) genome structure including overall genome statistics, repeats, genome rearrangement at both DNA and gene level, synteny, and breakpoints; (2) coding regions including gene content, protein content, orthologs, and paralogs; and (3) noncoding regions including the prediction of regulatory elements. We also briefly review the currently available computational tools in comparative genomics such as algorithms for genome-scale sequence alignment, gene identification, and nonhomology-based function prediction.

  17. [Sotos syndrome diagnosed by comparative genomic hybridisation].

    PubMed

    Saldarriaga, Wilmar; Molina-Barrera, Laura Camila; Ramírez-Cheyne, Julián

    2016-01-01

    Sotos Syndrome (SS) is a genetic disease with an autosomal dominant pattern caused by haplo-insufficiency of NSD1 gene secondary to point mutations or microdeletion of the 5q35 locus where the gene is located. It is a rare syndrome, occurring in 7 out of every 100,000 births. The objective of this report is to present the case of a 4 year-old patient with a global developmental delay, as well as specific physical findings suggesting a syndrome of genetic origin. Female patient, 4 years of age, thinning hair, triangular facie, long palpebral fissure, arched palate, prominent jaw, winged scapula and clinodactilia of the fifth finger both hands. The molecular test comparative genomic hybridisation test by microarray was subsequently performed, with the result showing 5q35.2 q35.3 region microdeletion of 2,082 MB, including the NSD1 gene. Finally, this article also proposes the performing of comparative genomic hybridisation as the first diagnostic option in cases where clinical findings are suggestive of SS. Copyright © 2015 Sociedad Chilena de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.

  18. Assessment of transfer methods for comparative genomics of regulatory networks in bacteria.

    PubMed

    Kılıç, Sefa; Erill, Ivan

    2016-08-31

    Comparative genomics can leverage the vast amount of available genomic sequences to reconstruct and analyze transcriptional regulatory networks in Bacteria, but the efficacy of this approach hinges on the ability to transfer regulatory network information from reference species to the genomes under analysis. Several methods have been proposed to transfer regulatory information between bacterial species, but the paucity and distributed nature of experimental information on bacterial transcriptional networks have prevented their systematic evaluation. We report the compilation of a large catalog of transcription factor-binding sites across Bacteria and its use to systematically benchmark proposed transfer methods across pairs of bacterial species. We evaluate motif- and accuracy-based metrics to assess the results of regulatory network transfer and we identify the precision-recall area-under-the-curve as the best metric for this purpose due to the large class-imbalanced nature of the problem. Methods assuming conservation of the transcription factor-binding motif (motif-based) are shown to substantially outperform those assuming conservation of regulon composition (network-based), even though their efficiency can decrease sharply with increasing phylogenetic distance. Variations of the basic motif-based transfer method do not yield significant improvements in transfer accuracy. Our results indicate that detection of a large enough number of regulated orthologs is critical for network-based transfer methods, but that relaxing orthology requirements does not improve results. Using the transcriptional regulators LexA and Fur as case examples, we also show how DNA-binding domain sequence similarity can yield confounding results as an indicator of transfer efficiency for motif-based methods. Counter to standard practice, our evaluation of metrics to assess the efficiency of methods for regulatory network information transfer reveals that the area under precision

  19. Comparing graph representations of protein structure for mining family-specific residue-based packing motifs.

    PubMed

    Huan, Jun; Bandyopadhyay, Deepak; Wang, Wei; Snoeyink, Jack; Prins, Jan; Tropsha, Alexander

    2005-01-01

    We find recurring amino-acid residue packing patterns, or spatial motifs, that are characteristic of protein structural families, by applying a novel frequent subgraph mining algorithm to graph representations of protein three-dimensional structure. Graph nodes represent amino acids, and edges are chosen in one of three ways: first, using a threshold for contact distance between residues; second, using Delaunay tessellation; and third, using the recently developed almost-Delaunay edges. For a set of graphs representing a protein family from the Structural Classification of Proteins (SCOP) database, subgraph mining typically identifies several hundred common subgraphs corresponding to spatial motifs that are frequently found in proteins in the family but rarely found outside of it. We find that some of the large motifs map onto known functional regions in two protein families explored in this study, i.e., serine proteases and kinases. We find that graphs based on almost-Delaunay edges significantly reduce the number of edges in the graph representation and hence present computational advantage, yet the patterns extracted from such graphs have a biological interpretation approximately equivalent to that of those extracted from distance based graphs.

  20. Comparative genomics of drug resistance in Trypanosoma brucei rhodesiense.

    PubMed

    Graf, Fabrice E; Ludin, Philipp; Arquint, Christian; Schmidt, Remo S; Schaub, Nadia; Kunz Renggli, Christina; Munday, Jane C; Krezdorn, Jessica; Baker, Nicola; Horn, David; Balmer, Oliver; Caccone, Adalgisa; de Koning, Harry P; Mäser, Pascal

    2016-09-01

    Trypanosoma brucei rhodesiense is one of the causative agents of human sleeping sickness, a fatal disease that is transmitted by tsetse flies and restricted to Sub-Saharan Africa. Here we investigate two independent lines of T. b. rhodesiense that have been selected with the drugs melarsoprol and pentamidine over the course of 2 years, until they exhibited stable cross-resistance to an unprecedented degree. We apply comparative genomics and transcriptomics to identify the underlying mutations. Only few mutations have become fixed during selection. Three genes were affected by mutations in both lines: the aminopurine transporter AT1, the aquaporin AQP2, and the RNA-binding protein UBP1. The melarsoprol-selected line carried a large deletion including the adenosine transporter gene AT1, whereas the pentamidine-selected line carried a heterozygous point mutation in AT1, G430R, which rendered the transporter non-functional. Both resistant lines had lost AQP2, and both lines carried the same point mutation, R131L, in the RNA-binding motif of UBP1. The finding that concomitant deletion of the known resistance genes AT1 and AQP2 in T. b. brucei failed to phenocopy the high levels of resistance of the T. b. rhodesiense mutants indicated a possible role of UBP1 in melarsoprol-pentamidine cross-resistance. However, homozygous in situ expression of UBP1-Leu(131) in T. b. brucei did not affect the sensitivity to melarsoprol or pentamidine.

  1. Characterisation of an unusual telomere motif (TTTTTTAGGG)n in the plant Cestrum elegans (Solanaceae), a species with a large genome.

    PubMed

    Peška, Vratislav; Fajkus, Petr; Fojtová, Miloslava; Dvořáčková, Martina; Hapala, Jan; Dvořáček, Vojtěch; Polanská, Pavla; Leitch, Andrew R; Sýkorová, Eva; Fajkus, Jiří

    2015-05-01

    The characterization of unusual telomere sequence sheds light on patterns of telomere evolution, maintenance and function. Plant species from the closely related genera Cestrum, Vestia and Sessea (family Solanaceae) lack known plant telomeric sequences. Here we characterize the telomere of Cestrum elegans, work that was a challenge because of its large genome size and few chromosomes (1C 9.76 pg; n = 8). We developed an approach that combines BAL31 digestion, which digests DNA from the ends and chromosome breaks, with next-generation sequencing (NGS), to generate data analysed in RepeatExplorer, designed for de novo repeats identification and quantification. We identify an unique repeat motif (TTTTTTAGGG)n in C. elegans, occurring in ca. 30 400 copies per haploid genome, averaging ca. 1900 copies per telomere, and synthesized by telomerase. We demonstrate that the motif is synthesized by telomerase. The occurrence of an unusual eukaryote (TTTTTTAGGG)n telomeric motif in C. elegans represents a switch in motif from the 'typical' angiosperm telomere (TTTAGGG)n . That switch may have happened with the divergence of Cestrum, Sessea and Vestia. The shift in motif when it arose would have had profound effects on telomere activity. Thus our finding provides a unique handle to study how telomerase and telomeres responded to genetic change, studies that will shed more light on telomere function.

  2. Comparative genomic analysis of prion genes

    PubMed Central

    Premzl, Marko; Gamulin, Vera

    2007-01-01

    Background The homologues of human disease genes are expected to contribute to better understanding of physiological and pathogenic processes. We made use of the present availability of vertebrate genomic sequences, and we have conducted the most comprehensive comparative genomic analysis of the prion protein gene PRNP and its homologues, shadow of prion protein gene SPRN and doppel gene PRND, and prion testis-specific gene PRNT so far. Results While the SPRN and PRNP homologues are present in all vertebrates, PRND is known in tetrapods, and PRNT is present in primates. PRNT could be viewed as a TE-associated gene. Using human as the base sequence for genomic sequence comparisons (VISTA), we annotated numerous potential cis-elements. The conserved regions in SPRNs harbour the potential Sp1 sites in promoters (mammals, birds), C-rich intron splicing enhancers and PTB intron splicing silencers in introns (mammals, birds), and hsa-miR-34a sites in 3'-UTRs (eutherians). We showed the conserved PRNP upstream regions, which may be potential enhancers or silencers (primates, dog). In the PRNP 3'-UTRs, there are conserved cytoplasmic polyadenylation element sites (mammals, birds). The PRND core promoters include highly conserved CCAAT, CArG and TATA boxes (mammals). We deduced 42 new protein primary structures, and performed the first phylogenetic analysis of all vertebrate prion genes. Using the protein alignment which included 122 sequences, we constructed the neighbour-joining tree which showed four major clusters, including shadoos, shadoo2s and prion protein-likes (cluster 1), fish prion proteins (cluster 2), tetrapode prion proteins (cluster 3) and doppels (cluster 4). We showed that the entire prion protein conformationally plastic region is well conserved between eutherian prion proteins and shadoos (18–25% identity and 28–34% similarity), and there could be a potential structural compatibility between shadoos and the left-handed parallel beta-helical fold

  3. ERCC1: a comparative genomic perspective.

    PubMed

    Wilson, M D; Ruttan, C C; Koop, B F; Glickman, B W

    2001-01-01

    ERCC1 plays an essential role in the nucleotide excision repair (NER) of DNA. We compare 37 kb of sequence from the ERCC1 region on human chromosome 19q13.3 to the orthologous region on mouse chromosome 7. In addition to showing the conserved gene structure between ERCC1, ASE-1, and their murine counterparts, this genomic comparison reveals a highly conserved 497 bp segment found 5 kb upstream of ERCC1 exon 1 that contains a CpG island and previously unidentified "classical" promoter elements. Additional putative regulatory elements are also found within a conserved LINE-1 (long interspersed nuclear element) sequence 800 bp upstream of exon 1 in both human and mouse. Expressed sequence tag (EST) assemblies for human ERCC1 identified numerous splice variants involving exons 1, 2, 3, 7, 8, and 9 that could affect DNA repair efficiencies of ERCC1. A previously undescribed transcript that reads through exon 9 and utilizes the polyadenylation signal of a neighboring Alu element accounts for nearly half of the total splice variants identified in the human EST database. This transcript would theoretically translate to a larger ERCC1 protein product containing a novel C-terminal end. Overall, approximately 18% of publicly available ERCC1 cDNA sequences were determined to be splice variants, while no variants were found in the mouse. The ability to assess novel transcripts and identify candidate regulatory regions demonstrates the potential utility for a catalogue archiving comparative analyses for all genes involved in DNA repair. Our comparative genomic analysis of ERCC1 can be viewed at http://web.uvic.ca/-bioweb/laj.html. Copyright 2001 Wiley-Liss, Inc.

  4. Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus

    PubMed Central

    Raubeson, Linda A; Peery, Rhiannon; Chumley, Timothy W; Dziubek, Chris; Fourcade, H Matthew; Boore, Jeffrey L; Jansen, Robert K

    2007-01-01

    Background The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (a basal eudicot). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition. Results The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides. Conclusion SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A+T richness", an A+T bias is not

  5. Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

    PubMed

    Raubeson, Linda A; Peery, Rhiannon; Chumley, Timothy W; Dziubek, Chris; Fourcade, H Matthew; Boore, Jeffrey L; Jansen, Robert K

    2007-06-15

    The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (a basal eudicot). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition. The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides. SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A+T richness", an A+T bias is not apparent upon more in

  6. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    SciTech Connect

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  7. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    PubMed Central

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  8. A universal genomic coordinate translator for comparative genomics.

    PubMed

    Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

    2014-06-30

    Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across

  9. An evaluation of Comparative Genome Sequencing (CGS) by comparing two previously-sequenced bacterial genomes

    PubMed Central

    Herring, Christopher D; Palsson, Bernhard Ø

    2007-01-01

    Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions. PMID:17697331

  10. Comparative genomics reveals insights into avian genome evolution and adaptation.

    PubMed

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

    2014-12-12

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.

  11. Comparative genomics reveals insights into avian genome evolution and adaptation

    PubMed Central

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

    2015-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

  12. The bonobo genome compared with the chimpanzee and human genomes

    PubMed Central

    Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R.; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R.; Mullikin, James C.; Meader, Stephen J.; Ponting, Chris P.; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E.; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M.; Fischer, Anne; Ptak, Susan E.; Lachmann, Michael; Symer, David E.; Mailund, Thomas; Schierup, Mikkel H.; Andrés, Aida M.; Kelso, Janet; Pääbo, Svante

    2012-01-01

    Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1–4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other. PMID:22722832

  13. The bonobo genome compared with the chimpanzee and human genomes.

    PubMed

    Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R; Mullikin, James C; Meader, Stephen J; Ponting, Chris P; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M; Fischer, Anne; Ptak, Susan E; Lachmann, Michael; Symer, David E; Mailund, Thomas; Schierup, Mikkel H; Andrés, Aida M; Kelso, Janet; Pääbo, Svante

    2012-06-28

    Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other.

  14. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline

    PubMed Central

    Weinberg, Zasha; Barrick, Jeffrey E.; Yao, Zizhen; Roth, Adam; Kim, Jane N.; Gore, Jeremy; Wang, Joy Xin; Lee, Elaine R.; Block, Kirsten F.; Sudarsan, Narasimhan; Neph, Shane; Tompa, Martin; Ruzzo, Walter L.

    2007-01-01

    We applied a computational pipeline based on comparative genomics to bacteria, and identified 22 novel candidate RNA motifs. We predicted six to be riboswitches, which are mRNA elements that regulate gene expression on binding a specific metabolite. In separate studies, we confirmed that two of these are novel riboswitches. Three other riboswitch candidates are upstream of either a putative transporter gene in the order Lactobacillales, citric acid cycle genes in Burkholderiales or molybdenum cofactor biosynthesis genes in several phyla. The remaining riboswitch candidate, the widespread Genes for the Environment, for Membranes and for Motility (GEMM) motif, is associated with genes important for natural competence in Vibrio cholerae and the use of metal ions as electron acceptors in Geobacter sulfurreducens. Among the other motifs, one has a genetic distribution similar to a previously published candidate riboswitch, ykkC/yxkD, but has a different structure. We identified possible non-coding RNAs in five phyla, and several additional cis-regulatory RNAs, including one in ε-proteobacteria (upstream of purD, involved in purine biosynthesis), and one in Cyanobacteria (within an ATP synthase operon). These candidate RNAs add to the growing list of RNA motifs involved in multiple cellular processes, and suggest that many additional RNAs remain to be discovered. PMID:17621584

  15. Genome-wide mining and comparative analysis of microsatellites in three macaque species.

    PubMed

    Liu, Sanxu; Hou, Wei; Sun, Tianlin; Xu, Yongtao; Li, Peng; Yue, Bisong; Fan, Zhenxin; Li, Jing

    2017-06-01

    Microsatellites are found in taxonomically different organisms, and such repeats are related with genomic structure, function and certain diseases. To characterize microsatellites for macaques, we searched and compared SSRs with 1-6 bp nucleotide motifs in rhesus, cynomolgus and pigtailed macaque. A total of 1395671, 1284929 and 1266348 perfect SSRs were mined, respectively. The most frequent perfect SSRs were mononucleotide SSRs. The most GC-content was in dinucleotide SSRs and the least was in the mononucleotide SSRs. Chromosome size was positively correlated with SSR number and negatively correlated with the relative frequency and density of SSRs. The GC content of chromosome SSRs were negatively correlated with relative frequency of SSRs and GC content of chromosome sequences. The features of microsatellite distribution in assembled genomes of the three species were greatly similar, which revealed that the distributional pattern of microsatellites is probably conservative in genus Macaca. The degenerated number of repeat motifs was found to be different in pentanucleotide and hexanucleotide repeats. Species-specific motifs for each macaque were significantly underrepresented. Overall, SSR frequencies of each chromosome in rhesus macaque were higher than in cynomolgus macaque. The maximum repeat times of mono- to pentanucleotide repeats in cynomolgus macaque was more than other two macaques. These results emphasize the genetic diversity and phylogenetic relationship of genus Macaca species. Our data will be beneficial for comparative genome mapping, understanding the distribution of SSRs and genome structure between these animal models, and provide a foundation for further development and identification of more macaque-specific SSRs.

  16. Update on Comparative Genomics of Legumes

    USDA-ARS?s Scientific Manuscript database

    This year marks the essential completion of the genome sequences of Glycine max, Medicago truncatula, and Lotus japonicus (soybean, barrel medic, and birdsfoot trefoil, respectively). The impact of these assembled, annotated genomes will be enormous. L. japonicus and M. truncatula, both forage crop...

  17. Comparative thermodynamic analysis of zinc binding to the His/Cys motif in virion infectivity factor.

    PubMed

    Ghimire-Rijal, Sudipa; Maynard, Ernest L

    2014-05-05

    HIV-1 virion infectivity factor (Vif) is an accessory protein that induces the proteasomal degradation of the host restriction factor, apolipoprotein B mRNA-editing enzyme catalytic polypeptide-like 3G (APOBEC3G). Degradation of APOBEC3G requires the interaction of Vif with Cul5, the scaffold for an E3 ubiquitin ligase. A highly conserved region in HIV-1 Vif termed the HCCH motif binds zinc and is critical for recruitment of Cul5 and degradation of APOBEC3G. To gain thermodynamic and mechanistic insight into zinc binding to diverse Vif proteins, we have employed a combination of isothermal titration calorimetry, analytical ultracentrifugation, and Cul5 pull down assays. The proton linkage of zinc binding to HIV-1 Vif was analyzed under different buffer conditions and consistent with the release of two Cys-thiol protons upon zinc binding, supporting earlier EXAFS studies. Zinc binding to Vif proteins from HIV-1, SIVAgm, HIV-2, and SIVMac followed a trend in which the enthalpy of zinc binding became less favorable and the entropy of zinc binding became more favorable. Using AUC, we determined that zinc induced oligomerization of Vif proteins from HIV-1 and SIVAgm but had little or no effect on the oligomeric properties of Vif proteins from HIV-2 and SIVMac. The zinc dependence of Cul5 recruitment by Vif was investigated. All Vif proteins except HIV-2 Vif required zinc to stabilize the interaction with Cul5. The trends in enthalpy-entropy compensation, zinc-induced oligomerization, and Cul5 recruitment are discussed in terms of the apo conformation of the HCCH motif and the role of zinc in stabilizing the structure of Vif.

  18. Genome-Wide Identification of Mitogen-Activated Protein Kinase Gene Family across Fungal Lineage Shows Presence of Novel and Diverse Activation Loop Motifs

    PubMed Central

    Mohanta, Tapan Kumar; Mohanta, Nibedita; Parida, Pratap; Panda, Sujogya Kumar; Ponpandian, Lakshmi Narayanan; Bae, Hanhong

    2016-01-01

    The mitogen-activated protein kinase (MAPK) is characterized by the presence of the T-E-Y, T-D-Y, and T-G-Y motifs in its activation loop region and plays a significant role in regulating diverse cellular responses in eukaryotic organisms. Availability of large-scale genome data in the fungal kingdom encouraged us to identify and analyse the fungal MAPK gene family consisting of 173 fungal species. The analysis of the MAPK gene family resulted in the discovery of several novel activation loop motifs (T-T-Y, T-I-Y, T-N-Y, T-H-Y, T-S-Y, K-G-Y, T-Q-Y, S-E-Y and S-D-Y) in fungal MAPKs. The phylogenetic analysis suggests that fungal MAPKs are non-polymorphic, had evolved from their common ancestors around 1500 million years ago, and are distantly related to plant MAPKs. We are the first to report the presence of nine novel activation loop motifs in fungal MAPKs. The specificity of the activation loop motif plays a significant role in controlling different growth and stress related pathways in fungi. Hence, the presences of these nine novel activation loop motifs in fungi are of special interest. PMID:26918378

  19. Comparative genomic hybridization in clinical cytogenetics

    SciTech Connect

    Bryndorf, T.; Kirchhoff, M.; Rose, H.

    1995-11-01

    We report the results of applying comparative genomic hybridization (CGH) in a cytogenetic service laboratory for (1) determination of the origin of extra and missing chromosomal material in intricate cases of unbalanced aberrations and (2) detection of common prenatal numerical chromosome aberrations. A total of 11 fetal samples were analyzed. Seven cases of complex unbalanced aberrations that could not be identified reliably by conventional cytogenetics were successfully resolved by CGH analysis. CGH results were validated by using FISH with chromosome-specific probes. Four cases representing common prenatal numerical aberrations (trisomy 21, 18, and 13 and monosomy X) were also successfully diagnosed by CGH. We conclude that CGH is a powerful adjunct to traditional cytogenetic techniques that makes it possible to solve clinical cases of intricate unbalanced aberrations in a single hybridization. CGH may also be a useful adjunct to screen for euchromatic involvement in marker chromosomes. Further technical development may render CGH applicable for routine aberration screening. 16 refs., 4 figs., 2 tabs.

  20. Comparative analysis of whole genome sequencing-based telomere length measurement techniques.

    PubMed

    Lee, Michael; Napier, Christine E; Yang, Sile F; Arthur, Jonathan W; Reddel, Roger R; Pickett, Hilda A

    2017-02-01

    Telomeres are regions of repetitive DNA at the ends of human chromosomes that function to maintain the integrity of the genome. Telomere attrition is associated with cellular ageing, whilst telomere maintenance is a prerequisite for malignant transformation. Whole genome sequencing (WGS) captures sequence information from the entire genome, including the telomeres, and is increasingly being applied in research and in the clinic. Several bioinformatics tools have been designed to determine telomere content and length from WGS data, and include Motif_counter, TelSeq, Computel, qMotif, and Telomerecat. These tools utilise different approaches to identify, quantify and normalise telomeric reads; however, it is not known how they compare to one another. Here we describe the details and utility of each tool, and directly compare WGS telomere length output with laboratory-based telomere length measurements. In addition, we evaluate the accessibility, practicality, speed, and additional features of each tool. Each tool was tested using a range of telomere read extraction criteria, to determine the optimal parameters for the specific WGS read length. The aim of this article is to improve the accessibility of WGS telomere length measurement tools, which have the potential to be applied to WGS cohorts for clinical as well as research benefit. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Comparative Analysis of Genome and Epigenome in Closely Related Medaka Species Identifies Conserved Sequence Preferences for DNA Hypomethylated Domains.

    PubMed

    Uno, Ayako; Nakamura, Ryohei; Tsukahara, Tatsuya; Qu, Wei; Sugano, Sumio; Suzuki, Yutaka; Morishita, Shinichi; Takeda, Hiroyuki

    2016-08-01

    The genomes of vertebrates are globally methylated, but a small portion of genomic regions are known to be hypomethylated. Although hypomethylated domains (HMDs) have been implicated in transcriptional regulation in various ways, how a HMD is determined in a particular genomic region remains elusive. To search for DNA motifs essential for the formation of HMDs, we performed the genome-wide comparative analysis of genome and DNA methylation patterns of the two medaka inbred lines, Hd-rRII1 and HNI-II, which are derived from northern and southern subpopulations of Japan and exhibit high levels of genetic variations (SNP, ∼ 3%). We successfully mapped > 70% of HMDs in both genomes and found that the majority of those mapped HMDs are conserved between the two lines (common HMDs). Unexpectedly, the average genetic variations are similar in the common HMD and other genome regions. However, we identified short well-conserved motifs that are specifically enriched in HMDs, suggesting that they may play roles in the establishment of HMDs in the medaka genome.

  2. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis.

    PubMed

    Zhu, Huayu; Song, Pengyao; Koo, Dal-Hoe; Guo, Luqin; Li, Yanman; Sun, Shouru; Weng, Yiqun; Yang, Luming

    2016-08-05

    Microsatellite markers are one of the most informative and versatile DNA-based markers used in plant genetic research, but their development has traditionally been difficult and costly. The whole genome sequencing with next-generation sequencing (NGS) technologies provides large amounts of sequence data to develop numerous microsatellite markers at whole genome scale. SSR markers have great advantage in cross-species comparisons and allow investigation of karyotype and genome evolution through highly efficient computation approaches such as in silico PCR. Here we described genome wide development and characterization of SSR markers in the watermelon (Citrullus lanatus) genome, which were then use in comparative analysis with two other important crop species in the Cucurbitaceae family: cucumber (Cucumis sativus L.) and melon (Cucumis melo L.). We further applied these markers in evaluating the genetic diversity and population structure in watermelon germplasm collections. A total of 39,523 microsatellite loci were identified from the watermelon draft genome with an overall density of 111 SSRs/Mbp, and 32,869 SSR primers were designed with suitable flanking sequences. The dinucleotide SSRs were the most common type representing 34.09 % of the total SSR loci and the AT-rich motifs were the most abundant in all nucleotide repeat types. In silico PCR analysis identified 832 and 925 SSR markers with each having a single amplicon in the cucumber and melon draft genome, respectively. Comparative analysis with these cross-species SSR markers revealed complicated mosaic patterns of syntenic blocks among the genomes of three species. In addition, genetic diversity analysis of 134 watermelon accessions with 32 highly informative SSR loci placed these lines into two groups with all accessions of C.lanatus var. citorides and three accessions of C. colocynthis clustered in one group and all accessions of C. lanatus var. lanatus and the remaining accessions of C. colocynthis

  3. Comparative Genomics of the Campylobacter lari Group

    PubMed Central

    Miller, William G.; Yee, Emma; Chapman, Mary H.; Smith, Timothy P.L.; Bono, James L.; Huynh, Steven; Parker, Craig T.; Vandamme, Peter; Luong, Khai; Korlach, Jonas

    2014-01-01

    The Campylobacter lari group is a phylogenetic clade within the epsilon subdivision of the Proteobacteria and is part of the thermotolerant Campylobacter spp., a division within the genus that includes the human pathogen Campylobacter jejuni. The C. lari group is currently composed of five species (C. lari, Campylobacter insulaenigrae, Campylobacter volucris, Campylobacter subantarcticus, and Campylobacter peloridis), as well as a group of strains termed the urease-positive thermophilic Campylobacter (UPTC) and other C. lari-like strains. Here we present the complete genome sequences of 11 C. lari group strains, including the five C. lari group species, four UPTC strains, and a lari-like strain isolated in this study. The genome of C. lari subsp. lari strain RM2100 was described previously. Analysis of the C. lari group genomes indicates that this group is highly related at the genome level. Furthermore, these genomes are strongly syntenic with minor rearrangements occurring only in 4 of the 12 genomes studied. The C. lari group can be bifurcated, based on the flagella and flagellar modification genes. Genomic analysis of the UPTC strains indicated that these organisms are variable but highly similar, closely related to but distinct from C. lari. Additionally, the C. lari group contains multiple genes encoding hemagglutination domain proteins, which are either contingency genes or linked to conserved contingency genes. Many of the features identified in strain RM2100, such as major deficiencies in amino acid biosynthesis and energy metabolism, are conserved across all 12 genomes, suggesting that these common features may play a role in the association of the C. lari group with coastal environments and watersheds. PMID:25381664

  4. Comparative Genomics of Large Mitochondria in Placozoans

    PubMed Central

    Signorovitch, Ana Y; Buss, Leo W; Dellaporta, Stephen L

    2007-01-01

    The first sequenced mitochondrial genome of a placozoan, Trichoplax adhaerens, challenged the conventional wisdom that a compact mitochondrial genome is a common feature among all animals. Three additional placozoan mitochondrial genomes representing highly divergent clades have been sequenced to determine whether the large Trichoplax mtDNA is a shared feature among members of the phylum Placozoa or a uniquely derived condition. All three mitochondrial genomes were found to be very large, 32- to 37-kb, circular molecules, having the typical 12 respiratory chain genes, 24 tRNAs, rnS, and rnL. They share with the Trichoplax mitochondrial genome the absence of atp8, atp9, and all ribosomal protein genes, the presence of several cox1 introns, and a large open reading frame containing an intron group I LAGLIDADG endonuclease domain. The differences in mtDNA size within Placozoa are due to variation in intergenic spacer regions and the presence or absence of long open reading frames of unknown function. Phylogenetic analyses of the 12 respiratory chain genes support the monophyly of Placozoa. The similarities in composition and structure between the three mitochondrial genomes reported here and that of Trichoplax's mtDNA suggest that their uncompacted state is a shared ancestral feature to other nonmetazoans while their gene content is a derived feature shared only among the Metazoa. PMID:17222063

  5. Comparative genomic hybridization: Detection of segmental aneusomies

    SciTech Connect

    Cronin, J.E.; Magrane, G.G.; Gray, J.W.

    1994-09-01

    Comparative genomic hybridization (CGH) has been used successfully to detect whole chromosome and segmental aneusomies. However, its sensitivity for detection of segmental aneusomies is still not well known. We present here an analysis of CGH sensitivity with emphasis on detection of abnormalities commonly found during pre-and neo-natal diagnosis. CGH is performed by hybridizing green and red fluorescing test and normal DNA samples, respectively, to normal metaphase spreads and measuring green:red fluorescence ratios along all chromosomes. The ratios are normalized such that 2 copies of a normal chromosome region in the test sample gives a ratio of 1.0. Alterations in test vs. control gene copy number range from 1.5 [trisomy] to 0.5 [monosomy]. Clinical samples analyzed included Wolf Hirschhorn (4p-), Cri du Chat (5p-) and DiGeorge (22q-). In addition, 7 cell lines with chromosome 21 segmental aneusomies were analyzed. These included 3 with terminal duplications, 1 with a terminal deletion, 1 with an interstitial deletion and 2 with interstitial amplifications. The DiGeorge deletion was the only deletion not deleted by CGH. This is not surprising as standard G banding does not routinely detect this 1-2 megabase deletion. The 4p- and 5p- monosomies were detected and breakpoints correctly assigned prospectively. Proximal alterations involving 21q22.11 are unambiguously defined. Specifically, two interstitial aneusomies involving this region are detected. Studies involving late prophase chromosome normal spreads gave identical breakpoints. Thus, analysis of extended chromosomes did not improve the sensitivity of the technique. Taken together, these data suggest that CGH can detect segmental aneusomies greater than 8 megabases in extent. Smaller aneusomies can, at times, be detected. Work is now underway to modify the analysis software to increase sensitivity and to decrease the amount of material needed for analysis.

  6. The Genome of Nosema sp. Isolate YNPr: A Comparative Analysis of Genome Evolution within the Nosema/Vairimorpha Clade.

    PubMed

    Xu, Jinshan; He, Qiang; Ma, Zhenggang; Li, Tian; Zhang, Xiaoyan; Debrunner-Vossbrinck, Bettina A; Zhou, Zeyang; Vossbrinck, Charles R

    2016-01-01

    The microsporidian parasite designated here as Nosema sp. Isolate YNPr was isolated from the cabbage butterfly Pieris rapae collected in Honghe Prefecture, Yunnan Province, China. The genome was sequenced by Illumina sequencing and compared to those of two related members of the Nosema/Vairimorpha clade, Nosema ceranae and Nosema apis. Based upon assembly statistics, the Nosema sp. YNPr genome is 3.36 x 106bp with a G+C content of 23.18% and 2,075 protein coding sequences. An "ACCCTT" motif is present approximately 50-bp upstream of the start codon, as reported from other members of the clade and from Encephalitozoon cuniculi, a sister taxon. Comparative small subunit ribosomal DNA (SSU rDNA) analysis as well as genome-wide phylogenetic analysis confirms a closer relationship between N. ceranae and Nosema sp. YNPr than between the two honeybee parasites N. ceranae and N. apis. The more closely related N. ceranae and Nosema sp. YNPr show similarities in a number of structural characteristics such as gene synteny, gene length, gene number, transposon composition and gene reduction. Based on transposable element content of the assemblies, the transposon content of Nosema sp. YNPr is 4.8%, that of N. ceranae is 3.7%, and that of N. apis is 2.5%, with large differences in the types of transposons present among these 3 species. Gene function annotation indicates that the number of genes participating in most metabolic activities is similar in all three species. However, the number of genes in the transcription, general function, and cysteine protease categories is greater in N. apis than in the other two species. Our studies further characterize the evolution of the Nosema/Vairimorpha clade of microsporidia. These organisms maintain variable but very reduced genomes. We are interested in understanding the effects of genetic drift versus natural selection on genome size in the microsporidia and in developing a testable hypothesis for further studies on the genomic ecology

  7. Comparative analysis and visualization of multiple collinear genomes

    PubMed Central

    2012-01-01

    Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

  8. Comparative genomics and genome biology of invasive Campylobacter jejuni.

    PubMed

    Skarp, C P A; Akinrinade, O; Nilsson, A J E; Ellström, P; Myllykangas, S; Rautelin, H

    2015-11-25

    Campylobacter jejuni is a major pathogen in bacterial gastroenteritis worldwide and can cause bacteremia in severe cases. C. jejuni is highly structured into clonal lineages of which the ST677CC lineage has been overrepresented among C. jejuni isolates derived from blood. In this study, we characterized the genomes of 31 C. jejuni blood isolates and 24 faecal isolates belonging to ST677CC in order to study the genome biology related to C. jejuni invasiveness. We combined the genome analyses with phenotypical evidence on serum resistance which was associated with phase variation of wcbK; a GDP-mannose 4,6-dehydratase involved in capsular biosynthesis. We also describe the finding of a Type III restriction-modification system unique to the ST-794 sublineage. However, features previously considered to be related to pathogenesis of C. jejuni were either absent or disrupted among our strains. Our results refine the role of capsule features associated with invasive disease and accentuate the possibility of methylation and restriction enzymes in the potential of C. jejuni to establish invasive infections. Our findings underline the importance of studying clinically relevant well-characterized bacterial strains in order to understand pathogenesis mechanisms important in human infections.

  9. Comparative Genomics of Cluster O Mycobacteriophages

    PubMed Central

    Cresawn, Steven G.; Pope, Welkin H.; Jacobs-Sera, Deborah; Bowman, Charles A.; Russell, Daniel A.; Dedrick, Rebekah M.; Adair, Tamarah; Anders, Kirk R.; Ball, Sarah; Bollivar, David; Breitenberger, Caroline; Burnett, Sandra H.; Butela, Kristen; Byrnes, Deanna; Carzo, Sarah; Cornely, Kathleen A.; Cross, Trevor; Daniels, Richard L.; Dunbar, David; Findley, Ann M.; Gissendanner, Chris R.; Golebiewska, Urszula P.; Hartzog, Grant A.; Hatherill, J. Robert; Hughes, Lee E.; Jalloh, Chernoh S.; De Los Santos, Carla; Ekanem, Kevin; Khambule, Sphindile L.; King, Rodney A.; King-Smith, Christina; Klyczek, Karen; Krukonis, Greg P.; Laing, Christian; Lapin, Jonathan S.; Lopez, A. Javier; Mkhwanazi, Sipho M.; Molloy, Sally D.; Moran, Deborah; Munsamy, Vanisha; Pacey, Eddie; Plymale, Ruth; Poxleitner, Marianne; Reyna, Nathan; Schildbach, Joel F.; Stukey, Joseph; Taylor, Sarah E.; Ware, Vassie C.; Wellmann, Amanda L.; Westholm, Daniel; Wodarski, Donna; Zajko, Michelle; Zikalala, Thabiso S.; Hendrix, Roger W.; Hatfull, Graham F.

    2015-01-01

    Mycobacteriophages – viruses of mycobacterial hosts – are genetically diverse but morphologically are all classified in the Caudovirales with double-stranded DNA and tails. We describe here a group of five closely related mycobacteriophages – Corndog, Catdawg, Dylan, Firecracker, and YungJamal – designated as Cluster O with long flexible tails but with unusual prolate capsids. Proteomic analysis of phage Corndog particles, Catdawg particles, and Corndog-infected cells confirms expression of half of the predicted gene products and indicates a non-canonical mechanism for translation of the Corndog tape measure protein. Bioinformatic analysis identifies 8–9 strongly predicted SigA promoters and all five Cluster O genomes contain more than 30 copies of a 17 bp repeat sequence with dyad symmetry located throughout the genomes. Comparison of the Cluster O phages provides insights into phage genome evolution including the processes of gene flux by horizontal genetic exchange. PMID:25742016

  10. Comparative genomics of Mortierella elongata and its bacterial endosymbiont Mycoavidus cysteinexigens: Comparative genomics of Mortierella elongata

    DOE PAGES

    Uehling, J.; Gryganskyi, A.; Hameed, K.; ...

    2017-01-01

    Endosymbiosis of bacteria by eukaryotes is a defining feature of cellular evolution. In addition to well-known bacterial origins for mitochondria and chloroplasts, multiple origins of bacterial endosymbiosis are known within the cells of diverse animals, plants and fungi. Early-diverging lineages of terrestrial fungi harbor endosymbiotic bacteria belonging to the Burkholderiaceae. Furthermore, we sequenced the metagenome of the soil-inhabiting fungus Mortierella elongata and assembled the complete circular chromosome of its endosymbiont, Mycoavidus cysteinexigens, which we place within a lineage of endofungal symbionts that are sister clade to Burkholderia. The genome of M. elongata strain AG77 features a core set of primarymore » metabolic pathways for degradation of simple carbohydrates and lipid biosynthesis, while the M. cysteinexigens (AG77) genome is reduced in size and function. Experiments using antibiotics to cure the endobacterium from the host demonstrate that the fungal host metabolism is highly modulated by presence/ absence of M. cysteinexigens. In independent comparative phylogenomic analyses of fungal and bacterial genomes we find that they are consistent with an ancient origin for M. elongata M. cysteinexigens symbiosis, most likely over 350 million years ago and concomitant with the terrestrialization of Earth and diversification of land fungi and plants.« less

  11. The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects.

    PubMed

    Negrisolo, Enrico; Babbucci, Massimiliano; Patarnello, Tomaso

    2011-05-10

    The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. The new genome

  12. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  13. Initial sequencing and comparative analysis of the mouse genome.

    PubMed

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  14. Genome-wide distribution comparative and composition analysis of the SSRs in Poaceae.

    PubMed

    Wang, Yi; Yang, Chao; Jin, Qiaojun; Zhou, Dongjie; Wang, Shuangshuang; Yu, Yuanjie; Yang, Long

    2015-02-15

    The Poaceae family is of great importance to human beings since it comprises the cereal grasses which are the main sources for human food and animal feed. With the rapid growth of genomic data from Poaceae members, comparative genomics becomes a convinent method to study genetics of diffierent species. The SSRs (Simple Sequence Repeats) are widely used markers in the studies of Poaceae for their high abundance and stability. In this study, using the genomic sequences of 9 Poaceae species, we detected 11,993,943 SSR loci and developed 6,799,910 SSR primer pairs. The results show that SSRs are distributed on all the genomic elements in grass. Hexamer is the most frequent motif and AT/TA is the most frequent motif in dimer. The abundance of the SSRs has a positive linear relationship with the recombination rate. SSR sequences in the coding regions involve a higher GC content in the Poaceae than that in the other species. SSRs of 70-80 bp in length showed the highest AT/GC base ratio among all of these loci. The result shows the highest polymorphism rate belongs to the SSRs ranged from 30 bp to 40 bp. Using all the SSR primers of Japonica, nineteen universal primers were selected and located on the genome of the grass family. The information of SSR loci, the SSR primers and the tools of mining and analyzing SSR are provided in the PSSRD (Poaceae SSR Database, http://biodb.sdau.edu.cn/pssrd/). Our study and the PSSRD database provide a foundation for the comparative study in the Poaceae and it will accelerate the study on markers application, gene mapping and molecular breeding.

  15. A Model for Carbohydrate Metabolism in the Diatom Phaeodactylum tricornutum Deduced from Comparative Whole Genome Analysis

    PubMed Central

    Kaplan, Aaron; Caron, Lise; Weber, Till; Maheswari, Uma; Armbrust, E. Virginia; Bowler, Chris

    2008-01-01

    Background Diatoms are unicellular algae responsible for approximately 20% of global carbon fixation. Their evolution by secondary endocytobiosis resulted in a complex cellular structure and metabolism compared to algae with primary plastids. Methodology/Principal Findings The whole genome sequence of the diatom Phaeodactylum tricornutum has recently been completed. We identified and annotated genes for enzymes involved in carbohydrate pathways based on extensive EST support and comparison to the whole genome sequence of a second diatom, Thalassiosira pseudonana. Protein localization to mitochondria was predicted based on identified similarities to mitochondrial localization motifs in other eukaryotes, whereas protein localization to plastids was based on the presence of signal peptide motifs in combination with plastid localization motifs previously shown to be required in diatoms. We identified genes potentially involved in a C4-like photosynthesis in P. tricornutum and, on the basis of sequence-based putative localization of relevant proteins, discuss possible differences in carbon concentrating mechanisms and CO2 fixation between the two diatoms. We also identified genes encoding enzymes involved in photorespiration with one interesting exception: glycerate kinase was not found in either P. tricornutum or T. pseudonana. Various Calvin cycle enzymes were found in up to five different isoforms, distributed between plastids, mitochondria and the cytosol. Diatoms store energy either as lipids or as chrysolaminaran (a β-1,3-glucan) outside of the plastids. We identified various β-glucanases and large membrane-bound glucan synthases. Interestingly most of the glucanases appear to contain C-terminal anchor domains that may attach the enzymes to membranes. Conclusions/Significance Here we present a detailed synthesis of carbohydrate metabolism in diatoms based on the genome sequences of Thalassiosira pseudonana and Phaeodactylum tricornutum. This model provides novel

  16. Structural complexity of Dengue virus untranslated regions: cis-acting RNA motifs and pseudoknot interactions modulating functionality of the viral genome

    PubMed Central

    Sztuba-Solinska, Joanna; Teramoto, Tadahisa; Rausch, Jason W.; Shapiro, Bruce A.; Padmanabhan, Radhakrishnan; Le Grice, Stuart F. J.

    2013-01-01

    The Dengue virus (DENV) genome contains multiple cis-acting elements required for translation and replication. Previous studies indicated that a 719-nt subgenomic minigenome (DENV-MINI) is an efficient template for translation and (−) strand RNA synthesis in vitro. We performed a detailed structural analysis of DENV-MINI RNA, combining chemical acylation techniques, Pb2+ ion-induced hydrolysis and site-directed mutagenesis. Our results highlight protein-independent 5′–3′ terminal interactions involving hybridization between recognized cis-acting motifs. Probing analyses identified tandem dumbbell structures (DBs) within the 3′ terminus spaced by single-stranded regions, internal loops and hairpins with embedded GNRA-like motifs. Analysis of conserved motifs and top loops (TLs) of these dumbbells, and their proposed interactions with downstream pseudoknot (PK) regions, predicted an H-type pseudoknot involving TL1 of the 5′ DB and the complementary region, PK2. As disrupting the TL1/PK2 interaction, via ‘flipping’ mutations of PK2, previously attenuated DENV replication, this pseudoknot may participate in regulation of RNA synthesis. Computer modeling implied that this motif might function as autonomous structural/regulatory element. In addition, our studies targeting elements of the 3′ DB and its complementary region PK1 indicated that communication between 5′–3′ terminal regions strongly depends on structure and sequence composition of the 5′ cyclization region. PMID:23531545

  17. Zinc-finger transcription factors are associated with guanine quadruplex motifs in human, chimpanzee, mouse and rat promoters genome-wide.

    PubMed

    Kumar, Pankaj; Yadav, Vinod Kumar; Baral, Aradhita; Kumar, Parveen; Saha, Dhurjhoti; Chowdhury, Shantanu

    2011-10-01

    Function of non-B DNA structures are poorly understood though several bioinformatics studies predict role of the G-quadruplex DNA structure in transcription. Earlier, using transcriptome profiling we found evidence of widespread G-quadruplex-mediated gene regulation. Herein, we asked whether potential G-quadruplex (PG4) motifs associate with transcription factors (TF). This was analyzed using 220 position weight matrices [designated as transcription factor binding sites (TFBS)], representing 187 unique TF, in >75,000 genes in human, chimpanzee, mouse and rat. Results show binding sites of nine TFs, including that of AP-2, SP1, MAZ and VDR, occurred significantly within 100 bases of the PG4 motif (P < 1.24E-10). PG4-TFBS combinations were conserved in 'orthologously' related promoters across all four organisms and were associated with >850 genes in each genome. Remarkably, seven of the nine TFs were zinc-finger binding proteins indicating a novel characteristic of PG4 motifs. To test these findings, transcriptome profiles from human cell lines treated with G-quadruplex-specific molecules were used; 66 genes were significantly differentially expressed across both cell-types, which also harbored conserved PG4 motifs along with one/more of the nine TFBS. In addition, genes regulated by PG4-TFBS combinations were found to be co-regulated in human tissues, further emphasizing the regulatory significance of the associations.

  18. Detection of chromosomal abnormalities by comparative genomic hybridization.

    PubMed

    Lapierre, Jean-Michel; Tachdjian, Gérard

    2005-04-01

    Comparative genomic hybridization (CGH) is a modified in-situ hybridization technique. In this type of analysis, two differentially labeled genomic DNAs (study and reference) are cohybridized to normal metaphase spreads or to microarray. Chromosomal locations of copy number changes in the DNA segments of the study genome are revealed by a variable fluorescence intensity ratio along each target chromosome. Thus, CGH allows detection and mapping of DNA sequence copy differences between two genomes in a single experiment. Since its development, comparative genomic hybridization has been applied mostly as a research tool in the field of cancer cytogenetics to identify genetic changes in many previously unknown regions. It is also a powerful tool for detection and identification of unbalanced chromosomal abnormalities in prenatal, postnatal and preimplantation diagnostics. The development of comparative genomic hybridization and increase in resolution analysis by using the microarray-based technique offer new information on chromosomal pathologies and thus better management of patients.

  19. Analysis of the allohexaploid bread wheat genome (Triticum aestivum) using comparative whole genome shotgun sequencing

    USDA-ARS?s Scientific Manuscript database

    The large 17 Gb allopolyploid genome of bread wheat is a major challenge for genome analysis because it is composed of three closely- related and independently maintained genomes, with genes dispersed as small “islands” separated by vast tracts of repetitive DNA. We used a novel comparative genomi...

  20. Human-mouse comparative genomics: successes and failures to reveal functional regions of the human genome

    SciTech Connect

    Pennacchio, Len A.; Baroukh, Nadine; Rubin, Edward M.

    2003-05-15

    Deciphering the genetic code embedded within the human genome remains a significant challenge despite the human genome consortium's recent success at defining its linear sequence (Lander et al. 2001; Venter et al. 2001). While useful strategies exist to identify a large percentage of protein encoding regions, efforts to accurately define functional sequences in the remaining {approx}97 percent of the genome lag. Our primary interest has been to utilize the evolutionary relationship and the universal nature of genomic sequence information in vertebrates to reveal functional elements in the human genome. This has been achieved through the combined use of vertebrate comparative genomics to pinpoint highly conserved sequences as candidates for biological activity and transgenic mouse studies to address the functionality of defined human DNA fragments. Accordingly, we describe strategies and insights into functional sequences in the human genome through the use of comparative genomics coupled wit h functional studies in the mouse.

  1. AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks.

    PubMed

    Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M

    2017-01-15

    AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.

  2. Comparative genomics of the lactic acid bacteria

    SciTech Connect

    Makarova, K.; Slesarev, A.; Wolf, Y.; Sorokin, A.; Mirkin, B.; Koonin, E.; Pavlov, A.; Pavlova, N.; Karamychev, V.; Polouchine, N.; Shakhova, V.; Grigoriev, I.; Lou, Y.; Rokhsar, D.; Lucas, S.; Huang, K.; Goodstein, D. M.; Hawkins, T.; Plengvidhya, V.; Welker, D.; Hughes, J.; Goh, Y.; Benson, A.; Baldwin, K.; Lee, J. -H.; Diaz-Muniz, I.; Dosti, B.; Smeianov, V; Wechter, W.; Barabote, R.; Lorca, G.; Altermann, E.; Barrangou, R.; Ganesan, B.; Xie, Y.; Rawsthorne, H.; Tamir, D.; Parker, C.; Breidt, F.; Broadbent, J.; Hutkins, R.; O'Sullivan, D.; Steele, J.; Unlu, G.; Saier, M.; Klaenhammer, T.; Richardson, P.; Kozyavkin, S.; Weimer, B.; Mills, D.

    2006-06-01

    Lactic acid-producing bacteria are associated with various plant and animal niches and play a key role in the production of fermented foods and beverages. We report nine genome sequences representing the phylogenetic and functional diversity of these bacteria. The small genomes of lactic acid bacteria encode a broad repertoire of transporters for efficient carbon and nitrogen acquisition from the nutritionally rich environments they inhabit and reflect a limited range of biosynthetic capabilities that indicate both prototrophic and auxotrophic strains. Phylogenetic analyses, comparison of gene content across the group, and reconstruction of ancestral gene sets indicate a combination of extensive gene loss and key gene acquisitions via horizontal gene transfer during the coevolution of lactic acid bacteria with their habitats.

  3. Comparative genomics of trypanosomatid parasitic protozoa.

    PubMed

    El-Sayed, Najib M; Myler, Peter J; Blandin, Gaëlle; Berriman, Matthew; Crabtree, Jonathan; Aggarwal, Gautam; Caler, Elisabet; Renauld, Hubert; Worthey, Elizabeth A; Hertz-Fowler, Christiane; Ghedin, Elodie; Peacock, Christopher; Bartholomeu, Daniella C; Haas, Brian J; Tran, Anh-Nhi; Wortman, Jennifer R; Alsmark, U Cecilia M; Angiuoli, Samuel; Anupama, Atashi; Badger, Jonathan; Bringaud, Frederic; Cadag, Eithon; Carlton, Jane M; Cerqueira, Gustavo C; Creasy, Todd; Delcher, Arthur L; Djikeng, Appolinaire; Embley, T Martin; Hauser, Christopher; Ivens, Alasdair C; Kummerfeld, Sarah K; Pereira-Leal, Jose B; Nilsson, Daniel; Peterson, Jeremy; Salzberg, Steven L; Shallom, Joshua; Silva, Joana C; Sundaram, Jaideep; Westenberger, Scott; White, Owen; Melville, Sara E; Donelson, John E; Andersson, Björn; Stuart, Kenneth D; Hall, Neil

    2005-07-15

    A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.

  4. MicrobesOnline: an integrated portal for comparative and functional genomics

    SciTech Connect

    Dehal, Paramvir; Joachimiak, Marcin; Price, Morgan; Bates, John; Baumohl, Jason; Chivian, Dylan; Friedland, Greg; Huang, Kathleen; Keller, Keith; Novichkov, Pavel; Dubchak, Inna; Alm, Eric; Arkin, Adam

    2011-07-14

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  5. MicrobesOnline: an integrated portal for comparative and functional genomics

    SciTech Connect

    Dehal, Paramvir S.; Joachimiak, Marcin P.; Price, Morgan N.; Bates, John T.; Baumohl, Jason K.; Chivian, Dylan; Friedland, Greg D.; Huang, Katherine H.; Keller, Keith; Novichkov, Pavel S.; Dubchak, Inna L.; Alm, Eric J.; Arkin, Adam P.

    2009-09-17

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  6. Genome-wide prediction and functional validation of promoter motifs regulating gene expression in spore and infection stages of Phytophthora infestans.

    PubMed

    Roy, Sourav; Kagda, Meenakshi; Judelson, Howard S

    2013-03-01

    Most eukaryotic pathogens have complex life cycles in which gene expression networks orchestrate the formation of cells specialized for dissemination or host colonization. In the oomycete Phytophthora infestans, the potato late blight pathogen, major shifts in mRNA profiles during developmental transitions were identified using microarrays. We used those data with search algorithms to discover about 100 motifs that are over-represented in promoters of genes up-regulated in hyphae, sporangia, sporangia undergoing zoosporogenesis, swimming zoospores, or germinated cysts forming appressoria (infection structures). Most of the putative stage-specific transcription factor binding sites (TFBSs) thus identified had features typical of TFBSs such as position or orientation bias, palindromy, and conservation in related species. Each of six motifs tested in P. infestans transformants using the GUS reporter gene conferred the expected stage-specific expression pattern, and several were shown to bind nuclear proteins in gel-shift assays. Motifs linked to the appressoria-forming stage, including a functionally validated TFBS, were over-represented in promoters of genes encoding effectors and other pathogenesis-related proteins. To understand how promoter and genome architecture influence expression, we also mapped transcription patterns to the P. infestans genome assembly. Adjacent genes were not typically induced in the same stage, including genes transcribed in opposite directions from small intergenic regions, but co-regulated gene pairs occurred more than expected by random chance. These data help illuminate the processes regulating development and pathogenesis, and will enable future attempts to purify the cognate transcription factors.

  7. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  8. GenColors-based comparative genome databases for small eukaryotic genomes

    PubMed Central

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources. PMID:23193285

  9. Comparative genetics and genomics of nematodes: genome structure, development, and lifestyle.

    PubMed

    Sommer, Ralf J; Streit, Adrian

    2011-01-01

    Nematodes are found in virtually all habitats on earth. Many of them are parasites of plants and animals, including humans. The free-living nematode, Caenorhabditis elegans, is one of the genetically best-studied model organisms and was the first metazoan whose genome was fully sequenced. In recent years, the draft genome sequences of another six nematodes representing four of the five major clades of nematodes were published. Compared to mammalian genomes, all these genomes are very small. Nevertheless, they contain almost the same number of genes as the human genome. Nematodes are therefore a very attractive system for comparative genetic and genomic studies, with C. elegans as an excellent baseline. Here, we review the efforts that were made to extend genetic analysis to nematodes other than C. elegans, and we compare the seven available nematode genomes. One of the most striking findings is the unexpectedly high incidence of gene acquisition through horizontal gene transfer (HGT).

  10. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  11. Computational Methods for the Analysis of Array Comparative Genomic Hybridization

    PubMed Central

    Chari, Raj; Lockwood, William W.; Lam, Wan L.

    2006-01-01

    Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development. PMID:17992253

  12. Comparative Genomics of an Emerging Amphibian Virus

    PubMed Central

    Epstein, Brendan; Storfer, Andrew

    2015-01-01

    Ranaviruses, a genus of the Iridoviridae, are large double-stranded DNA viruses that infect cold-blooded vertebrates worldwide. Ranaviruses have caused severe epizootics in commercial frog and fish populations, and are currently classified as notifiable pathogens in international trade. Previous work shows that a ranavirus that infects tiger salamanders throughout Western North America (Ambystoma tigrinum virus, or ATV) is in high prevalence among salamanders in the fishing bait trade. Bait ATV strains have elevated virulence and are transported long distances by humans, providing widespread opportunities for pathogen pollution. We sequenced the genomes of 15 strains of ATV collected from tiger salamanders across western North America and performed phylogenetic and population genomic analyses and tests for recombination. We find that ATV forms a monophyletic clade within the rest of the Ranaviruses and that it likely emerged within the last several thousand years, before human activities influenced its spread. We also identify several genes under strong positive selection, some of which appear to be involved in viral virulence and/or host immune evasion. In addition, we provide support for the pathogen pollution hypothesis with evidence of recombination among ATV strains, and potential bait-endemic strain recombination. PMID:26530419

  13. Comparative Genomics of an Emerging Amphibian Virus.

    PubMed

    Epstein, Brendan; Storfer, Andrew

    2015-11-03

    Ranaviruses, a genus of the Iridoviridae, are large double-stranded DNA viruses that infect cold-blooded vertebrates worldwide. Ranaviruses have caused severe epizootics in commercial frog and fish populations, and are currently classified as notifiable pathogens in international trade. Previous work shows that a ranavirus that infects tiger salamanders throughout Western North America (Ambystoma tigrinum virus, or ATV) is in high prevalence among salamanders in the fishing bait trade. Bait ATV strains have elevated virulence and are transported long distances by humans, providing widespread opportunities for pathogen pollution. We sequenced the genomes of 15 strains of ATV collected from tiger salamanders across western North America and performed phylogenetic and population genomic analyses and tests for recombination. We find that ATV forms a monophyletic clade within the rest of the Ranaviruses and that it likely emerged within the last several thousand years, before human activities influenced its spread. We also identify several genes under strong positive selection, some of which appear to be involved in viral virulence and/or host immune evasion. In addition, we provide support for the pathogen pollution hypothesis with evidence of recombination among ATV strains, and potential bait-endemic strain recombination.

  14. The Methanosarcina barkeri genome: comparative analysis withMethanosarcina acetivorans and Methanosarcina mazei reveals extensiverearrangement within methanosarcinal genomes

    SciTech Connect

    Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.; Bruce,David C.; Gilna, Paul; Han, Cliff S.; Lapidus, Alla; Metcalf, William W.; Saunders, Elizabeth; Tapia, Roxanne; Sowers, Kevin R.

    2006-05-19

    We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri, 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.

  15. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus.

    PubMed

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria.

  16. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus

    PubMed Central

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria. PMID:26900859

  17. A phylogenetic foundation for comparative mammalian genomics.

    PubMed

    Waddell, P J; Kishino, H; Ota, R

    2001-01-01

    A major effort is being undertaken to sequence an array of mammalian genomes. Coincidentally, the evolutionary relationships of the 18 presently recognized orders of placental mammals are only just being resolved. In this work we construct and analyse the largest alignments of amino acid sequence data to date. Our findings allow us to set up a series of superordinal groups (clades) to act as prior hypotheses for further testing. Important findings include strong evidence for a clade of Euarchonta+Glires (=Supraprimates) comprised of primates, flying lemurs, tree shrews, lagomorphs and rodents. In addition, there is good evidence for a clade of all placental mammals except Xenarthra and Afrotheria (=Boreotheria) and for the previously recognised clades Laurasiatheria, Scrotifera, Fereuungulata, Ferae, Afrotheria, Euarchonta, Glires, and Eulipotyphla. Accordingly, a revised classification of the placental mammals is put forward. Using this and molecular divergence-time methods, the ages of the superordinal splits are estimated. While results are strongly consistent with the earliest superordinal divergences all being >65 mybp (Cretaceous period), they suffer from greater uncertainty than presently appreciated. The early primate split of tarsiers from the anthropoid lineage at ~55 mybp is seen to be an especially informative fossil calibration point. A statistical framework for testing clades using SINE data is presented and reveals significant support for the tarsier/anthropoid clade, as well as the clades Cetruminantia and Whippomorpha. Results also underline our thesis that while sequence analysis can help set up hypothesised clades, SINEs obtainable from sequencing 1-2 MB regions of placental genomes are essential to testing them. In contrast, derivations suggest that empirical Bayesian methods for sequence data may not be robust estimators of clades. Our findings, including the study of genes such as TP53, make a good case for the tree shrew as a closer relative

  18. Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus

    SciTech Connect

    Rodionov, Dmitry A.; Novichkov, Pavel; Stavrovskaya, Elena D.; Rodionova, Irina A.; Li, Xiaoqing; Kazanov, Marat D.; Ravcheev, Dmitry A.; Gerasimova, Anna V.; Kazakov, Alexey E.; Kovaleva, Galina Y.; Permina, Elizabeth A.; Laikova, Olga N.; Overbeek, Ross; Romine, Margaret F.; Fredrickson, Jim K.; Arkin, Adam P.; Dubchak, Inna; Osterman, Andrei L.; Gelfand, Mikhail S.

    2011-06-15

    Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. Despite the growing number of genome-scale gene expression studies, our abilities to convert the results of these studies into accurate regulatory annotations and to project them from model to other organisms are extremely limited. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. However, even orthologous regulators with conserved DNA-binding motifs may control substantially different gene sets, revealing striking differences in regulatory strategies between the Shewanella spp. and E. coli. Multiple examples of regulatory network rewiring include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), and numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. NagR for N-acetylglucosamine catabolism and PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp).

  19. Comparative genomics of autism and schizophrenia

    PubMed Central

    Crespi, Bernard; Stead, Philip; Elliot, Michael

    2010-01-01

    We used data from studies of copy-number variants (CNVs), single-gene associations, growth-signaling pathways, and intermediate phenotypes associated with brain growth to evaluate four alternative hypotheses for the genomic and developmental relationships between autism and schizophrenia: (i) autism subsumed in schizophrenia, (ii) independence, (iii) diametric, and (iv) partial overlap. Data from CNVs provides statistical support for the hypothesis that autism and schizophrenia are associated with reciprocal variants, such that at four loci, deletions predispose to one disorder, whereas duplications predispose to the other. Data from single-gene studies are inconsistent with a hypothesis based on independence, in that autism and schizophrenia share associated genes more often than expected by chance. However, differentiation between the partial overlap and diametric hypotheses using these data is precluded by limited overlap in the specific genetic markers analyzed in both autism and schizophrenia. Evidence from the effects of risk variants on growth-signaling pathways shows that autism-spectrum conditions tend to be associated with up-regulation of pathways due to loss of function mutations in negative regulators, whereas schizophrenia is associated with reduced pathway activation. Finally, data from studies of head and brain size phenotypes indicate that autism is commonly associated with developmentally-enhanced brain growth, whereas schizophrenia is characterized, on average, by reduced brain growth. These convergent lines of evidence appear most compatible with the hypothesis that autism and schizophrenia represent diametric conditions with regard to their genomic underpinnings, neurodevelopmental bases, and phenotypic manifestations as reflecting under-development versus dysregulated over-development of the human social brain. PMID:19955444

  20. Gramene 2016: comparative plant genomics and pathway resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the data...

  1. Comparative and functional genomics of lipases in holometabolous insects.

    PubMed

    Horne, Irene; Haritos, Victoria S; Oakeshott, John G

    2009-08-01

    Lipases have key roles in insect lipid acquisition, storage and mobilisation and are also fundamental to many physiological processes underpinning insect reproduction, development, defence from pathogens and oxidative stress, and pheromone signalling. We have screened the recently sequenced genomes of five species from four orders of holometabolous insects, the dipterans Drosophila melanogaster and Anopheles gambiae, the hymenopteran Apis mellifera, the moth Bombyx mori and the beetle Tribolium castaneum, for the six major lipase families that are also found in other organisms. The two most numerous families in the insects, the neutral and acid lipases, are also the main families in mammals, albeit not in Caenorhabditis elegans, plants or microbes. Total numbers of the lipases vary two-fold across the five insect species, from numbers similar to those in mammals up to numbers comparable to those seen in C. elegans. Whilst there is a high degree of orthology with mammalian lipases in the other four families, the great majority of the insect neutral and acid lipases have arisen since the insect orders themselves diverged. Intriguingly, about 10% of the insect neutral and acid lipases have lost motifs critical for catalytic function. Examination of the length of lid and loop regions of the neutral lipase sequences suggest that most of the insect lipases lack triacylglycerol (TAG) hydrolysis activity, although the acid lipases all have intact cap domains required for TAG hydrolysis. We have also reviewed the sequence databases and scientific literature for insights into the expression profiles and functions of the insect neutral and acid lipases and the orthologues of the mammalian adipose triglyceride lipase which has a pivotal role in lipid mobilisation. These data suggest that some of the acid and neutral lipase diversity may be due to a requirement for rapid accumulation of dietary lipids. The different roles required of lipases at the four discrete life stages of

  2. Phytozome: a comparative platform for green plant genomics.

    PubMed

    Goodstein, David M; Shu, Shengqiang; Howson, Russell; Neupane, Rochak; Hayes, Richard D; Fazo, Joni; Mitros, Therese; Dirks, William; Hellsten, Uffe; Putnam, Nicholas; Rokhsar, Daniel S

    2012-01-01

    The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

  3. [Prediction of Promoter Motifs in Virophages].

    PubMed

    Gong, Chaowen; Zhou, Xuewen; Pan, Yingjie; Wang, Yongjie

    2015-07-01

    Virophages have crucial roles in ecosystems and are the transport vectors of genetic materials. To shed light on regulation and control mechanisms in virophage--host systems as well as evolution between virophages and their hosts, the promoter motifs of virophages were predicted on the upstream regions of start codons using an analytical tool for prediction of promoter motifs: Multiple EM for Motif Elicitation. Seventeen potential promoter motifs were identified based on the E-value, location, number and length of promoters in genomes. Sputnik and zamilon motif 2 with AT-rich regions were distributed widely on genomes, suggesting that these motifs may be associated with regulation of the expression of various genes. Motifs containing the TCTA box were predicted to be late promoter motif in mavirus; motifs containing the ATCT box were the potential late promoter motif in the Ace Lake mavirus . AT-rich regions were identified on motif 2 in the Organic Lake virophage, motif 3 in Yellowstone Lake virophage (YSLV)1 and 2, motif 1 in YSLV3, and motif 1 and 2 in YSLV4, respectively. AT-rich regions were distributed widely on the genomes of virophages. All of these motifs may be promoter motifs of virophages. Our results provide insights into further exploration of temporal expression of genes in virophages as well as associations between virophages and giant viruses.

  4. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/.

  5. Reference-Free Comparative Genomics of 174 Chloroplasts

    PubMed Central

    Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.

    2012-01-01

    Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied

  6. Comparative Genomics and Extensive Recombinations in Phage Communities

    NASA Astrophysics Data System (ADS)

    Poisson, Guylaine; Belcaid, Mahdi; Bergeron, Anne

    Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities.

  7. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches.

  8. Advances in understanding cis regulation of the plant gene with an emphasis on comparative genomics.

    PubMed

    Burgess, Diane G; Xu, Jie; Freeling, Michael

    2015-10-01

    The plant gene model remains largely an extrapolation from animals, with the cis functional unit, the gene, cast as a dynamic looping structure. Molecular genetics with model plants continues to make advances; highlighted here are quantitative-occupancy results from the Arabidopsis thaliana (Arabidopsis) Phytochrome-Interacting bHLH transcription Factors (PIF) quartet. Compared to this complex snapshot, results from chromatin occupancy and other Encyclopedia of DNA Elements (ENCODE)-like approaches increase our transcription factor-motif cognate library, but regulation cannot by itself be inferred from binding. Complementary published Arabidopsis conserved noncoding sequence lists are compared, evaluated, merged, and released. Comparative genomic approaches have identified a cis modifier of a gene's expression-hypothetically, a transposon-based 'rheostat'-that works in all cells, times and places.

  9. Dyneins Across Eukaryotes: A Comparative Genomic Analysis

    PubMed Central

    Wickstead, Bill; Gull, Keith

    2007-01-01

    Dyneins are large minus-end-directed microtubule motors. Each dynein contains at least one dynein heavy chain (DHC) and a variable number of intermediate chains (IC), light intermediate chains (LIC) and light chains (LC). Here, we used genome sequence data from 24 diverse eukaryotes to assess the distribution of DHCs, ICs, LICs and LCs across Eukaryota. Phylogenetic inference identified nine DHC families (two cytoplasmic and seven axonemal) and six IC families (one cytoplasmic). We confirm that dyneins have been lost from higher plants and show that this is most likely because of a single loss of cytoplasmic dynein 1 from the ancestor of Rhodophyta and Viridiplantae, followed by lineage-specific losses of other families. Independent losses in Entamoeba mean that at least three extant eukaryotic lineages are entirely devoid of dyneins. Cytoplasmic dynein 2 is associated with intraflagellar transport (IFT), but in two chromalveolate organisms, we find an IFT footprint without the retrograde motor. The distribution of one family of outer-arm dyneins accounts for 2-headed or 3-headed outer-arm ultrastructures observed in different organisms. One diatom species builds motile axonemes without any inner-arm dyneins (IAD), and the unexpected conservation of IAD I1 in non-flagellate algae and LC8 (DYNLL1/2) in all lineages reveals a surprising fluidity to dynein function. PMID:17897317

  10. Comparative genomics meets topology: a novel view on genome median and halving problems.

    PubMed

    Alexeev, Nikita; Avdeyev, Pavel; Alekseyev, Max A

    2016-11-11

    Genome median and genome halving are combinatorial optimization problems that aim at reconstruction of ancestral genomes by minimizing the number of evolutionary events between them and genomes of the extant species. While these problems have been widely studied in past decades, their solutions are often either not efficient or not biologically adequate. These shortcomings have been recently addressed by restricting the problems solution space. We show that the restricted variants of genome median and halving problems are, in fact, closely related. We demonstrate that these problems have a neat topological interpretation in terms of embedded graphs and polygon gluings. We illustrate how such interpretation can lead to solutions to these problems in particular cases. This study provides an unexpected link between comparative genomics and topology, and demonstrates advantages of solving genome median and halving problems within the topological framework.

  11. Whole Genome Amplification of Labeled Viable Single Cells Suited for Array-Comparative Genomic Hybridization.

    PubMed

    Kroneis, Thomas; El-Heliebi, Amin

    2015-01-01

    Understanding details of a complex biological system makes it necessary to dismantle it down to its components. Immunostaining techniques allow identification of several distinct cell types thereby giving an inside view of intercellular heterogeneity. Often staining reveals that the most remarkable cells are the rarest. To further characterize the target cells on a molecular level, single cell techniques are necessary. Here, we describe the immunostaining, micromanipulation, and whole genome amplification of single cells for the purpose of genomic characterization. First, we exemplify the preparation of cell suspensions from cultured cells as well as the isolation of peripheral mononucleated cells from blood. The target cell population is then subjected to immunostaining. After cytocentrifugation target cells are isolated by micromanipulation and forwarded to whole genome amplification. For whole genome amplification, we use GenomePlex(®) technology allowing downstream genomic analysis such as array-comparative genomic hybridization.

  12. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    PubMed

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  13. The dog genome: survey sequencing and comparative analysis.

    PubMed

    Kirkness, Ewen F; Bafna, Vineet; Halpern, Aaron L; Levy, Samuel; Remington, Karin; Rusch, Douglas B; Delcher, Arthur L; Pop, Mihai; Wang, Wei; Fraser, Claire M; Venter, J Craig

    2003-09-26

    A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.

  14. Detecting correlations among functional-sequence motifs

    NASA Astrophysics Data System (ADS)

    Pirino, Davide; Rigosa, Jacopo; Ledda, Alice; Ferretti, Luca

    2012-06-01

    Sequence motifs are words of nucleotides in DNA with biological functions, e.g., gene regulation. Identification of such words proceeds through rejection of Markov models on the expected motif frequency along the genome. Additional biological information can be extracted from the correlation structure among patterns of motif occurrences. In this paper a log-linear multivariate intensity Poisson model is estimated via expectation maximization on a set of motifs along the genome of E. coli K12. The proposed approach allows for excitatory as well as inhibitory interactions among motifs and between motifs and other genomic features like gene occurrences. Our findings confirm previous stylized facts about such types of interactions and shed new light on genome-maintenance functions of some particular motifs. We expect these methods to be applicable to a wider set of genomic features.

  15. Comparative Genomics Reveals the Core and Accessory Genomes of Streptomyces Species.

    PubMed

    Kim, Ji-Nu; Kim, Yeonbum; Jeong, Yujin; Roe, Jung-Hye; Kim, Byung-Gee; Cho, Byung-Kwan

    2015-10-01

    The development of rapid and efficient genome sequencing methods has enabled us to study the evolutionary background of bacterial genetic information. Here, we present comparative genomic analysis of 17 Streptomyces species, for which the genome has been completely sequenced, using the pan-genome approach. The analysis revealed that 34,592 ortholog clusters constituted the pan-genome of these Streptomyces species, including 2,018 in the core genome, 11,743 in the dispensable genome, and 20,831 in the unique genome. The core genome was converged to a smaller number of genes than reported previously, with 3,096 gene families. Functional enrichment analysis showed that genes involved in transcription were most abundant in the Streptomyces pan-genome. Finally, we investigated core genes for the sigma factors, mycothiol biosynthesis pathway, and secondary metabolism pathways; our data showed that many genes involved in stress response and morphological differentiation were commonly expressed in Streptomyces species. Elucidation of the core genome offers a basis for understanding the functional evolution of Streptomyces species and provides insights into target selection for the construction of industrial strains.

  16. Comparative genomic paleontology across plant kingdom reveals the dynamics of TE-driven genome evolution.

    PubMed

    El Baidouri, Moaine; Panaud, Olivier

    2013-01-01

    Long terminal repeat-retrotransposons (LTR-RTs) are the most abundant class of transposable elements (TEs) in plants. They strongly impact the structure, function, and evolution of their host genome, and, in particular, their role in genome size variation has been clearly established. However, the dynamics of the process through which LTR-RTs have differentially shaped plant genomes is still poorly understood because of a lack of comparative studies. Using a new robust and automated family classification procedure, we exhaustively characterized the LTR-RTs in eight plant genomes for which a high-quality sequence is available (i.e., Arabidopsis thaliana, A. lyrata, grapevine, soybean, rice, Brachypodium dystachion, sorghum, and maize). This allowed us to perform a comparative genome-wide study of the retrotranspositional landscape in these eight plant lineages from both monocots and dicots. We show that retrotransposition has recurrently occurred in all plant genomes investigated, regardless their size, and through bursts, rather than a continuous process. Moreover, in each genome, only one or few LTR-RT families have been active in the recent past, and the difference in genome size among the species studied could thus mostly be accounted for by the extent of the latest transpositional burst(s). Following these bursts, LTR-RTs are efficiently eliminated from their host genomes through recombination and deletion, but we show that the removal rate is not lineage specific. These new findings lead us to propose a new model of TE-driven genome evolution in plants.

  17. IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

    PubMed Central

    Lee, Wonhoon; Park, Jongsun; Choi, Jaeyoung; Jung, Kyongyong; Park, Bongsoo; Kim, Donghan; Lee, Jaeyoung; Ahn, Kyohun; Song, Wonho; Kang, Seogchan; Lee, Yong-Hwan; Lee, Seunghwan

    2009-01-01

    Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site . PMID:19351385

  18. Polysaccharides utilization in human gut bacterium Bacteroides thetaiotaomicron: comparative genomics reconstruction of metabolic and regulatory networks

    PubMed Central

    2013-01-01

    Background Bacteroides thetaiotaomicron, a predominant member of the human gut microbiota, is characterized by its ability to utilize a wide variety of polysaccharides using the extensive saccharolytic machinery that is controlled by an expanded repertoire of transcription factors (TFs). The availability of genomic sequences for multiple Bacteroides species opens an opportunity for their comparative analysis to enable characterization of their metabolic and regulatory networks. Results A comparative genomics approach was applied for the reconstruction and functional annotation of the carbohydrate utilization regulatory networks in 11 Bacteroides genomes. Bioinformatics analysis of promoter regions revealed putative DNA-binding motifs and regulons for 31 orthologous TFs in the Bacteroides. Among the analyzed TFs there are 4 SusR-like regulators, 16 AraC-like hybrid two-component systems (HTCSs), and 11 regulators from other families. Novel DNA motifs of HTCSs and SusR-like regulators in the Bacteroides have the common structure of direct repeats with a long spacer between two conserved sites. Conclusions The inferred regulatory network in B. thetaiotaomicron contains 308 genes encoding polysaccharide and sugar catabolic enzymes, carbohydrate-binding and transport systems, and TFs. The analyzed TFs control pathways for utilization of host and dietary glycans to monosaccharides and their further interconversions to intermediates of the central metabolism. The reconstructed regulatory network allowed us to suggest and refine specific functional assignments for sugar catabolic enzymes and transporters, providing a substantial improvement to the existing metabolic models for B. thetaiotaomicron. The obtained collection of reconstructed TF regulons is available in the RegPrecise database (http://regprecise.lbl.gov). PMID:24330590

  19. Polysaccharides utilization in human gut bacterium Bacteroides thetaiotaomicron: comparative genomics reconstruction of metabolic and regulatory networks.

    PubMed

    Ravcheev, Dmitry A; Godzik, Adam; Osterman, Andrei L; Rodionov, Dmitry A

    2013-12-12

    Bacteroides thetaiotaomicron, a predominant member of the human gut microbiota, is characterized by its ability to utilize a wide variety of polysaccharides using the extensive saccharolytic machinery that is controlled by an expanded repertoire of transcription factors (TFs). The availability of genomic sequences for multiple Bacteroides species opens an opportunity for their comparative analysis to enable characterization of their metabolic and regulatory networks. A comparative genomics approach was applied for the reconstruction and functional annotation of the carbohydrate utilization regulatory networks in 11 Bacteroides genomes. Bioinformatics analysis of promoter regions revealed putative DNA-binding motifs and regulons for 31 orthologous TFs in the Bacteroides. Among the analyzed TFs there are 4 SusR-like regulators, 16 AraC-like hybrid two-component systems (HTCSs), and 11 regulators from other families. Novel DNA motifs of HTCSs and SusR-like regulators in the Bacteroides have the common structure of direct repeats with a long spacer between two conserved sites. The inferred regulatory network in B. thetaiotaomicron contains 308 genes encoding polysaccharide and sugar catabolic enzymes, carbohydrate-binding and transport systems, and TFs. The analyzed TFs control pathways for utilization of host and dietary glycans to monosaccharides and their further interconversions to intermediates of the central metabolism. The reconstructed regulatory network allowed us to suggest and refine specific functional assignments for sugar catabolic enzymes and transporters, providing a substantial improvement to the existing metabolic models for B. thetaiotaomicron. The obtained collection of reconstructed TF regulons is available in the RegPrecise database (http://regprecise.lbl.gov).

  20. Motifs from the deep

    PubMed Central

    Hwang, Tony W; Codrea, Vlad; Ellington, Andrew D

    2009-01-01

    Because of the increasing recognition of the importance of non-coding RNAs in gene regulation, there is considerable interest in identifying RNA motifs in genomic data. In a recent report in BMC Genomics, Breaker and colleagues describe a new algorithm for identifying functional noncoding RNAs in metagenomic sequences of marine organisms, a strategy that may be particularly effective for discovering new and unique riboswitches. PMID:19735583

  1. The dual role of LSD1 and HDAC3 in STAT5-dependent transcription is determined by protein interactions, binding affinities, motifs and genomic positions

    PubMed Central

    Nanou, Aikaterini; Toumpeki, Chrisavgi; Lavigne, Matthieu D.; Lazou, Vassiliki; Demmers, Jeroen; Paparountas, Triantafillos; Thanos, Dimitris; Katsantoni, Eleni

    2017-01-01

    STAT5 interacts with other factors to control transcription, and the mechanism of regulation is of interest as constitutive active STAT5 has been reported in malignancies. Here, LSD1 and HDAC3 were identified as novel STAT5a interacting partners in pro-B cells. Characterization of STAT5a, LSD1 and HDAC3 target genes by ChIP-seq and RNA-seq revealed gene subsets regulated by independent or combined action of the factors and LSD1/HDAC3 to play dual role in their activation or repression. Genes bound by STAT5a alone or in combination with weakly associated LSD1 or HDAC3 were enriched for the canonical STAT5a GAS motif, and such binding induced activation or repression. Strong STAT5 binding was seen more frequently in intergenic regions, which might function as distal enhancer elements. Groups of genes bound weaker by STAT5a and stronger by LSD1/HDAC3 showed an absence of the GAS motif, and were differentially regulated based on their genomic binding localization and binding affinities. These genes exhibited increased binding frequency in promoters, and in conjunction with the absence of GAS sites, the data indicate a requirement for stabilization by additional factors, which might recruit LSD1/HDAC3. Our study describes an interaction network of STAT5a/LSD1/HDAC3 and a dual function of LSD1/HDAC3 on STAT5-dependent transcription, defined by protein–protein interactions, genomic binding localization/affinity and motifs. PMID:27651463

  2. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  3. Evolution of Prdm Genes in Animals: Insights from Comparative Genomics

    PubMed Central

    Vervoort, Michel; Meulemeester, David; Béhague, Julien; Kerner, Pierre

    2016-01-01

    Prdm genes encode transcription factors with a subtype of SET domain known as the PRDF1-RIZ (PR) homology domain and a variable number of zinc finger motifs. These genes are involved in a wide variety of functions during animal development. As most Prdm genes have been studied in vertebrates, especially in mice, little is known about the evolution of this gene family. We searched for Prdm genes in the fully sequenced genomes of 93 different species representative of all the main metazoan lineages. A total of 976 Prdm genes were identified in these species. The number of Prdm genes per species ranges from 2 to 19. To better understand how the Prdm gene family has evolved in metazoans, we performed phylogenetic analyses using this large set of identified Prdm genes. These analyses allowed us to define 14 different subfamilies of Prdm genes and to establish, through ancestral state reconstruction, that 11 of them are ancestral to bilaterian animals. Three additional subfamilies were acquired during early vertebrate evolution (Prdm5, Prdm11, and Prdm17). Several gene duplication and gene loss events were identified and mapped onto the metazoan phylogenetic tree. By studying a large number of nonmetazoan genomes, we confirmed that Prdm genes likely constitute a metazoan-specific gene family. Our data also suggest that Prdm genes originated before the diversification of animals through the association of a single ancestral SET domain encoding gene with one or several zinc finger encoding genes. PMID:26560352

  4. Marine invertebrate lipases: Comparative and functional genomic analysis.

    PubMed

    Rivera-Perez, Crisalejandra

    2015-09-01

    Lipases are key enzymes involved in lipid digestion, storage and mobilization of reserves during fasting or heightened metabolic demand. This is a highly conserved process, essential for survival. The genomes of five marine invertebrate species with distinctive digestive system were screened for the six major lipase families. The two most common families in marine invertebrates, the neutral an acid lipases, are also the main families in mammals and insects. The number of lipases varies two-fold across analyzed genomes. A high degree of orthology with mammalian lipases was observed. Interestingly, 19% of the marine invertebrate lipases have lost motifs required for catalysis. Analysis of the lid and loop regions of the neutral lipases suggests that many marine invertebrates have a functional triacylglycerol hydrolytic activity as well as some acid lipases. A revision of the expression profiles and functional activity on sequences in databases and scientific literature provided information regarding the function of these families of enzymes in marine invertebrates. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. The Genome of Nosema sp. Isolate YNPr: A Comparative Analysis of Genome Evolution within the Nosema/Vairimorpha Clade

    PubMed Central

    Ma, Zhenggang; Li, Tian; Zhang, Xiaoyan; Debrunner-Vossbrinck, Bettina A.; Zhou, Zeyang; Vossbrinck, Charles R.

    2016-01-01

    The microsporidian parasite designated here as Nosema sp. Isolate YNPr was isolated from the cabbage butterfly Pieris rapae collected in Honghe Prefecture, Yunnan Province, China. The genome was sequenced by Illumina sequencing and compared to those of two related members of the Nosema/Vairimorpha clade, Nosema ceranae and Nosema apis. Based upon assembly statistics, the Nosema sp. YNPr genome is 3.36 x 106bp with a G+C content of 23.18% and 2,075 protein coding sequences. An “ACCCTT” motif is present approximately 50-bp upstream of the start codon, as reported from other members of the clade and from Encephalitozoon cuniculi, a sister taxon. Comparative small subunit ribosomal DNA (SSU rDNA) analysis as well as genome-wide phylogenetic analysis confirms a closer relationship between N. ceranae and Nosema sp. YNPr than between the two honeybee parasites N. ceranae and N. apis. The more closely related N. ceranae and Nosema sp. YNPr show similarities in a number of structural characteristics such as gene synteny, gene length, gene number, transposon composition and gene reduction. Based on transposable element content of the assemblies, the transposon content of Nosema sp. YNPr is 4.8%, that of N. ceranae is 3.7%, and that of N. apis is 2.5%, with large differences in the types of transposons present among these 3 species. Gene function annotation indicates that the number of genes participating in most metabolic activities is similar in all three species. However, the number of genes in the transcription, general function, and cysteine protease categories is greater in N. apis than in the other two species. Our studies further characterize the evolution of the Nosema/Vairimorpha clade of microsporidia. These organisms maintain variable but very reduced genomes. We are interested in understanding the effects of genetic drift versus natural selection on genome size in the microsporidia and in developing a testable hypothesis for further studies on the genomic

  6. Evolutionary and comparative analyses of the soybean genome

    PubMed Central

    Cannon, Steven B.; Shoemaker, Randy C.

    2012-01-01

    The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483

  7. Comparative genomics of insect juvenile hormone biosynthesis⋆

    PubMed Central

    Noriega, F.G.; Ribeiro, J.M.C.; Koener, J.F.; Valenzuela, J.G.; Hernandez-Martinez, S.; Pham, V.M.; Feyereisen, R.

    2009-01-01

    The biosynthesis of insect juvenile hormone (JH) and its neuroendocrine control are attractive targets for chemical control of insect pests and vectors of disease. To facilitate the molecular study of JH biosynthesis, we analyzed ESTs from the glands producing JH, the corpora allata (CA) in the cockroach Diploptera punctata, an insect long used as a physiological model species and compared them with ESTs from the CA of the mosquitoes Aedes aegypti and Anopheles albimanus. The predicted genes were analyzed according to their probable functions with the Gene Ontology classification, and compared to Drosophila and Anopheles gambiae genes. A large number of reciprocal matches in the cDNA libraries of cockroach and mosquito CA were found. These matches defined known and suspected enzymes of the JH biosynthetic pathway, but also several proteins associated with signal transduction that might play a role in the modulation of JH synthesis by neuropeptides. The identification in both cockroach and mosquito CA of homologs of the small ligand binding proteins from insects, Takeout/JH binding protein and retinol-binding protein highlights a hitherto unsuspected complexity of metabolite trafficking, perhaps JH precursor trafficking, in these endocrine glands. Furthermore, many reciprocal matches for genes of unknown function may provide a fertile ground for an in-depth study of allatal-specific cell physiology. PMID:16551550

  8. Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

    PubMed

    Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

    2017-05-22

    Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.

  9. Comparing Vertebrate Whole-Genome Shotgun Reads to the Human Genome

    PubMed Central

    Chen, Rui; Bouck, John B.; Weinstock, George M.; Gibbs, Richard A.

    2001-01-01

    Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted ∼42,500 genes in the human, slightly more than reported previously. PMID:11691844

  10. Comparative genomics of vesicomyid clam (Bivalvia: Mollusca) chemosynthetic symbionts

    PubMed Central

    Newton, Irene LG; Girguis, Peter R; Cavanaugh, Colleen M

    2008-01-01

    Background The Vesicomyidae (Bivalvia: Mollusca) are a family of clams that form symbioses with chemosynthetic gamma-proteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a reduced gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. Recently, two vesicomyid symbiont genomes were sequenced, illuminating the possible nutritional contributions of the symbiont to the host and making genome-wide evolutionary analyses possible. Results To examine the genomic evolution of the vesicomyid symbionts, a comparative genomics framework, including the existing genomic data combined with heterologous microarray hybridization results, was used to analyze conserved gene content in four vesicomyid symbiont genomes. These four symbionts were chosen to include a broad phylogenetic sampling of the vesicomyid symbionts and represent distinct chemosynthetic environments: cold seeps and hydrothermal vents. Conclusion The results of this comparative genomics analysis emphasize the importance of the symbionts' chemoautotrophic metabolism within their hosts. The fact that these symbionts appear to be metabolically capable autotrophs underscores the extent to which the host depends on them for nutrition and reveals the key to invertebrate colonization of these challenging environments. PMID:19055818

  11. Comparative Genomics of a Parthenogenesis-Inducing Wolbachia Symbiont

    PubMed Central

    Lindsey, Amelia R. I.; Werren, John H.; Richards, Stephen; Stouthamer, Richard

    2016-01-01

    Wolbachia is an intracellular symbiont of invertebrates responsible for inducing a wide variety of phenotypes in its host. These host-Wolbachia relationships span the continuum from reproductive parasitism to obligate mutualism, and provide a unique system to study genomic changes associated with the evolution of symbiosis. We present the genome sequence from a parthenogenesis-inducing Wolbachia strain (wTpre) infecting the minute parasitoid wasp Trichogramma pretiosum. The wTpre genome is the most complete parthenogenesis-inducing Wolbachia genome available to date. We used comparative genomics across 16 Wolbachia strains, representing five supergroups, to identify a core Wolbachia genome of 496 sets of orthologous genes. Only 14 of these sets are unique to Wolbachia when compared to other bacteria from the Rickettsiales. We show that the B supergroup of Wolbachia, of which wTpre is a member, contains a significantly higher number of ankyrin repeat-containing genes than other supergroups. In the wTpre genome, there is evidence for truncation of the protein coding sequences in 20% of ORFs, mostly as a result of frameshift mutations. The wTpre strain represents a conversion from cytoplasmic incompatibility to a parthenogenesis-inducing lifestyle, and is required for reproduction in the Trichogramma host it infects. We hypothesize that the large number of coding frame truncations has accompanied the change in reproductive mode of the wTpre strain. PMID:27194801

  12. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    PubMed

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  13. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

    PubMed

    Premzl, Marko

    2012-11-01

    The interferon-γ-inducible GTPases, IFGGs, are intracellular proteins involved in immune response against pathogens. A comprehensive comparative genomic review and analysis of eutherian IFGGs was carried out using public genomic sequences. The 64 eutherian IFGG genes were examined in detail and annotated. The eutherian IFGG promoter types were first catalogued followed by a phylogenetic analysis of eutherian IFGGs, which described five major IFGG clusters. The patterns of differential gene expansions and protein regions that may regulate IFGG catalytic features suggested a new classification of eutherian IFGGs. This mini-review has also provided new tests of reliability of public genomic sequences as well as tests of protein molecular evolution.

  14. Comparative Genome Analysis of Basidiomycete Fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  15. Comparative genomics of first available bovine Anaplasma phagocytophilum genome obtained with targeted sequence capture.

    PubMed

    Dugat, Thibaud; Loux, Valentin; Marthey, Sylvain; Moroldo, Marco; Lagrée, Anne-Claire; Boulouis, Henri-Jean; Haddad, Nadia; Maillard, Renaud

    2014-11-17

    Anaplasma phagocytophilum is a zoonotic and obligate intracellular bacterium transmitted by ticks. In domestic ruminants, it is the causative agent of tick-borne fever, which causes significant economic losses in Europe. As A. phagocytophilum is difficult to isolate and cultivate, only nine genome sequences have been published to date, none of which originate from a bovine strain.Our goals were to; 1/ develop a sequencing methodology which efficiently circumvents the difficulties associated with A. phagocytophilum isolation and culture; 2/ describe the first genome of a bovine strain; and 3/ compare it with available genomes, in order to both explore key genomic features at the species level, and to identify candidate genes that could be specific to bovine strains. DNA was extracted from a bovine blood sample infected by A. phagocytophilum. Following a whole genome capture approach, A. phagocytophilum DNA was enriched 197-fold in the sample and then sequenced using Illumina technology. In total, 58.9% of obtained reads corresponded to the A. phagocytophilum genome, covering 85.3% of the HZ genome. Then by performing comparisons with nine previously-sequenced A. phagocytophilum genomes, we determined the core genome of these ten strains. Following analysis, 1281 coding DNA sequences, including 1001 complete sequences, were detected in the A. phagocytophilum bovine genome, of which four appeared to be unique to the bovine isolate. These four coding DNA sequences coded for "hypothetical proteins of unknown function" and require further analysis. We also identified nine proteins common to both European domestic ruminants tested. Using a whole genome capture approach, we have sequenced the first A. phagocytophilum genome isolated from a cow. To the best of our knowledge, this is the first time that this method has been used to selectively enrich pathogenic bacterial DNA from samples also containing host DNA. The four proteins unique to the A. phagocytophilum bovine

  16. Genomic and comparative genomic analyses of Rickettsia heilongjiangensis provide insight into its evolution and pathogenesis.

    PubMed

    Duan, Changsong; Xiong, Xiaolu; Qi, Yong; Gong, Wenping; Jiao, Jun; Wen, Bohai

    2014-08-01

    Rickettsia heilongjiangensis, the causative agent of far eastern spotted fever, is an obligate intracellular gram-negative bacterium that belongs to the spotted fever group rickettsiae. To understand the evolution and pathogenesis of R. heilongjiangensis, we analyzed its genome and compared it with other rickettsial genomes available in GenBank. The R. heilongjiangensis chromosome contains 1333 genes, including 1297 protein coding genes and 36 RNA coding genes. The genome also contains 121 pseudogenes, 54 insertion sequences, and 39 tandem repeats. Sixteen genes encoding the major components of the type IV secretion systems were identified in the R. heilongjiangensis genome. In total, 37 β-barrel outer membrane proteins were predicted in the genome, eight of which have been previously confirmed to be outer membrane proteins. In addition, 266 potential virulence factor genes, seven partially deleted antibiotic resistance genes, and a genomic island were identified in the genome. The codon usage in the genome is compatible with its low GC content, and the amino acid usage shows apparent bias. A comparative genomic analysis showed that R. heilongjiangensis and R. japonica share one unique fragment that may be a target sequence for a diagnostic assay. The orthologs of 37 genes of R. heilongjiangensis were found in pathogenic R. rickettsii str. Sheila Smith but not in non-pathogenic R. rickettsii str. Iowa, which may explain why R. heilongjiangensis is pathogenic. Pan-genome analysis showed that R. heilongjiangensis and 42 other rickettsiae strains share 693 core genes with a pan-genome size of 4837 genes. The pan-genome-based phylogeny showed that R. heilongjiangensis was closely related to R. japonica.

  17. Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity.

    PubMed

    Smokvina, Tamara; Wels, Michiel; Polka, Justyna; Chervaux, Christian; Brisse, Sylvain; Boekhorst, Jos; van Hylckama Vlieg, Johan E T; Siezen, Roland J

    2013-01-01

    Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to link

  18. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  19. The MicrobesOnline Web site for comparative genomics

    SciTech Connect

    Alm, Eric J.; Huang, Katherine H.; Price, Morgan N.; Koche,Richard P.; Keller, Keith; Dubchak, Inna L.; Arkin, Adam P.

    2004-11-05

    At present, hundreds of microbial genomes have been sequenced, and hundreds more are currently in the pipeline. The Virtual Institute for Microbial Stress and Survival has developed a publicly available suite of Web-based comparative genomic tools (http://www.microbesonline.org) designed to facilitate multispecies comparison among prokaryotes. Highlights of the Microbes Online Web site include operon and regulon predictions, a multispecies genome browser, a multispecies Gene Ontology browser, a comparative KEGG metabolic pathway viewer, a Bioinformatics Workbench for in-depth sequence analysis, and Gene Carts that allow users to save genes of interest for further study while they browse. In addition, we provide an interface for genome annotation, which like all of the tools reported here, is freely available to the scientific community.

  20. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    NASA Astrophysics Data System (ADS)

    Song, Jiuzhou; Ware, Tony; Liu, Shu-Lin; Surette, M.

    2004-12-01

    Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  1. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  2. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    PubMed Central

    Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

    2008-01-01

    Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client

  3. Comparative genomics reveals 104 candidate structured RNAs from bacteria, archaea, and their metagenomes

    PubMed Central

    2010-01-01

    Background Structured noncoding RNAs perform many functions that are essential for protein synthesis, RNA processing, and gene regulation. Structured RNAs can be detected by comparative genomics, in which homologous sequences are identified and inspected for mutations that conserve RNA secondary structure. Results By applying a comparative genomics-based approach to genome and metagenome sequences from bacteria and archaea, we identified 104 candidate structured RNAs and inferred putative functions for many of these. Twelve candidate metabolite-binding RNAs were identified, three of which were validated, including one reported herein that binds the coenzyme S-adenosylmethionine. Newly identified cis-regulatory RNAs are implicated in photosynthesis or nitrogen regulation in cyanobacteria, purine and one-carbon metabolism, stomach infection by Helicobacter, and many other physiological processes. A candidate riboswitch termed crcB is represented in both bacteria and archaea. Another RNA motif may control gene expression from 3'-untranslated regions of mRNAs, which is unusual for bacteria. Many noncoding RNAs that likely act in trans are also revealed, and several of the noncoding RNA candidates are found mostly or exclusively in metagenome DNA sequences. Conclusions This work greatly expands the variety of highly structured noncoding RNAs known to exist in bacteria and archaea and provides a starting point for biochemical and genetic studies needed to validate their biologic functions. Given the sustained rate of RNA discovery over several similar projects, we expect that far more structured RNAs remain to be discovered from bacterial and archaeal organisms. PMID:20230605

  4. Alfresco—A Workbench for Comparative Genomic Sequence Analysis

    PubMed Central

    Jareborg, Niclas; Durbin, Richard

    2000-01-01

    Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented. PMID:10958633

  5. Gramene 2016: comparative plant genomics and pathway resources.

    PubMed

    Tello-Ruiz, Marcela K; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A; Huerta, Laura; Keays, Maria; Tang, Y Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J; Jaiswal, Pankaj; Ware, Doreen

    2016-01-04

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  6. Gramene 2016: comparative plant genomics and pathway resources

    PubMed Central

    Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen

    2016-01-01

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

  7. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes

    SciTech Connect

    Gupta, Nitin; Benhamida, Jamal; Bhargava, Vipul; Goodman, Daniel; Kain , Elisabeth; Kerman, Ian; Nguyen , Ngan; Ollikainen, Noah; Rodriguez, Jesse; Wang, J.; Lipton, Mary S.; Romine, Margaret F.; Bafna, Vineet; Smith, Richard D.; Pevzner, Pavel A.

    2008-07-30

    While bacterial genome annotations have significantly improved in recent years, techniques for bacterial proteome annotation (including post-translational chemical modifications, signal peptides, proteolytic events, etc.) are still in their infancy. At the same time, the number of sequenced bacterial genomes is rising sharply, far outpacing our ability to validate the predicted genes, let alone annotate bacterial proteomes. In this study, we use tandem mass spectrometry (MS/MS) to annotate the proteome of Shewanella oneidensis MR-1, an important microbe for bioremediation. In particular, we provide the first comprehensive map of post-translational modifications in a bacterial genome, including a large number of chemical modifications, signal peptide cleavages and cleavage of N-terminal methionine residues. We also detect multiple genes that were missed or assigned incorrect start positions by gene prediction programs and suggest corrections to improve the gene annotation. This study demonstrates that complementing every genome sequencing project by an MS/MS project would significantly improve both genome and proteome annotations for a reasonable cost.

  8. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia

    PubMed Central

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M.

    2016-01-01

    Abstract Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints. PMID:28175287

  9. Comparative genomics of phages and prophages in lactic acid bacteria.

    PubMed

    Desiere, Frank; Lucchini, Sacha; Canchaya, Carlos; Ventura, Marco; Brüssow, Harald

    2002-08-01

    Comparative phage genomics has become possible due to the availability of more than 100 complete phage genome sequences and the development of powerful bioinformatics tools. This technology, profiting from classical molecular-biology knowledge, has opened avenues of research for topics, which were difficult to address in the past. Now, it is possible to retrace part of the evolutionary history of phage modules by comparative genomics. The diagnosis of relatedness is hereby not uniquely based on sequence similarity alone, but includes topological considerations of genome organization. Detailed transcription maps have allowed in silico predictions of genome organization to be verified and refined. This comparative knowledge is providing the basis for a new taxonomic classification concept for bacteriophages infecting low G + C-content Gram-positive bacteria based on the genetic organization of the structural gene module. An Sfi21-like and an Sfi11-like genus of Siphoviridae is proposed. The gene maps of many phages show remarkable synteny in their structural genes defining a lambda super-group within Siphoviridae. A hierarchy of relatedness within the lambda super-group suggests elements of vertical evolution in Siphoviridae. Tailed phages are the result of both vertical and horizontal evolution and are thus fascinating objects for the study of molecular evolution. Prophage sequences integrated into the genomes of their bacterial host present theoretical challenges for evolutionary biologists. Prophages represent up to 10% of the genome in some LAB. In pathogenic streptococci prophages confer genes of selective value for the lysogenic cell. The lysogenic conversion genes are located between the lysin gene and the right phage attachment site. Non-attributed genes were found at the same genome position of prophages from lactic streptococci. These genes belong to the few prophage genes transcribed in the lysogen. Prophages from dairy bacteria might therefore also

  10. SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.

    PubMed

    Wilson, Derek; Pethica, Ralph; Zhou, Yiduo; Talbot, Charles; Vogel, Christine; Madera, Martin; Chothia, Cyrus; Gough, Julian

    2009-01-01

    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.

  11. SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

    PubMed Central

    Wilson, Derek; Pethica, Ralph; Zhou, Yiduo; Talbot, Charles; Vogel, Christine; Madera, Martin; Chothia, Cyrus; Gough, Julian

    2009-01-01

    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site. PMID:19036790

  12. hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms.

    PubMed

    Xie, Qingjun; Tzfadia, Oren; Levy, Matan; Weithorn, Efrat; Peled-Zehavi, Hadas; Van Parys, Thomas; Van de Peer, Yves; Galili, Gad

    2016-05-03

    Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements-the presence of acidic amino acids and the absence of positively charged amino acids in certain positions-to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/.

  13. Comparative Genomic and Phylogenomic Analyses Reveal a Conserved Core Genome Shared by Estuarine and Oceanic Cyanopodoviruses.

    PubMed

    Huang, Sijun; Zhang, Si; Jiao, Nianzhi; Chen, Feng

    2015-01-01

    Podoviruses are among the major viral groups that infect marine picocyanobacteria Prochlorococcus and Synechococcus. Here, we reported the genome sequences of five Synechococcus podoviruses isolated from the estuarine environment, and performed comparative genomic and phylogenomic analyses based on a total of 20 cyanopodovirus genomes. The genomes of all the known marine cyanopodoviruses are highly syntenic. A pan-genome of 349 clustered orthologous groups was determined, among which 15 were core genes. These core genes make up nearly half of each genome in length, reflecting the high level of genome conservation among this cyanophage type. The whole genome phylogenies based on concatenated core genes and gene content were highly consistent and confirmed the separation of two discrete marine cyanopodovirus clusters MPP-A and MPP-B. The genomes within cluster MPP-B grouped into subclusters mainly corresponding to Prochlorococcus or Synechococcus host types. Auxiliary metabolic genes tend to occur in a specific phylogenetic group of these cyanopodoviruses. All the MPP-B phages analyzed here encode the photosynthesis gene psbA, which are absent in all the MPP-A genomes thus far. Interestingly, all the MPP-B and two MPP-A Synechococcus podoviruses encode the thymidylate synthase gene thyX, while at the same genome locus all the MPP-B Prochlorococcus podoviruses encode the transaldolase gene talC. Both genes are hypothesized to have the potential to facilitate the biosynthesis of deoxynucleotide for phage replication. Inheritance of specific functional genes could be important to the evolution and ecological fitness of certain cyanophage genotypes. Our analyses demonstrate that cyanopodoviruses of estuarine and oceanic origins share a conserved core genome and suggest that accessory genes may be related to environmental adaptation.

  14. Piggy-BACing the human genome I: constructing a porcine BAC physical map through comparative genomics.

    PubMed

    Rogatcheva, Margarita B; Chen, Kefei; Larkin, Denis M; Meyers, Stacey N; Marron, Brandy M; He, Weisong; Schook, Lawrence B; Beever, Jonathan E

    2008-01-01

    Availability of the human genome sequence and high similarity between humans and pigs at the molecular level provides an opportunity to use a comparative mapping approach to piggy-BAC the human genome. In order to advance the pig genome sequencing initiative, sequence similarity between large-scale porcine BAC-end sequences (BESs) and human genome sequence was used to construct a comparatively-anchored porcine physical map that is a first step towards sequencing the pig genome. A total of 50,300 porcine BAC clones were end-sequenced, yielding 76,906 BESs after trimming with an average read length of 538 bp. To anchor the porcine BACs on the human genome, these BESs were subjected to BLAST analysis using the human draft sequence, revealing 31.5% significant hits (E < e(-5)). Both genic and non-genic regions of homology contributed to the alignments between the human and porcine genomes. Porcine BESs with unique homology matches within the human genome provided a source of markers spaced approximately 70 to 300 kb along each human chromosome. In order to evaluate the utility of piggy-BACing human genome sequences, and confirm predictions of orthology, 193 evenly spaced BESs with similarity to HSA3 and HSA21 were selected and then utilized for developing a high-resolution (1.22 Mb) comparative radiation hybrid map of SSC13 that represents a fusion of HSA3 and HSA21. Resulting RH mapping of SSC13 covers 99% and 97% of HSA3 and HSA21, respectively. Seven evolutionary conserved blocks were identified including six on HSA3 and a single syntenic block corresponding to HSA21. The strategy of piggy-BACing the human genome described in this study demonstrates that through a directed, targeted comparative genomics approach construction of a high-resolution anchored physical map of the pig genome can be achieved. This map supports the selection of BACs to construct a minimal tiling path for genome sequencing and targeted gap filling. Moreover, this approach is highly relevant

  15. OGRe: a relational database for comparative analysis of mitochondrial genomes

    PubMed Central

    Jameson, Daniel; Gibson, Andrew P.; Hudelot, Cendrine; Higgs, Paul G.

    2003-01-01

    Organellar Genome Retrieval (OGRe) is a relational database of complete mitochondrial genome sequences for over 250 Metazoan species. OGRe provides a resource for the comparative analysis of mitochondrial genomes at several levels. At the sequence level, OGRe allows the retrieval of any selected set of mitochondrial genes from any selected set of species. Species are classified using a taxonomic system that allows easy selection of related groups of species. Sequence alignments are also available for some species. At the level of individual nucleotides, the system contains information on base frequencies and codon usage frequencies that can be compared between organisms. At the level of whole genomes, OGRe provides several ways of visualizing information on gene order. Diagrams illustrating the genome arrangement can be generated for any selected set of species automatically from the information in the database. Searches can be done based on gene arrangement to find sets of species that have the same order as one another. Diagrams for pairwise comparison of species can be produced that show the positions of break-points in the gene order and use colour to highlight the sections of the genome that have moved. OGRe is available from http://www.bioinf.man.ac.uk/ogre. PMID:12519982

  16. DCODE.ORG Anthology of Comparative Genomic Tools

    SciTech Connect

    Loots, G G; Ovcharenko, I

    2005-01-11

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.

  17. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  18. Phytozome: a Tool for Green Plant Comparative Genomics

    DOE Data Explorer

    Phytozome is a joint project of the Department of Energy's Joint Genome Institute and the Center for Integrative Genomics to facilitate comparative genomic studies amongst green plants. Clusters of orthologous and paralogous genes that represent the modern descendents of ancestral gene sets are constructed at key phylogenetic nodes. These clusters allow easy access to clade specific orthology/paralogy relationships as well as clade specific genes and gene expansions. As of release v4.0, Phytozome provides access to nine sequenced and annotated green plant genomes, eight of which have been clustered into gene families at six evolutionarily significant nodes. Where possible, each gene has been annotated with PFAM, KOG, KEGG, and PANTHER assignments, and publicly available annotations from RefSeq, UniProt, TAIR, JGI are hyper-linked and searchable. [Copied from the Overview at http://www.phytozome.net/Phytozome_info.php

  19. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    PubMed Central

    Callister, Stephen J.; McCue, Lee Ann; Turse, Joshua E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-01-01

    While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits. PMID:18253490

  20. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    SciTech Connect

    Callister, Stephen J.; McCue, Lee Ann; Turse, Josh E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-02-06

    Comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry. Experimental validation of the existence of this core genome requires extensive measurement and is not typically undertaken. Enabled by an extensive proteome database development over a six year period, we experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. While genomic studies establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits.

  1. FLAGdb(++): A Bioinformatic Environment to Study and Compare Plant Genomes.

    PubMed

    Tamby, Jean Philippe; Brunaud, Véronique

    2017-01-01

    Today, the growing knowledge and data accumulation on plant genomes do not solve in a simple way the task of gene function inference. Because data of different types are coming from various sources, we need to integrate and analyze them to help biologists in this task. We created FLAGdb(++) ( http://tools.ips2.u-psud.fr/FLAGdb ) to take up this challenge for a selection of plant genomes. In order to enrich gene function predictions, structural and functional annotations of the genomes are explored to generate meta-data and to compare them. Since data are numerous and complex, we focused on accessibility and visualization with an original and user-friendly interface. In this chapter we present the main tools of FLAGdb(++) and a use-case to explore a gene family: structural and functional properties of this family and research of orthologous genes in the other plant genomes.

  2. Comparative Genomics of Mycobacteria: Some Answers, Yet More New Questions

    PubMed Central

    Behr, Marcel A.

    2015-01-01

    Comparative genomic studies permit a genus-level perspective on the distinction between environmental mycobacteria and Mycobacterium tuberculosis, as well as a species-level assessment of genetic variability within M. tuberculosis. Both of these strata of evolutionary analysis serve to generate hypotheses regarding the genomic basis of M. tuberculosis virulence. In contrasting lessons from macroevolutionary study and microevolutionary study, one can form predictions about which segments of the genome are likely to be essential for or dispensable for the pathogenesis of tuberculosis. Although some of these predictions have been experimentally verified, notable exceptions challenge the direct link between these virulence factors and the capacity of M. tuberculosis to successfully cause disease and propagate between human hosts. These unexpected findings serve as the stimulus for further studies, using genomic comparisons and other approaches, to better define the remarkable success of this recalcitrant pathogen. PMID:25395374

  3. Genome-wide computational analysis reveals cardiomyocyte-specific transcriptional Cis-regulatory motifs that enable efficient cardiac gene therapy.

    PubMed

    Rincon, Melvin Y; Sarcar, Shilpita; Danso-Abeam, Dina; Keyaerts, Marleen; Matrai, Janka; Samara-Kuko, Ermira; Acosta-Sanchez, Abel; Athanasopoulos, Takis; Dickson, George; Lahoutte, Tony; De Bleser, Pieter; VandenDriessche, Thierry; Chuah, Marinee K

    2015-01-01

    Gene therapy is a promising emerging therapeutic modality for the treatment of cardiovascular diseases and hereditary diseases that afflict the heart. Hence, there is a need to develop robust cardiac-specific expression modules that allow for stable expression of the gene of interest in cardiomyocytes. We therefore explored a new approach based on a genome-wide bioinformatics strategy that revealed novel cardiac-specific cis-acting regulatory modules (CS-CRMs). These transcriptional modules contained evolutionary-conserved clusters of putative transcription factor binding sites that correspond to a "molecular signature" associated with robust gene expression in the heart. We then validated these CS-CRMs in vivo using an adeno-associated viral vector serotype 9 that drives a reporter gene from a quintessential cardiac-specific α-myosin heavy chain promoter. Most de novo designed CS-CRMs resulted in a >10-fold increase in cardiac gene expression. The most robust CRMs enhanced cardiac-specific transcription 70- to 100-fold. Expression was sustained and restricted to cardiomyocytes. We then combined the most potent CS-CRM4 with a synthetic heart and muscle-specific promoter (SPc5-12) and obtained a significant 20-fold increase in cardiac gene expression compared to the cytomegalovirus promoter. This study underscores the potential of rational vector design to improve the robustness of cardiac gene therapy.

  4. Genome-wide Computational Analysis Reveals Cardiomyocyte-specific Transcriptional Cis-regulatory Motifs That Enable Efficient Cardiac Gene Therapy

    PubMed Central

    Rincon, Melvin Y; Sarcar, Shilpita; Danso-Abeam, Dina; Keyaerts, Marleen; Matrai, Janka; Samara-Kuko, Ermira; Acosta-Sanchez, Abel; Athanasopoulos, Takis; Dickson, George; Lahoutte, Tony; De Bleser, Pieter; VandenDriessche, Thierry; Chuah, Marinee K

    2015-01-01

    Gene therapy is a promising emerging therapeutic modality for the treatment of cardiovascular diseases and hereditary diseases that afflict the heart. Hence, there is a need to develop robust cardiac-specific expression modules that allow for stable expression of the gene of interest in cardiomyocytes. We therefore explored a new approach based on a genome-wide bioinformatics strategy that revealed novel cardiac-specific cis-acting regulatory modules (CS-CRMs). These transcriptional modules contained evolutionary-conserved clusters of putative transcription factor binding sites that correspond to a “molecular signature” associated with robust gene expression in the heart. We then validated these CS-CRMs in vivo using an adeno-associated viral vector serotype 9 that drives a reporter gene from a quintessential cardiac-specific α-myosin heavy chain promoter. Most de novo designed CS-CRMs resulted in a >10-fold increase in cardiac gene expression. The most robust CRMs enhanced cardiac-specific transcription 70- to 100-fold. Expression was sustained and restricted to cardiomyocytes. We then combined the most potent CS-CRM4 with a synthetic heart and muscle-specific promoter (SPc5-12) and obtained a significant 20-fold increase in cardiac gene expression compared to the cytomegalovirus promoter. This study underscores the potential of rational vector design to improve the robustness of cardiac gene therapy. PMID:25195597

  5. Genome-wide comparative analysis of the IQD gene families in Arabidopsis thaliana and Oryza sativa.

    PubMed

    Abel, Steffen; Savchenko, Tatyana; Levy, Maggie

    2005-12-20

    Calcium signaling plays a prominent role in plants for coordinating a wide range of developmental processes and responses to environmental cues. Stimulus-specific generation of intracellular calcium transients, decoding of calcium signatures, and transformation of the signal into cellular responses are integral modules of the transduction process. Several hundred proteins with functions in calcium signaling circuits have been identified, and the number of downstream targets of calcium sensors is expected to increase. We previously identified a novel, calmodulin-binding nuclear protein, IQD1, which stimulates glucosinolate accumulation and plant defense in Arabidopsis thaliana. Here, we present a comparative genome-wide analysis of a new class of putative calmodulin target proteins in Arabidopsis and rice. We identified and analyzed 33 and 29 IQD1-like genes in Arabidopsis thaliana and Oryza sativa, respectively. The encoded IQD proteins contain a plant-specific domain of 67 conserved amino acid residues, referred to as the IQ67 domain, which is characterized by a unique and repetitive arrangement of three different calmodulin recruitment motifs, known as the IQ, 1-5-10, and 1-8-14 motifs. We demonstrated calmodulin binding for IQD20, the smallest IQD protein in Arabidopsis, which consists of a C-terminal IQ67 domain and a short N-terminal extension. A striking feature of IQD proteins is the high isoelectric point (approximately 10.3) and frequency of serine residues (approximately 11%). We compared the Arabidopsis and rice IQD gene families in terms of gene structure, chromosome location, predicted protein properties and motifs, phylogenetic relationships, and evolutionary history. The existence of an IQD-like gene in bryophytes suggests that IQD proteins are an ancient family of calmodulin-binding proteins and arose during the early evolution of land plants. Comparative phylogenetic analyses indicate that the major IQD gene lineages originated before the monocot

  6. Genome-wide comparative analysis of the IQD gene families in Arabidopsis thaliana and Oryza sativa

    PubMed Central

    Abel, Steffen; Savchenko, Tatyana; Levy, Maggie

    2005-01-01

    Background Calcium signaling plays a prominent role in plants for coordinating a wide range of developmental processes and responses to environmental cues. Stimulus-specific generation of intracellular calcium transients, decoding of calcium signatures, and transformation of the signal into cellular responses are integral modules of the transduction process. Several hundred proteins with functions in calcium signaling circuits have been identified, and the number of downstream targets of calcium sensors is expected to increase. We previously identified a novel, calmodulin-binding nuclear protein, IQD1, which stimulates glucosinolate accumulation and plant defense in Arabidopsis thaliana. Here, we present a comparative genome-wide analysis of a new class of putative calmodulin target proteins in Arabidopsis and rice. Results We identified and analyzed 33 and 29 IQD1-like genes in Arabidopsis thaliana and Oryza sativa, respectively. The encoded IQD proteins contain a plant-specific domain of 67 conserved amino acid residues, referred to as the IQ67 domain, which is characterized by a unique and repetitive arrangement of three different calmodulin recruitment motifs, known as the IQ, 1-5-10, and 1-8-14 motifs. We demonstrated calmodulin binding for IQD20, the smallest IQD protein in Arabidopsis, which consists of a C-terminal IQ67 domain and a short N-terminal extension. A striking feature of IQD proteins is the high isoelectric point (~10.3) and frequency of serine residues (~11%). We compared the Arabidopsis and rice IQD gene families in terms of gene structure, chromosome location, predicted protein properties and motifs, phylogenetic relationships, and evolutionary history. The existence of an IQD-like gene in bryophytes suggests that IQD proteins are an ancient family of calmodulin-binding proteins and arose during the early evolution of land plants. Conclusion Comparative phylogenetic analyses indicate that the major IQD gene lineages originated before the

  7. Origin replication complex binding, nucleosome depletion patterns, and a primary sequence motif can predict origins of replication in a genome with epigenetic centromeres.

    PubMed

    Tsai, Hung-Ji; Baller, Joshua A; Liachko, Ivan; Koren, Amnon; Burrack, Laura S; Hickman, Meleah A; Thevandavakkam, Mathuravani A; Rusche, Laura N; Berman, Judith

    2014-09-02

    Origins of DNA replication are key genetic elements, yet their identification remains elusive in most organisms. In previous work, we found that centromeres contain origins of replication (ORIs) that are determined epigenetically in the pathogenic yeast Candida albicans. In this study, we used origin recognition complex (ORC) binding and nucleosome occupancy patterns in Saccharomyces cerevisiae and Kluyveromyces lactis to train a machine learning algorithm to predict the position of active arm (noncentromeric) origins in the C. albicans genome. The model identified bona fide active origins as determined by the presence of replication intermediates on nondenaturing two-dimensional (2D) gels. Importantly, these origins function at their native chromosomal loci and also as autonomously replicating sequences (ARSs) on a linear plasmid. A "mini-ARS screen" identified at least one and often two ARS regions of ≥100 bp within each bona fide origin. Furthermore, a 15-bp AC-rich consensus motif was associated with the predicted origins and conferred autonomous replicating activity to the mini-ARSs. Thus, while centromeres and the origins associated with them are epigenetic, arm origins are dependent upon critical DNA features, such as a binding site for ORC and a propensity for nucleosome exclusion. DNA replication machinery is highly conserved, yet the definition of exactly what specifies a replication origin differs in different species. Here, we utilized computational genomics to predict origin locations in Candida albicans by combining locations of binding sites for the conserved origin replication complex, necessary for replication initiation, together with chromatin organization patterns. We identified predicted sequences that exhibited bona fide origin function and developed a linear plasmid assay to delimit the DNA fragments necessary for origin function. Additionally, we found that a short AC-rich motif, which is enriched in predicted origins, is required for

  8. Malignant canine mammary tumours: Preliminary genomic insights using oligonucleotide array comparative genomic hybridisation analysis.

    PubMed

    Santos, Marta; Dias-Pereira, Patrícia; Williams, Christina; Lopes, Carlos; Breen, Matthew

    2017-03-28

    Neoplastic mammary disease in female dogs represents a major health concern for dog owners and veterinarians, but the genomic basis of the disease is poorly understood. In this study, we performed high resolution oligonucleotide array comparative genomic hybridisation (oaCGH) to assess genome wide DNA copy number changes in 10 malignant canine mammary tumours from seven female dogs, including multiple tumours collected at one time from each of three female dogs. In all but two tumours, genomic imbalances were detected, with losses being more common than gains. Canine chromosomes 9, 22, 26, 27, 34 and X were most frequently affected. Dissimilar oaCGH ratio profiles were observed in multiple tumours from the same dogs, providing preliminary evidence for probable independent pathogenesis. Analysis of adjacent samples of one tumour revealed regional differences in the number of genomic imbalances, suggesting heterogeneity within tumours.

  9. The tiger genome and comparative analysis with lion and snow leopard genomes.

    PubMed

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  10. The tiger genome and comparative analysis with lion and snow leopard genomes

    PubMed Central

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  11. Low-pass sequencing for microbial comparative genomics

    PubMed Central

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

    2004-01-01

    Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich

  12. Low-pass sequencing for microbial comparative genomics.

    PubMed

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Ng, Wailap Victor; Hood, Leroy

    2004-01-12

    We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1

  13. Comparative genomics of the Rab protein family in Apicomplexan parasites

    PubMed Central

    Langsley, Gordon; van Noort, Vera; Carret, Céline; Meissner, Markus; de Villiers, Etienne P.; Bishop, Richard; Pain, Arnab

    2008-01-01

    Rab genes encode a subgroup of small GTP-binding proteins within the ras super-family that regulate targeting and fusion of transport vesicles within the secretory and endocytic pathways. These genes are of particular interest in the protozoan phylum Apicomplexa, since a family of Rab GTPases has been described for Plasmodium and most putative secretory pathway proteins in Apicomplexa have conventional predicted signal peptides. Moreover, peptide motifs have now been identified within a large number of secreted Plasmodium proteins that direct their targeting to the red blood cell cytosol, the apicoplast, the food vacuole and Maurer's clefs; in contrast, motifs that direct proteins to secretory organelles (rhoptries, micronemes and microspheres) have yet to be defined. The nature of the vesicle in which these proteins are transported to their destinations remains unknown and morphological structures equivalent to the endoplasmic reticulum and trans-Golgi stacks typical of other eukaryotes cannot be visualised in Apicomplexa. Since Rab GTPases regulate vesicular traffic in all eukaryotes, and this traffic in intracellular parasites could regulate import of nutrient and drugs and export of antigens, host cell modulatory proteins and lactate we compare and contrast here the Rab families of Apicomplexa. PMID:18468471

  14. MIDDAS-M: Motif-Independent De Novo Detection of Secondary Metabolite Gene Clusters through the Integration of Genome Sequencing and Transcriptome Data

    PubMed Central

    Umemura, Myco; Koike, Hideaki; Nagano, Nozomi; Ishii, Tomoko; Kawano, Jin; Yamane, Noriko; Kozone, Ikuko; Horimoto, Katsuhisa; Shin-ya, Kazuo; Asai, Kiyoshi; Yu, Jiujiang; Bennett, Joan W.; Machida, Masayuki

    2013-01-01

    Many bioactive natural products are produced as “secondary metabolites” by plants, bacteria, and fungi. During the middle of the 20th century, several secondary metabolites from fungi revolutionized the pharmaceutical industry, for example, penicillin, lovastatin, and cyclosporine. They are generally biosynthesized by enzymes encoded by clusters of coordinately regulated genes, and several motif-based methods have been developed to detect secondary metabolite biosynthetic (SMB) gene clusters using the sequence information of typical SMB core genes such as polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS). However, no detection method exists for SMB gene clusters that are functional and do not include core SMB genes at present. To advance the exploration of SMB gene clusters, especially those without known core genes, we developed MIDDAS-M, a motif-independent de novo detection algorithm for SMB gene clusters. We integrated virtual gene cluster generation in an annotated genome sequence with highly sensitive scoring of the cooperative transcriptional regulation of cluster member genes. MIDDAS-M accurately predicted 38 SMB gene clusters that have been experimentally confirmed and/or predicted by other motif-based methods in 3 fungal strains. MIDDAS-M further identified a new SMB gene cluster for ustiloxin B, which was experimentally validated. Sequence analysis of the cluster genes indicated a novel mechanism for peptide biosynthesis independent of NRPS. Because it is fully computational and independent of empirical knowledge about SMB core genes, MIDDAS-M allows a large-scale, comprehensive analysis of SMB gene clusters, including those with novel biosynthetic mechanisms that do not contain any functionally characterized genes. PMID:24391870

  15. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  16. The vacuolar protein sorting genes in insects: A comparative genome view.

    PubMed

    Li, Zhaofei; Blissard, Gary

    2015-07-01

    In eukaryotic cells, regulated vesicular trafficking is critical for directing protein transport and for recycling and degradation of membrane lipids and proteins. Through carefully regulated transport vesicles, the endomembrane system performs a large and important array of dynamic cellular functions while maintaining the integrity of the cellular membrane system. Genetic studies in yeast Saccharomyces cerevisiae have identified approximately 50 vacuolar protein sorting (VPS) genes involved in vesicle trafficking, and most of these genes are also characterized in mammals. The VPS proteins form distinct functional complexes, which include complexes known as ESCRT, retromer, CORVET, HOPS, GARP, and PI3K-III. Little is known about the orthologs of VPS proteins in insects. Here, with the newly annotated Manduca sexta genome, we carried out genomic comparative analysis of VPS proteins in yeast, humans, and 13 sequenced insect genomes representing the Orders Hymenoptera, Diptera, Hemiptera, Phthiraptera, Lepidoptera, and Coleoptera. Amino acid sequence alignments and domain/motif structure analyses reveal that most of the components of ESCRT, retromer, CORVET, HOPS, GARP, and PI3K-III are evolutionarily conserved across yeast, insects, and humans. However, in contrast to the VPS gene expansions observed in the human genome, only four VPS genes (VPS13, VPS16, VPS33, and VPS37) were expanded in the six insect Orders. Additionally, VPS2 was expanded only in species from Phthiraptera, Lepidoptera, and Coleoptera. These studies provide a baseline for understanding the evolution of vesicular trafficking across yeast, insect, and human genomes, and also provide a basis for further addressing specific functional roles of VPS proteins in insects. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Motif enrichment tool.

    PubMed

    Blatti, Charles; Sinha, Saurabh

    2014-07-01

    The Motif Enrichment Tool (MET) provides an online interface that enables users to find major transcriptional regulators of their gene sets of interest. MET searches the appropriate regulatory region around each gene and identifies which transcription factor DNA-binding specificities (motifs) are statistically overrepresented. Motif enrichment analysis is currently available for many metazoan species including human, mouse, fruit fly, planaria and flowering plants. MET also leverages high-throughput experimental data such as ChIP-seq and DNase-seq from ENCODE and ModENCODE to identify the regulatory targets of a transcription factor with greater precision. The results from MET are produced in real time and are linked to a genome browser for easy follow-up analysis. Use of the web tool is free and open to all, and there is no login requirement. ADDRESS: http://veda.cs.uiuc.edu/MET/.

  18. Comparative genomic analysis of two brucellaphages of distant origins.

    PubMed

    Flores, Victor; López-Merino, Ahidé; Mendoza-Hernandez, Guillermo; Guarneros, Gabriel

    2012-04-01

    Here, we present the first complete genome sequence of brucellaphage Tbilisi (Tb) and compared it with that of Pr, a broad host-range brucellaphage recently isolated in Mexico. The genomes consist of 41,148 bp (Tb) and 38,253 bp (Pr), they differ mainly in the region encoding structural proteins, in which the genome of Tb shows two major insertions. Both genomes share 99.87% nucleotide identity, a high percentage of identity among phages isolated at so globally distant locations and temporally different occasions. Sequence analysis revealed 57 conserved ORFs, three transcriptional terminators and four putative transcriptional promoters. The co-occurrence of an ORF encoding a putative DnaA-like protein and a putative oriC-like origin of replication was found in both brucellaphages genomes, a feature not described in any other phage genome. These elements suggest that DNA replication in brucellaphages differs from other phages, and might resemble that of bacterial chromosomes. Copyright © 2012 Elsevier Inc. All rights reserved.

  19. Sequencing and comparative analyses of the genomes of zoysiagrasses

    PubMed Central

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-01-01

    Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196

  20. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  1. Comparative and demographic analysis of orang-utan genomes.

    PubMed

    Locke, Devin P; Hillier, LaDeana W; Warren, Wesley C; Worley, Kim C; Nazareth, Lynne V; Muzny, Donna M; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A; Fulton, Robert S; Nelson, Joanne O; Magrini, Vincent; Pohl, Craig; Graves, Tina A; Markovic, Chris; Cree, Andy; Dinh, Huyen H; Hume, Jennifer; Kovar, Christie L; Fowler, Gerald R; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M; Eichler, Evan E; White, Simon; Searle, Stephen; Vilella, Albert J; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Della Valle, Giuliano; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K; Walker, Jerilyn A; Ullmer, Brygg; Batzer, Mark A; Smit, Arian F A; Hubley, Robert; Casola, Claudio; Schrider, Daniel R; Hahn, Matthew W; Quesada, Victor; Puente, Xose S; Ordoñez, Gonzalo R; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S; Miller, Webb; Kosiol, Carolin; Lawson, Heather A; Taliwal, Vikas; Martins, André L; Siepel, Adam; Roychoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D; Gutenkunst, Ryan N; Mailund, Thomas; Dutheil, Julien Y; Hobolth, Asger; Schierup, Mikkel H; Ryder, Oliver A; Yoshinaga, Yuko; de Jong, Pieter J; Weinstock, George M; Rogers, Jeffrey; Mardis, Elaine R; Gibbs, Richard A; Wilson, Richard K

    2011-01-27

    'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new

  2. Comparative Analysis of Six Lagerstroemia Complete Chloroplast Genomes

    PubMed Central

    Xu, Chao; Dong, Wenpan; Li, Wenqing; Lu, Yizeng; Xie, Xiaoman; Jin, Xiaobai; Shi, Jipu; He, Kaihong; Suo, Zhili

    2017-01-01

    Crape myrtles are economically important ornamental trees of the genus Lagerstroemia L. (Lythraceae), with a distribution from tropical to northern temperate zones. They are positioned phylogenetically to a large subclade of rosids (in the eudicots) which contain more than 25% of all the angiosperms. They commonly bloom from summer till fall and are of significant value in city landscape and environmental protection. Morphological traits are shared inter-specifically among plants of Lagerstroemia to certain extent and are also influenced by environmental conditions and different developmental stages. Thus, classification of plants in Lagerstroemia at species and cultivar levels is still a challenging task. Chloroplast (cp) genome sequences have been proven to be an informative and valuable source of cp DNA markers for genetic diversity evaluation. In this study, the complete cp genomes of three Lagerstroemia species were newly sequenced, and three other published cp genome sequences of Lagerstroemia were retrieved for comparative analyses in order to obtain an upgraded understanding of the application value of genetic information from the cp genomes. The six cp genomes ranged from 152,049 bp (L. subcostata) to 152,526 bp (L. speciosa) in length. We analyzed nucleotide substitutions, insertions/deletions, and simple sequence repeats in the cp genomes, and discovered 12 relatively highly variable regions that will potentially provide plastid markers for further taxonomic, phylogenetic, and population genetics studies in Lagerstroemia. The phylogenetic relationships of the Lagerstroemia taxa inferred from the datasets from the cp genomes obtained high support, indicating that cp genome data may be useful in resolving relationships in this genus. PMID:28154574

  3. Floral gene resources from basal angiosperms for comparative genomics research.

    PubMed

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; DePamphilis, Claude W; Leebens-Mack, James H

    2005-03-30

    The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and

  4. Evolution of cancer suppression as revealed by mammalian comparative genomics.

    PubMed

    Tollis, Marc; Schiffman, Joshua D; Boddy, Amy M

    2017-02-02

    Cancer suppression is an important feature in the evolution of large and long-lived animals. While some tumor suppression pathways are conserved among all multicellular organisms, others mechanisms of cancer resistance are uniquely lineage specific. Comparative genomics has become a powerful tool to discover these unique and shared molecular adaptations in respect to cancer suppression. These findings may one day be translated to human patients through evolutionary medicine. Here, we will review theory and methods of comparative cancer genomics and highlight major findings of cancer suppression across mammals. Our current knowledge of cancer genomics suggests that more efficient DNA repair and higher sensitivity to DNA damage may be the key to tumor suppression in large or long-lived mammals.

  5. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics.

    PubMed

    Haft, Daniel H; Selengut, Jeremy D; Brinkac, Lauren M; Zafar, Nikhat; White, Owen

    2005-02-01

    The presence or absence of metabolic pathways and structures provide a context that makes protein annotation far more reliable. Compiling such information across microbial genomes improves the functional classification of proteins and provides a valuable resource for comparative genomics. We have created a Genome Properties system to present key aspects of prokaryotic biology using standardized computational methods and controlled vocabularies. Properties reflect gene content, phenotype, phylogeny and computational analyses. The results of searches using hidden Markov models allow many properties to be deduced automatically, especially for families of proteins (equivalogs) conserved in function since their last common ancestor. Additional properties are derived from curation, published reports and other forms of evidence. Genome Properties system was applied to 156 complete prokaryotic genomes, and is easily mined to find differences between species, correlations between metabolic features and families of uncharacterized proteins, or relationships among properties. Genome Properties can be found at http://www.tigr.org/Genome_Properties http://www.tigr.org/tigr-scripts/CMR2/genome_properties_references.spl.

  6. Genome informatics and vaccine targets in Corynebacterium urealyticum using two whole genomes, comparative genomics, and reverse vaccinology.

    PubMed

    Guimarães, Luis; Soares, Siomar; Trost, Eva; Blom, Jochen; Ramos, Rommel; Silva, Artur; Barh, Debmalya; Azevedo, Vasco

    2015-01-01

    Corynebacterium urealyticum is an opportunistic pathogen that normally lives on skin and mucous membranes in humans. This high Gram-positive bacteria can cause acute or encrusted cystitis, encrusted pyelitis, and pyelonephritis in immunocompromised patients. The bacteria is multi-drug resistant, and knowledge about the genes that contribute to its virulence is very limited. Two complete genome sequences were used in this comparative genomic study: C. urealyticum DSM 7109 and C. urealyticum DSM 7111. We used comparative genomics strategies to compare the two strains, DSM 7109 and DSM 7111, and to analyze their metabolic pathways, genome plasticity, and to predict putative antigenic targets. The genomes of these two strains together encode 2,115 non-redundant coding sequences, 1,823 of which are common to both genomes. We identified 188 strain-specific genes in DSM 7109 and 104 strain-specific genes in DSM 7111. The high number of strain-specific genes may be a result of horizontal gene transfer triggered by the large number of transposons in the genomes of these two strains. Screening for virulence factors revealed the presence of the spaDEF operon that encodes pili forming proteins. Therefore, spaDEF may play a pivotal role in facilitating the adhesion of the pathogen to the host tissue. Application of the reverse vaccinology method revealed 19 putative antigenic proteins that may be used in future studies as candidate drug or vaccine targets. The genome features and the presence of virulence factors in genomic islands in the two strains of C. urealyticum provide insights in the lifestyle of this opportunistic pathogen and may be useful in developing future therapeutic strategies.

  7. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

    PubMed Central

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-01-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  8. Comparative genomics of wild type yeast strains unveils important genome diversity

    PubMed Central

    Carreto, Laura; Eiriz, Maria F; Gomes, Ana C; Pereira, Patrícia M; Schuller, Dorit; Santos, Manuel AS

    2008-01-01

    Background Genome variability generates phenotypic heterogeneity and is of relevance for adaptation to environmental change, but the extent of such variability in natural populations is still poorly understood. For example, selected Saccharomyces cerevisiae strains are variable at the ploidy level, have gene amplifications, changes in chromosome copy number, and gross chromosomal rearrangements. This suggests that genome plasticity provides important genetic diversity upon which natural selection mechanisms can operate. Results In this study, we have used wild-type S. cerevisiae (yeast) strains to investigate genome variation in natural and artificial environments. We have used comparative genome hybridization on array (aCGH) to characterize the genome variability of 16 yeast strains, of laboratory and commercial origin, isolated from vineyards and wine cellars, and from opportunistic human infections. Interestingly, sub-telomeric instability was associated with the clinical phenotype, while Ty element insertion regions determined genomic differences of natural wine fermentation strains. Copy number depletion of ASP3 and YRF1 genes was found in all wild-type strains. Other gene families involved in transmembrane transport, sugar and alcohol metabolism or drug resistance had copy number changes, which also distinguished wine from clinical isolates. Conclusion We have isolated and genotyped more than 1000 yeast strains from natural environments and carried out an aCGH analysis of 16 strains representative of distinct genotype clusters. Important genomic variability was identified between these strains, in particular in sub-telomeric regions and in Ty-element insertion sites, suggesting that this type of genome variability is the main source of genetic diversity in natural populations of yeast. The data highlights the usefulness of yeast as a model system to unravel intraspecific natural genome diversity and to elucidate how natural selection shapes the yeast genome

  9. Whole genome annotation and comparative genomic analyses of bio-control fungus Purpureocillium lilacinum.

    PubMed

    Prasad, Pushplata; Varshney, Deepti; Adholeya, Alok

    2015-11-25

    The fungus Purpureocillium lilacinum is widely known as a biological control agent against plant parasitic nematodes. This research article consists of genomic annotation of the first draft of whole genome sequence of P. lilacinum. The study aims to decipher the putative genetic components of the fungus involved in nematode pathogenesis by performing comparative genomic analysis with nine closely related fungal species in Hypocreales. de novo genomic assembly was done and a total of 301 scaffolds were constructed for P. lilacinum genomic DNA. By employing structural genome prediction models, 13, 266 genes coding for proteins were predicted in the genome. Approximately 73% of the predicted genes were functionally annotated using Blastp, InterProScan and Gene Ontology. A 14.7% fraction of the predicted genes shared significant homology with genes in the Pathogen Host Interactions (PHI) database. The phylogenomic analysis carried out using maximum likelihood RAxML algorithm provided insight into the evolutionary relationship of P. lilacinum. In congruence with other closely related species in the Hypocreales namely, Metarhizium spp., Pochonia chlamydosporia, Cordyceps militaris, Trichoderma reesei and Fusarium spp., P. lilacinum has large gene sets coding for G-protein coupled receptors (GPCRs), proteases, glycoside hydrolases and carbohydrate esterases that are required for degradation of nematode-egg shell components. Screening of the genome by Antibiotics & Secondary Metabolite Analysis Shell (AntiSMASH) pipeline indicated that the genome potentially codes for a variety of secondary metabolites, possibly required for adaptation to heterogeneous lifestyles reported for P. lilacinum. Significant up-regulation of subtilisin-like serine protease genes in presence of nematode eggs in quantitative real-time analyses suggested potential role of serine proteases in nematode pathogenesis. The data offer a better understanding of Purpureocillium lilacinum genome and will

  10. Using comparative genomics to drive new discoveries in microbiology.

    PubMed

    Haft, Daniel H

    2015-02-01

    Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses.

  11. Comparative omics-driven genome annotation refinement: application across Yersiniae.

    PubMed

    Schrimpe-Rutledge, Alexandra C; Jones, Marcus B; Chauhan, Sadhana; Purvine, Samuel O; Sanford, James A; Monroe, Matthew E; Brewer, Heather M; Payne, Samuel H; Ansong, Charles; Frank, Bryan C; Smith, Richard D; Peterson, Scott N; Motin, Vladimir L; Adkins, Joshua N

    2012-01-01

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors inherent to automated computational annotations is to apply data from omics measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. Here, the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species. Transcriptomic and proteomic data derived from highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis Pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 incorrect (i.e., observed frameshifts, extended start sites, and translated pseudogenes) protein-coding sequences within the three current genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes, including the insertion-ablated argD, underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, a transcriptional regulator, and many hypothetical proteins that were missed during annotation.

  12. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  13. Comparative genomics and evolution of transcriptional regulons in Proteobacteria

    PubMed Central

    Kazakov, Alexey E.; Ravcheev, Dmitry A.; Stepanova, Vita V.; Novichkov, Pavel S.

    2016-01-01

    Comparative genomics approaches are broadly used for analysis of transcriptional regulation in bacterial genomes. In this work, we identified binding sites and reconstructed regulons for 33 orthologous groups of transcription factors (TFs) in 196 reference genomes from 21 taxonomic groups of Proteobacteria. Overall, we predict over 10 600 TF binding sites and identified more than 15 600 target genes for 1896 TFs constituting the studied orthologous groups of regulators. These include a set of orthologues for 21 metabolism-associated TFs from Escherichia coli and/or Shewanella that are conserved in five or more taxonomic groups and several additional TFs that represent non-orthologous substitutions of the metabolic regulators in some lineages of Proteobacteria. By comparing gene contents of the reconstructed regulons, we identified the core, taxonomy-specific and genome-specific TF regulon members and classified them by their metabolic functions. Detailed analysis of ArgR, TyrR, TrpR, HutC, HypR and other amino-acid-specific regulons demonstrated remarkable differences in regulatory strategies used by various lineages of Proteobacteria. The obtained genomic collection of in silico reconstructed TF regulons contains a large number of new regulatory interactions that await future experimental validation. The collection provides a framework for future evolutionary studies of transcriptional regulatory networks in Bacteria. It can be also used for functional annotation of putative metabolic transporters and enzymes that are abundant in the reconstructed regulons. PMID:28348857

  14. Comparative genomics and evolution of transcriptional regulons in Proteobacteria.

    PubMed

    Leyn, Semen A; Suvorova, Inna A; Kazakov, Alexey E; Ravcheev, Dmitry A; Stepanova, Vita V; Novichkov, Pavel S; Rodionov, Dmitry A

    2016-07-01

    Comparative genomics approaches are broadly used for analysis of transcriptional regulation in bacterial genomes. In this work, we identified binding sites and reconstructed regulons for 33 orthologous groups of transcription factors (TFs) in 196 reference genomes from 21 taxonomic groups of Proteobacteria. Overall, we predict over 10 600 TF binding sites and identified more than 15 600 target genes for 1896 TFs constituting the studied orthologous groups of regulators. These include a set of orthologues for 21 metabolism-associated TFs from Escherichia coli and/or Shewanella that are conserved in five or more taxonomic groups and several additional TFs that represent non-orthologous substitutions of the metabolic regulators in some lineages of Proteobacteria. By comparing gene contents of the reconstructed regulons, we identified the core, taxonomy-specific and genome-specific TF regulon members and classified them by their metabolic functions. Detailed analysis of ArgR, TyrR, TrpR, HutC, HypR and other amino-acid-specific regulons demonstrated remarkable differences in regulatory strategies used by various lineages of Proteobacteria. The obtained genomic collection of in silico reconstructed TF regulons contains a large number of new regulatory interactions that await future experimental validation. The collection provides a framework for future evolutionary studies of transcriptional regulatory networks in Bacteria. It can be also used for functional annotation of putative metabolic transporters and enzymes that are abundant in the reconstructed regulons.

  15. Comparative Genomics of Regulation of Fatty Acid and Branched-chain Amino Acid Utilization in Proteobacteria

    SciTech Connect

    Kazakov, Alexey E.; Rodionov, Dmitry A.; Arkin, Adam Paul; Dubchak, Inna; Gelfand, Mikhail S.; Alm, Eric

    2008-10-31

    Bacteria can use branched-chain amino acids (ILV, i.e. isoleucine, leucine, valine) and fatty acids (FA) as sole carbon and energy sources convering ILV into acetyl-CoA, propanoyl-CoA and propionyl-CoA, respectively. In this work, we used the comparative genomic approach to identify candidate transcriptional factors and DNA motifs that control ILV and FA utilization pathways in proteobacteria. The metabolic regulons were characterized based on the identification and comparison of candidate transcription factor binding sites in groups of phylogenetically related genomes. The reconstructed ILV/FA regulatory network demonstrates considerable variability and involves six transcriptional factors from the MerR, TetR and GntR families binding to eleven distinct DNA motifs. The ILV degradation genes in gamma- and beta-proteobacteria are mainly regulated by anovel regulator from the MerR family (e.g., LiuR in Pseudomonas aeruginosa) (40 species), in addition, the TetR-type regulator LiuQ was identified in some beta-proteobacteria (8 species). Besides the core set of ILV utilization genes, the LiuR regulon in some lineages is expanded to include genes from other metabolic pathways, such as the glyoxylate shunt and glutamate synthase in the Shewanella species. The FA degradation genes are controlled by four regulators including FadR in gamma-proteobacteria (34 species), PsrA in gamma- and beta-proteobacteria (45 species), FadP in beta-proteobacteria (14 species), and LiuR orthologs in alpha-proteobacteria (22 species). The remarkable variability of the regulatory systems associated with the FA degradation pathway is discussed from the functional and evolutionary points of view.

  16. Comparative genomics of regulation of fatty acid and branched-chain amino acid utilization in proteobacteria.

    PubMed

    Kazakov, Alexey E; Rodionov, Dmitry A; Alm, Eric; Arkin, Adam Paul; Dubchak, Inna; Gelfand, Mikhail S

    2009-01-01

    Bacteria can use branched-chain amino acids (ILV, i.e., isoleucine, leucine, valine) and fatty acids (FAs) as sole carbon and energy sources converting ILV into acetyl-coenzyme A (CoA), propanoyl-CoA, and propionyl-CoA, respectively. In this work, we used the comparative genomic approach to identify candidate transcriptional factors and DNA motifs that control ILV and FA utilization pathways in proteobacteria. The metabolic regulons were characterized based on the identification and comparison of candidate transcription factor binding sites in groups of phylogenetically related genomes. The reconstructed ILV/FA regulatory network demonstrates considerable variability and involves six transcriptional factors from the MerR, TetR, and GntR families binding to 11 distinct DNA motifs. The ILV degradation genes in gamma- and betaproteobacteria are regulated mainly by a novel regulator from the MerR family (e.g., LiuR in Pseudomonas aeruginosa) (40 species); in addition, the TetR-type regulator LiuQ was identified in some betaproteobacteria (eight species). Besides the core set of ILV utilization genes, the LiuR regulon in some lineages is expanded to include genes from other metabolic pathways, such as the glyoxylate shunt and glutamate synthase in Shewanella species. The FA degradation genes are controlled by four regulators including FadR in gammaproteobacteria (34 species), PsrA in gamma- and betaproteobacteria (45 species), FadP in betaproteobacteria (14 species), and LiuR orthologs in alphaproteobacteria (22 species). The remarkable variability of the regulatory systems associated with the FA degradation pathway is discussed from functional and evolutionary points of view.

  17. Comparing thousands of circular genomes using the CGView Comparison Tool

    PubMed Central

    2012-01-01

    Background Continued sequencing efforts coupled with advances in sequencing technology will lead to the completion of a vast number of small genomes. Whole-genome comparisons represent an important part of the analysis of any new genome sequence, as they can provide a better understanding of the biology and evolution of the source organism. Visualization of the results is important, as it allows information from a variety of sources to be integrated and interpreted. However, existing graphical comparison tools lack features needed for efficiently comparing a new genome to hundreds or thousands of existing sequences. Moreover, existing tools are limited in terms of the types of comparisons that can be performed, the extent to which the output can be customized, and the ease with which the entire process can be automated. Results The CGView Comparison Tool (CCT) is a package for visually comparing bacterial, plasmid, chloroplast, or mitochondrial sequences of interest to existing genomes or sequence collections. The comparisons are conducted using BLAST, and the BLAST results are presented in the form of graphical maps that can also show sequence features, gene and protein names, COG (Clusters of Orthologous Groups of proteins) category assignments, and sequence composition characteristics. CCT can generate maps in a variety of sizes, including 400 Megapixel maps suitable for posters. Comparisons can be conducted within a particular species or genus, or all available genomes can be used. The entire map creation process, from downloading sequences to redrawing zoomed maps, can be completed easily using scripts included with the CCT. User-defined features or analysis results can be included on maps, and maps can be extensively customized. To simplify program setup, a CCT virtual machine that includes all dependencies preinstalled is available. Detailed tutorials illustrating the use of CCT are included with the CCT documentation. Conclusion CCT can be used to visually

  18. Genome-Wide Comparative Analysis Reveals Similar Types of NBS Genes in Hybrid Citrus sinensis Genome and Original Citrus clementine Genome and Provides New Insights into Non-TIR NBS Genes

    PubMed Central

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K.; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention. PMID:25811466

  19. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    PubMed

    Wang, Yunsheng; Zhou, Lijuan; Li, Dazhi; Dai, Liangying; Lawton-Rauh, Amy; Srimani, Pradip K; Duan, Yongping; Luo, Feng

    2015-01-01

    In this study, we identified and compared nucleotide-binding site (NBS) domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China). Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR) domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC) domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  20. The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

    PubMed Central

    Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

    2010-01-01

    Background Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE. PMID:20140085

  1. The effect of orthology and coregulation on detecting regulatory motifs.

    PubMed

    Storms, Valerie; Claeys, Marleen; Sanchez, Aminael; De Moor, Bart; Verstuyf, Annemieke; Marchal, Kathleen

    2010-02-03

    Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/~kmarchal/Supplementary_Storms_Valerie_PlosONE.

  2. Statistical methods for detecting genomic alterations through array-based comparative genomic hybridization (CGH).

    PubMed

    Wang, Yuedong; Guo, Sun-Wei

    2004-01-01

    Array-based comparative genomic hybridization (ABCGH) is an emerging high-resolution and high-throughput molecular genetic technique that allows genome-wide screening for chromosome alterations associated with tumorigenesis. Like the cDNA microarrays, ABCGH uses two differentially labeled test and reference DNAs which are cohybridized to cloned genomic fragments immobilized on glass slides. The hybridized DNAs are then detected in two different fluorochromes, and the significant deviation from unity in the ratios of the digitized intensity values is indicative of copy-number differences between the test and reference genomes. Proper statistical analyses need to account for many sources of variation besides genuine differences between the two genomes. In particular, spatial correlations, the variable nature of the ratio variance and non-Normal distribution call for careful statistical modeling. We propose two new statistics, the standard t-statistic and its modification with variances smoothed along the genome, and two tests for each statistic, the standard t-test and a test based on the hybrid adaptive spline (HAS). Simulations indicate that the smoothed t-statistic always improves the performance over the standard t-statistic. The t-tests are more powerful in detecting isolated alterations while those based on HAS are more powerful in detecting a cluster of alterations. We apply the proposed methods to the identification of genomic alterations in endometrium in women with endometriosis.

  3. fPoxDB: fungal peroxidase database for comparative genomics.

    PubMed

    Choi, Jaeyoung; Détry, Nicolas; Kim, Ki-Tae; Asiegbu, Fred O; Valkonen, Jari P T; Lee, Yong-Hwan

    2014-05-08

    analysis toolkits with easy-to-follow web interface offer a useful workbench to study comparative and evolutionary genomics of peroxidases in fungi.

  4. African relapsing Fever borreliae genomospecies revealed by comparative genomics.

    PubMed

    Elbir, Haitham; Abi-Rached, Laurent; Pontarotti, Pierre; Yoosuf, Niyaz; Drancourt, Michel

    2014-01-01

    Relapsing fever borreliae are vector-borne bacteria responsible for febrile infection in humans in North America, Africa, Asia, and in the Iberian Peninsula in Europe. Relapsing fever borreliae are phylogenetically closely related, yet they differ in pathogenicity and vectors. Their long-term taxonomy, based on geography and vector grouping, needs to be re-apprised in a genomic context. We therefore embarked into genomic analyses of relapsing fever borreliae, focusing on species found in Africa. Genome-wide phylogenetic analyses group Old World Borrelia crocidurae, Borrelia hispanica, B. duttonii, and B. recurrentis in one clade, and New World Borrelia turicatae and Borrelia hermsii in a second clade. Accordingly, average nucleotide identity is 99% among B. duttonii, B. recurrentis, and B. crocidurae and 96% between latter borreliae and B. hispanica while the similarity is 86% between Old World and New World borreliae. Comparative genomics indicates that the Old World relapsing fever B. duttonii, B. recurrentis, B. crocidurae, and B. hispanica have a 2,514-gene pan genome and a 933-gene core genome that includes 788 chromosomal and 145 plasmidic genes. Analyzing the role that natural selection has played in the evolution of Old World borreliae species revealed that 55 loci were under positive diversifying selection, including loci coding for membrane, flagellar, and chemotaxis proteins, three categories associated with adaption to specific niches. Genomic analyses led to a reappraisal of the taxonomy of relapsing fever borreliae in Africa. These analyses suggest that B. crocidurae, B. duttonii, and B. recurrentis are ecotypes of a unique genomospecies, while B. hispanica is a distinct species.

  5. African Relapsing Fever Borreliae Genomospecies Revealed by Comparative Genomics

    PubMed Central

    Elbir, Haitham; Abi-Rached, Laurent; Pontarotti, Pierre; Yoosuf, Niyaz; Drancourt, Michel

    2014-01-01

    Background: Relapsing fever borreliae are vector-borne bacteria responsible for febrile infection in humans in North America, Africa, Asia, and in the Iberian Peninsula in Europe. Relapsing fever borreliae are phylogenetically closely related, yet they differ in pathogenicity and vectors. Their long-term taxonomy, based on geography and vector grouping, needs to be re-apprised in a genomic context. We therefore embarked into genomic analyses of relapsing fever borreliae, focusing on species found in Africa. Results: Genome-wide phylogenetic analyses group Old World Borrelia crocidurae, Borrelia hispanica, B. duttonii, and B. recurrentis in one clade, and New World Borrelia turicatae and Borrelia hermsii in a second clade. Accordingly, average nucleotide identity is 99% among B. duttonii, B. recurrentis, and B. crocidurae and 96% between latter borreliae and B. hispanica while the similarity is 86% between Old World and New World borreliae. Comparative genomics indicates that the Old World relapsing fever B. duttonii, B. recurrentis, B. crocidurae, and B. hispanica have a 2,514-gene pan genome and a 933-gene core genome that includes 788 chromosomal and 145 plasmidic genes. Analyzing the role that natural selection has played in the evolution of Old World borreliae species revealed that 55 loci were under positive diversifying selection, including loci coding for membrane, flagellar, and chemotaxis proteins, three categories associated with adaption to specific niches. Conclusion: Genomic analyses led to a reappraisal of the taxonomy of relapsing fever borreliae in Africa. These analyses suggest that B. crocidurae, B. duttonii, and B. recurrentis are ecotypes of a unique genomospecies, while B. hispanica is a distinct species. PMID:25229054

  6. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    SciTech Connect

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  7. A web server for mining Comparative Genomic Hybridization (CGH) data

    NASA Astrophysics Data System (ADS)

    Liu, Jun; Ranka, Sanjay; Kahveci, Tamer

    2007-11-01

    Advances in cytogenetics and molecular biology has established that chromosomal alterations are critical in the pathogenesis of human cancer. Recurrent chromosomal alterations provide cytological and molecular markers for the diagnosis and prognosis of disease. They also facilitate the identification of genes that are important in carcinogenesis, which in the future may help in the development of targeted therapy. A large amount of publicly available cancer genetic data is now available and it is growing. There is a need for public domain tools that allow users to analyze their data and visualize the results. This chapter describes a web based software tool that will allow researchers to analyze and visualize Comparative Genomic Hybridization (CGH) datasets. It employs novel data mining methodologies for clustering and classification of CGH datasets as well as algorithms for identifying important markers (small set of genomic intervals with aberrations) that are potentially cancer signatures. The developed software will help in understanding the relationships between genomic aberrations and cancer types.

  8. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  9. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes.

    PubMed

    Dunning Hotopp, Julie C; Grifantini, Renata; Kumar, Nikhil; Tzeng, Yih Ling; Fouts, Derrick; Frigimelica, Elisabetta; Draghi, Monia; Giuliani, Marzia Monica; Rappuoli, Rino; Stephens, David S; Grandi, Guido; Tettelin, Hervé

    2006-12-01

    To better understand Neisseria meningitidis genomes and virulence, microarray comparative genome hybridization (mCGH) data were collected from one Neisseria cinerea, two Neisseria lactamica, two Neisseria gonorrhoeae and 48 Neisseria meningitidis isolates. For N. meningitidis, these isolates are from diverse clonal complexes, invasive and carriage strains, and all major serogroups. The microarray platform represented N. meningitidis strains MC58, Z2491 and FAM18, and N. gonorrhoeae FA1090. By comparing hybridization data to genome sequences, the core N. meningitidis genome and insertions/deletions (e.g. capsule locus, type I secretion system) related to pathogenicity were identified, including further characterization of the capsule locus, bioinformatics analysis of a type I secretion system, and identification of some metabolic pathways associated with intracellular survival in pathogens. Hybridization data clustered meningococcal isolates from similar clonal complexes that were distinguished by the differential presence of six distinct islands of horizontal transfer. Several of these islands contained prophage or other mobile elements, including a novel prophage and a transposon carrying portions of a type I secretion system. Acquisition of some genetic islands appears to have occurred in multiple lineages, including transfer between N. lactamica and N. meningitidis. However, island acquisition occurs infrequently, such that the genomic-level relationship is not obscured within clonal complexes. The N. meningitidis genome is characterized by the horizontal acquisition of multiple genetic islands; the study of these islands reveals important sets of genes varying between isolates and likely to be related to pathogenicity.

  10. Allelic genome structural variations in maize detected by array comparative genome hybridization.

    PubMed

    Beló, André; Beatty, Mary K; Hondred, David; Fengler, Kevin A; Li, Bailin; Rafalski, Antoni

    2010-01-01

    DNA polymorphisms such as insertion/deletions and duplications affecting genome segments larger than 1 kb are known as copy-number variations (CNVs) or structural variations (SVs). They have been recently studied in animals and humans by using array-comparative genome hybridization (aCGH), and have been associated with several human diseases. Their presence and phenotypic effects in plants have not been investigated on a genomic scale, although individual structural variations affecting traits have been described. We used aCGH to investigate the presence of CNVs in maize by comparing the genome of 13 maize inbred lines to B73. Analysis of hybridization signal ratios of 60,472 60-mer oligonucleotide probes between inbreds in relation to their location in the reference genome (B73) allowed us to identify clusters of probes that deviated from the ratio expected for equal copy-numbers. We found CNVs distributed along the maize genome in all chromosome arms. They occur with appreciable frequency in different germplasm subgroups, suggesting ancient origin. Validation of several CNV regions showed both insertion/deletions and copy-number differences. The nature of CNVs detected suggests CNVs might have a considerable impact on plant phenotypes, including disease response and heterosis.

  11. Comparative analysis of methods for genome-wide nucleosome cartography.

    PubMed

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  12. Comparative Genomics and the Gene Complement of a Minimal Cell

    NASA Astrophysics Data System (ADS)

    Islas, Sara; Becerra, Arturo; Luisi, P. Luigi; Lazcano, Antonio

    2004-02-01

    The concept of a minimal cell is discussed from the viewpoint of comparative genomics. Analysis of published DNA content values determined for 641 different archaeal and bacterial species by pulsed field gel electrophoresis has lead to a more precise definition of the genome size ranges of free-living and host-associated organisms. DNA content is not an indicator of phylogenetic position. However, the smallest genomes in our sample do not have a random distribution in rRNA-based evolutionary trees, and are found mostly in (a) the basal branches of the tree where thermophiles are located; and (b) in late clades, such as those of Gram positive bacteria. While the smallest-known genome size for an endosymbiont is only 450 kb, no free-living prokaryote has been described to have genomes <1450 kb. Estimates of the size of minimal gene complement can provide important insights in the primary biological functions required for a sustainable, reproducing cell nowadays and throughout evolutionary times, but definitions of the minimum cell is dependent on specific environments.

  13. Sequencing and comparative genome analysis of two pathogenic Streptococcus gallolyticus subspecies: genome plasticity, adaptation and virulence.

    PubMed

    Lin, I-Hsuan; Liu, Tze-Tze; Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops.

  14. Sequencing and Comparative Genome Analysis of Two Pathogenic Streptococcus gallolyticus Subspecies: Genome Plasticity, Adaptation and Virulence

    PubMed Central

    Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops. PMID:21633709

  15. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  16. A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

    PubMed Central

    Margalit, Hanah; Friedman, Nir

    2008-01-01

    Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors. PMID:18463706

  17. G-quadruplex (G4) motifs in the maize (Zea mays L.) genome are enriched at specific locations in thousands of genes coupled to energy status, hypoxia, low sugar, and nutrient deprivation.

    PubMed

    Andorf, Carson M; Kopylov, Mykhailo; Dobbs, Drena; Koch, Karen E; Stroupe, M Elizabeth; Lawrence, Carolyn J; Bass, Hank W

    2014-12-20

    The G-quadruplex (G4) elements comprise a class of nucleic acid structures formed by stacking of guanine base quartets in a quadruple helix. This G4 DNA can form within or across single-stranded DNA molecules and is mutually exclusive with duplex B-form DNA. The reversibility and structural diversity of G4s make them highly versatile genetic structures, as demonstrated by their roles in various functions including telomere metabolism, genome maintenance, immunoglobulin gene diversification, transcription, and translation. Sequence motifs capable of forming G4 DNA are typically located in telomere repeat DNA and other non-telomeric genomic loci. To investigate their potential roles in a large-genome model plant species, we computationally identified 149,988 non-telomeric G4 motifs in maize (Zea mays L., B73 AGPv2), 29% of which were in non-repetitive genomic regions. G4 motif hotspots exhibited non-random enrichment in genes at two locations on the antisense strand, one in the 5' UTR and the other at the 5' end of the first intron. Several genic G4 motifs were shown to adopt sequence-specific and potassium-dependent G4 DNA structures in vitro. The G4 motifs were prevalent in key regulatory genes associated with hypoxia (group VII ERFs), oxidative stress (DJ-1/GATase1), and energy status (AMPK/SnRK) pathways. They also showed statistical enrichment for genes in metabolic pathways that function in glycolysis, sugar degradation, inositol metabolism, and base excision repair. Collectively, the maize G4 motifs may represent conditional regulatory elements that can aid in energy status gene responses. Such a network of elements could provide a mechanistic basis for linking energy status signals to gene regulation in maize, a model genetic system and major world crop species for feed, food, and fuel.

  18. Lactobacillus paracasei Comparative Genomics: Towards Species Pan-Genome Definition and Exploitation of Diversity

    PubMed Central

    Smokvina, Tamara; Wels, Michiel; Polka, Justyna; Chervaux, Christian; Brisse, Sylvain; Boekhorst, Jos; Vlieg, Johan E. T. van Hylckama; Siezen, Roland J.

    2013-01-01

    Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its “pan-genome”. We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800–3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25–53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to

  19. Comparative genomics reveals diversity among xanthomonads infecting tomato and pepper

    PubMed Central

    2011-01-01

    Background Bacterial spot of tomato and pepper is caused by four Xanthomonas species and is a major plant disease in warm humid climates. The four species are distinct from each other based on physiological and molecular characteristics. The genome sequence of strain 85-10, a member of one of the species, Xanthomonas euvesicatoria (Xcv) has been previously reported. To determine the relationship of the four species at the genome level and to investigate the molecular basis of their virulence and differing host ranges, draft genomic sequences of members of the other three species were determined and compared to strain 85-10. Results We sequenced the genomes of X. vesicatoria (Xv) strain 1111 (ATCC 35937), X. perforans (Xp) strain 91-118 and X. gardneri (Xg) strain 101 (ATCC 19865). The genomes were compared with each other and with the previously sequenced Xcv strain 85-10. In addition, the molecular features were predicted that may be required for pathogenicity including the type III secretion apparatus, type III effectors, other secretion systems, quorum sensing systems, adhesins, extracellular polysaccharide, and lipopolysaccharide determinants. Several novel type III effectors from Xg strain 101 and Xv strain 1111 genomes were computationally identified and their translocation was validated using a reporter gene assay. A homolog to Ax21, the elicitor of XA21-mediated resistance in rice, and a functional Ax21 sulfation system were identified in Xcv. Genes encoding proteins with functions mediated by type II and type IV secretion systems have also been compared, including enzymes involved in cell wall deconstruction, as contributors to pathogenicity. Conclusions Comparative genomic analyses revealed considerable diversity among bacterial spot pathogens, providing new insights into differences and similarities that may explain the diverse nature of these strains. Genes specific to pepper pathogens, such as the O-antigen of the lipopolysaccharide cluster, and genes

  20. Comparative genomics and transcriptomics of trait-gene association

    PubMed Central

    2012-01-01

    Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs). We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis. PMID:23181781

  1. Complete mitochondrial genome of the red drum, Sciaenops ocellatus (Perciformes, Sciaenidae): absence of the typical conserved motif in the origin of the light-strand replication.

    PubMed

    Cheng, Yuanzhi; Shi, Ge; Xu, Tianjun; Li, Haiyan; Sun, Yueyan; Wang, Rixin

    2012-04-01

    In this study, the complete mitochondrial genome of the red drum Sciaenops ocellatus was determined first. The genome was 16,500 bp in length and contained 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes, and 2 main non-coding regions (the control region and the origin of the light-strand replication); the gene composition and order of which were similar to most other vertebrates. The overall base composition of the heavy strand was T 25.5%, C 30.7%, A 27.5%, and G 16.3%, with a slight AT bias of 53%. Within the control region, the discrete and conserved sequence blocks were identified. Motif 5'-ACCGG-3' rather than 5'-GCCGG-3' was detected in the origin of light-strand replication (O(L)) of red drum, which is rare in the mitogenomes of Sciaenidae species. These results would play an important role in elucidating sequence-function relationships of the O(L).

  2. hfAIM: A reliable bioinformatics approach for in silico genome-wide identification of autophagy-associated Atg8-interacting motifs in various organisms

    PubMed Central

    Xie, Qingjun; Tzfadia, Oren; Levy, Matan; Weithorn, Efrat; Peled-Zehavi, Hadas; Van Parys, Thomas; Van de Peer, Yves; Galili, Gad

    2016-01-01

    ABSTRACT Most of the proteins that are specifically turned over by selective autophagy are recognized by the presence of short Atg8 interacting motifs (AIMs) that facilitate their association with the autophagy apparatus. Such AIMs can be identified by bioinformatics methods based on their defined degenerate consensus F/W/Y-X-X-L/I/V sequences in which X represents any amino acid. Achieving reliability and/or fidelity of the prediction of such AIMs on a genome-wide scale represents a major challenge. Here, we present a bioinformatics approach, high fidelity AIM (hfAIM), which uses additional sequence requirements—the presence of acidic amino acids and the absence of positively charged amino acids in certain positions—to reliably identify AIMs in proteins. We demonstrate that the use of the hfAIM method allows for in silico high fidelity prediction of AIMs in AIM-containing proteins (ACPs) on a genome-wide scale in various organisms. Furthermore, by using hfAIM to identify putative AIMs in the Arabidopsis proteome, we illustrate a potential contribution of selective autophagy to various biological processes. More specifically, we identified 9 peroxisomal PEX proteins that contain hfAIM motifs, among which AtPEX1, AtPEX6 and AtPEX10 possess evolutionary-conserved AIMs. Bimolecular fluorescence complementation (BiFC) results verified that AtPEX6 and AtPEX10 indeed interact with Atg8 in planta. In addition, we show that mutations occurring within or nearby hfAIMs in PEX1, PEX6 and PEX10 caused defects in the growth and development of various organisms. Taken together, the above results suggest that the hfAIM tool can be used to effectively perform genome-wide in silico screens of proteins that are potentially regulated by selective autophagy. The hfAIM system is a web tool that can be accessed at link: http://bioinformatics.psb.ugent.be/hfAIM/. PMID:27071037

  3. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes.

    PubMed

    Rozak, David A; Rozak, Anthony J

    2014-01-22

    Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation's black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte.

  4. Using a color-coded ambigraphic nucleic acid notation to visualize conserved palindromic motifs within and across genomes

    PubMed Central

    2014-01-01

    Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494

  5. Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

    PubMed

    Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

    2015-10-05

    Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome

  6. Comparing genomes with duplications: a computational complexity point of view.

    PubMed

    Blin, Guillaume; Chauve, Cedric; Fertin, Guillaume; Rizzi, Romeo; Vialette, Stéphane

    2007-01-01

    In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G_1 and G_2: first, one establishes a oneto- one correspondence between genes of G_1 and genes of G_2 ; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy are deleted in both genomes for each gene family, and the matching model, that computes a maximal correspondence for each gene family. We show that for these two models, and for three (dis)similarity measures on permutations, namely the number of common intervals, the maximum adjacency disruption (MAD) number and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete, and even APXhard for the MAD number and SAD number.

  7. Comparative Physical Mapping Between Oryza sativa (AA Genome Type) and O. punctata (BB Genome Type)

    PubMed Central

    Kim, HyeRan; Miguel, Phillip San; Nelson, William; Collura, Kristi; Wissotski, Marina; Walling, Jason G.; Kim, Jun Pyo; Jackson, Scott A.; Soderlund, Carol; Wing, Rod A.

    2007-01-01

    A comparative physical map of the AA genome (Oryza sativa) and the BB genome (O. punctata) was constructed by aligning a physical map of O. punctata, deduced from 63,942 BAC end sequences (BESs) and 34,224 fingerprints, onto the O. sativa genome sequence. The level of conservation of each chromosome between the two species was determined by calculating a ratio of BES alignments. The alignment result suggests more divergence of intergenic and repeat regions in comparison to gene-rich regions. Further, this characteristic enabled localization of heterochromatic and euchromatic regions for each chromosome of both species. The alignment identified 16 locations containing expansions, contractions, inversions, and transpositions. By aligning 40% of the punctata BES on the map, 87% of the punctata FPC map covered 98% of the O. sativa genome sequence. The genome size of O. punctata was estimated to be 8% larger than that of O. sativa with individual chromosome differences of 1.5–16.5%. The sum of expansions and contractions observed in regions >500 kb were similar, suggesting that most of the contractions/expansions contributing to the genome size difference between the two species are small, thus preserving the macro-collinearity between these species, which diverged ∼2 million years ago. PMID:17339227

  8. The Whole Genome Assembly and Comparative Genomic Research of Thellungiella parvula (Extremophile Crucifer) Mitochondrion

    PubMed Central

    Wang, Xuelin; Bi, Changwei; Xu, Yiqing; Wei, Suyun; Dai, Xiaogang; Yin, Tongming; Ye, Ning

    2016-01-01

    The complete nucleotide sequences of the mitochondrial (mt) genome of an extremophile species Thellungiella parvula (T. parvula) have been determined with the lengths of 255,773 bp. T. parvula mt genome is a circular sequence and contains 32 protein-coding genes, 19 tRNA genes, and three ribosomal RNA genes with a 11.5% coding sequence. The base composition of 27.5% A, 27.5% T, 22.7% C, and 22.3% G in descending order shows a slight bias of 55% AT. Fifty-three repeats were identified in the mitochondrial genome of T. parvula, including 24 direct repeats, 28 tandem repeats (TRs), and one palindromic repeat. Furthermore, a total of 199 perfect microsatellites have been mined with a high A/T content (83.1%) through simple sequence repeat (SSR) analysis and they were distributed unevenly within this mitochondrial genome. We also analyzed other plant mitochondrial genomes' evolution in general, providing clues for the understanding of the evolution of organelles genomes in plants. Comparing with other Brassicaceae species, T. parvula is related to Arabidopsis thaliana whose characters of low temperature resistance have been well documented. This study will provide important genetic tools for other Brassicaceae species research and improve yields of economically important plants. PMID:27148547

  9. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  10. Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform1[W

    PubMed Central

    Van Bel, Michiel; Proost, Sebastian; Wischnitzki, Elisabeth; Movahedi, Sara; Scheerlinck, Christopher; Van de Peer, Yves; Vandepoele, Klaas

    2012-01-01

    With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase. PMID:22198273

  11. Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome

    PubMed Central

    Bergman, Casey M; Pfeiffer, Barret D; Rincón-Limas, Diego E; Hoskins, Roger A; Gnirke, Andreas; Mungall, Chris J; Wang, Adrienne M; Kronmiller, Brent; Pacleb, Joanne; Park, Soo; Stapleton, Mark; Wan, Kenneth; George, Reed A; de Jong, Pieter J; Botas, Juan; Rubin, Gerald M; Celniker, Susan E

    2002-01-01

    Background It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences. Conclusions Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. PMID:12537575

  12. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    PubMed Central

    STRONG, MICHAEL; CASCIO, DUILIO; EISENBERG, DAVID

    2004-01-01

    As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php. PMID:23653555

  13. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes

    PubMed Central

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M.

    2016-01-01

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea. PMID:27756915

  14. Detection of genomic imbalances by array based comparative genomic hybridisation in fetuses with multiple malformations

    PubMed Central

    Le Caignec, C; Boceno, M; Saugier-Veber, P; Jacquemont, S; Joubert, M; David, A; Frebourg, T; Rival, J

    2005-01-01

    Background: Malformations are a major cause of morbidity and mortality in full term infants and genomic imbalances are a significant component of their aetiology. However, the causes of defects in many patients with multiple congenital malformations remain unexplained despite thorough clinical examination and laboratory investigations. Methods: We used a commercially available array based comparative genomic hybridisation method (array CGH), able to screen all subtelomeric regions, main microdeletion syndromes, and 201 other regions covering the genome, to detect submicroscopic chromosomal imbalances in 49 fetuses with three or more significant anomalies and normal karyotype. Results: Array CGH identified eight genomic rearrangements (16.3%), all confirmed by quantitative multiplex PCR of short fluorescent fragments. Subtelomeric and interstitial deletions, submicroscopic duplications, and a complex genomic imbalance were identified. In four de novo cases (15qtel deletion, 16q23.1–q23.3 deletion, 22q11.2 deletion, and mosaicism for a rearranged chromosome 18), the genomic imbalance identified clearly underlay the pathological phenotype. In one case, the relationship between the genotype and phenotype was unclear, since a subtelomeric 6q deletion was detected in a mother and her two fetuses bearing multiple malformations. In three cases, a subtelomeric 10q duplication, probably a genomic polymorphism, was identified. Conclusions: The detection of 5/49 causative chromosomal imbalances (or 4/49 if the 6qtel deletion is not considered as causative) suggests wide genome screening when standard chromosome analysis is normal and confirms that array CGH will have a major impact on pre and postnatal diagnosis as well as providing information for more accurate genetic counselling. PMID:15689449

  15. Genomic characteristics and comparative genomics analysis of Penicillium chrysogenum KF-25

    PubMed Central

    2014-01-01

    Background Penicillium chrysogenum has been used in producing penicillin and derived β-lactam antibiotics for many years. Although the genome of the mutant strain P. chrysogenum Wisconsin 54-1255 has already been sequenced, the versatility and genetic diversity of this species still needs to be intensively studied. In this study, the genome of the wild-type P. chrysogenum strain KF-25, which has high activity against Ustilaginoidea virens, was sequenced and characterized. Results The genome of KF-25 was about 29.9 Mb in size and contained 9,804 putative open reading frames (orfs). Thirteen genes were predicted to encode two-component system proteins, of which six were putatively involved in osmolarity adaption. There were 33 putative secondary metabolism pathways and numerous genes that were essential in metabolite biosynthesis. Several P. chrysogenum virus untranslated region sequences were found in the KF-25 genome, suggesting that there might be a relationship between the virus and P. chrysogenum in evolution. Comparative genome analysis showed that the genomes of KF-25 and Wisconsin 54-1255 were highly similar, except that KF-25 was 2.3 Mb smaller. Three hundred and fifty-five KF-25 specific genes were found and the biological functions of the proteins encoded by these genes were mainly unknown (232, representing 65%), except for some orfs encoding proteins with predicted functions in transport, metabolism, and signal transduction. Numerous KF-25-specific genes were found to be associated with the pathogenicity and virulence of the strains, which were identical to those of wild-type P. chrysogenum NRRL 1951. Conclusion Genome sequencing and comparative analysis are helpful in further understanding the biology, evolution, and environment adaption of P. chrysogenum, and provide a new tool for identifying further functional metabolites. PMID:24555742

  16. Genomic characteristics and comparative genomics analysis of Penicillium chrysogenum KF-25.

    PubMed

    Peng, Qin; Yuan, Yihui; Gao, Meiying; Chen, Xupeng; Liu, Biao; Liu, Pengming; Wu, Yan; Wu, Dandan

    2014-02-21

    Penicillium chrysogenum has been used in producing penicillin and derived β-lactam antibiotics for many years. Although the genome of the mutant strain P. chrysogenum Wisconsin 54-1255 has already been sequenced, the versatility and genetic diversity of this species still needs to be intensively studied. In this study, the genome of the wild-type P. chrysogenum strain KF-25, which has high activity against Ustilaginoidea virens, was sequenced and characterized. The genome of KF-25 was about 29.9 Mb in size and contained 9,804 putative open reading frames (orfs). Thirteen genes were predicted to encode two-component system proteins, of which six were putatively involved in osmolarity adaption. There were 33 putative secondary metabolism pathways and numerous genes that were essential in metabolite biosynthesis. Several P. chrysogenum virus untranslated region sequences were found in the KF-25 genome, suggesting that there might be a relationship between the virus and P. chrysogenum in evolution. Comparative genome analysis showed that the genomes of KF-25 and Wisconsin 54-1255 were highly similar, except that KF-25 was 2.3 Mb smaller. Three hundred and fifty-five KF-25 specific genes were found and the biological functions of the proteins encoded by these genes were mainly unknown (232, representing 65%), except for some orfs encoding proteins with predicted functions in transport, metabolism, and signal transduction. Numerous KF-25-specific genes were found to be associated with the pathogenicity and virulence of the strains, which were identical to those of wild-type P. chrysogenum NRRL 1951. Genome sequencing and comparative analysis are helpful in further understanding the biology, evolution, and environment adaption of P. chrysogenum, and provide a new tool for identifying further functional metabolites.

  17. Genome Sequence and Comparative Genome Analysis of Lactobacillus casei: Insights into Their Niche-Associated Evolution

    PubMed Central

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F.; Broadbent, Jeff R.

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  18. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  19. Comparative genomics of 9 novel Paenibacillus larvae bacteriophages

    PubMed Central

    Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.

    2016-01-01

    ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559

  20. Comparative Analysis of Acinetobacters: Three Genomes for Three Lifestyles

    PubMed Central

    Vallenet, David; Nordmann, Patrice; Barbe, Valérie; Poirel, Laurent; Mangenot, Sophie; Bataille, Elodie; Dossat, Carole; Gas, Shahinaz; Kreimeyer, Annett; Lenoble, Patricia; Oztas, Sophie; Poulain, Julie; Segurens, Béatrice; Robert, Catherine; Abergel, Chantal; Claverie, Jean-Michel; Raoult, Didier; Médigue, Claudine; Weissenbach, Jean; Cruveiller, Stéphane

    2008-01-01

    Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i) whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss); ii) strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii) several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors) were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS). Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment), louse, soil. PMID:18350144

  1. Comparative genome sequencing reveals genomic signature of extreme desiccation tolerance in the anhydrobiotic midge

    PubMed Central

    Gusev, Oleg; Suetsugu, Yoshitaka; Cornette, Richard; Kawashima, Takeshi; Logacheva, Maria D.; Kondrashov, Alexey S.; Penin, Aleksey A.; Hatanaka, Rie; Kikuta, Shingo; Shimura, Sachiko; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Shagimardanova, Elena; Alexeev, Dmitry; Govorun, Vadim; Wisecaver, Jennifer; Mikheyev, Alexander; Koyanagi, Ryo; Fujie, Manabu; Nishiyama, Tomoaki; Shigenobu, Shuji; Shibata, Tomoko F.; Golygina, Veronika; Hasebe, Mitsuyasu; Okuda, Takashi; Satoh, Nori; Kikawada, Takahiro

    2014-01-01

    Anhydrobiosis represents an extreme example of tolerance adaptation to water loss, where an organism can survive in an ametabolic state until water returns. Here we report the first comparative analysis examining the genomic background of extreme desiccation tolerance, which is exclusively found in larvae of the only anhydrobiotic insect, Polypedilum vanderplanki. We compare the genomes of P. vanderplanki and a congeneric desiccation-sensitive midge P. nubifer. We determine that the genome of the anhydrobiotic species specifically contains clusters of multi-copy genes with products that act as molecular shields. In addition, the genome possesses several groups of genes with high similarity to known protective proteins. However, these genes are located in distinct paralogous clusters in the genome apart from the classical orthologues of the corresponding genes shared by both chironomids and other insects. The transcripts of these clustered paralogues contribute to a large majority of the mRNA pool in the desiccating larvae and most likely define successful anhydrobiosis. Comparison of expression patterns of orthologues between two chironomid species provides evidence for the existence of desiccation-specific gene expression systems in P. vanderplanki. PMID:25216354

  2. G-compass: a web-based comparative genome browser between human and other vertebrate genomes.

    PubMed

    Kawahara, Yoshihiro; Sakate, Ryuichi; Matsuya, Akihiro; Murakami, Katsuhiko; Sato, Yoshiharu; Zhang, Hao; Gojobori, Takashi; Itoh, Takeshi; Imanishi, Tadashi

    2009-12-15

    G-compass is designed for efficient comparative genome analysis between human and other vertebrate genomes. The current version of G-compass allows us to browse two corresponding genomic regions between human and another species in parallel. One-to-one evolutionarily conserved regions (i.e. orthologous regions) between species are highlighted along the genomes. Information such as locations of duplicated regions, copy number variations and mammalian ultra-conserved elements is also provided. These features of G-compass enable us to easily determine patterns of genomic rearrangements and changes in gene orders through evolutionary time. Since G-compass is a satellite database of H-InvDB, which is a comprehensive annotation resource for human genes and transcripts, users can easily refer to manually curated functional annotations and other abundant biological information for each human transcript. G-compass is expected to be a valuable tool for comparing human and model organisms and promoting the exchange of functional information. G-compass is freely available at http://www.h-invitational.jp/g-compass/. t.imanishi@aist.go.jp

  3. Streptococcus thermophilus core genome: comparative genome hybridization study of 47 strains.

    PubMed

    Rasmussen, Thomas Bovbjerg; Danielsen, Morten; Valina, Ondrej; Garrigues, Christel; Johansen, Eric; Pedersen, Martin Bastian

    2008-08-01

    A DNA microarray platform based on 2,200 genes from publicly available sequences was designed for Streptococcus thermophilus. We determined how single-nucleotide polymorphisms in the 65- to 75-mer oligonucleotide probe sequences affect the hybridization signals. The microarrays were then used for comparative genome hybridization (CGH) of 47 dairy S. thermophilus strains. An analysis of the exopolysaccharide genes in each strain confirmed previous findings that this class of genes is indeed highly variable. A phylogenetic tree based on the CGH data showed similar distances for most strains, indicating frequent recombination or gene transfer within S. thermophilus. By comparing genome sizes estimated from the microarrays and pulsed-field gel electrophoresis, the amount of unknown DNA in each strain was estimated. A core genome comprised of 1,271 genes detected in all 47 strains was identified. Likewise, a set of noncore genes detected in only some strains was identified. The concept of an industrial core genome is proposed. This is comprised of the genes in the core genome plus genes that are necessary in an applied industrial context.

  4. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    PubMed Central

    2010-01-01

    Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation. PMID:20929575

  5. Decoding the molecular evolution of human cognition using comparative genomics.

    PubMed

    Usui, Noriyoshi; Co, Marissa; Konopka, Genevieve

    2014-01-01

    Identification of genetic and molecular factors responsible for the specialized cognitive abilities of humans is expected to provide important insights into the mechanisms responsible for disorders of cognition such as autism, schizophrenia and Alzheimer's disease. Here, we discuss the use of comparative genomics for identifying salient genes and gene networks that may underlie cognition. We focus on the comparison of human and non-human primate brain gene expression and the utility of building gene coexpression networks for prioritizing hundreds of genes that differ in expression among the species queried. We also discuss the importance of and methods for functional studies of the individual genes identified. Together, this integration of comparative genomics with cellular and animal models should provide improved systems for developing effective therapeutics for disorders of cognition. © 2014 S. Karger AG, Basel.

  6. Decoding the molecular evolution of human cognition using comparative genomics

    PubMed Central

    Usui, Noriyoshi; Co, Marissa; Konopka, Genevieve

    2014-01-01

    Identification of genetic and molecular factors responsible for the specialized cognitive abilities of humans is expected to provide important insights into the mechanisms responsible for disorders of cognition such as autism, schizophrenia, and Alzheimer’s disease. Here, we discuss the use of comparative genomics for identifying salient genes and gene networks that may underlie cognition. We focus on the comparison of human and non-human primate brain gene expression and the utility of building gene co-expression networks for prioritizing hundreds of genes that differ in expression among the species queried. We also discuss the importance and methods for functional studies of individual genes identified. Together, this integration of comparative genomics with cellular and animal models should provide improved systems for developing effective therapeutics for disorders of cognition. PMID:25247723

  7. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    PubMed

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  8. Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

    SciTech Connect

    Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana; Purvine, Samuel O.; Sanford, James; Monroe, Matthew E.; Brewer, Heather M.; Payne, Samuel H.; Ansong, Charles; Frank, Bryan C.; Smith, Richard D.; Peterson, Scott; Motin, Vladimir L.; Adkins, Joshua N.

    2012-03-27

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus

  9. Unlocking Holocentric Chromosomes: New Perspectives from Comparative and Functional Genomics?

    PubMed Central

    Mandrioli, Mauro; Manicardi, Gian Carlo

    2012-01-01

    The presence of chromosomes with diffuse centromeres (holocentric chromosomes) has been reported in several taxa since more than fifty years, but a full understanding of their origin is still lacking. Comparative and functional genomics are nowadays furnishing new data to better understand holocentric chromosome evolution thus opening new perspectives to analyse karyotype rearrangements in species with holocentric chromosomes in particular evidencing unusual common features, such as the uniform GC content and gene distribution along chromosomes. PMID:23372420

  10. GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions

    PubMed Central

    Besemer, John; Lomsadze, Alexandre; Borodovsky, Mark

    2001-01-01

    Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-coding regions and models of regulatory sites near gene start within an iterative Hidden Markov model based algorithm. The new gene prediction method, called GeneMarkS, utilizes a non-supervised training procedure and can be used for a newly sequenced prokaryotic genome with no prior knowledge of any protein or rRNA genes. The GeneMarkS implementation uses an improved version of the gene finding program GeneMark.hmm, heuristic Markov models of coding and non-coding regions and the Gibbs sampling multiple alignment program. GeneMarkS predicted precisely 83.2% of the translation starts of GenBank annotated Bacillus subtilis genes and 94.4% of translation starts in an experimentally validated set of Escherichia coli genes. We have also observed that GeneMarkS detects prokaryotic genes, in terms of identifying open reading frames containing real genes, with an accuracy matching the level of the best currently used gene detection methods. Accurate translation start prediction, in addition to the refinement of protein sequence N-terminal data, provides the benefit of precise positioning of the sequence region situated upstream to a gene start. Therefore, sequence motifs related to transcription and translation regulatory sites can be revealed and analyzed with higher precision. These motifs were shown to possess a significant variability, the functional and evolutionary connections of which are discussed. PMID:11410670

  11. ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors

    PubMed Central

    Starick, Stephan R.; Ibn-Salem, Jonas; Jurk, Marcel; Hernandez, Céline; Love, Michael I.; Chung, Ho-Ryun; Vingron, Martin; Thomas-Chollier, Morgane; Meijsing, Sebastiaan H.

    2015-01-01

    The classical DNA recognition sequence of the glucocorticoid receptor (GR) appears to be present at only a fraction of bound genomic regions. To identify sequences responsible for recruitment of this transcription factor (TF) to individual loci, we turned to the high-resolution ChIP-exo approach. We exploited this signal by determining footprint profiles of TF binding at single-base-pair resolution using ExoProfiler, a computational pipeline based on DNA binding motifs. When applied to our GR and the few available public ChIP-exo data sets, we find that ChIP-exo footprints are protein- and recognition sequence-specific signatures of genomic TF association. Furthermore, we show that ChIP-exo captures information about TFs other than the one directly targeted by the antibody in the ChIP procedure. Consequently, the shape of the ChIP-exo footprint can be used to discriminate between direct and indirect (tethering to other DNA-bound proteins) DNA association of GR. Together, our findings indicate that the absence of classical recognition sequences can be explained by direct GR binding to a broader spectrum of sequences than previously known, either as a homodimer or as a heterodimer binding together with a member of the ETS or TEAD families of TFs, or alternatively by indirect recruitment via FOX or STAT proteins. ChIP-exo footprints also bring structural insights and locate DNA:protein cross-link points that are compatible with crystal structures of the studied TFs. Overall, our generically applicable footprint-based approach uncovers new structural and functional insights into the diverse ways of genomic cooperation and association of TFs. PMID:25720775

  12. ChIP-exo signal associated with DNA-binding motifs provides insight into the genomic binding of the glucocorticoid receptor and cooperating transcription factors.

    PubMed

    Starick, Stephan R; Ibn-Salem, Jonas; Jurk, Marcel; Hernandez, Céline; Love, Michael I; Chung, Ho-Ryun; Vingron, Martin; Thomas-Chollier, Morgane; Meijsing, Sebastiaan H

    2015-06-01

    The classical DNA recognition sequence of the glucocorticoid receptor (GR) appears to be present at only a fraction of bound genomic regions. To identify sequences responsible for recruitment of this transcription factor (TF) to individual loci, we turned to the high-resolution ChIP-exo approach. We exploited this signal by determining footprint profiles of TF binding at single-base-pair resolution using ExoProfiler, a computational pipeline based on DNA binding motifs. When applied to our GR and the few available public ChIP-exo data sets, we find that ChIP-exo footprints are protein- and recognition sequence-specific signatures of genomic TF association. Furthermore, we show that ChIP-exo captures information about TFs other than the one directly targeted by the antibody in the ChIP procedure. Consequently, the shape of the ChIP-exo footprint can be used to discriminate between direct and indirect (tethering to other DNA-bound proteins) DNA association of GR. Together, our findings indicate that the absence of classical recognition sequences can be explained by direct GR binding to a broader spectrum of sequences than previously known, either as a homodimer or as a heterodimer binding together with a member of the ETS or TEAD families of TFs, or alternatively by indirect recruitment via FOX or STAT proteins. ChIP-exo footprints also bring structural insights and locate DNA:protein cross-link points that are compatible with crystal structures of the studied TFs. Overall, our generically applicable footprint-based approach uncovers new structural and functional insights into the diverse ways of genomic cooperation and association of TFs. © 2015 Starick et al.; Published by Cold Spring Harbor Laboratory Press.

  13. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  14. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    SciTech Connect

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The species P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this

  15. Bamboo Flowering from the Perspective of Comparative Genomics and Transcriptomics

    PubMed Central

    Biswas, Prasun; Chakraborty, Sukanya; Dutta, Smritikana; Pal, Amita; Das, Malay

    2016-01-01

    Bamboos are an important member of the subfamily Bambusoideae, family Poaceae. The plant group exhibits wide variation with respect to the timing (1–120 years) and nature (sporadic vs. gregarious) of flowering among species. Usually flowering in woody bamboos is synchronous across culms growing over a large area, known as gregarious flowering. In many monocarpic bamboos this is followed by mass death and seed setting. While in sporadic flowering an isolated wild clump may flower, set little or no seed and remain alive. Such wide variation in flowering time and extent means that the plant group serves as repositories for genes and expression patterns that are unique to bamboo. Due to the dearth of available genomic and transcriptomic resources, limited studies have been undertaken to identify the potential molecular players in bamboo flowering. The public release of the first bamboo genome sequence Phyllostachys heterocycla, availability of related genomes Brachypodium distachyon and Oryza sativa provide us the opportunity to study this long-standing biological problem in a comparative and functional genomics framework. We identified bamboo genes homologous to those of Oryza and Brachypodium that are involved in established pathways such as vernalization, photoperiod, autonomous, and hormonal regulation of flowering. Additionally, we investigated triggers like stress (drought), physiological maturity and micro RNAs that may play crucial roles in flowering. We also analyzed available transcriptome datasets of different bamboo species to identify genes and their involvement in bamboo flowering. Finally, we summarize potential research hurdles that need to be addressed in future research. PMID:28018419

  16. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  17. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  18. Ecology of marine Bacteroidetes: a comparative genomics approach

    PubMed Central

    Fernández-Gómez, Beatriz; Richter, Michael; Schüler, Margarete; Pinhassi, Jarone; Acinas, Silvia G; González, José M; Pedrós-Alió, Carlos

    2013-01-01

    Bacteroidetes are commonly assumed to be specialized in degrading high molecular weight (HMW) compounds and to have a preference for growth attached to particles, surfaces or algal cells. The first sequenced genomes of marine Bacteroidetes seemed to confirm this assumption. Many more genomes have been sequenced recently. Here, a comparative analysis of marine Bacteroidetes genomes revealed a life strategy different from those of other important phyla of marine bacterioplankton such as Cyanobacteria and Proteobacteria. Bacteroidetes have many adaptations to grow attached to particles, have the capacity to degrade polymers, including a large number of peptidases, glycoside hydrolases (GHs), glycosyl transferases, adhesion proteins, as well as the genes for gliding motility. Several of the polymer degradation genes are located in close association with genes for TonB-dependent receptors and transducers, suggesting an integrated regulation of adhesion and degradation of polymers. This confirmed the role of this abundant group of marine bacteria as degraders of particulate matter. Marine Bacteroidetes had a significantly larger number of proteases than GHs, while non-marine Bacteroidetes had equal numbers of both. Proteorhodopsin containing Bacteroidetes shared two characteristics: small genome size and a higher number of genes involved in CO2 fixation per Mb. The latter may be important in order to survive when floating freely in the illuminated, but nutrient-poor, ocean surface. PMID:23303374

  19. Bamboo Flowering from the Perspective of Comparative Genomics and Transcriptomics.

    PubMed

    Biswas, Prasun; Chakraborty, Sukanya; Dutta, Smritikana; Pal, Amita; Das, Malay

    2016-01-01

    Bamboos are an important member of the subfamily Bambusoideae, family Poaceae. The plant group exhibits wide variation with respect to the timing (1-120 years) and nature (sporadic vs. gregarious) of flowering among species. Usually flowering in woody bamboos is synchronous across culms growing over a large area, known as gregarious flowering. In many monocarpic bamboos this is followed by mass death and seed setting. While in sporadic flowering an isolated wild clump may flower, set little or no seed and remain alive. Such wide variation in flowering time and extent means that the plant group serves as repositories for genes and expression patterns that are unique to bamboo. Due to the dearth of available genomic and transcriptomic resources, limited studies have been undertaken to identify the potential molecular players in bamboo flowering. The public release of the first bamboo genome sequence Phyllostachys heterocycla, availability of related genomes Brachypodium distachyon and Oryza sativa provide us the opportunity to study this long-standing biological problem in a comparative and functional genomics framework. We identified bamboo genes homologous to those of Oryza and Brachypodium that are involved in established pathways such as vernalization, photoperiod, autonomous, and hormonal regulation of flowering. Additionally, we investigated triggers like stress (drought), physiological maturity and micro RNAs that may play crucial roles in flowering. We also analyzed available transcriptome datasets of different bamboo species to identify genes and their involvement in bamboo flowering. Finally, we summarize potential research hurdles that need to be addressed in future research.

  20. Comparative genomics of parasitic silkworm microsporidia reveal an association between genome expansion and host adaptation

    PubMed Central

    2013-01-01

    Background Microsporidian Nosema bombycis has received much attention because the pébrine disease of domesticated silkworms results in great economic losses in the silkworm industry. So far, no effective treatment could be found for pébrine. Compared to other known Nosema parasites, N. bombycis can unusually parasitize a broad range of hosts. To gain some insights into the underlying genetic mechanism of pathological ability and host range expansion in this parasite, a comparative genomic approach is conducted. The genome of two Nosema parasites, N. bombycis and N. antheraeae (an obligatory parasite to undomesticated silkworms Antheraea pernyi), were sequenced and compared with their distantly related species, N. ceranae (an obligatory parasite to honey bees). Results Our comparative genomics analysis show that the N. bombycis genome has greatly expanded due to the following three molecular mechanisms: 1) the proliferation of host-derived transposable elements, 2) the acquisition of many horizontally transferred genes from bacteria, and 3) the production of abundnant gene duplications. To our knowledge, duplicated genes derived not only from small-scale events (e.g., tandem duplications) but also from large-scale events (e.g., segmental duplications) have never been seen so abundant in any reported microsporidia genomes. Our relative dating analysis further indicated that these duplication events have arisen recently over very short evolutionary time. Furthermore, several duplicated genes involving in the cytotoxic metabolic pathway were found to undergo positive selection, suggestive of the role of duplicated genes on the adaptive evolution of pathogenic ability. Conclusions Genome expansion is rarely considered as the evolutionary outcome acting on those highly reduced and compact parasitic microsporidian genomes. This study, for the first time, demonstrates that the parasitic genomes can expand, instead of shrink, through several common molecular mechanisms

  1. The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics

    PubMed Central

    Bao, Zhirong; Blasiar, Darin; Blumenthal, Thomas; Brent, Michael R; Chen, Nansheng; Chinwalla, Asif; Clarke, Laura; Clee, Chris; Coghlan, Avril; Coulson, Alan; D'Eustachio, Peter; Fitch, David H. A; Fulton, Lucinda A; Fulton, Robert E; Griffiths-Jones, Sam; Harris, Todd W; Hillier, LaDeana W; Kamath, Ravi; Kuwabara, Patricia E; Mardis, Elaine R; Marra, Marco A; Miner, Tracie L; Minx, Patrick; Mullikin, James C; Plumb, Robert W; Rogers, Jane; Schein, Jacqueline E; Sohrmann, Marc; Spieth, John; Stajich, Jason E; Wei, Chaochun; Willey, David; Wilson, Richard K; Durbin, Richard; Waterston, Robert H

    2003-01-01

    The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found

  2. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    PubMed

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2015-10-30

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.

  3. A Comparative Analysis of Mitochondrial Genomes in Eustigmatophyte Algae

    PubMed Central

    Ševčíková, Tereza; Klimeš, Vladimír; Zbránková, Veronika; Strnad, Hynek; Hroudová, Miluše; Vlček, Čestmír; Eliáš, Marek

    2016-01-01

    Eustigmatophyceae (Ochrophyta, Stramenopiles) is a small algal group with species of the genus Nannochloropsis being its best studied representatives. Nuclear and organellar genomes have been recently sequenced for several Nannochloropsis spp., but phylogenetically wider genomic studies are missing for eustigmatophytes. We sequenced mitochondrial genomes (mitogenomes) of three species representing most major eustigmatophyte lineages, Monodopsis sp. MarTras21, Vischeria sp. CAUP Q 202 and Trachydiscus minutus, and carried out their comparative analysis in the context of available data from Nannochloropsis and other stramenopiles, revealing a number of noticeable findings. First, mitogenomes of most eustigmatophytes are highly collinear and similar in the gene content, but extensive rearrangements and loss of three otherwise ubiquitous genes happened in the Vischeria lineage; this correlates with an accelerated evolution of mitochondrial gene sequences in this lineage. Second, eustigmatophytes appear to be the only ochrophyte group with the Atp1 protein encoded by the mitogenome. Third, eustigmatophyte mitogenomes uniquely share a truncated nad11 gene encoding only the C-terminal part of the Nad11 protein, while the N-terminal part is encoded by a separate gene in the nuclear genome. Fourth, UGA as a termination codon and the cognate release factor mRF2 were lost from mitochondria independently by the Nannochloropsis and T. minutus lineages. Finally, the rps3 gene in the mitogenome of Vischeria sp. is interrupted by the UAG codon, but the genome includes a gene for an unusual tRNA with an extended anticodon loop that we speculate may serve as a suppressor tRNA to properly decode the rps3 gene. PMID:26872774

  4. Comparative genomics of Wolbachia and the bacterial species concept.

    PubMed

    Ellegaard, Kirsten Maren; Klasson, Lisa; Näslund, Kristina; Bourtzis, Kostas; Andersson, Siv G E

    2013-04-01

    The importance of host-specialization to speciation processes in obligate host-associated bacteria is well known, as is also the ability of recombination to generate cohesion in bacterial populations. However, whether divergent strains of highly recombining intracellular bacteria, such as Wolbachia, can maintain their genetic distinctness when infecting the same host is not known. We first developed a protocol for the genome sequencing of uncultivable endosymbionts. Using this method, we have sequenced the complete genomes of the Wolbachia strains wHa and wNo, which occur as natural double infections in Drosophila simulans populations on the Seychelles and in New Caledonia. Taxonomically, wHa belong to supergroup A and wNo to supergroup B. A comparative genomics study including additional strains supported the supergroup classification scheme and revealed 24 and 33 group-specific genes, putatively involved in host-adaptation processes. Recombination frequencies were high for strains of the same supergroup despite different host-preference patterns, leading to genomic cohesion. The inferred recombination fragments for strains of different supergroups were of short sizes, and the genomes of the co-infecting Wolbachia strains wHa and wNo were not more similar to each other and did not share more genes than other A- and B-group strains that infect different hosts. We conclude that Wolbachia strains of supergroup A and B represent genetically distinct clades, and that strains of different supergroups can co-exist in the same arthropod host without converging into the same species. This suggests that the supergroups are irreversibly separated and that barriers other than host-specialization are able to maintain distinct clades in recombining endosymbiont populations. Acquiring a good knowledge of the barriers to genetic exchange in Wolbachia will advance our understanding of how endosymbiont communities are constructed from vertically and horizontally transmitted genes.

  5. Comparative Genomics of Wolbachia and the Bacterial Species Concept

    PubMed Central

    Näslund, Kristina; Bourtzis, Kostas; Andersson, Siv G. E.

    2013-01-01

    The importance of host-specialization to speciation processes in obligate host-associated bacteria is well known, as is also the ability of recombination to generate cohesion in bacterial populations. However, whether divergent strains of highly recombining intracellular bacteria, such as Wolbachia, can maintain their genetic distinctness when infecting the same host is not known. We first developed a protocol for the genome sequencing of uncultivable endosymbionts. Using this method, we have sequenced the complete genomes of the Wolbachia strains wHa and wNo, which occur as natural double infections in Drosophila simulans populations on the Seychelles and in New Caledonia. Taxonomically, wHa belong to supergroup A and wNo to supergroup B. A comparative genomics study including additional strains supported the supergroup classification scheme and revealed 24 and 33 group-specific genes, putatively involved in host-adaptation processes. Recombination frequencies were high for strains of the same supergroup despite different host-preference patterns, leading to genomic cohesion. The inferred recombination fragments for strains of different supergroups were of short sizes, and the genomes of the co-infecting Wolbachia strains wHa and wNo were not more similar to each other and did not share more genes than other A- and B-group strains that infect different hosts. We conclude that Wolbachia strains of supergroup A and B represent genetically distinct clades, and that strains of different supergroups can co-exist in the same arthropod host without converging into the same species. This suggests that the supergroups are irreversibly separated and that barriers other than host-specialization are able to maintain distinct clades in recombining endosymbiont populations. Acquiring a good knowledge of the barriers to genetic exchange in Wolbachia will advance our understanding of how endosymbiont communities are constructed from vertically and horizontally transmitted genes

  6. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    PubMed Central

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  7. fPoxDB: fungal peroxidase database for comparative genomics

    PubMed Central

    2014-01-01

    -based prediction and diverse analysis toolkits with easy-to-follow web interface offer a useful workbench to study comparative and evolutionary genomics of peroxidases in fungi. PMID:24885079

  8. Comparative genome analysis of Bacillus cereus group genomes withBacillus subtilis

    SciTech Connect

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D'Souza, Mark; Larsen, Niels; Pusch,Gordon; Liolios, Konstantinos; Grechkin, Yuri; Lapidus, Alla; Goltsman,Eugene; Chu, Lien; Fonstein, Michael; Ehrlich, S. Dusko; Overbeek, Ross; Kyrpides, Nikos; Ivanova, Natalia

    2005-09-14

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.

  9. Reduction and expansion in microsporidian genome evolution: new insights from comparative genomics.

    PubMed

    Nakjang, Sirintra; Williams, Tom A; Heinz, Eva; Watson, Andrew K; Foster, Peter G; Sendra, Kacper M; Heaps, Sarah E; Hirt, Robert P; Martin Embley, T

    2013-01-01

    Microsporidia are an abundant group of obligate intracellular parasites of other eukaryotes, including immunocompromised humans, but the molecular basis of their intracellular lifestyle and pathobiology are poorly understood. New genomes from a taxonomically broad range of microsporidians, complemented by published expression data, provide an opportunity for comparative analyses to identify conserved and lineage-specific patterns of microsporidian genome evolution that have underpinned this success. In this study, we infer that a dramatic bottleneck in the last common microsporidian ancestor (LCMA) left a small conserved core of genes that was subsequently embellished by gene family expansion driven by gene acquisition in different lineages. Novel expressed protein families represent a substantial fraction of sequenced microsporidian genomes and are significantly enriched for signals consistent with secretion or membrane location. Further evidence of selection is inferred from the gain and reciprocal loss of functional domains between paralogous genes, for example, affecting transport proteins. Gene expansions among transporter families preferentially affect those that are located on the plasma membrane of model organisms, consistent with recruitment to plug conserved gaps in microsporidian biosynthesis and metabolism. Core microsporidian genes shared with other eukaryotes are enriched in orthologs that, in yeast, are highly expressed, highly connected, and often essential, consistent with strong negative selection against further reduction of the conserved gene set since the LCMA. Our study reveals that microsporidian genome evolution is a highly dynamic process that has balanced constraint, reductive evolution, and genome expansion during adaptation to an extraordinarily successful obligate intracellular lifestyle.

  10. A process for analysis of microarray comparative genomics hybridisation studies for bacterial genomes

    PubMed Central

    Carter, Ben; Wu, Guanghui; Woodward, Martin J; Anjum, Muna F

    2008-01-01

    Background Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes. PMID:18230148

  11. Genome sequence of the β-rhizobium Cupriavidus taiwanensis and comparative genomics of rhizobia

    PubMed Central

    Amadou, Claire; Pascal, Géraldine; Mangenot, Sophie; Glew, Michelle; Bontemps, Cyril; Capela, Delphine; Carrère, Sébastien; Cruveiller, Stéphane; Dossat, Carole; Lajus, Aurélie; Marchetti, Marta; Poinsot, Véréna; Rouy, Zoé; Servin, Bertrand; Saad, Maged; Schenowitz, Chantal; Barbe, Valérie; Batut, Jacques; Médigue, Claudine; Masson-Boivin, Catherine

    2008-01-01

    We report the first complete genome sequence of a β-proteobacterial nitrogen-fixing symbiont of legumes, Cupriavidus taiwanensis LMG19424. The genome consists of two chromosomes of size 3.42 Mb and 2.50 Mb, and a large symbiotic plasmid of 0.56 Mb. The C. taiwanensis genome displays an unexpected high similarity with the genome of the saprophytic bacterium C. eutrophus H16, despite being 0.94 Mb smaller. Both organisms harbor two chromosomes with large regions of synteny interspersed by specific regions. In contrast, the two species host highly divergent plasmids, with the consequence that C. taiwanensis is symbiotically proficient and less metabolically versatile. Altogether, specific regions in C. taiwanensis compared with C. eutrophus cover 1.02 Mb and are enriched in genes associated with symbiosis or virulence in other bacteria. C. taiwanensis reveals characteristics of a minimal rhizobium, including the most compact (35-kb) symbiotic island (nod and nif) identified so far in any rhizobium. The atypical phylogenetic position of C. taiwanensis allowed insightful comparative genomics of all available rhizobium genomes. We did not find any gene that was both common and specific to all rhizobia, thus suggesting that a unique shared genetic strategy does not support symbiosis of rhizobia with legumes. Instead, phylodistribution analysis of more than 200 Sinorhizobium meliloti known symbiotic genes indicated large and complex variations of their occurrence in rhizobia and non-rhizobia. This led us to devise an in silico method to extract genes preferentially associated with rhizobia. We discuss how the novel genes we have identified may contribute to symbiotic adaptation. PMID:18490699

  12. SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets.

    PubMed

    Sarovich, Derek S; Price, Erin P

    2014-09-08

    Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.

  13. Enabling comparative modeling of closely related genomes: Example genus Brucella

    DOE PAGES

    Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.; ...

    2014-03-08

    For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less

  14. New Target Regions for Human Hypertension via Comparative Genomics

    PubMed Central

    Stoll, Monika; Kwitek-Black, Anne E.; Cowley, Allen W.; Harris, Eugenie L.; Harrap, Stephen B.; Krieger, José E.; Printz, Morton P.; Provoost, Abraham P.; Sassard, Jean; Jacob, Howard J.

    2000-01-01

    Models of human disease have long been used to understand the basic pathophysiology of disease and to facilitate the discovery of new therapeutics. However, as long as models have been used there have been debates about the utility of these models and their ability to mimic clinical disease at the phenotypic level. The application of genetic studies to both humans and model systems allows for a new paradigm, whereby a novel comparative genomics strategy combined with phenotypic correlates can be used to bridge between clinical relevance and model utility. This study presents a comparative genomic map for “candidate hypertension loci in humans” based on translating QTLs between rat and human, predicting 26 chromosomal regions in the human genome that are very likely to harbor hypertension genes. The predictive power appears robust, as several of these regions have also been implicated in mouse, suggesting that these regions represent primary targets for the development of SNPs for linkage disequilibrium testing in humans and/or provide a means to select specific models for additional functional studies and the development of new therapeutics. PMID:10779487

  15. MGcV: the microbial genomic context viewer for comparative genome analysis

    PubMed Central

    2013-01-01

    Background Conserved gene context is used in many types of comparative genome analyses. It is used to provide leads on gene function, to guide the discovery of regulatory sequences, but also to aid in the reconstruction of metabolic networks. We present the Microbial Genomic context Viewer (MGcV), an interactive, web-based application tailored to strengthen the practice of manual comparative genome context analysis for bacteria. Results MGcV is a versatile, easy-to-use tool that renders a visualization of the genomic context of any set of selected genes, genes within a phylogenetic tree, genomic segments, or regulatory elements. It is tailored to facilitate laborious tasks such as the interactive annotation of gene function, the discovery of regulatory elements, or the sequence-based reconstruction of gene regulatory networks. We illustrate that MGcV can be used in gene function annotation by visually integrating information on prokaryotic genes, like their annotation as available from NCBI with other annotation data such as Pfam domains, sub-cellular location predictions and gene-sequence characteristics such as GC content. We also illustrate the usefulness of the interactive features that allow the graphical selection of genes to facilitate data gathering (e.g. upstream regions, ID’s or annotation), in the analysis and reconstruction of transcription regulation. Moreover, putative regulatory elements and their corresponding scores or data from RNA-seq and microarray experiments can be uploaded, visualized and interpreted in (ranked-) comparative context maps. The ranked maps allow the interpretation of predicted regulatory elements and experimental data in light of each other. Conclusion MGcV advances the manual comparative analysis of genes and regulatory elements by providing fast and flexible integration of gene related data combined with straightforward data retrieval. MGcV is available at http://mgcv.cmbi.ru.nl. PMID:23547764

  16. Ancient signals: comparative genomics of green plant CDPKs.

    PubMed

    Hamel, Louis-Philippe; Sheen, Jen; Séguin, Armand

    2014-02-01

    Calcium-dependent protein kinases (CDPKs) are multifunctional proteins that combine calcium-binding and signaling capabilities within a single gene product. This unique versatility enables multiple plant biological processes to be controlled, including developmental programs and stress responses. The genome of flowering plants typically encodes around 30 CDPK homologs that cluster in four conserved clades. In this review, we take advantage of the recent availability of genome sequences from green algae and early land plants to examine how well the previously described CDPK family from angiosperms compares to the broader evolutionary states associated with early diverging green plant lineages. Our analysis suggests that the current architecture of the CDPK family was shaped during the colonization of the land by plants, whereas CDPKs from ancestor green algae have continued to evolve independently. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  17. A Comparative Encyclopedia of DNA Elements in the Mouse Genome

    PubMed Central

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing

    2014-01-01

    Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824

  18. Beyond the thale: comparative genomics and genetics of Arabidopsis relatives.

    PubMed

    Koenig, Daniel; Weigel, Detlef

    2015-05-01

    For decades a small number of model species have rightly occupied a privileged position in laboratory experiments, but it is becoming increasingly clear that our knowledge of biology is greatly improved when informed by a broader diversity of species and evolutionary context. Arabidopsis thaliana has been the primary model organism for plants, benefiting from a high-quality reference genome sequence and resources for reverse genetics. However, recent studies have made a group of species also in the Brassicaceae family and closely related to A. thaliana a focal point for comparative molecular, genomic, phenotypic and evolutionary studies. In this Review, we emphasize how such studies complement continued study of the model plant itself, provide an evolutionary perspective and summarize our current understanding of genetic and phenotypic diversity in plants.

  19. A comparative encyclopedia of DNA elements in the mouse genome.

    PubMed

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

    2014-11-20

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

  20. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

    PubMed Central

    Bengelsdorf, Frank R.; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood–Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (PthlA) from C. acetobutylicum or native pta-ack promoter (Ppta-ack) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  1. RNA motif search with data-driven element ordering.

    PubMed

    Rampášek, Ladislav; Jimenez, Randi M; Lupták, Andrej; Vinař, Tomáš; Brejová, Broňa

    2016-05-18

    In this paper, we study the problem of RNA motif search in long genomic sequences. This approach uses a combination of sequence and structure constraints to uncover new distant homologs of known functional RNAs. The problem is NP-hard and is traditionally solved by backtracking algorithms. We have designed a new algorithm for RNA motif search and implemented a new motif search tool RNArobo. The tool enhances the RNAbob descriptor language, allowing insertions in helices, which enables better characterization of ribozymes and aptamers. A typical RNA motif consists of multiple elements and the running time of the algorithm is highly dependent on their ordering. By approaching the element ordering problem in a principled way, we demonstrate more than 100-fold speedup of the search for complex motifs compared to previously published tools. We have developed a new method for RNA motif search that allows for a significant speedup of the search of complex motifs that include pseudoknots. Such speed improvements are crucial at a time when the rate of DNA sequencing outpaces growth in computing. RNArobo is available at http://compbio.fmph.uniba.sk/rnarobo .

  2. Comparative Genomics of the Ubiquitous, Hydrocarbon-degrading Genus Marinobacter

    NASA Astrophysics Data System (ADS)

    Singer, E.; Webb, E.; Edwards, K. J.

    2012-12-01

    The genus Marinobacter is amongst the most ubiquitous in the global oceans and strains have been isolated from a wide variety of marine environments, including offshore oil-well heads, coastal thermal springs, Antarctic sea water, saline soils and associations with diatoms and dinoflagellates. Many strains have been recognized to be important hydrocarbon degraders in various marine habitats presenting sometimes extreme pH or salinity conditions. Analysis of the genome of M. aquaeolei revealed enormous adaptation versatility with an assortment of strategies for carbon and energy acquisition, sensation, and defense. In an effort to elucidate the ecological and biogeochemical significance of the Marinobacters, seven Marinobacter strains from diverse environments were included in a comparative genomics study. Genomes were screened for metabolic and adaptation potential to elucidate the strategies responsible for the omnipresence of the Marinobacter genus and their remedial action potential in hydrocarbon-polluted waters. The core genome predominantly encodes for key genes involved in hydrocarbon degradation, biofilm-relevant processes, including utilization of external DNA, halotolerance, as well as defense mechanisms against heavy metals, antibiotics, and toxins. All Marinobacter strains were observed to degrade a wide spectrum of hydrocarbon species, including aliphatic, polycyclic aromatic as well as acyclic isoprenoid compounds. Various genes predicted to facilitate hydrocarbon degradation, e.g. alkane 1-monooxygenase, appear to have originated from lateral gene transfer as they are located on gene clusters of 10-20% lower GC-content compared to genome averages and are flanked by transposases. Top ortholog hits are found in other hydrocarbon degrading organisms, e.g. Alcanivorax borkumensis. Strategies for hydrocarbon uptake encoded by various Marinobacter strains include cell surface hydrophobicity adaptation via capsular polysaccharide biosynthesis and attachment

  3. Array comparative genomic hybridization in retinoma and retinoblastoma tissues.

    PubMed

    Sampieri, Katia; Amenduni, Mariangela; Papa, Filomena Tiziana; Katzaki, Eleni; Mencarelli, Maria Antonietta; Marozza, Annabella; Epistolato, Maria Carmela; Toti, Paolo; Lazzi, Stefano; Bruttini, Mirella; De Filippis, Roberta; De Francesco, Sonia; Longo, Ilaria; Meloni, Ilaria; Mari, Francesca; Acquaviva, Antonio; Hadjistilianou, Theodora; Renieri, Alessandra; Ariani, Francesca

    2009-03-01

    In retinoblastoma, two RB1 mutations are necessary for tumor development. Recurrent genomic rearrangements may represent subsequent events required for retinoblastoma progression. Array-comparative genomic hybridization was carried out in 18 eye samples, 10 from bilateral and eight from unilateral retinoblastoma patients. Two unilateral cases also showed areas of retinoma. The most frequent imbalance in retinoblastomas was 6p gain (40%), followed by gains at 1q12-q25.3, 2p24.3-p24.2, 9q22.2, and 9q33.1 and losses at 11q24.3, 13q13.2-q22.3, and 16q12.1-q21. Bilateral cases showed a lower number of imbalances than unilateral cases (P = 0.002). Unilateral cases were divided into low-level (< or = 4) and high-level (> or = 7) chromosomal instability groups. The first group presented with younger age at diagnosis (mean 511 days) compared with the second group (mean 1606 days). In one retinoma case ophthalmoscopically diagnosed as a benign lesion no rearrangements were detected, whereas the adjacent retinoblastoma displayed seven aberrations. The other retinoma case identified by retrospective histopathological examination shared three rearrangements with the adjacent retinoblastoma. Two other gene-free rearrangements were retinoma specific. One rearrangement, dup5p, was retinoblastoma specific and included the SKP2 gene. Genomic profiling indicated that the first retinoma was a pretumoral lesion, whereas the other represents a subclone of cells bearing 'benign' rearrangements overwhelmed by another subclone presenting aberrations with higher 'oncogenic' potential. In summary, the present study shows that bilateral and unilateral retinoblastoma have different chromosomal instability that correlates with the age of tumor onset in unilateral cases. This is the first report of genomic profiling in retinoma tissue, shedding light on the different nature of lesions named 'retinoma'.

  4. A New System for Comparative Functional Genomics of Saccharomyces Yeasts

    PubMed Central

    Caudy, Amy A.; Guan, Yuanfang; Jia, Yue; Hansen, Christina; DeSevo, Chris; Hayes, Alicia P.; Agee, Joy; Alvarez-Dominguez, Juan R.; Arellano, Hugo; Barrett, Daniel; Bauerle, Cynthia; Bisaria, Namita; Bradley, Patrick H.; Breunig, J. Scott; Bush, Erin; Cappel, David; Capra, Emily; Chen, Walter; Clore, John; Combs, Peter A.; Doucette, Christopher; Demuren, Olukunle; Fellowes, Peter; Freeman, Sam; Frenkel, Evgeni; Gadala-Maria, Daniel; Gawande, Richa; Glass, David; Grossberg, Samuel; Gupta, Anita; Hammonds-Odie, Latanya; Hoisos, Aaron; Hsi, Jenny; Hsu, Yu-Han Huang; Inukai, Sachi; Karczewski, Konrad J.; Ke, Xiaobo; Kojima, Mina; Leachman, Samuel; Lieber, Danny; Liebowitz, Anna; Liu, Julia; Liu, Yufei; Martin, Trevor; Mena, Jose; Mendoza, Rosa; Myhrvold, Cameron; Millian, Christian; Pfau, Sarah; Raj, Sandeep; Rich, Matt; Rokicki, Joe; Rounds, William; Salazar, Michael; Salesi, Matthew; Sharma, Rajani; Silverman, Sanford; Singer, Cara; Sinha, Sandhya; Staller, Max; Stern, Philip; Tang, Hanlin; Weeks, Sharon; Weidmann, Maxwell; Wolf, Ashley; Young, Carmen; Yuan, Jie; Crutchfield, Christopher; McClean, Megan; Murphy, Coleen T.; Llinás, Manuel; Botstein, David; Troyanskaya, Olga G.; Dunham, Maitreya J.

    2013-01-01

    Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast. PMID:23852385

  5. A new system for comparative functional genomics of Saccharomyces yeasts.

    PubMed

    Caudy, Amy A; Guan, Yuanfang; Jia, Yue; Hansen, Christina; DeSevo, Chris; Hayes, Alicia P; Agee, Joy; Alvarez-Dominguez, Juan R; Arellano, Hugo; Barrett, Daniel; Bauerle, Cynthia; Bisaria, Namita; Bradley, Patrick H; Breunig, J Scott; Bush, Erin; Cappel, David; Capra, Emily; Chen, Walter; Clore, John; Combs, Peter A; Doucette, Christopher; Demuren, Olukunle; Fellowes, Peter; Freeman, Sam; Frenkel, Evgeni; Gadala-Maria, Daniel; Gawande, Richa; Glass, David; Grossberg, Samuel; Gupta, Anita; Hammonds-Odie, Latanya; Hoisos, Aaron; Hsi, Jenny; Hsu, Yu-Han Huang; Inukai, Sachi; Karczewski, Konrad J; Ke, Xiaobo; Kojima, Mina; Leachman, Samuel; Lieber, Danny; Liebowitz, Anna; Liu, Julia; Liu, Yufei; Martin, Trevor; Mena, Jose; Mendoza, Rosa; Myhrvold, Cameron; Millian, Christian; Pfau, Sarah; Raj, Sandeep; Rich, Matt; Rokicki, Joe; Rounds, William; Salazar, Michael; Salesi, Matthew; Sharma, Rajani; Silverman, Sanford; Singer, Cara; Sinha, Sandhya; Staller, Max; Stern, Philip; Tang, Hanlin; Weeks, Sharon; Weidmann, Maxwell; Wolf, Ashley; Young, Carmen; Yuan, Jie; Crutchfield, Christopher; McClean, Megan; Murphy, Coleen T; Llinás, Manuel; Botstein, David; Troyanskaya, Olga G; Dunham, Maitreya J

    2013-09-01

    Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast.

  6. Comparative Genomics of Serratia spp.: Two Paths towards Endosymbiotic Life

    PubMed Central

    Manzano-Marín, Alejandro; Lamelas, Araceli; Moya, Andrés; Latorre, Amparo

    2012-01-01

    Symbiosis is a widespread phenomenon in nature, in which insects show a great number of these associations. Buchnera aphidicola, the obligate endosymbiont of aphids, coexists in some species with another intracellular bacterium, Serratia symbiotica. Of particular interest is the case of the cedar aphid Cinara cedri, where B. aphidicola BCc and S. symbiotica SCc need each other to fulfil their symbiotic role with the insect. Moreover, various features seem to indicate that S. symbiotica SCc is closer to an obligate endosymbiont than to other facultative S. symbiotica, such as the one described for the aphid Acirthosyphon pisum (S. symbiotica SAp). This work is based on the comparative genomics of five strains of Serratia, three free-living and two endosymbiotic ones (one facultative and one obligate) which should allow us to dissect the genome reduction taking place in the adaptive process to an intracellular life-style. Using a pan-genome approach, we have identified shared and strain-specific genes from both endosymbiotic strains and gained insight into the different genetic reduction both S. symbiotica have undergone. We have identified both retained and reduced functional categories in S. symbiotica compared to the Free-Living Serratia (FLS) that seem to be related with its endosymbiotic role in their specific host-symbiont systems. By means of a phylogenomic reconstruction we have solved the position of both endosymbionts with confidence, established the probable insect-pathogen origin of the symbiotic clade as well as the high amino-acid substitution rate in S. symbiotica SCc. Finally, we were able to quantify the minimal number of rearrangements suffered in the endosymbiotic lineages and reconstruct a minimal rearrangement phylogeny. All these findings provide important evidence for the existence of at least two distinctive S. symbiotica lineages that are characterized by different rearrangements, gene content, genome size and branch lengths. PMID:23077583

  7. Lineage-specific evolution of the vertebrate Otopetrin gene family revealed by comparative genomic analyses

    PubMed Central

    2011-01-01

    Background Mutations in the Otopetrin 1 gene (Otop1) in mice and fish produce an unusual bilateral vestibular pathology that involves the absence of otoconia without hearing impairment. The encoded protein, Otop1, is the only functionally characterized member of the Otopetrin Domain Protein (ODP) family; the extended sequence and structural preservation of ODP proteins in metazoans suggest a conserved functional role. Here, we use the tools of sequence- and cytogenetic-based comparative genomics to study the Otop1 and the Otop2-Otop3 genes and to establish their genomic context in 25 vertebrates. We extend our evolutionary study to include the gene mutated in Usher syndrome (USH) subtype 1G (Ush1g), both because of the head-to-tail clustering of Ush1g with Otop2 and because Otop1 and Ush1g mutations result in inner ear phenotypes. Results We established that OTOP1 is the boundary gene of an inversion polymorphism on human chromosome 4p16 that originated in the common human-chimpanzee lineage more than 6 million years ago. Other lineage-specific evolutionary events included a three-fold expansion of the Otop genes in Xenopus tropicalis and of Ush1g in teleostei fish. The tight physical linkage between Otop2 and Ush1g is conserved in all vertebrates. To further understand the functional organization of the Ushg1-Otop2 locus, we deduced a putative map of binding sites for CCCTC-binding factor (CTCF), a mammalian insulator transcription factor, from genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq) data in mouse and human embryonic stem (ES) cells combined with detection of CTCF-binding motifs. Conclusions The results presented here clarify the evolutionary history of the vertebrate Otop and Ush1g families, and establish a framework for studying the possible interaction(s) of Ush1g and Otop in developmental pathways. PMID:21261979

  8. An Ordered Comparative Map of the Cattle and Human Genomes

    PubMed Central

    Band, Mark R.; Larson, Joshua H.; Rebeiz, Mark; Green, Cheryl A.; Heyen, D. Wayne; Donovan, Jena; Windish, Ryan; Steining, Chad; Mahyuddin, Prapti; Womack, James E.; Lewin, Harris A.

    2000-01-01

    A cattle–human whole-genome comparative map was constructed using parallel radiation hybrid (RH) mapping in conjunction with EST sequencing, database mining for unmapped cattle genes, and a predictive bioinformatics approach (COMPASS) for targeting specific homologous regions. A total of 768 genes were placed on the RH map in addition to 319 microsatellites used as anchor markers. Of these, 638 had human orthologs with mapping data, thus permitting construction of an ordered comparative map. The large number of ordered loci revealed ⋝ 105 conserved segments between the two genomes. The comparative map suggests that 41 translocation events, a minimum of 54 internal rearrangements, and repositioning of all but one centromere can account for the observed organizations of the cattle and human genomes. In addition, the COMPASS in silico mapping tool was shown to be 95% accurate in its ability to predict cattle chromosome location from random sequence data, demonstrating this tool to be valuable for efficient targeting of specific regions for detailed mapping. The comparative map generated will be a cornerstone for elucidating mammalian chromosome phylogeny and the identification of genes of agricultural importance.“Ought we, for instance, to begin by discussing each separate species—in virtue of some common element of their nature, and proceed from this as a basis for the consideration of them separately?” from Aristotle, On the Parts of Animals, 350 B.C.E. [The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AW244888-AW244897, AW261132-AW261195, AW266849-AW267161, AW289175-AW289430, AW428566-AW428607, AW621146, AW621147.] PMID:10984454

  9. Comparative Genomics of Flatworms (Platyhelminthes) Reveals Shared Genomic Features of Ecto- and Endoparastic Neodermata

    PubMed Central

    Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

    2014-01-01

    The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host–parasite interactions and speciation in the highly diverse monogenean flatworms. PMID:24732282

  10. Comparative genomics of flatworms (platyhelminthes) reveals shared genomic features of ecto- and endoparastic neodermata.

    PubMed

    Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

    2014-05-01

    The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host-parasite interactions and speciation in the highly diverse monogenean flatworms.

  11. Genome Sequence of Desulfurella amilsii Strain TR1 and Comparative Genomics of Desulfurellaceae Family

    PubMed Central

    Florentino, Anna P.; Stams, Alfons J. M.; Sánchez-Andrea, Irene

    2017-01-01

    The acidotolerant sulfur reducer Desulfurella amilsii was isolated from sediments of Tinto River, an extremely acidic environment. Its ability to grow in a broad range of pH and to tolerate certain heavy metals offers potential for metal recovery processes. Here we report its high-quality draft genome sequence and compare it to the available genome sequences of other members of Desulfurellaceae family: D. acetivorans. D. multipotens, Hippea maritima. H. alviniae, H. medeae, and H. jasoniae. For most species, pairwise comparisons for average nucleotide identity (ANI) and in silico DNA–DNA hybridization (DDH) revealed ANI values from 67.5 to 80% and DDH values from 12.9 to 24.2%. D. acetivorans and D. multipotens, however, surpassed the estimated thresholds of species definition for both DDH (98.6%) and ANI (88.1%). Therefore, they should be merged to a single species. Comparative analysis of Desulfurellaceae genomes revealed different gene content for sulfur respiration between Desulfurella and Hippea species. Sulfur reductase is only encoded in D. amilsii, in which it is suggested to play a role in sulfur respiration, especially at low pH. Polysulfide reductase is only encoded in Hippea species; it is likely that this genus uses polysulfide as electron acceptor. Genes encoding thiosulfate reductase are present in all the genomes, but dissimilatory sulfite reductase is only present in Desulfurella species. Thus, thiosulfate respiration via sulfite is only likely in this genus. Although sulfur disproportionation occurs in Desulfurella species, the molecular mechanism behind this process is not yet understood, hampering a genome prediction. The metabolism of acetate in Desulfurella species can occur via the acetyl-CoA synthetase or via acetate kinase in combination with phosphate acetyltransferase, while in Hippea species, it might occur via the acetate kinase. Large differences in gene sets involved in resistance to acidic conditions were not detected among the

  12. Comparative Genomics of the Listeria monocytogenes ST204 Subgroup.

    PubMed

    Fox, Edward M; Allnutt, Theodore; Bradbury, Mark I; Fanning, Séamus; Chandry, P Scott

    2016-01-01

    The ST204 subgroup of Listeria monocytogenes is among the most frequently isolated in Australia from a range of environmental niches. In this study we provide a comparative genomics analysis of food and food environment isolates from geographically diverse sources. Analysis of the ST204 genomes showed a highly conserved core genome with the majority of variation seen in mobile genetic elements such as plasmids, transposons and phage insertions. Most strains (13/15) harbored plasmids, which although varying in size contained highly conserved sequences. Interestingly 4 isolates contained a conserved plasmid of 91,396 bp. The strains examined were isolated over a period of 12 years and from different geographic locations suggesting plasmids are an important component of the genetic repertoire of this subgroup and may provide a range of stress tolerance mechanisms. In addition to this 4 phage insertion sites and 2 transposons were identified among isolates, including a novel transposon. These genetic elements were highly conserved across isolates that harbored them, and also contained a range of genetic markers linked to stress tolerance and virulence. The maintenance of conserved mobile genetic elements in the ST204 population suggests these elements may contribute to the diverse range of niches colonized by ST204 isolates. Environmental stress selection may contribute to maintaining these genetic features, which in turn may be co-selecting for virulence markers relevant to clinical infection with ST204 isolates.

  13. Comparative genomics of the mimicry switch in Papilio dardanus

    PubMed Central

    Timmermans, Martijn J. T. N.; Baxter, Simon W.; Clark, Rebecca; Heckel, David G.; Vogel, Heiko; Collins, Steve; Papanicolaou, Alexie; Fukova, Iva; Joron, Mathieu; Thompson, Martin J.; Jiggins, Chris D.; ffrench-Constant, Richard H.; Vogler, Alfried P.

    2014-01-01

    The African Mocker Swallowtail, Papilio dardanus, is a textbook example in evolutionary genetics. Classical breeding experiments have shown that wing pattern variation in this polymorphic Batesian mimic is determined by the polyallelic H locus that controls a set of distinct mimetic phenotypes. Using bacterial artificial chromosome (BAC) sequencing, recombination analyses and comparative genomics, we show that H co-segregates with an interval of less than 500 kb that is collinear with two other Lepidoptera genomes and contains 24 genes, including the transcription factor genes engrailed (en) and invected (inv). H is located in a region of conserved gene order, which argues against any role for genomic translocations in the evolution of a hypothesized multi-gene mimicry locus. Natural populations of P. dardanus show significant associations of specific morphs with single nucleotide polymorphisms (SNPs), centred on en. In addition, SNP variation in the H region reveals evidence of non-neutral molecular evolution in the en gene alone. We find evidence for a duplication potentially driving physical constraints on recombination in the lamborni morph. Absence of perfect linkage disequilibrium between different genes in the other morphs suggests that H is limited to nucleotide positions in the regulatory and coding regions of en. Our results therefore support the hypothesis that a single gene underlies wing pattern variation in P. dardanus. PMID:24920480

  14. Comparative genomics of the mimicry switch in Papilio dardanus.

    PubMed

    Timmermans, Martijn J T N; Baxter, Simon W; Clark, Rebecca; Heckel, David G; Vogel, Heiko; Collins, Steve; Papanicolaou, Alexie; Fukova, Iva; Joron, Mathieu; Thompson, Martin J; Jiggins, Chris D; ffrench-Constant, Richard H; Vogler, Alfried P

    2014-07-22

    The African Mocker Swallowtail, Papilio dardanus, is a textbook example in evolutionary genetics. Classical breeding experiments have shown that wing pattern variation in this polymorphic Batesian mimic is determined by the polyallelic H locus that controls a set of distinct mimetic phenotypes. Using bacterial artificial chromosome (BAC) sequencing, recombination analyses and comparative genomics, we show that H co-segregates with an interval of less than 500 kb that is collinear with two other Lepidoptera genomes and contains 24 genes, including the transcription factor genes engrailed (en) and invected (inv). H is located in a region of conserved gene order, which argues against any role for genomic translocations in the evolution of a hypothesized multi-gene mimicry locus. Natural populations of P. dardanus show significant associations of specific morphs with single nucleotide polymorphisms (SNPs), centred on en. In addition, SNP variation in the H region reveals evidence of non-neutral molecular evolution in the en gene alone. We find evidence for a duplication potentially driving physical constraints on recombination in the lamborni morph. Absence of perfect linkage disequilibrium between different genes in the other morphs suggests that H is limited to nucleotide positions in the regulatory and coding regions of en. Our results therefore support the hypothesis that a single gene underlies wing pattern variation in P. dardanus.

  15. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  16. Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium

    SciTech Connect

    Ma, Li Jun; van der Does, H. C.; Borkovich, Katherine A.; Coleman, Jeffrey J.; Daboussi, Marie-Jose; Di Pietro, Antonio; Dufresne, Marie; Freitag, Michael; Grabherr, Manfred; Henrissat, Bernard; Houterman, Petra M.; Kang, Seogchan; Shim, Won-Bo; Wolochuk, Charles; Xie, Xiaohui; Xu, Jin Rong; Antoniw, John; Baker, Scott E.; Bluhm, Burton H.; Breakspear, Andrew; Brown, Daren W.; Butchko, Robert A.; Chapman, Sinead; Coulson, Richard; Coutinho, Pedro M.; Danchin, Etienne G.; Diener, Andrew; Gale, Liane R.; Gardiner, Donald; Goff, Steven; Hammond-Kossack, Kim; Hilburn, Karen; Hua-Van, Aurelie; Jonkers, Wilfried; Kazan, Kemal; Kodira, Chinnappa D.; Koehrsen, Michael; Kumar, Lokesh; Lee, Yong Hwan; Li, Liande; Manners, John M.; Miranda-Saavedra, Diego; Mukherjee, Mala; Park, Gyungsoon; Park, Jongsun; Park, Sook Young; Proctor, Robert H.; Regev, Aviv; Ruiz-Roldan, M. C.; Sain, Divya; Sakthikumar, Sharadha; Sykes, Sean; Schwartz, David C.; Turgeon, Barbara G.; Wapinski, Ilan; Yoder, Olen; Young, Sarah; Zeng, Qiandong; Zhou, Shiguo; Galagan, James; Cuomo, Christina A.; Kistler, H. Corby; Rep, Martijn

    2010-03-18

    Fusarium species are among the most important phytopathogenic and toxigenic fungi, having significant impact on crop production and animal health. Distinctively, members of the F. oxysporum species complex exhibit wide host range but discontinuously distributed host specificity, reflecting remarkable genetic adaptability. To understand the molecular underpinnings of diverse phenotypic traits and their evolution in Fusarium, we compared the genomes of three economically important and phylogenetically related, yet phenotypically diverse plant-pathogenic species, F. graminearum, F. verticillioides and F. oxysporum f. sp. lycopersici. Our analysis revealed greatly expanded lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes, accounting for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity. Experimentally, we demonstrate for the first time the transfer of two LS chromosomes between strains of F. oxysporum, resulting in the conversion of a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in the F. oxysporum species complex, putting the evolution of fungal pathogenicity into a new perspective.

  17. Comparative Genomics of the Listeria monocytogenes ST204 Subgroup

    PubMed Central

    Fox, Edward M.; Allnutt, Theodore; Bradbury, Mark I.; Fanning, Séamus; Chandry, P. Scott

    2016-01-01

    The ST204 subgroup of Listeria monocytogenes is among the most frequently isolated in Australia from a range of environmental niches. In this study we provide a comparative genomics analysis of food and food environment isolates from geographically diverse sources. Analysis of the ST204 genomes showed a highly conserved core genome with the majority of variation seen in mobile genetic elements such as plasmids, transposons and phage insertions. Most strains (13/15) harbored plasmids, which although varying in size contained highly conserved sequences. Interestingly 4 isolates contained a conserved plasmid of 91,396 bp. The strains examined were isolated over a period of 12 years and from different geographic locations suggesting plasmids are an important component of the genetic repertoire of this subgroup and may provide a range of stress tolerance mechanisms. In addition to this 4 phage insertion sites and 2 transposons were identified among isolates, including a novel transposon. These genetic elements were highly conserved across isolates that harbored them, and also contained a range of genetic markers linked to stress tolerance and virulence. The maintenance of conserved mobile genetic elements in the ST204 population suggests these elements may contribute to the diverse range of niches colonized by ST204 isolates. Environmental stress selection may contribute to maintaining these genetic features, which in turn may be co-selecting for virulence markers relevant to clinical infection with ST204 isolates. PMID:28066377

  18. Detection of aneuploidy in single cells using comparative genomic hybridization.

    PubMed

    Voullaire, L; Wilton, L; Slater, H; Williamson, R

    1999-09-01

    The ability of comparative genomic hybridization (CGH) to detect aneuploidy following universal amplification of DNA from a single cell, or a small number of cells, was investigated with a view to preimplantation diagnosis following in vitro fertilization, and prenatal diagnosis using fetal erythroblasts obtained from maternal blood. The DNA obtained from lysed single cells was amplified using degenerate oligonucleotide-primed PCR (DOP-PCR). This product was labelled using nick translation and hybridized together with normal reference genomic DNA. The CGH fluorescent ratio profiles obtained could be used to determine aneuploidy with cut-off thresholds of 0.75 and 1.25. Deviation in the profiles in the heterochromatic regions was reduced by using, as a reference sample, normal genomic DNA that had also undergone DOP-PCR. Single cells known to be trisomic for chromosomes 13, 18 or 21 were analysed using this technique. The resolution of CGH with amplified DNA from a single cell is of the order of 40 Mb, sufficient for the diagnosis of trisomy 21, and possibly segmental aneuploidy of equivalent size. These results, and those of others, demonstrate that diagnosis of chromosomal aneuploidy in single cells is possible using CGH with DOP-PCR amplified DNA.

  19. An Inhibitory Motif on the 5’UTR of Several Rotavirus Genome Segments Affects Protein Expression and Reverse Genetics Strategies

    PubMed Central

    Papa, Guido; Eichwald, Catherine; Burrone, Oscar R.

    2016-01-01

    Rotavirus genome consists of eleven segments of dsRNA, each encoding one single protein. Viral mRNAs contain an open reading frame (ORF) flanked by relatively short untranslated regions (UTRs), whose role in the viral cycle remains elusive. Here we investigated the role of 5’UTRs in T7 polymerase-driven cDNAs expression in uninfected cells. The 5’UTRs of eight genome segments (gs3, gs5-6, gs7-11) of the simian SA11 strain showed a strong inhibitory effect on the expression of viral proteins. Decreased protein expression was due to both compromised transcription and translation and was independent of the ORF and the 3’UTR sequences. Analysis of several mutants of the 21-nucleotide long 5’UTR of gs 11 defined an inhibitory motif (IM) represented by its primary sequence rather than its secondary structure. IM was mapped to the 5’ terminal 6-nucleotide long pyrimidine-rich tract 5’-GGY(U/A)UY-3’. The 5’ terminal position within the mRNA was shown to be essentially required, as inhibitory activity was lost when IM was moved to an internal position. We identified two mutations (insertion of a G upstream the 5’UTR and the U to A mutation of the fifth nucleotide of IM) that render IM non-functional and increase the transcription and translation rate to levels that could considerably improve the efficiency of virus helper-free reverse genetics strategies. PMID:27846320

  20. Genome-Wide Identification and Analysis of the VQ Motif-Containing Protein Family in Chinese Cabbage (Brassica rapa L. ssp. Pekinensis).

    PubMed

    Zhang, Gaoyuan; Wang, Fengde; Li, Jingjuan; Ding, Qian; Zhang, Yihui; Li, Huayin; Zhang, Jiannong; Gao, Jianwei

    2015-12-02

    Previous studies have showed that the VQ motif-containing proteins in Arabidopsis thaliana and Oryza sativa play an important role in plant growth, development, and stress responses. However, little is known about the functions of the VQ genes in Brassica rapa (Chinese cabbage). In this study, we performed genome-wide identification, characterization, and expression analysis of the VQ genes in Chinese cabbage, especially under adverse environment. We identified 57 VQ genes and classified them into seven subgroups (I-VII), which were dispersedly distributed on chromosomes 1 to 10. The expansion of these genes mainly contributed to segmental and tandem duplication. Fifty-four VQ genes contained no introns and 50 VQ proteins were less than 300 amino acids in length. Quantitative real-time PCR showed that the VQ genes were differentially expressed in various tissues and during different abiotic stresses and plant hormone treatments. This study provides a comprehensive overview of Chinese cabbage VQ genes and will benefit the molecular breeding for resistance to stresses and disease, as well as further studies on the biological functions of the VQ proteins.

  1. Origin Replication Complex Binding, Nucleosome Depletion Patterns, and a Primary Sequence Motif Can Predict Origins of Replication in a Genome with Epigenetic Centromeres

    PubMed Central

    Tsai, Hung-Ji; Baller, Joshua A.; Liachko, Ivan; Koren, Amnon; Burrack, Laura S.; Hickman, Meleah A.; Thevandavakkam, Mathuravani A.; Rusche, Laura N.

    2014-01-01

    ABSTRACT Origins of DNA replication are key genetic elements, yet their identification remains elusive in most organisms. In previous work, we found that centromeres contain origins of replication (ORIs) that are determined epigenetically in the pathogenic yeast Candida albicans. In this study, we used origin recognition complex (ORC) binding and nucleosome occupancy patterns in Saccharomyces cerevisiae and Kluyveromyces lactis to train a machine learning algorithm to predict the position of active arm (noncentromeric) origins in the C. albicans genome. The model identified bona fide active origins as determined by the presence of replication intermediates on nondenaturing two-dimensional (2D) gels. Importantly, these origins function at their native chromosomal loci and also as autonomously replicating sequences (ARSs) on a linear plasmid. A “mini-ARS screen” identified at least one and often two ARS regions of ≥100 bp within each bona fide origin. Furthermore, a 15-bp AC-rich consensus motif was associated with the predicted origins and conferred autonomous replicating activity to the mini-ARSs. Thus, while centromeres and the origins associated with them are epigenetic, arm origins are dependent upon critical DNA features, such as a binding site for ORC and a propensity for nucleosome exclusion. PMID:25182328

  2. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus.

    PubMed

    Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark; Arakawa, Kazuharu

    2017-07-01

    Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda.

  3. Comparative genomic analysis of Chlamydia trachomatis oculotropic and genitotropic strains.

    PubMed

    Carlson, John H; Porcella, Stephen F; McClarty, Grant; Caldwell, Harlan D

    2005-10-01

    Chlamydia trachomatis infection is an important cause of preventable blindness and sexually transmitted disease (STD) in humans. C. trachomatis exists as multiple serovariants that exhibit distinct organotropism for the eye or urogenital tract. We previously reported tissue-tropic correlations with the presence or absence of a functional tryptophan synthase and a putative GTPase-inactivating domain of the chlamydial toxin gene. This suggested that these genes may be the primary factors responsible for chlamydial disease organotropism. To test this hypothesis, the genome of an oculotropic trachoma isolate (A/HAR-13) was sequenced and compared to the genome of a genitotropic (D/UW-3) isolate. Remarkably, the genomes share 99.6% identity, supporting the conclusion that a functional tryptophan synthase enzyme and toxin might be the principal virulence factors underlying disease organotropism. Tarp (translocated actin-recruiting phosphoprotein) was identified to have variable numbers of repeat units within the N and C portions of the protein. A correlation exists between lymphogranuloma venereum serovars and the number of N-terminal repeats. Single-nucleotide polymorphism (SNP) analysis between the two genomes highlighted the minimal genetic variation. A disproportionate number of SNPs were observed within some members of the polymorphic membrane protein (pmp) autotransporter gene family that corresponded to predicted T-cell epitopes that bind HLA class I and II alleles. These results implicate Pmps as novel immune targets, which could advance future chlamydial vaccine strategies. Lastly, a novel target for PCR diagnostics was discovered that can discriminate between ocular and genital strains. This discovery will enhance epidemiological investigations in nations where both trachoma and chlamydial STD are endemic.

  4. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus

    PubMed Central

    Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R.; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D.; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark

    2017-01-01

    Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda. PMID:28749982

  5. Reconstructing the Evolution of Brachypodium Genomes Using Comparative Chromosome Painting

    PubMed Central

    Betekhtin, Alexander; Jenkins, Glyn; Hasterok, Robert

    2014-01-01

    Brachypodium distachyon is a model for the temperate cereals and grasses and has a biology, genomics infrastructure and cytogenetic platform fit for purpose. It is a member of a genus with fewer than 20 species, which have different genome sizes, basic chromosome numbers and ploidy levels. The phylogeny and interspecific relationships of this group have not to date been resolved by sequence comparisons and karyotypical studies. The aims of this study are not only to reconstruct the evolution of Brachypodium karyotypes to resolve the phylogeny, but also to highlight the mechanisms that shape the evolution of grass genomes. This was achieved through the use of comparative chromosome painting (CCP) which hybridises fluorescent, chromosome-specific probes derived from B. distachyon to homoeologous meiotic chromosomes of its close relatives. The study included five diploids (B. distachyon 2n = 10, B. sylvaticum 2n = 18, B. pinnatum 2n = 16; 2n = 18, B. arbuscula 2n = 18 and B. stacei 2n = 20) three allotetraploids (B. pinnatum 2n = 28, B. phoenicoides 2n = 28 and B. hybridum 2n = 30), and two species of unknown ploidy (B. retusum 2n = 38 and B. mexicanum 2n = 40). On the basis of the patterns of hybridisation and incorporating published data, we propose two alternative, but similar, models of karyotype evolution in the genus Brachypodium. According to the first model, the extant genome of B. distachyon derives from B. mexicanum or B. stacei by several rounds of descending dysploidy, and the other diploids evolve from B. distachyon via ascending dysploidy. The allotetraploids arise by interspecific hybridisation and chromosome doubling between B. distachyon and other diploids. The second model differs from the first insofar as it incorporates an intermediate 2n = 18 species between the B. mexicanum or B. stacei progenitors and the dysploidic B. distachyon. PMID:25493646

  6. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources

    PubMed Central

    Klima, Cassidy L.; Cook, Shaun R.; Zaheer, Rahat; Laing, Chad; Gannon, Vick P.; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W.; McAllister, Tim A.

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2–8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  7. Comparative study of the gating motif and C-type inactivation in prokaryotic voltage-gated sodium channels.

    PubMed

    Irie, Katsumasa; Kitagawa, Kazuya; Nagura, Hitoshi; Imai, Tomoya; Shimomura, Takushi; Fujiyoshi, Yoshinori

    2010-02-05

    Prokaryotic voltage-gated sodium channels (Na(V)s) are homotetramers and are thought to inactivate through a single mechanism, named C-type inactivation. Here we report the voltage dependence and inactivation rate of the NaChBac channel from Bacillus halodurans, the first identified prokaryotic Na(V), as well as of three new homologues cloned from Bacillus licheniformis (Na(V)BacL), Shewanella putrefaciens (Na(V)SheP), and Roseobacter denitrificans (Na(V)RosD). We found that, although activated by a lower membrane potential, Na(V)BacL inactivates as slowly as NaChBac. Na(V)SheP and Na(V)RosD inactivate faster than NaChBac. Mutational analysis of helix S6 showed that residues corresponding to the "glycine hinge" and "PXP motif" in voltage-gated potassium channels are not obligatory for channel gating in these prokaryotic Na(V)s, but mutations in the regions changed the inactivation rates. Mutation of the region corresponding to the glycine hinge in Na(V)BacL (A214G), Na(V)SheP (A216G), and NaChBac (G219A) accelerated inactivation in these channels, whereas mutation of glycine to alanine in the lower part of helix S6 in NaChBac (G229A), Na(V)BacL (G224A), and Na(V)RosD (G217A) reduced the inactivation rate. These results imply that activation gating in prokaryotic Na(V)s does not require gating motifs and that the residues of helix S6 affect C-type inactivation rates in these channels.

  8. Whole genome comparative analysis of four Georgian grape cultivars.

    PubMed

    Tabidze, V; Pipia, I; Gogniashvili, M; Kunelauri, N; Ujmajuridze, L; Pirtskhalava, M; Vishnepolsky, B; Hernandez, A G; Fields, C J; Beridze, Tengiz

    2017-08-07

    Grapevine is the one of the most important fruit species in the world. Comparative genome sequencing of grape cultivars is very important for the interpretation of the grape genome and understanding its evolution. The genomes of four Georgian grape cultivars-Chkhaveri, Saperavi, Meskhetian green, and Rkatsiteli, belonging to different haplogroups, were resequenced. The shotgun genomic libraries of grape cultivars were sequenced on an Illumina HiSeq. Pinot Noir nuclear, mitochondrial, and chloroplast DNA were used as reference. Mitochondrial DNA of Chkhaveri closely matches that of the reference Pinot noir mitochondrial DNA, with the exception of 16 SNPs found in the Chkhaveri mitochondrial DNA. The number of SNPs in mitochondrial DNA from Saperavi, Meskhetian green, and Rkatsiteli was 764, 702, and 822, respectively. Nuclear DNA differs from the reference by 1,800,675 nt in Chkhaveri, 1,063,063 nt in Meskhetian green, 2,174,995 in Saperavi, and 5,011,513 in Rkatsiteli. Unlike mtDNA Pinot noir, chromosomal DNA is closer to the Meskhetian green than to other cultivars. Substantial differences in the number of SNPs in mitochondrial and nuclear DNA of Chkhaveri and Pinot noir cultivars are explained by backcrossing or introgression of their wild predecessors before or during the process of domestication. Annotation of chromosomal DNA of Georgian grape cultivars by MEGANTE, a web-based annotation system, shows 66,745 predicted genes (Chkhaveri-17,409; Saperavi-17,021; Meskhetian green-18,355; and Rkatsiteli-13,960). Among them, 106 predicted genes and 43 pseudogenes of terpene synthase genes were found in chromosomes 12, 18 random (18R), and 19. Four novel TPS genes not present in reference Pinot noir DNA were detected. Two of them-germacrene A synthase (Chromosome 18R) and (-) germacrene D synthase (Chromosome 19) can be identified as putatively full-length proteins. This work performs the first attempt of the comparative whole genome analysis of different haplogroups

  9. Genomic Sequencing of Orientia tsutsugamushi Strain Karp, an Assembly Comparable to the Genome Size of the Strain Ikeda

    PubMed Central

    Liao, Hsiao-Mei; Chao, Chien-Chung; Lei, Haiyan; Li, Bingjie; Tsai, Shien; Hung, Guo-Chiuan

    2016-01-01

    Orientia tsutsugamushi, an intracellular bacterium, belongs to the family Rickettsiaceae. This study presents the draft genome sequence of strain Karp, with 2.0 Mb as the size of the completed genome. This nearly finished draft genome sequence was annotated with the RAST server and the contents compared to those of the other strains. PMID:27540052

  10. Complete chloroplast genome sequence of Elodea canadensis and comparative analyses with other monocot plastid genomes.

    PubMed

    Huotari, Tea; Korpelainen, Helena

    2012-10-15

    Elodea canadensis is an aquatic angiosperm native to North America. It has attracted great attention due to its invasive nature when transported to new areas in its non-native range. We have determined the complete nucleotide sequence of the chloroplast (cp) genome of Elodea. Taxonomically Elodea is a basal monocot, and only few monocot cp genomes representing early lineages of monocots have been sequenced so far. The genome is a circular double-stranded DNA molecule 156,700 bp in length, and has a typical structure with large (LSC 86,194 bp) and small (SSC 17,810 bp) single-copy regions separated by a pair of inverted repeats (IRs 26,348 bp each). The Elodea cp genome contains 113 unique genes and 16 duplicated genes in the IR regions. A comparative analysis showed that the gene order and organization of the Elodea cp genome is almost identical to that of Amborella trichopoda, a basal angiosperm. The structure of IRs in Elodea is unique among monocot species with the whole cp genome sequenced. In Elodea and another monocot Lemna minor the borders between IRs and LSC are located upstream of rps 19 gene and downstream of trnH-GUG gene, while in most monocots, IR has extended to include both trnH and rps 19 genes. A phylogenetic analysis conducted using Bayesian method, based on the DNA sequences of 81 chloroplast genes from 17 monocot taxa provided support for the placement of Elodea together with Lemna as a basal monocot and the next diverging lineage of monocots after Acorales. In comparison with other monocots, the Elodea cp genome has gone through only few rearrangements or gene losses. IR of Elodea has a unique structure among the monocot species studied so far as its structure is similar to t