Sample records for comparing genomic features

  1. Detection of genomic rearrangements in cucumber using genomecmp software

    NASA Astrophysics Data System (ADS)

    Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

    2017-08-01

    Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.

  2. A genomic view of food-related and probiotic Enterococcus strains

    PubMed Central

    Suárez, Nadia; Hormigo, Ricardo; Fadda, Silvina; Saavedra, Lucila

    2017-01-01

    Abstract The study of enterococcal genomes has grown considerably in recent years. While special attention is paid to comparative genomic analysis among clinical relevant isolates, in this study we performed an exhaustive comparative analysis of enterococcal genomes of food origin and/or with potential to be used as probiotics. Beyond common genetic features, we especially aimed to identify those that are specific to enterococcal strains isolated from a certain food-related source as well as features present in a species-specific manner. Thus, the genome sequences of 25 Enterococcus strains, from 7 different species, were examined and compared. Their phylogenetic relationship was reconstructed based on orthologous proteins and whole genomes. Likewise, markers associated with a successful colonization (bacteriocin genes and genomic islands) and genome plasticity (phages and clustered regularly interspaced short palindromic repeats) were investigated for lifestyle specific genetic features. At the same time, a search for antibiotic resistance genes was carried out, since they are of big concern in the food industry. Finally, it was possible to locate 1617 FIGfam families as a core proteome universally present among the genera and to determine that most of the accessory genes code for hypothetical proteins, providing reasonable hints to support their functional characterization. PMID:27773878

  3. Genomic insights into strategies used by Xanthomonas albilineans with its reduced artillery to spread within sugarcane xylem vessels.

    PubMed

    Pieretti, Isabelle; Royer, Monique; Barbe, Valérie; Carrere, Sébastien; Koebnik, Ralf; Couloux, Arnaud; Darrasse, Armelle; Gouzy, Jérôme; Jacques, Marie-Agnès; Lauber, Emmanuelle; Manceau, Charles; Mangenot, Sophie; Poussier, Stéphane; Segurens, Béatrice; Szurek, Boris; Verdier, Valérie; Arlat, Matthieu; Gabriel, Dean W; Rott, Philippe; Cociancich, Stéphane

    2012-11-21

    Xanthomonas albilineans causes leaf scald, a lethal disease of sugarcane. X. albilineans exhibits distinctive pathogenic mechanisms, ecology and taxonomy compared to other species of Xanthomonas. For example, this species produces a potent DNA gyrase inhibitor called albicidin that is largely responsible for inducing disease symptoms; its habitat is limited to xylem; and the species exhibits large variability. A first manuscript on the complete genome sequence of the highly pathogenic X. albilineans strain GPE PC73 focused exclusively on distinctive genomic features shared with Xylella fastidiosa-another xylem-limited Xanthomonadaceae. The present manuscript on the same genome sequence aims to describe all other pathogenicity-related genomic features of X. albilineans, and to compare, using suppression subtractive hybridization (SSH), genomic features of two strains differing in pathogenicity. Comparative genomic analyses showed that most of the known pathogenicity factors from other Xanthomonas species are conserved in X. albilineans, with the notable absence of two major determinants of the "artillery" of other plant pathogenic species of Xanthomonas: the xanthan gum biosynthesis gene cluster, and the type III secretion system Hrp (hypersensitive response and pathogenicity). Genomic features specific to X. albilineans that may contribute to specific adaptation of this pathogen to sugarcane xylem vessels were also revealed. SSH experiments led to the identification of 20 genes common to three highly pathogenic strains but missing in a less pathogenic strain. These 20 genes, which include four ABC transporter genes, a methyl-accepting chemotaxis protein gene and an oxidoreductase gene, could play a key role in pathogenicity. With the exception of hypothetical proteins revealed by our comparative genomic analyses and SSH experiments, no genes potentially involved in any offensive or counter-defensive mechanism specific to X. albilineans were identified, supposing that X. albilineans has a reduced artillery compared to other pathogenic Xanthomonas species. Particular attention has therefore been given to genomic features specific to X. albilineans making it more capable of evading sugarcane surveillance systems or resisting sugarcane defense systems. This study confirms that X. albilineans is a highly distinctive species within the genus Xanthomonas, and opens new perpectives towards a greater understanding of the pathogenicity of this destructive sugarcane pathogen.

  4. Comparative analysis and visualization of multiple collinear genomes

    PubMed Central

    2012-01-01

    Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

  5. A genomic view of food-related and probiotic Enterococcus strains.

    PubMed

    Bonacina, Julieta; Suárez, Nadia; Hormigo, Ricardo; Fadda, Silvina; Lechner, Marcus; Saavedra, Lucila

    2017-02-01

    The study of enterococcal genomes has grown considerably in recent years. While special attention is paid to comparative genomic analysis among clinical relevant isolates, in this study we performed an exhaustive comparative analysis of enterococcal genomes of food origin and/or with potential to be used as probiotics. Beyond common genetic features, we especially aimed to identify those that are specific to enterococcal strains isolated from a certain food-related source as well as features present in a species-specific manner. Thus, the genome sequences of 25 Enterococcus strains, from 7 different species, were examined and compared. Their phylogenetic relationship was reconstructed based on orthologous proteins and whole genomes. Likewise, markers associated with a successful colonization (bacteriocin genes and genomic islands) and genome plasticity (phages and clustered regularly interspaced short palindromic repeats) were investigated for lifestyle specific genetic features. At the same time, a search for antibiotic resistance genes was carried out, since they are of big concern in the food industry. Finally, it was possible to locate 1617 FIGfam families as a core proteome universally present among the genera and to determine that most of the accessory genes code for hypothetical proteins, providing reasonable hints to support their functional characterization. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  6. PLAZA 3.0: an access point for plant comparative genomics

    PubMed Central

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. PMID:25324309

  7. Meta-Analysis of DNA Tumor-Viral Integration Site Selection Indicates a Role for Repeats, Gene Expression and Epigenetics

    PubMed Central

    Doolittle-Hall, Janet M.; Cunningham Glasspoole, Danielle L.; Seaman, William T.; Webster-Cyriaque, Jennifer

    2015-01-01

    Oncoviruses cause tremendous global cancer burden. For several DNA tumor viruses, human genome integration is consistently associated with cancer development. However, genomic features associated with tumor viral integration are poorly understood. We sought to define genomic determinants for 1897 loci prone to hosting human papillomavirus (HPV), hepatitis B virus (HBV) or Merkel cell polyomavirus (MCPyV). These were compared to HIV, whose enzyme-mediated integration is well understood. A comprehensive catalog of integration sites was constructed from the literature and experimentally-determined HPV integration sites. Features were scored in eight categories (genes, expression, open chromatin, histone modifications, methylation, protein binding, chromatin segmentation and repeats) and compared to random loci. Random forest models determined loci classification and feature selection. HPV and HBV integrants were not fragile site associated. MCPyV preferred integration near sensory perception genes. Unique signatures of integration-associated predictive genomic features were detected. Importantly, repeats, actively-transcribed regions and histone modifications were common tumor viral integration signatures. PMID:26569308

  8. EDGAR: A software framework for the comparative analysis of prokaryotic genomes

    PubMed Central

    Blom, Jochen; Albaum, Stefan P; Doppmeier, Daniel; Pühler, Alfred; Vorhölter, Frank-Jörg; Zakrzewski, Martha; Goesmann, Alexander

    2009-01-01

    Background The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons. Results To support these studies EDGAR – "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" – was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy. Conclusion EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface , where the precomputed data sets can be browsed. PMID:19457249

  9. Evolutionary and comparative analyses of the soybean genome

    PubMed Central

    Cannon, Steven B.; Shoemaker, Randy C.

    2012-01-01

    The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483

  10. Understanding the direction of evolution in Burkholderia glumae through comparative genomics.

    PubMed

    Lee, Hyun-Hee; Park, Jungwook; Kim, Jinnyun; Park, Inmyoung; Seo, Young-Su

    2016-02-01

    Members of the genus Burkholderia occupy remarkably diverse niches, with genome sizes ranging from ~3.75 to 11.29 Mbp. The genome of Burkholderia glumae ranges in size from ~5.81 to 7.89 Mbp. Unlike other plant pathogenic bacteria, B. glumae can infect a wide range of monocot and dicot plants. Comparative genome analysis of B. glumae strains can provide insight into genome variation as well as differential features of whole metabolism or pathways between multiple strains of B. glumae infecting the same host. Comparative analysis of complete genomes among B. glumae BGR1, B. glumae LMG 2196, and B. glumae PG1 revealed the largest departmentalization of genes onto separate replicons in B. glumae BGR1 and considerable downsizing of the genome in B. glumae LMG 2196. In addition, the presence of large-scale evolutionary events such as rearrangement and inversion and the development of highly specialized systems were found to be related to virulence-associated features in the three B. glumae strains. This connection may explain why this bacterium broadens its host range and reinforces its interaction with hosts.

  11. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  12. How much does the amphioxus genome represent the ancestor of chordates?

    PubMed

    Louis, Alexandra; Roest Crollius, Hugues; Robinson-Rechavi, Marc

    2012-03-01

    One of the main motivations to study amphioxus is its potential for understanding the last common ancestor of chordates, which notably gave rise to the vertebrates. An important feature in this respect is the slow evolutionary rate that seems to have characterized the cephalochordate lineage, making amphioxus an interesting proxy for the chordate ancestor, as well as a key lineage to include in comparative studies. Whereas slow evolution was first noticed at the phenotypic level, it has also been described at the genomic level. Here, we examine whether the amphioxus genome is indeed a good proxy for the genome of the chordate ancestor, with a focus on protein-coding genes. We investigate genome features, such as synteny, gene duplication and gene loss, and contrast the amphioxus genome with those of other deuterostomes that are used in comparative studies, such as Ciona, Oikopleura and urchin.

  13. PLAZA 3.0: an access point for plant comparative genomics.

    PubMed

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  14. Genomes as geography: using GIS technology to build interactive genome feature maps

    PubMed Central

    Dolan, Mary E; Holden, Constance C; Beard, M Kate; Bult, Carol J

    2006-01-01

    Background Many commonly used genome browsers display sequence annotations and related attributes as horizontal data tracks that can be toggled on and off according to user preferences. Most genome browsers use only simple keyword searches and limit the display of detailed annotations to one chromosomal region of the genome at a time. We have employed concepts, methodologies, and tools that were developed for the display of geographic data to develop a Genome Spatial Information System (GenoSIS) for displaying genomes spatially, and interacting with genome annotations and related attribute data. In contrast to the paradigm of horizontally stacked data tracks used by most genome browsers, GenoSIS uses the concept of registered spatial layers composed of spatial objects for integrated display of diverse data. In addition to basic keyword searches, GenoSIS supports complex queries, including spatial queries, and dynamically generates genome maps. Our adaptation of the geographic information system (GIS) model in a genome context supports spatial representation of genome features at multiple scales with a versatile and expressive query capability beyond that supported by existing genome browsers. Results We implemented an interactive genome sequence feature map for the mouse genome in GenoSIS, an application that uses ArcGIS, a commercially available GIS software system. The genome features and their attributes are represented as spatial objects and data layers that can be toggled on and off according to user preferences or displayed selectively in response to user queries. GenoSIS supports the generation of custom genome maps in response to complex queries about genome features based on both their attributes and locations. Our example application of GenoSIS to the mouse genome demonstrates the powerful visualization and query capability of mature GIS technology applied in a novel domain. Conclusion Mapping tools developed specifically for geographic data can be exploited to display, explore and interact with genome data. The approach we describe here is organism independent and is equally useful for linear and circular chromosomes. One of the unique capabilities of GenoSIS compared to existing genome browsers is the capacity to generate genome feature maps dynamically in response to complex attribute and spatial queries. PMID:16984652

  15. Bluejay 1.0: genome browsing and comparison with rich customization provision and dynamic resource linking

    PubMed Central

    Soh, Jung; Gordon, Paul MK; Taschuk, Morgan L; Dong, Anguo; Ah-Seng, Andrew C; Turinsky, Andrei L; Sensen, Christoph W

    2008-01-01

    Background The Bluejay genome browser has been developed over several years to address the challenges posed by the ever increasing number of data types as well as the increasing volume of data in genome research. Beginning with a browser capable of rendering views of XML-based genomic information and providing scalable vector graphics output, we have now completed version 1.0 of the system with many additional features. Our development efforts were guided by our observation that biologists who use both gene expression profiling and comparative genomics gain functional insights above and beyond those provided by traditional per-gene analyses. Results Bluejay 1.0 is a genome viewer integrating genome annotation with: (i) gene expression information; and (ii) comparative analysis with an unlimited number of other genomes in the same view. This allows the biologist to see a gene not just in the context of its genome, but also its regulation and its evolution. Bluejay now has rich provision for personalization by users: (i) numerous display customization features; (ii) the availability of waypoints for marking multiple points of interest on a genome and subsequently utilizing them; and (iii) the ability to take user relevance feedback of annotated genes or textual items to offer personalized recommendations. Bluejay 1.0 also embeds the Seahawk browser for the Moby protocol, enabling users to seamlessly invoke hundreds of Web Services on genomic data of interest without any hard-coding. Conclusion Bluejay offers a unique set of customizable genome-browsing features, with the goal of allowing biologists to quickly focus on, analyze, compare, and retrieve related information on the parts of the genomic data they are most interested in. We expect these capabilities of Bluejay to benefit the many biologists who want to answer complex questions using the information available from completely sequenced genomes. PMID:18940007

  16. [Advance on genome research of Yersinia pestis bacteriophage].

    PubMed

    Tan, H L; Wang, P; Li, W

    2017-04-10

    Completion of the genome sequences on Yersinia pestis bacteriophage offered unprecedented opportunity for researchers to carry out related genomic studies. This review was based on the genomic sequences and provided a genomic perspective in describing the essential features of genome on Yersinia pestis bacteriophage. Based on the comparative genomics, genetic evolutionary relationship was discussed. Description of functions from the gene prediction and protein annotation provided evidence for further related studies.

  17. Comparative genome analysis of rice-pathogenic Burkholderia provides insight into capacity to adapt to different environments and hosts.

    PubMed

    Seo, Young-Su; Lim, Jae Yun; Park, Jungwook; Kim, Sunyoung; Lee, Hyun-Hee; Cheong, Hoon; Kim, Sang-Mok; Moon, Jae Sun; Hwang, Ingyu

    2015-05-06

    In addition to human and animal diseases, bacteria of the genus Burkholderia can cause plant diseases. The representative species of rice-pathogenic Burkholderia are Burkholderia glumae, B. gladioli, and B. plantarii, which primarily cause grain rot, sheath rot, and seedling blight, respectively, resulting in severe reductions in rice production. Though Burkholderia rice pathogens cause problems in rice-growing countries, comprehensive studies of these rice-pathogenic species aiming to control Burkholderia-mediated diseases are only in the early stages. We first sequenced the complete genome of B. plantarii ATCC 43733T. Second, we conducted comparative analysis of the newly sequenced B. plantarii ATCC 43733T genome with eleven complete or draft genomes of B. glumae and B. gladioli strains. Furthermore, we compared the genome of three rice Burkholderia pathogens with those of other Burkholderia species such as those found in environmental habitats and those known as animal/human pathogens. These B. glumae, B. gladioli, and B. plantarii strains have unique genes involved in toxoflavin or tropolone toxin production and the clustered regularly interspaced short palindromic repeats (CRISPR)-mediated bacterial immune system. Although the genome of B. plantarii ATCC 43733T has many common features with those of B. glumae and B. gladioli, this B. plantarii strain has several unique features, including quorum sensing and CRISPR/CRISPR-associated protein (Cas) systems. The complete genome sequence of B. plantarii ATCC 43733T and publicly available genomes of B. glumae BGR1 and B. gladioli BSR3 enabled comprehensive comparative genome analyses among three rice-pathogenic Burkholderia species responsible for tissue rotting and seedling blight. Our results suggest that B. glumae has evolved rapidly, or has undergone rapid genome rearrangements or deletions, in response to the hosts. It also, clarifies the unique features of rice pathogenic Burkholderia species relative to other animal and human Burkholderia species.

  18. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).

    PubMed

    Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C

    2015-01-01

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.

  19. Exploring the genetic architecture and improving genomic prediction accuracy for mastitis and milk production traits in dairy cattle by mapping variants to hepatic transcriptomic regions responsive to intra-mammary infection.

    PubMed

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-05-12

    A better understanding of the genetic architecture of complex traits can contribute to improve genomic prediction. We hypothesized that genomic variants associated with mastitis and milk production traits in dairy cattle are enriched in hepatic transcriptomic regions that are responsive to intra-mammary infection (IMI). Genomic markers [e.g. single nucleotide polymorphisms (SNPs)] from those regions, if included, may improve the predictive ability of a genomic model. We applied a genomic feature best linear unbiased prediction model (GFBLUP) to implement the above strategy by considering the hepatic transcriptomic regions responsive to IMI as genomic features. GFBLUP, an extension of GBLUP, includes a separate genomic effect of SNPs within a genomic feature, and allows differential weighting of the individual marker relationships in the prediction equation. Since GFBLUP is computationally intensive, we investigated whether a SNP set test could be a computationally fast way to preselect predictive genomic features. The SNP set test assesses the association between a genomic feature and a trait based on single-SNP genome-wide association studies. We applied these two approaches to mastitis and milk production traits (milk, fat and protein yield) in Holstein (HOL, n = 5056) and Jersey (JER, n = 1231) cattle. We observed that a majority of genomic features were enriched in genomic variants that were associated with mastitis and milk production traits. Compared to GBLUP, the accuracy of genomic prediction with GFBLUP was marginally improved (3.2 to 3.9%) in within-breed prediction. The highest increase (164.4%) in prediction accuracy was observed in across-breed prediction. The significance of genomic features based on the SNP set test were correlated with changes in prediction accuracy of GFBLUP (P < 0.05). GFBLUP provides a framework for integrating multiple layers of biological knowledge to provide novel insights into the biological basis of complex traits, and to improve the accuracy of genomic prediction. The SNP set test might be used as a first-step to improve GFBLUP models. Approaches like GFBLUP and SNP set test will become increasingly useful, as the functional annotations of genomes keep accumulating for a range of species and traits.

  20. Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality

    PubMed Central

    Simola, Daniel F.; Wissler, Lothar; Donahue, Greg; Waterhouse, Robert M.; Helmkampf, Martin; Roux, Julien; Nygaard, Sanne; Glastad, Karl M.; Hagen, Darren E.; Viljakainen, Lumi; Reese, Justin T.; Hunt, Brendan G.; Graur, Dan; Elhaik, Eran; Kriventseva, Evgenia V.; Wen, Jiayu; Parker, Brian J.; Cash, Elizabeth; Privman, Eyal; Childers, Christopher P.; Muñoz-Torres, Monica C.; Boomsma, Jacobus J.; Bornberg-Bauer, Erich; Currie, Cameron R.; Elsik, Christine G.; Suen, Garret; Goodisman, Michael A.D.; Keller, Laurent; Liebig, Jürgen; Rawls, Alan; Reinberg, Danny; Smith, Chris D.; Smith, Chris R.; Tsutsui, Neil; Wurm, Yannick; Zdobnov, Evgeny M.; Berger, Shelley L.; Gadau, Jürgen

    2013-01-01

    Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor–binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the “socio-genomes” of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations. PMID:23636946

  1. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

    DOE PAGES

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...

    2015-10-26

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.

  2. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos

    The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.

  3. Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis.

    PubMed

    Wang, Yan; Stata, Matt; Wang, Wei; Stajich, Jason E; White, Merlin M; Moncalvo, Jean-Marc

    2018-05-15

    Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. IMPORTANCE Insect guts harbor various microbes that are important for host digestion, immune response, and disease dispersal in certain cases. Bacteria, which are among the primary endosymbionts, have been studied extensively. However, fungi, which are also frequently encountered, are poorly known with respect to their biology within the insect guts. To understand the genomic features and related biology, we produced the whole-genome sequences of nine gut commensal fungi from disease-bearing insects (black flies, midges, and mosquitoes). The results show that insect gut fungi tend to have low GC content across their genomes. By comparing these commensals with entomopathogenic and free-living fungi that have available genome sequences, we found a universal core gene toolbox that is unique and thus potentially important for the insect-fungus symbiosis. This comparative work also uncovered different host invasion strategies employed by insect pathogens and commensals, as well as a model system to study ancient fungal genome duplication within the gut of insects. © Crown copyright 2018.

  4. The integrated microbial genome resource of analysis.

    PubMed

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  5. Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

    PubMed

    Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

    2009-01-01

    Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.

  6. Sheep genome functional annotation reveals proximal regulatory elements contributed to the evolution of modern breeds.

    PubMed

    Naval-Sanchez, Marina; Nguyen, Quan; McWilliam, Sean; Porto-Neto, Laercio R; Tellam, Ross; Vuocolo, Tony; Reverter, Antonio; Perez-Enciso, Miguel; Brauning, Rudiger; Clarke, Shannon; McCulloch, Alan; Zamani, Wahid; Naderi, Saeid; Rezaei, Hamid Reza; Pompanon, Francois; Taberlet, Pierre; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Jhangiani, Shalini N; Cockett, Noelle; Daetwyler, Hans; Kijas, James

    2018-02-28

    Domestication fundamentally reshaped animal morphology, physiology and behaviour, offering the opportunity to investigate the molecular processes driving evolutionary change. Here we assess sheep domestication and artificial selection by comparing genome sequence from 43 modern breeds (Ovis aries) and their Asian mouflon ancestor (O. orientalis) to identify selection sweeps. Next, we provide a comparative functional annotation of the sheep genome, validated using experimental ChIP-Seq of sheep tissue. Using these annotations, we evaluate the impact of selection and domestication on regulatory sequences and find that sweeps are significantly enriched for protein coding genes, proximal regulatory elements of genes and genome features associated with active transcription. Finally, we find individual sites displaying strong allele frequency divergence are enriched for the same regulatory features. Our data demonstrate that remodelling of gene expression is likely to have been one of the evolutionary forces that drove phenotypic diversification of this common livestock species.

  7. Independent evolution of genomic characters during major metazoan transitions.

    PubMed

    Simakov, Oleg; Kawashima, Takeshi

    2017-07-15

    Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  8. Genomic diversity and versatility of Lactobacillus plantarum, a natural metabolic engineer.

    PubMed

    Siezen, Roland J; van Hylckama Vlieg, Johan E T

    2011-08-30

    In the past decade it has become clear that the lactic acid bacterium Lactobacillus plantarum occupies a diverse range of environmental niches and has an enormous diversity in phenotypic properties, metabolic capacity and industrial applications. In this review, we describe how genome sequencing, comparative genome hybridization and comparative genomics has provided insight into the underlying genomic diversity and versatility of L. plantarum. One of the main features appears to be genomic life-style islands consisting of numerous functional gene cassettes, in particular for carbohydrates utilization, which can be acquired, shuffled, substituted or deleted in response to niche requirements. In this sense, L. plantarum can be considered a "natural metabolic engineer".

  9. Genomic diversity and versatility of Lactobacillus plantarum, a natural metabolic engineer

    PubMed Central

    2011-01-01

    In the past decade it has become clear that the lactic acid bacterium Lactobacillus plantarum occupies a diverse range of environmental niches and has an enormous diversity in phenotypic properties, metabolic capacity and industrial applications. In this review, we describe how genome sequencing, comparative genome hybridization and comparative genomics has provided insight into the underlying genomic diversity and versatility of L. plantarum. One of the main features appears to be genomic life-style islands consisting of numerous functional gene cassettes, in particular for carbohydrates utilization, which can be acquired, shuffled, substituted or deleted in response to niche requirements. In this sense, L. plantarum can be considered a “natural metabolic engineer”. PMID:21995294

  10. Carnivore-specific SINEs (Can-SINEs): distribution, evolution, and genomic impact.

    PubMed

    Walters-Conte, Kathryn B; Johnson, Diana L E; Allard, Marc W; Pecon-Slattery, Jill

    2011-01-01

    Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics.

  11. Carnivore-Specific SINEs (Can-SINEs): Distribution, Evolution, and Genomic Impact

    PubMed Central

    Johnson, Diana L.E.; Allard, Marc W.; Pecon-Slattery, Jill

    2011-01-01

    Short interspersed nuclear elements (SINEs) are a type of class 1 transposable element (retrotransposon) with features that allow investigators to resolve evolutionary relationships between populations and species while providing insight into genome composition and function. Characterization of a Carnivora-specific SINE family, Can-SINEs, has, has aided comparative genomic studies by providing rare genomic changes, and neutral sequence variants often needed to resolve difficult evolutionary questions. In addition, Can-SINEs constitute a significant source of functional diversity with Carnivora. Publication of the whole-genome sequence of domestic dog, domestic cat, and giant panda serves as a valuable resource in comparative genomic inferences gleaned from Can-SINEs. In anticipation of forthcoming studies bolstered by new genomic data, this review describes the discovery and characterization of Can-SINE motifs as well as describes composition, distribution, and effect on genome function. As the contribution of noncoding sequences to genomic diversity becomes more apparent, SINEs and other transposable elements will play an increasingly large role in mammalian comparative genomics. PMID:21846743

  12. Genome-Wide Analysis of Transposon and Retroviral Insertions Reveals Preferential Integrations in Regions of DNA Flexibility.

    PubMed

    Vrljicak, Pavle; Tao, Shijie; Varshney, Gaurav K; Quach, Helen Ngoc Bao; Joshi, Adita; LaFave, Matthew C; Burgess, Shawn M; Sampath, Karuna

    2016-04-07

    DNA transposons and retroviruses are important transgenic tools for genome engineering. An important consideration affecting the choice of transgenic vector is their insertion site preferences. Previous large-scale analyses of Ds transposon integration sites in plants were done on the basis of reporter gene expression or germ-line transmission, making it difficult to discern vertebrate integration preferences. Here, we compare over 1300 Ds transposon integration sites in zebrafish with Tol2 transposon and retroviral integration sites. Genome-wide analysis shows that Ds integration sites in the presence or absence of marker selection are remarkably similar and distributed throughout the genome. No strict motif was found, but a preference for structural features in the target DNA associated with DNA flexibility (Twist, Tilt, Rise, Roll, Shift, and Slide) was observed. Remarkably, this feature is also found in transposon and retroviral integrations in maize and mouse cells. Our findings show that structural features influence the integration of heterologous DNA in genomes, and have implications for targeted genome engineering. Copyright © 2016 Vrljicak et al.

  13. Signatures of cytoplasmic proteins in the exoproteome distinguish community- and hospital-associated methicillin-resistant Staphylococcus aureus USA300 lineages.

    PubMed

    Mekonnen, Solomon A; Palma Medina, Laura M; Glasner, Corinna; Tsompanidou, Eleni; de Jong, Anne; Grasso, Stefano; Schaffer, Marc; Mäder, Ulrike; Larsen, Anders R; Gumpert, Heidi; Westh, Henrik; Völker, Uwe; Otto, Andreas; Becher, Dörte; van Dijl, Jan Maarten

    2017-08-18

    Methicillin-resistant Staphylococcus aureus (MRSA) is the common name for a heterogeneous group of highly drug-resistant staphylococci. Two major MRSA classes are distinguished based on epidemiology, namely community-associated (CA) and hospital-associated (HA) MRSA. Notably, the distinction of CA- and HA-MRSA based on molecular traits remains difficult due to the high genomic plasticity of S. aureus. Here we sought to pinpoint global distinguishing features of CA- and HA-MRSA through a comparative genome and proteome analysis of the notorious MRSA lineage USA300. We show for the first time that CA- and HA-MRSA isolates can be distinguished by 2 distinct extracellular protein abundance clusters that are predictive not only for epidemiologic behavior, but also for their growth and survival within epithelial cells. This 'exoproteome profiling' also groups more distantly related HA-MRSA isolates into the HA exoproteome cluster. Comparative genome analysis suggests that these distinctive features of CA- and HA-MRSA isolates relate predominantly to the accessory genome. Intriguingly, the identified exoproteome clusters differ in the relative abundance of typical cytoplasmic proteins, suggesting that signatures of cytoplasmic proteins in the exoproteome represent a new distinguishing feature of CA- and HA-MRSA. Our comparative genome and proteome analysis focuses attention on potentially distinctive roles of 'liberated' cytoplasmic proteins in the epidemiology and intracellular survival of CA- and HA-MRSA isolates. Such extracellular cytoplasmic proteins were recently invoked in staphylococcal virulence, but their implication in the epidemiology of MRSA is unprecedented.

  14. Mutational Dynamics of Aroid Chloroplast Genomes

    PubMed Central

    Ahmed, Ibrar; Biggs, Patrick J.; Matthews, Peter J.; Collins, Lesley J.; Hendy, Michael D.; Lockhart, Peter J.

    2012-01-01

    A characteristic feature of eukaryote and prokaryote genomes is the co-occurrence of nucleotide substitution and insertion/deletion (indel) mutations. Although similar observations have also been made for chloroplast DNA, genome-wide associations have not been reported. We determined the chloroplast genome sequences for two morphotypes of taro (Colocasia esculenta; family Araceae) and compared these with four publicly available aroid chloroplast genomes. Here, we report the extent of genome-wide association between direct and inverted repeats, indels, and substitutions in these aroid chloroplast genomes. We suggest that alternative but not mutually exclusive hypotheses explain the mutational dynamics of chloroplast genome evolution. PMID:23204304

  15. GenomicusPlants: a web resource to study genome evolution in flowering plants.

    PubMed

    Louis, Alexandra; Murat, Florent; Salse, Jérôme; Crollius, Hugues Roest

    2015-01-01

    Comparative genomics combined with phylogenetic reconstructions are powerful approaches to study the evolution of genes and genomes. However, the current rapid expansion of the volume of genomic information makes it increasingly difficult to interrogate, integrate and synthesize comparative genome data while taking into account the maximum breadth of information available. GenomicusPlants (http://www.genomicus.biologie.ens.fr/genomicus-plants) is an extension of the Genomicus webserver that addresses this issue by allowing users to explore flowering plant genomes in an intuitive way, across the broadest evolutionary scales. Extant genomes of 26 flowering plants can be analyzed, as well as 23 ancestral reconstructed genomes. Ancestral gene order provides a long-term chronological view of gene order evolution, greatly facilitating comparative genomics and evolutionary studies. Four main interfaces ('views') are available where: (i) PhyloView combines phylogenetic trees with comparisons of genomic loci across any number of genomes; (ii) AlignView projects loci of interest against all other genomes to visualize its topological conservation; (iii) MatrixView compares two genomes in a classical dotplot representation; and (iv) Karyoview visualizes chromosome karyotypes 'painted' with colours of another genome of interest. All four views are interconnected and benefit from many customizable features. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists.

  16. The mitochondrial genome of the ascalaphid owlfly Libelloides macaronius and comparative evolutionary mitochondriomics of neuropterid insects

    PubMed Central

    2011-01-01

    Background The insect order Neuroptera encompasses more than 5,700 described species. To date, only three neuropteran mitochondrial genomes have been fully and one partly sequenced. Current knowledge on neuropteran mitochondrial genomes is limited, and new data are strongly required. In the present work, the mitochondrial genome of the ascalaphid owlfly Libelloides macaronius is described and compared with the known neuropterid mitochondrial genomes: Megaloptera, Neuroptera and Raphidioptera. These analyses are further extended to other endopterygotan orders. Results The mitochondrial genome of L. macaronius is a circular molecule 15,890 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. The gene order of this newly sequenced genome is unique among Neuroptera and differs from the ancestral type of insects in the translocation of trnC. The L. macaronius genome shows the lowest A+T content (74.50%) among known neuropterid genomes. Protein-coding genes possess the typical mitochondrial start codons, except for cox1, which has an unusual ACG. Comparisons among endopterygotan mitochondrial genomes showed that A+T content and AT/GC-skews exhibit a broad range of variation among 84 analyzed taxa. Comparative analyses showed that neuropterid mitochondrial protein-coding genes experienced complex evolutionary histories, involving features ranging from codon usage to rate of substitution, that make them potential markers for population genetics/phylogenetics studies at different taxonomic ranks. The 22 tRNAs show variable substitution patterns in Neuropterida, with higher sequence conservation in genes located on the α strand. Inferred secondary structures for neuropterid rrnS and rrnL genes largely agree with those known for other insects. For the first time, a model is provided for domain I of an insect rrnL. The control region in Neuropterida, as in other insects, is fast-evolving genomic region, characterized by AT-rich motifs. Conclusions The new genome shares many features with known neuropteran genomes but differs in its low A+T content. Comparative analysis of neuropterid mitochondrial genes showed that they experienced distinct evolutionary patterns. Both tRNA families and ribosomal RNAs show composite substitution pathways. The neuropterid mitochondrial genome is characterized by a complex evolutionary history. PMID:21569260

  17. Genome-wide array-based comparative genomic hybridization (array-CGH) analysis in Aicardi Syndrome

    USDA-ARS?s Scientific Manuscript database

    Aicardi syndrome is characterized by agenesis of the corpus callosum, chorioretinal lacunae, severe seizures (starting as infantile spasms), neuronal migration defects, mental retardation, costovertebral defects, and typical facial features. Because Aicardi syndrome is sporadic and affects only fem...

  18. Genome-wide phylogenetic analysis of the pathogenic potential of Vibrio furnissii

    PubMed Central

    Lux, Thomas M.; Lee, Rob; Love, John

    2014-01-01

    We recently reported the genome sequence of a free-living strain of Vibrio furnissii (NCTC 11218) harvested from an estuarine environment. V. furnissii is a widespread, free-living proteobacterium and emerging pathogen that can cause acute gastroenteritis in humans and lethal zoonoses in aquatic invertebrates, including farmed crustaceans and molluscs. Here we present the analyses to assess the potential pathogenic impact of V. furnissii. We compared the complete genome of V. furnissii with 8 other emerging and pathogenic Vibrio species. We selected and analyzed more deeply 10 genomic regions based upon unique or common features, and used 3 of these regions to construct a phylogenetic tree. Thus, we positioned V. furnissii more accurately than before and revealed a closer relationship between V. furnissii and V. cholerae than previously thought. However, V. furnissii lacks several important features normally associated with virulence in the human pathogens V. cholera and V. vulnificus. A striking feature of the V. furnissii genome is the hugely increased Super Integron, compared to the other Vibrio. Analyses of predicted genomic islands resulted in the discovery of a protein sequence that is present only in Vibrio associated with diseases in aquatic animals. We also discovered evidence of high levels horizontal gene transfer in V. furnissii. V. furnissii seems therefore to have a dynamic and fluid genome that could quickly adapt to environmental perturbation or increase its pathogenicity. Taken together, these analyses confirm the potential of V. furnissii as an emerging marine and possible human pathogen, especially in the developing, tropical, coastal regions that are most at risk from climate change. PMID:25191313

  19. Genome-wide phylogenetic analysis of the pathogenic potential of Vibrio furnissii.

    PubMed

    Lux, Thomas M; Lee, Rob; Love, John

    2014-01-01

    We recently reported the genome sequence of a free-living strain of Vibrio furnissii (NCTC 11218) harvested from an estuarine environment. V. furnissii is a widespread, free-living proteobacterium and emerging pathogen that can cause acute gastroenteritis in humans and lethal zoonoses in aquatic invertebrates, including farmed crustaceans and molluscs. Here we present the analyses to assess the potential pathogenic impact of V. furnissii. We compared the complete genome of V. furnissii with 8 other emerging and pathogenic Vibrio species. We selected and analyzed more deeply 10 genomic regions based upon unique or common features, and used 3 of these regions to construct a phylogenetic tree. Thus, we positioned V. furnissii more accurately than before and revealed a closer relationship between V. furnissii and V. cholerae than previously thought. However, V. furnissii lacks several important features normally associated with virulence in the human pathogens V. cholera and V. vulnificus. A striking feature of the V. furnissii genome is the hugely increased Super Integron, compared to the other Vibrio. Analyses of predicted genomic islands resulted in the discovery of a protein sequence that is present only in Vibrio associated with diseases in aquatic animals. We also discovered evidence of high levels horizontal gene transfer in V. furnissii. V. furnissii seems therefore to have a dynamic and fluid genome that could quickly adapt to environmental perturbation or increase its pathogenicity. Taken together, these analyses confirm the potential of V. furnissii as an emerging marine and possible human pathogen, especially in the developing, tropical, coastal regions that are most at risk from climate change.

  20. Comparative genomics of bdelloid rotifers: Insights from desiccating and nondesiccating species

    PubMed Central

    Almeida, Pedro; Wilson, Christopher G.; Smith, Thomas P.; Fontaneto, Diego; Crisp, Alastair; Micklem, Gos; Tunnacliffe, Alan

    2018-01-01

    Bdelloid rotifers are a class of microscopic invertebrates that have existed for millions of years apparently without sex or meiosis. They inhabit a variety of temporary and permanent freshwater habitats globally, and many species are remarkably tolerant of desiccation. Bdelloids offer an opportunity to better understand the evolution of sex and recombination, but previous work has emphasised desiccation as the cause of several unusual genomic features in this group. Here, we present high-quality whole-genome sequences of 3 bdelloid species: Rotaria macrura and R. magnacalcarata, which are both desiccation intolerant, and Adineta ricciae, which is desiccation tolerant. In combination with the published assembly of A. vaga, which is also desiccation tolerant, we apply a comparative genomics approach to evaluate the potential effects of desiccation tolerance and asexuality on genome evolution in bdelloids. We find that ancestral tetraploidy is conserved among all 4 bdelloid species, but homologous divergence in obligately aquatic Rotaria genomes is unexpectedly low. This finding is contrary to current models regarding the role of desiccation in shaping bdelloid genomes. In addition, we find that homologous regions in A. ricciae are largely collinear and do not form palindromic repeats as observed in the published A. vaga assembly. Consequently, several features interpreted as genomic evidence for long-term ameiotic evolution are not general to all bdelloid species, even within the same genus. Finally, we substantiate previous findings of high levels of horizontally transferred nonmetazoan genes in both desiccating and nondesiccating bdelloid species and show that this unusual feature is not shared by other animal phyla, even those with desiccation-tolerant representatives. These comparisons call into question the proposed role of desiccation in mediating horizontal genetic transfer. PMID:29689044

  1. The UCSC Genome Browser database: extensions and updates 2013.

    PubMed

    Meyer, Laurence R; Zweig, Ann S; Hinrichs, Angie S; Karolchik, Donna; Kuhn, Robert M; Wong, Matthew; Sloan, Cricket A; Rosenbloom, Kate R; Roe, Greg; Rhead, Brooke; Raney, Brian J; Pohl, Andy; Malladi, Venkat S; Li, Chin H; Lee, Brian T; Learned, Katrina; Kirkup, Vanessa; Hsu, Fan; Heitner, Steve; Harte, Rachel A; Haeussler, Maximilian; Guruvadoo, Luvina; Goldman, Mary; Giardine, Belinda M; Fujita, Pauline A; Dreszer, Timothy R; Diekhans, Mark; Cline, Melissa S; Clawson, Hiram; Barber, Galt P; Haussler, David; Kent, W James

    2013-01-01

    The University of California Santa Cruz (UCSC) Genome Browser (http://genome.ucsc.edu) offers online public access to a growing database of genomic sequence and annotations for a wide variety of organisms. The Browser is an integrated tool set for visualizing, comparing, analysing and sharing both publicly available and user-generated genomic datasets. As of September 2012, genomic sequence and a basic set of annotation 'tracks' are provided for 63 organisms, including 26 mammals, 13 non-mammal vertebrates, 3 invertebrate deuterostomes, 13 insects, 6 worms, yeast and sea hare. In the past year 19 new genome assemblies have been added, and we anticipate releasing another 28 in early 2013. Further, a large number of annotation tracks have been either added, updated by contributors or remapped to the latest human reference genome. Among these are an updated UCSC Genes track for human and mouse assemblies. We have also introduced several features to improve usability, including new navigation menus. This article provides an update to the UCSC Genome Browser database, which has been previously featured in the Database issue of this journal.

  2. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  3. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

    PubMed Central

    Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

    2007-01-01

    MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813

  4. Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics

    PubMed Central

    Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

    2015-01-01

    The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326

  5. Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis

    PubMed Central

    Stata, Matt; Wang, Wei; White, Merlin M.; Moncalvo, Jean-Marc

    2018-01-01

    ABSTRACT Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. PMID:29764946

  6. Comparative chloroplast genomics: Analyses including new sequencesfrom the angiosperms Nuphar advena and Ranunculus macranthus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raubeso, Linda A.; Peery, Rhiannon; Chumley, Timothy W.

    2007-03-01

    The number of completely sequenced plastid genomes available is growing rapidly. This new array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is most useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the new genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (from the basal group of eudicots). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as proteinmore » coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition.« less

  7. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks

    NASA Astrophysics Data System (ADS)

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-01

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named “DeepMethyl” to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  8. Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

    PubMed

    Wang, Yiheng; Liu, Tong; Xu, Dong; Shi, Huidong; Zhang, Chaoyang; Mo, Yin-Yuan; Wang, Zheng

    2016-01-22

    The hypo- or hyper-methylation of the human genome is one of the epigenetic features of leukemia. However, experimental approaches have only determined the methylation state of a small portion of the human genome. We developed deep learning based (stacked denoising autoencoders, or SdAs) software named "DeepMethyl" to predict the methylation state of DNA CpG dinucleotides using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We used the experimental data from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines to train the learning models and assess prediction performance. We have tested various SdA architectures with different configurations of hidden layer(s) and amount of pre-training data and compared the performance of deep networks relative to support vector machines (SVMs). Using the methylation states of sequentially neighboring regions as one of the learning features, an SdA achieved a blind test accuracy of 89.7% for GM12878 and 88.6% for K562. When the methylation states of sequentially neighboring regions are unknown, the accuracies are 84.82% for GM12878 and 72.01% for K562. We also analyzed the contribution of genome topological features inferred from Hi-C. DeepMethyl can be accessed at http://dna.cs.usm.edu/deepmethyl/.

  9. The Complete Chloroplast and Mitochondrial Genome Sequences of Boea hygrometrica: Insights into the Evolution of Plant Organellar Genomes

    PubMed Central

    Wang, Xumin; Deng, Xin; Zhang, Xiaowei; Hu, Songnian; Yu, Jun

    2012-01-01

    The complete nucleotide sequences of the chloroplast (cp) and mitochondrial (mt) genomes of resurrection plant Boea hygrometrica (Bh, Gesneriaceae) have been determined with the lengths of 153,493 bp and 510,519 bp, respectively. The smaller chloroplast genome contains more genes (147) with a 72% coding sequence, and the larger mitochondrial genome have less genes (65) with a coding faction of 12%. Similar to other seed plants, the Bh cp genome has a typical quadripartite organization with a conserved gene in each region. The Bh mt genome has three recombinant sequence repeats of 222 bp, 843 bp, and 1474 bp in length, which divide the genome into a single master circle (MC) and four isomeric molecules. Compared to other angiosperms, one remarkable feature of the Bh mt genome is the frequent transfer of genetic material from the cp genome during recent Bh evolution. We also analyzed organellar genome evolution in general regarding genome features as well as compositional dynamics of sequence and gene structure/organization, providing clues for the understanding of the evolution of organellar genomes in plants. The cp-derived sequences including tRNAs found in angiosperm mt genomes support the conclusion that frequent gene transfer events may have begun early in the land plant lineage. PMID:22291979

  10. Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

    PubMed

    Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

    2017-07-11

    A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.

  11. Genome Analysis of Streptococcus pyogenes Associated with Pharyngitis and Skin Infections

    PubMed Central

    Ibrahim, Joe; Eisen, Jonathan A.; Jospin, Guillaume; Coil, David A.; Khazen, Georges

    2016-01-01

    Streptococcus pyogenes is a very important human pathogen, commonly associated with skin or throat infections but can also cause life-threatening situations including sepsis, streptococcal toxic shock syndrome, and necrotizing fasciitis. Various studies involving typing and molecular characterization of S. pyogenes have been published to date; however next-generation sequencing (NGS) studies provide a comprehensive collection of an organism’s genetic variation. In this study, the genomes of nine S. pyogenes isolates associated with pharyngitis and skin infection were sequenced and studied for the presence of virulence genes, resistance elements, prophages, genomic recombination, and other genomic features. Additionally, a comparative phylogenetic analysis of the isolates with global clones highlighted their possible evolutionary lineage and their site of infection. The genomes were found to also house a multitude of features including gene regulation systems, virulence factors and antimicrobial resistance mechanisms. PMID:27977735

  12. Genomic Features and Niche-Adaptation of Enterococcus faecium Strains from Korean Soybean-Fermented Foods.

    PubMed

    Kim, Eun Bae; Jin, Gwi-Deuk; Lee, Jun-Yeong; Choi, Yun-Jaie

    2016-01-01

    Certain strains of Enterococcus faecium contribute beneficially to human health and food fermentation. However, other E. faecium strains are opportunistic pathogens due to the acquisition of virulence factors and antibiotic resistance determinants. To characterize E. faecium from soybean fermentation, we sequenced the genomes of 10 E. faecium strains from Korean soybean-fermented foods and analyzed their genomes by comparing them with 51 clinical and 52 non-clinical strains of different origins. Hierarchical clustering based on 13,820 orthologous genes from all E. faecium genomes showed that the 10 strains are distinguished from most of the clinical strains. Like non-clinical strains, their genomes are significantly smaller than clinical strains due to fewer accessory genes associated with antibiotic resistance, virulence, and mobile genetic elements. Moreover, we identified niche-associated gene gain and loss from the soybean strains. Thus, we conclude that soybean E. faecium strains might have evolved to have distinctive genomic features that may contribute to its ability to thrive during soybean fermentation.

  13. Genomic Features and Niche-Adaptation of Enterococcus faecium Strains from Korean Soybean-Fermented Foods

    PubMed Central

    Kim, Eun Bae; Jin, Gwi-Deuk; Lee, Jun-Yeong; Choi, Yun-Jaie

    2016-01-01

    Certain strains of Enterococcus faecium contribute beneficially to human health and food fermentation. However, other E. faecium strains are opportunistic pathogens due to the acquisition of virulence factors and antibiotic resistance determinants. To characterize E. faecium from soybean fermentation, we sequenced the genomes of 10 E. faecium strains from Korean soybean-fermented foods and analyzed their genomes by comparing them with 51 clinical and 52 non-clinical strains of different origins. Hierarchical clustering based on 13,820 orthologous genes from all E. faecium genomes showed that the 10 strains are distinguished from most of the clinical strains. Like non-clinical strains, their genomes are significantly smaller than clinical strains due to fewer accessory genes associated with antibiotic resistance, virulence, and mobile genetic elements. Moreover, we identified niche-associated gene gain and loss from the soybean strains. Thus, we conclude that soybean E. faecium strains might have evolved to have distinctive genomic features that may contribute to its ability to thrive during soybean fermentation. PMID:27070419

  14. Non-Random Inversion Landscapes in Prokaryotic Genomes Are Shaped by Heterogeneous Selection Pressures

    PubMed Central

    Repar, Jelena; Warnecke, Tobias

    2017-01-01

    Abstract Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin–terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus–Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. PMID:28407093

  15. The complete mitochondrial genome of Arctic Calanus hyperboreus (Copepoda, Calanoida) reveals characteristic patterns in calanoid mitochondrial genome.

    PubMed

    Kim, Sanghee; Lim, Byung-Jin; Min, Gi-Sik; Choi, Han-Gu

    2013-05-10

    Copepoda is the most diverse and abundant group of crustaceans, but its phylogenetic relationships are ambiguous. Mitochondrial (mt) genomes are useful for studying evolutionary history, but only six complete Copepoda mt genomes have been made available and these have extremely rearranged genome structures. This study determined the mt genome of Calanus hyperboreus, making it the first reported Arctic copepod mt genome and the first complete mt genome of a calanoid copepod. The mt genome of C. hyperboreus is 17,910 bp in length and it contains the entire set of 37 mt genes, including 13 protein-coding genes, 2 rRNAs, and 22 tRNAs. It has a very unusual gene structure, including the longest control region reported for a crustacean, a large tRNA gene cluster, and reversed GC skews in 11 out of 13 protein-coding genes (84.6%). Despite the unusual features, comparing this genome to published copepod genomes revealed retained pan-crustacean features, as well as a conserved calanoid-specific pattern. Our data provide a foundation for exploring the calanoid pattern and the mechanisms of mt gene rearrangement in the evolutionary history of the copepod mt genome. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Comparative genomics of Paracoccus sp. SM22M-07 isolated from coral mucus: insights into bacteria-host interactions.

    PubMed

    Carlos, Camila; Pereira, Letícia Bianca; Ottoboni, Laura Maria Mariscal

    2017-06-01

    One of the main goals of coral microbiology is to understand the ways in which coral-bacteria associations are established and maintained. This work describes the sequencing of the genome of Paracoccus sp. SM22M-07 isolated from the mucus of the endemic Brazilian coral species Mussismilia hispida. Comparative analysis was used to identify unique genomic features of SM22M-07 that might be involved in its adaptation to the marine ecosystem and the nutrient-rich environment provided by coral mucus, as well as in the establishment and strengthening of the interaction with the host. These features included genes related to the type IV protein secretion system, erythritol catabolism, and succinoglycan biosynthesis. We experimentally confirmed the production of succinoglycan by Paracoccus sp. SM22M-07 and we hypothesize that it may be involved in the association of the bacterium with coral surfaces.

  17. 16q24.1 microdeletion in a premature newborn: usefulness of array-based comparative genomic hybridization in persistent pulmonary hypertension of the newborn.

    PubMed

    Zufferey, Flore; Martinet, Danielle; Osterheld, Maria-Chiara; Niel-Bütschi, Florence; Giannoni, Eric; Schmutz, Nathalie Besuchet; Xia, Zhilian; Beckmann, Jacques S; Shaw-Smith, Charles; Stankiewicz, Pawel; Langston, Claire; Fellmann, Florence

    2011-11-01

    Report of a 16q24.1 deletion in a premature newborn, demonstrating the usefulness of array-based comparative genomic hybridization in persistent pulmonary hypertension of the newborn and multiple congenital malformations. Descriptive case report. Genetic department and neonatal intensive care unit of a tertiary care children's hospital. None. We report the case of a preterm male infant, born at 26 wks of gestation. A cardiac malformation and bilateral hydronephrosis were diagnosed at 19 wks of gestation. Karyotype analysis was normal, and a 22q11.2 microdeletion was excluded by fluorescence in situ hybridization analysis. A cesarean section was performed due to fetal distress. The patient developed persistent pulmonary hypertension unresponsive to mechanical ventilation and nitric oxide treatment and expired at 16 hrs of life. An autopsy revealed partial atrioventricular canal malformation and showed bilateral dilation of the renal pelvocaliceal system with bilateral ureteral stenosis and annular pancreas. Array-based comparative genomic hybridization analysis (Agilent oligoNT 44K, Agilent Technologies, Santa Clara, CA) showed an interstitial microdeletion encompassing the forkhead box gene cluster in 16q24.1. Review of the pulmonary microscopic examination showed the characteristic features of alveolar capillary dysplasia with misalignment of pulmonary veins. Some features were less prominent due to the gestational age. Our review of the literature shows that alveolar capillary dysplasia with misalignment of pulmonary veins is rare but probably underreported. Prematurity is not a usual presentation, and histologic features are difficult to interpret. In our case, array-based comparative genomic hybridization revealed a 16q24.1 deletion, leading to the final diagnosis of alveolar capillary dysplasia with misalignment of pulmonary veins. It emphasizes the usefulness of array-based comparative genomic hybridization analysis as a diagnostic tool with implications for both prognosis and management decisions in newborns with refractory persistent pulmonary hypertension and multiple congenital malformations.

  18. The Essential Genome of Escherichia coli K-12

    PubMed Central

    2018-01-01

    ABSTRACT Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. PMID:29463657

  19. The Echinococcus canadensis (G7) genome: a key knowledge of parasitic platyhelminth human diseases.

    PubMed

    Maldonado, Lucas L; Assis, Juliana; Araújo, Flávio M Gomes; Salim, Anna C M; Macchiaroli, Natalia; Cucher, Marcela; Camicia, Federico; Fox, Adolfo; Rosenzvit, Mara; Oliveira, Guilherme; Kamenetzky, Laura

    2017-02-27

    The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.

  20. Draft genome sequence of Halomonas lutea strain YIM 91125 T (DSM 23508 T) isolated from the alkaline Lake Ebinur in Northwest China

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei

    Species of the genus Halomonas are halophilic and their flexible adaption to changes of salinity and temperature brings considerable potential biotechnology applications, such as degradation of organic pollutants and enzyme production. The type strain Halomonas lutea YIM 91125 T was isolated from a hypersaline lake in China. The genome of strain YIM 91125 T becomes the twelfth species sequenced in Halomonas, and the thirteenth species sequenced in Halomonadaceae. We described the features of H. lutea YIM 91125 T, together with the high quality draft genome sequence and annotation of its type strain. The 4,533,090 bp long genome of strain YIMmore » 91125 T with its 4,284 protein-coding and 84 RNA genes is a part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project. From the viewpoint of comparative genomics, H. lutea has a larger genome size and more specific genes, which indicated acquisition of function bringing better adaption to its environment. Finally, DDH analysis demonstrated that H. lutea is a distinctive species, and halophilic features and nitrogen metabolism related genes were discovered in its genome.« less

  1. Draft genome sequence of Halomonas lutea strain YIM 91125 T (DSM 23508 T) isolated from the alkaline Lake Ebinur in Northwest China

    DOE PAGES

    Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; ...

    2015-01-20

    Species of the genus Halomonas are halophilic and their flexible adaption to changes of salinity and temperature brings considerable potential biotechnology applications, such as degradation of organic pollutants and enzyme production. The type strain Halomonas lutea YIM 91125 T was isolated from a hypersaline lake in China. The genome of strain YIM 91125 T becomes the twelfth species sequenced in Halomonas, and the thirteenth species sequenced in Halomonadaceae. We described the features of H. lutea YIM 91125 T, together with the high quality draft genome sequence and annotation of its type strain. The 4,533,090 bp long genome of strain YIMmore » 91125 T with its 4,284 protein-coding and 84 RNA genes is a part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project. From the viewpoint of comparative genomics, H. lutea has a larger genome size and more specific genes, which indicated acquisition of function bringing better adaption to its environment. Finally, DDH analysis demonstrated that H. lutea is a distinctive species, and halophilic features and nitrogen metabolism related genes were discovered in its genome.« less

  2. Clinical significance of rare copy number variations in epilepsy: a case-control survey using microarray-based comparative genomic hybridization.

    PubMed

    Striano, Pasquale; Coppola, Antonietta; Paravidino, Roberta; Malacarne, Michela; Gimelli, Stefania; Robbiano, Angela; Traverso, Monica; Pezzella, Marianna; Belcastro, Vincenzo; Bianchi, Amedeo; Elia, Maurizio; Falace, Antonio; Gazzerro, Elisabetta; Ferlazzo, Edoardo; Freri, Elena; Galasso, Roberta; Gobbi, Giuseppe; Molinatto, Cristina; Cavani, Simona; Zuffardi, Orsetta; Striano, Salvatore; Ferrero, Giovanni Battista; Silengo, Margherita; Cavaliere, Maria Luigia; Benelli, Matteo; Magi, Alberto; Piccione, Maria; Dagna Bricarelli, Franca; Coviello, Domenico A; Fichera, Marco; Minetti, Carlo; Zara, Federico

    2012-03-01

    To perform an extensive search for genomic rearrangements by microarray-based comparative genomic hybridization in patients with epilepsy. Prospective cohort study. Epilepsy centers in Italy. Two hundred seventy-nine patients with unexplained epilepsy, 265 individuals with nonsyndromic mental retardation but no epilepsy, and 246 healthy control subjects were screened by microarray-based comparative genomic hybridization. Identification of copy number variations (CNVs) and gene enrichment. Rare CNVs occurred in 26 patients (9.3%) and 16 healthy control subjects (6.5%) (P = .26). The CNVs identified in patients were larger (P = .03) and showed higher gene content (P = .02) than those in control subjects. The CNVs larger than 1 megabase (P = .002) and including more than 10 genes (P = .005) occurred more frequently in patients than in control subjects. Nine patients (34.6%) among those harboring rare CNVs showed rearrangements associated with emerging microdeletion or microduplication syndromes. Mental retardation and neuropsychiatric features were associated with rare CNVs (P = .004), whereas epilepsy type was not. The CNV rate in patients with epilepsy and mental retardation or neuropsychiatric features is not different from that observed in patients with mental retardation only. Moreover, significant enrichment of genes involved in ion transport was observed within CNVs identified in patients with epilepsy. Patients with epilepsy show a significantly increased burden of large, rare, gene-rich CNVs, particularly when associated with mental retardation and neuropsychiatric features. The limited overlap between CNVs observed in the epilepsy group and those observed in the group with mental retardation only as well as the involvement of specific (ion channel) genes indicate a specific association between the identified CNVs and epilepsy. Screening for CNVs should be performed for diagnostic purposes preferentially in patients with epilepsy and mental retardation or neuropsychiatric features.

  3. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics.

    PubMed

    Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

    2015-01-01

    The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

    PubMed Central

    Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis

    2008-01-01

    Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375

  6. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  7. The complete mitochondrial genome of the bag-shelter moth Ochrogaster lunifer (Lepidoptera, Notodontidae)

    PubMed Central

    Salvato, Paola; Simonato, Mauro; Battisti, Andrea; Negrisolo, Enrico

    2008-01-01

    Background Knowledge of animal mitochondrial genomes is very important to understand their molecular evolution as well as for phylogenetic and population genetic studies. The Lepidoptera encompasses more than 160,000 described species and is one of the largest insect orders. To date only nine lepidopteran mitochondrial DNAs have been fully and two others partly sequenced. Furthermore the taxon sampling is very scant. Thus advance of lepidopteran mitogenomics deeply requires new genomes derived from a broad taxon sampling. In present work we describe the mitochondrial genome of the moth Ochrogaster lunifer. Results The mitochondrial genome of O. lunifer is a circular molecule 15593 bp long. It includes the entire set of 37 genes usually present in animal mitochondrial genomes. It contains also 7 intergenic spacers. The gene order of the newly sequenced genome is that typical for Lepidoptera and differs from the insect ancestral type for the placement of trnM. The 77.84% A+T content of its α strand is the lowest among known lepidopteran genomes. The mitochondrial genome of O. lunifer exhibits one of the most marked C-skew among available insect Pterygota genomes. The protein-coding genes have typical mitochondrial start codons except for cox1 that present an unusual CGA. The O. lunifer genome exhibits the less biased synonymous codon usage among lepidopterans. Comparative genomics analysis study identified atp6, cox1, cox2 as cox3, cob, nad1, nad2, nad4, and nad5 as potential markers for population genetics/phylogenetics studies. A peculiar feature of O. lunifer mitochondrial genome it that the intergenic spacers are mostly made by repetitive sequences. Conclusion The mitochondrial genome of O. lunifer is the first representative of superfamily Noctuoidea that account for about 40% of all described Lepidoptera. New genome shares many features with other known lepidopteran genomes. It differs however for its low A+T content and marked C-skew. Compared to other lepidopteran genomes it is less biased in synonymous codon usage. Comparative evolutionary analysis of lepidopteran mitochondrial genomes allowed the identification of previously neglected coding genes as potential phylogenetic markers. Presence of repetitive elements in intergenic spacers of O. lunifer genome supports the role of DNA slippage as possible mechanism to produce spacers during replication. PMID:18627592

  8. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds.

    PubMed

    Fang, Lingzhao; Sahana, Goutam; Ma, Peipei; Su, Guosheng; Yu, Ying; Zhang, Shengli; Lund, Mogens Sandø; Sørensen, Peter

    2017-08-10

    A better understanding of the genetic architecture underlying complex traits (e.g., the distribution of causal variants and their effects) may aid in the genomic prediction. Here, we hypothesized that the genomic variants of complex traits might be enriched in a subset of genomic regions defined by genes grouped on the basis of "Gene Ontology" (GO), and that incorporating this independent biological information into genomic prediction models might improve their predictive ability. Four complex traits (i.e., milk, fat and protein yields, and mastitis) together with imputed sequence variants in Holstein (HOL) and Jersey (JER) cattle were analysed. We first carried out a post-GWAS analysis in a HOL training population to assess the degree of enrichment of the association signals in the gene regions defined by each GO term. We then extended the genomic best linear unbiased prediction model (GBLUP) to a genomic feature BLUP (GFBLUP) model, including an additional genomic effect quantifying the joint effect of a group of variants located in a genomic feature. The GBLUP model using a single random effect assumes that all genomic variants contribute to the genomic relationship equally, whereas GFBLUP attributes different weights to the individual genomic relationships in the prediction equation based on the estimated genomic parameters. Our results demonstrate that the immune-relevant GO terms were more associated with mastitis than milk production, and several biologically meaningful GO terms improved the prediction accuracy with GFBLUP for the four traits, as compared with GBLUP. The improvement of the genomic prediction between breeds (the average increase across the four traits was 0.161) was more apparent than that it was within the HOL (the average increase across the four traits was 0.020). Our genomic feature modelling approaches provide a framework to simultaneously explore the genetic architecture and genomic prediction of complex traits by taking advantage of independent biological knowledge.

  9. Latent feature decompositions for integrative analysis of multi-platform genomic data

    PubMed Central

    Gregory, Karl B.; Momin, Amin A.; Coombes, Kevin R.; Baladandayuthapani, Veerabhadran

    2015-01-01

    Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to, a glioblastoma multiforme dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features. PMID:26146492

  10. Caryoscope: An Open Source Java application for viewing microarray data in a genomic context

    PubMed Central

    Awad, Ihab AB; Rees, Christian A; Hernandez-Boussard, Tina; Ball, Catherine A; Sherlock, Gavin

    2004-01-01

    Background Microarray-based comparative genome hybridization experiments generate data that can be mapped onto the genome. These data are interpreted more easily when represented graphically in a genomic context. Results We have developed Caryoscope, which is an open source Java application for visualizing microarray data from array comparative genome hybridization experiments in a genomic context. Caryoscope can read General Feature Format files (GFF files), as well as comma- and tab-delimited files, that define the genomic positions of the microarray reporters for which data are obtained. The microarray data can be browsed using an interactive, zoomable interface, which helps users identify regions of chromosomal deletion or amplification. The graphical representation of the data can be exported in a number of graphic formats, including publication-quality formats such as PostScript. Conclusion Caryoscope is a useful tool that can aid in the visualization, exploration and interpretation of microarray data in a genomic context. PMID:15488149

  11. An Integrated Metabolomic and Genomic Mining Workflow To Uncover the Biosynthetic Potential of Bacteria

    PubMed Central

    Maansson, Maria; Vynne, Nikolaj G.; Klitgaard, Andreas; Nybo, Jane L.; Melchiorsen, Jette; Nguyen, Don D.; Sanchez, Laura M.; Ziemert, Nadine; Dorrestein, Pieter C.

    2016-01-01

    ABSTRACT Microorganisms are a rich source of bioactives; however, chemical identification is a major bottleneck. Strategies that can prioritize the most prolific microbial strains and novel compounds are of great interest. Here, we present an integrated approach to evaluate the biosynthetic richness in bacteria and mine the associated chemical diversity. Thirteen strains closely related to Pseudoalteromonas luteoviolacea isolated from all over the Earth were analyzed using an untargeted metabolomics strategy, and metabolomic profiles were correlated with whole-genome sequences of the strains. We found considerable diversity: only 2% of the chemical features and 7% of the biosynthetic genes were common to all strains, while 30% of all features and 24% of the genes were unique to single strains. The list of chemical features was reduced to 50 discriminating features using a genetic algorithm and support vector machines. Features were dereplicated by tandem mass spectrometry (MS/MS) networking to identify molecular families of the same biosynthetic origin, and the associated pathways were probed using comparative genomics. Most of the discriminating features were related to antibacterial compounds, including the thiomarinols that were reported from P. luteoviolacea here for the first time. By comparative genomics, we identified the biosynthetic cluster responsible for the production of the antibiotic indolmycin, which could not be predicted with standard methods. In conclusion, we present an efficient, integrative strategy for elucidating the chemical richness of a given set of bacteria and link the chemistry to biosynthetic genes. IMPORTANCE We here combine chemical analysis and genomics to probe for new bioactive secondary metabolites based on their pattern of distribution within bacterial species. We demonstrate the usefulness of this combined approach in a group of marine Gram-negative bacteria closely related to Pseudoalteromonas luteoviolacea, which is a species known to produce a broad spectrum of chemicals. The approach allowed us to identify new antibiotics and their associated biosynthetic pathways. Combining chemical analysis and genetics is an efficient “mining” workflow for identifying diverse pharmaceutical candidates in a broad range of microorganisms and therefore of great use in bioprospecting. PMID:27822535

  12. Comparative genomics of two super-shedder isolates of Escherichia coli O157:H7

    PubMed Central

    Katani, Robab; Cote, Rebecca; Kudva, Indira T.; DebRoy, Chitrita; Arthur, Terrance M.

    2017-01-01

    Shiga toxin-producing Escherichia coli O157:H7 (O157) are zoonotic foodborne pathogens and of major public health concern that cause considerable intestinal and extra-intestinal illnesses in humans. O157 colonize the recto-anal junction (RAJ) of asymptomatic cattle who shed the bacterium into the environment through fecal matter. A small subset of cattle, termed super-shedders (SS), excrete O157 at a rate (≥ 104 CFU/g of feces) that is several orders of magnitude greater than other colonized cattle and play a major role in the prevalence and transmission of O157. To better understand microbial factors contributing to super-shedding we have recently sequenced two SS isolates, SS17 (GenBank accession no. CP008805) and SS52 (GenBank accession no. CP010304) and shown that SS isolates display a distinctive strongly adherent phenotype on bovine rectal squamous epithelial cells. Here we present a detailed comparative genomics analysis of SS17 and SS52 with other previously characterized O157 strains (EC4115, EDL933, Sakai, TW14359). The results highlight specific polymorphisms and genomic features shared amongst SS strains, and reveal several SNPs that are shared amongst SS isolates, including in genes involved in motility, adherence, and metabolism. Finally, our analyses reveal distinctive patterns of distribution of phage-associated genes amongst the two SS and other isolates. Together, the results of our comparative genomics studies suggest that while SS17 and SS52 share genomic features with other lineage I/II isolates, they likely have distinct recent evolutionary histories. Future comparative and functional genomic studies are needed to decipher the precise molecular basis for super shedding in O157. PMID:28797098

  13. Comparative genomics of two super-shedder isolates of Escherichia coli O157:H7.

    PubMed

    Katani, Robab; Cote, Rebecca; Kudva, Indira T; DebRoy, Chitrita; Arthur, Terrance M; Kapur, Vivek

    2017-01-01

    Shiga toxin-producing Escherichia coli O157:H7 (O157) are zoonotic foodborne pathogens and of major public health concern that cause considerable intestinal and extra-intestinal illnesses in humans. O157 colonize the recto-anal junction (RAJ) of asymptomatic cattle who shed the bacterium into the environment through fecal matter. A small subset of cattle, termed super-shedders (SS), excrete O157 at a rate (≥ 104 CFU/g of feces) that is several orders of magnitude greater than other colonized cattle and play a major role in the prevalence and transmission of O157. To better understand microbial factors contributing to super-shedding we have recently sequenced two SS isolates, SS17 (GenBank accession no. CP008805) and SS52 (GenBank accession no. CP010304) and shown that SS isolates display a distinctive strongly adherent phenotype on bovine rectal squamous epithelial cells. Here we present a detailed comparative genomics analysis of SS17 and SS52 with other previously characterized O157 strains (EC4115, EDL933, Sakai, TW14359). The results highlight specific polymorphisms and genomic features shared amongst SS strains, and reveal several SNPs that are shared amongst SS isolates, including in genes involved in motility, adherence, and metabolism. Finally, our analyses reveal distinctive patterns of distribution of phage-associated genes amongst the two SS and other isolates. Together, the results of our comparative genomics studies suggest that while SS17 and SS52 share genomic features with other lineage I/II isolates, they likely have distinct recent evolutionary histories. Future comparative and functional genomic studies are needed to decipher the precise molecular basis for super shedding in O157.

  14. Non-Random Inversion Landscapes in Prokaryotic Genomes Are Shaped by Heterogeneous Selection Pressures.

    PubMed

    Repar, Jelena; Warnecke, Tobias

    2017-08-01

    Inversions are a major contributor to structural genome evolution in prokaryotes. Here, using a novel alignment-based method, we systematically compare 1,651 bacterial and 98 archaeal genomes to show that inversion landscapes are frequently biased toward (symmetric) inversions around the origin-terminus axis. However, symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades. At the extremes, inversion landscapes in Bacillus-Clostridium and Actinobacteria are dominated by symmetric inversions, while there is little or no systematic bias favoring symmetric rearrangements in archaea with a single origin of replication. Within clades, we find strong but clade-specific relationships between symmetric inversion bias and different features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand. We suggest that heterogeneous selection pressures have converged to produce similar patterns of structural genome evolution across prokaryotes. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Genome U-Plot: a whole genome visualization.

    PubMed

    Gaitatzes, Athanasios; Johnson, Sarah H; Smadbeck, James B; Vasmatzis, George

    2018-05-15

    The ability to produce and analyze whole genome sequencing (WGS) data from samples with structural variations (SV) generated the need to visualize such abnormalities in simplified plots. Conventional two-dimensional representations of WGS data frequently use either circular or linear layouts. There are several diverse advantages regarding both these representations, but their major disadvantage is that they do not use the two-dimensional space very efficiently. We propose a layout, termed the Genome U-Plot, which spreads the chromosomes on a two-dimensional surface and essentially quadruples the spatial resolution. We present the Genome U-Plot for producing clear and intuitive graphs that allows researchers to generate novel insights and hypotheses by visualizing SVs such as deletions, amplifications, and chromoanagenesis events. The main features of the Genome U-Plot are its layered layout, its high spatial resolution and its improved aesthetic qualities. We compare conventional visualization schemas with the Genome U-Plot using visualization metrics such as number of line crossings and crossing angle resolution measures. Based on our metrics, we improve the readability of the resulting graph by at least 2-fold, making apparent important features and making it easy to identify important genomic changes. A whole genome visualization tool with high spatial resolution and improved aesthetic qualities. An implementation and documentation of the Genome U-Plot is publicly available at https://github.com/gaitat/GenomeUPlot. vasmatzis.george@mayo.edu. Supplementary data are available at Bioinformatics online.

  16. Comparative Metagenomics of Gut and Ocean: Identification of Microbial Marker Genes for Complex Environmental Properties (2011 JGI User Meeting)

    ScienceCinema

    Bork, Peer

    2018-02-14

    The U.S. Department of Energy Joint Genome Institute (JGI) invited scientists interested in the application of genomics to bioenergy and environmental issues, as well as all current and prospective users and collaborators, to attend the annual DOE JGI Genomics of Energy & Environment Meeting held March 22-24, 2011 in Walnut Creek, Calif. The emphasis of this meeting was on the genomics of renewable energy strategies, carbon cycling, environmental gene discovery, and engineering of fuel-producing organisms. The meeting features presentations by leading scientists advancing these topics. Peer Bork of the European Molecular Biology Laboratory on Comparative Metagenomics of Gut and Ocean: Identification of Microbial Marker Genes for Complex Environmental Properties at the 6th annual Genomics of Energy & Environment Meeting on March 23, 2011.

  17. Comparative analyses of Legionella species identifies genetic features of strains causing Legionnaires' disease.

    PubMed

    Gomez-Valero, Laura; Rusniok, Christophe; Rolando, Monica; Neou, Mario; Dervins-Ravault, Delphine; Demirtas, Jasmin; Rouy, Zoe; Moore, Robert J; Chen, Honglei; Petty, Nicola K; Jarraud, Sophie; Etienne, Jerome; Steinert, Michael; Heuner, Klaus; Gribaldo, Simonetta; Médigue, Claudine; Glöckner, Gernot; Hartland, Elizabeth L; Buchrieser, Carmen

    2014-01-01

    The genus Legionella comprises over 60 species. However, L. pneumophila and L. longbeachae alone cause over 95% of Legionnaires’ disease. To identify the genetic bases underlying the different capacities to cause disease we sequenced and compared the genomes of L. micdadei, L. hackeliae and L. fallonii (LLAP10), which are all rarely isolated from humans. We show that these Legionella species possess different virulence capacities in amoeba and macrophages, correlating with their occurrence in humans. Our comparative analysis of 11 Legionella genomes belonging to five species reveals highly heterogeneous genome content with over 60% representing species-specific genes; these comprise a complete prophage in L. micdadei, the first ever identified in a Legionella genome. Mobile elements are abundant in Legionella genomes; many encode type IV secretion systems for conjugative transfer, pointing to their importance for adaptation of the genus. The Dot/Icm secretion system is conserved, although the core set of substrates is small, as only 24 out of over 300 described Dot/Icm effector genes are present in all Legionella species. We also identified new eukaryotic motifs including thaumatin, synaptobrevin or clathrin/coatomer adaptine like domains. Legionella genomes are highly dynamic due to a large mobilome mainly comprising type IV secretion systems, while a minority of core substrates is shared among the diverse species. Eukaryotic like proteins and motifs remain a hallmark of the genus Legionella. Key factors such as proteins involved in oxygen binding, iron storage, host membrane transport and certain Dot/Icm substrates are specific features of disease-related strains.

  18. Comparative Genomics of Flatworms (Platyhelminthes) Reveals Shared Genomic Features of Ecto- and Endoparastic Neodermata

    PubMed Central

    Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

    2014-01-01

    The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host–parasite interactions and speciation in the highly diverse monogenean flatworms. PMID:24732282

  19. Tracing phylogenomic events leading to diversity of Haemophilus influenzae and the emergence of Brazilian Purpuric Fever (BPF)-associated clones

    PubMed Central

    Papazisi, Leka; Ratnayake, Shashikala; Remortel, Brian G.; Bock, Geoffrey R.; Liang, Wei; Saeed, Alexander I.; Liu, Jia; Fleischmann, Robert D.; Kilian, Mogens; Peterson, Scott N.

    2010-01-01

    Here we report the use of a multi-genome DNA microarray to elucidate the genomic events associated with the emergence of the clonal variants of H. influenzae biogroup aegyptius causing Brazilian Purpuric Fever (BPF), an important pediatric disease with a high mortality rate. We performed directed genome sequencing of strain HK1212 unique loci to construct a species DNA microarray. Comparative genome hybridization using this microarray enabled us to determine and compare gene complements, and infer reliable phylogenomic relationships among members of the species. The higher genomic variability observed in the genomes of BPF-related strains (clones) and their close relatives may be characterized by significant gene flux related to a subset of functional role categories. We found that the acquisition of a large number of virulence determinants featuring numerous cell membrane proteins coupled to the loss of genes involved in transport, central biosynthetic pathways and in particular, energy production pathways to be characteristics of the BPF genomic variants. PMID:20654709

  20. Comparative Genomic Analysis Reveals a Diverse Repertoire of Genes Involved in Prokaryote-Eukaryote Interactions within the Pseudovibrio Genus

    PubMed Central

    Romano, Stefano; Fernàndez-Guerra, Antonio; Reen, F. Jerry; Glöckner, Frank O.; Crowley, Susan P.; O'Sullivan, Orla; Cotter, Paul D.; Adams, Claire; Dobson, Alan D. W.; O'Gara, Fergal

    2016-01-01

    Strains of the Pseudovibrio genus have been detected worldwide, mainly as part of bacterial communities associated with marine invertebrates, particularly sponges. This recurrent association has been considered as an indication of a symbiotic relationship between these microbes and their host. Until recently, the availability of only two genomes, belonging to closely related strains, has limited the knowledge on the genomic and physiological features of the genus to a single phylogenetic lineage. Here we present 10 newly sequenced genomes of Pseudovibrio strains isolated from marine sponges from the west coast of Ireland, and including the other two publicly available genomes we performed an extensive comparative genomic analysis. Homogeneity was apparent in terms of both the orthologous genes and the metabolic features shared amongst the 12 strains. At the genomic level, a key physiological difference observed amongst the isolates was the presence only in strain P. axinellae AD2 of genes encoding proteins involved in assimilatory nitrate reduction, which was then proved experimentally. We then focused on studying those systems known to be involved in the interactions with eukaryotic and prokaryotic cells. This analysis revealed that the genus harbors a large diversity of toxin-like proteins, secretion systems and their potential effectors. Their distribution in the genus was not always consistent with the phylogenetic relationship of the strains. Finally, our analyses identified new genomic islands encoding potential toxin-immunity systems, previously unknown in the genus. Our analyses shed new light on the Pseudovibrio genus, indicating a large diversity of both metabolic features and systems for interacting with the host. The diversity in both distribution and abundance of these systems amongst the strains underlines how metabolically and phylogenetically similar bacteria may use different strategies to interact with the host and find a niche within its microbiota. Our data suggest the presence of a sponge-specific lineage of Pseudovibrio. The reduction in genome size and the loss of some systems potentially used to successfully enter the host, leads to the hypothesis that P. axinellae strain AD2 may be a lineage that presents an ancient association with the host and that may be vertically transmitted to the progeny. PMID:27065959

  1. Chætognath transcriptome reveals ancestral and unique features among bilaterians

    PubMed Central

    Marlétaz, Ferdinand; Gilles, André; Caubit, Xavier; Perez, Yvan; Dossat, Carole; Samain, Sylvie; Gyapay, Gabor; Wincker, Patrick; Le Parco, Yannick

    2008-01-01

    Background The chætognaths (arrow worms) have puzzled zoologists for years because of their astonishing morphological and developmental characteristics. Despite their deuterostome-like development, phylogenomic studies recently positioned the chætognath phylum in protostomes, most likely in an early branching. This key phylogenetic position and the peculiar characteristics of chætognaths prompted further investigation of their genomic features. Results Transcriptomic and genomic data were collected from the chætognath Spadella cephaloptera through the sequencing of expressed sequence tags and genomic bacterial artificial chromosome clones. Transcript comparisons at various taxonomic scales emphasized the conservation of a core gene set and phylogenomic analysis confirmed the basal position of chætognaths among protostomes. A detailed survey of transcript diversity and individual genotyping revealed a past genome duplication event in the chætognath lineage, which was, surprisingly, followed by a high retention rate of duplicated genes. Moreover, striking genetic heterogeneity was detected within the sampled population at the nuclear and mitochondrial levels but cannot be explained by cryptic speciation. Finally, we found evidence for trans-splicing maturation of transcripts through splice-leader addition in the chætognath phylum and we further report that this processing is associated with operonic transcription. Conclusion These findings reveal both shared ancestral and unique derived characteristics of the chætognath genome, which suggests that this genome is likely the product of a very original evolutionary history. These features promote chætognaths as a pivotal model for comparative genomics, which could provide new clues for the investigation of the evolution of animal genomes. PMID:18533022

  2. Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia.

    PubMed

    Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee

    2016-01-01

    Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5' and 3' non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63-81% among themselves and 63-96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection.

  3. Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia

    PubMed Central

    Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee

    2016-01-01

    Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5′ and 3′ non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63–81% among themselves and 63–96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection. PMID:27199901

  4. Interrogation of the Burkholderia pseudomallei genome to address differential virulence among isolates

    DOE PAGES

    Challacombe, Jean F.; Stubben, Chris J.; Klimko, Christopher P.; ...

    2014-12-23

    Infection by the Gram-negative pathogen Burkholderia pseudomallei results in the disease melioidosis, acquired from the environment in parts of southeast Asia and northern Australia. Clinical symptoms of melioidosis range from acute (fever, pneumonia, septicemia, and localized infection) to chronic (abscesses in various organs and tissues, most commonly occurring in the lungs, liver, spleen, kidney, prostate and skeletal muscle), and persistent infections in humans are difficult to cure. Understanding the basic biology and genomics of B. pseudomallei is imperative for the development of new vaccines and therapeutic interventions. This formidable task is becoming more tractable due to the increasing number ofmore » B. pseudomallei genomes that are being sequenced and compared. Here, we compared three B. pseudomallei genomes, from strains MSHR668, K96243 and 1106a, to identify features that might explain why MSHR668 is more virulent than K96243 and 1106a in a mouse model of B. pseudomallei infection. Our analyses focused on metabolic, virulence and regulatory genes that were present in MSHR668 but absent from both K96243 and 1106a. We also noted features present in K96243 and 1106a but absent from MSHR668, and identified genomic differences that may contribute to variations in virulence noted among the three B. pseudomallei isolates. While this work contributes to our understanding of B. pseudomallei genomics, more detailed experiments are necessary to characterize the relevance of specific genomic features to B. pseudomallei metabolism and virulence. Functional analyses of metabolic networks, virulence and regulation shows promise for examining the effects of B. pseudomallei on host cell metabolism and will lay a foundation for future prediction of the virulence of emerging strains. Continued emphasis in this area will be critical for protection against melioidosis, as a better understanding of what constitutes a fully virulent Burkholderia isolate may provide for better diagnostic and medical countermeasure strategies.« less

  5. Recent updates and developments to plant genome size databases

    PubMed Central

    Garcia, Sònia; Leitch, Ilia J.; Anadon-Rosell, Alba; Canela, Miguel Á.; Gálvez, Francisco; Garnatje, Teresa; Gras, Airy; Hidalgo, Oriane; Johnston, Emmeline; Mas de Xaxars, Gemma; Pellicer, Jaume; Siljak-Yakovlev, Sonja; Vallès, Joan; Vitales, Daniel; Bennett, Michael D.

    2014-01-01

    Two plant genome size databases have been recently updated and/or extended: the Plant DNA C-values database (http://data.kew.org/cvalues), and GSAD, the Genome Size in Asteraceae database (http://www.asteraceaegenomesize.com). While the first provides information on nuclear DNA contents across land plants and some algal groups, the second is focused on one of the largest and most economically important angiosperm families, Asteraceae. Genome size data have numerous applications: they can be used in comparative studies on genome evolution, or as a tool to appraise the cost of whole-genome sequencing programs. The growing interest in genome size and increasing rate of data accumulation has necessitated the continued update of these databases. Currently, the Plant DNA C-values database (Release 6.0, Dec. 2012) contains data for 8510 species, while GSAD has 1219 species (Release 2.0, June 2013), representing increases of 17 and 51%, respectively, in the number of species with genome size data, compared with previous releases. Here we provide overviews of the most recent releases of each database, and outline new features of GSAD. The latter include (i) a tool to visually compare genome size data between species, (ii) the option to export data and (iii) a webpage containing information about flow cytometry protocols. PMID:24288377

  6. Genomics of the Genus Bifidobacterium Reveals Species-Specific Adaptation to the Glycan-Rich Gut Environment

    PubMed Central

    Milani, Christian; Turroni, Francesca; Duranti, Sabrina; Lugli, Gabriele Andrea; Mancabelli, Leonardo; Ferrario, Chiara; van Sinderen, Douwe

    2015-01-01

    Bifidobacteria represent one of the dominant microbial groups that occur in the gut of various animals, being particularly prevalent during the suckling period of humans and other mammals. Their ability to compete with other gut bacteria is largely attributed to their saccharolytic features. Comparative and functional genomic as well as transcriptomic analyses have revealed the genetic background that underpins the overall saccharolytic phenotype for each of the 47 bifidobacterial (sub)species representing the genus Bifidobacterium, while also generating insightful information regarding carbohydrate resource sharing and cross-feeding among bifidobacteria. The abundance of bifidobacterial saccharolytic features in human microbiomes supports the notion that metabolic accessibility to dietary and/or host-derived glycans is a potent evolutionary force that has shaped the bifidobacterial genome. PMID:26590291

  7. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  8. Comparative genomics of Mortierella elongata and its bacterial endosymbiont Mycoavidus cysteinexigens

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Uehling, J.; Gryganskyi, A.; Hameed, K.

    Endosymbiosis of bacteria by eukaryotes is a defining feature of cellular evolution. In addition to well-known bacterial origins for mitochondria and chloroplasts, multiple origins of bacterial endosymbiosis are known within the cells of diverse animals, plants and fungi. Early-diverging lineages of terrestrial fungi harbor endosymbiotic bacteria belonging to the Burkholderiaceae. Furthermore, we sequenced the metagenome of the soil-inhabiting fungus Mortierella elongata and assembled the complete circular chromosome of its endosymbiont, Mycoavidus cysteinexigens, which we place within a lineage of endofungal symbionts that are sister clade to Burkholderia. The genome of M. elongata strain AG77 features a core set of primarymore » metabolic pathways for degradation of simple carbohydrates and lipid biosynthesis, while the M. cysteinexigens (AG77) genome is reduced in size and function. Experiments using antibiotics to cure the endobacterium from the host demonstrate that the fungal host metabolism is highly modulated by presence/ absence of M. cysteinexigens. In independent comparative phylogenomic analyses of fungal and bacterial genomes we find that they are consistent with an ancient origin for M. elongata M. cysteinexigens symbiosis, most likely over 350 million years ago and concomitant with the terrestrialization of Earth and diversification of land fungi and plants.« less

  9. Comparative genomics of Mortierella elongata and its bacterial endosymbiont Mycoavidus cysteinexigens

    DOE PAGES

    Uehling, J.; Gryganskyi, A.; Hameed, K.; ...

    2017-01-11

    Endosymbiosis of bacteria by eukaryotes is a defining feature of cellular evolution. In addition to well-known bacterial origins for mitochondria and chloroplasts, multiple origins of bacterial endosymbiosis are known within the cells of diverse animals, plants and fungi. Early-diverging lineages of terrestrial fungi harbor endosymbiotic bacteria belonging to the Burkholderiaceae. Furthermore, we sequenced the metagenome of the soil-inhabiting fungus Mortierella elongata and assembled the complete circular chromosome of its endosymbiont, Mycoavidus cysteinexigens, which we place within a lineage of endofungal symbionts that are sister clade to Burkholderia. The genome of M. elongata strain AG77 features a core set of primarymore » metabolic pathways for degradation of simple carbohydrates and lipid biosynthesis, while the M. cysteinexigens (AG77) genome is reduced in size and function. Experiments using antibiotics to cure the endobacterium from the host demonstrate that the fungal host metabolism is highly modulated by presence/ absence of M. cysteinexigens. In independent comparative phylogenomic analyses of fungal and bacterial genomes we find that they are consistent with an ancient origin for M. elongata M. cysteinexigens symbiosis, most likely over 350 million years ago and concomitant with the terrestrialization of Earth and diversification of land fungi and plants.« less

  10. Gramene 2016: comparative plant genomics and pathway resources

    PubMed Central

    Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen

    2016-01-01

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

  11. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  12. The comparison of pathology in ferrets infected by H9N2 avian influenza viruses with different genomic features.

    PubMed

    Gao, Rongbao; Bai, Tian; Li, Xiaodan; Xiong, Ying; Huang, Yiwei; Pan, Ming; Zhang, Ye; Bo, Hong; Zou, Shumei; Shu, Yuelong

    2016-01-15

    H9N2 avian influenza virus circulates widely in poultry and has been responsible for sporadic human infections in several regions. Few studies have been conducted on the pathogenicity of H9N2 AIV isolates that have different genomic features. We compared the pathology induced by a novel reassortant H9N2 virus and two currently circulating H9N2 viruses that have different genomic features in ferrets. The results showed that the three viruses can induce infections with various amounts of viral shedding in ferrets. The novel H9N2 induced respiratory infection, but no pathological lesions were observed in lung tissues. The other two viruses induced mild to intermediate pathological lesions in lung tissues, although the clinical signs presented mildly in ferrets. The pathological lesions presented a diversity consistent with viral replication in ferrets. Copyright © 2015 Elsevier Inc. All rights reserved.

  13. Diversity and evolution of the emerging Pandoraviridae family.

    PubMed

    Legendre, Matthieu; Fabre, Elisabeth; Poirot, Olivier; Jeudy, Sandra; Lartigue, Audrey; Alempic, Jean-Marie; Beucher, Laure; Philippe, Nadège; Bertaux, Lionel; Christo-Foroux, Eugène; Labadie, Karine; Couté, Yohann; Abergel, Chantal; Claverie, Jean-Michel

    2018-06-11

    With DNA genomes reaching 2.5 Mb packed in particles of bacterium-like shape and dimension, the first two Acanthamoeba-infecting pandoraviruses remained up to now the most complex viruses since their discovery in 2013. Our isolation of three new strains from distant locations and environments is now used to perform the first comparative genomics analysis of the emerging worldwide-distributed Pandoraviridae family. Thorough annotation of the genomes combining transcriptomic, proteomic, and bioinformatic analyses reveals many non-coding transcripts and significantly reduces the former set of predicted protein-coding genes. Here we show that the pandoraviruses exhibit an open pan-genome, the enormous size of which is not adequately explained by gene duplications or horizontal transfers. As most of the strain-specific genes have no extant homolog and exhibit statistical features comparable to intergenic regions, we suggest that de novo gene creation could contribute to the evolution of the giant pandoravirus genomes.

  14. New Implications on Genomic Adaptation Derived from the Helicobacter pylori Genome Comparison

    PubMed Central

    Lara-Ramírez, Edgar Eduardo; Segura-Cabrera, Aldo; Guo, Xianwu; Yu, Gongxin; García-Pérez, Carlos Armando; Rodríguez-Pérez, Mario A.

    2011-01-01

    Background Helicobacter pylori has a reduced genome and lives in a tough environment for long-term persistence. It evolved with its particular characteristics for biological adaptation. Because several H. pylori genome sequences are available, comparative analysis could help to better understand genomic adaptation of this particular bacterium. Principal Findings We analyzed nine H. pylori genomes with emphasis on microevolution from a different perspective. Inversion was an important factor to shape the genome structure. Illegitimate recombination not only led to genomic inversion but also inverted fragment duplication, both of which contributed to the creation of new genes and gene family, and further, homological recombination contributed to events of inversion. Based on the information of genomic rearrangement, the first genome scaffold structure of H. pylori last common ancestor was produced. The core genome consists of 1186 genes, of which 22 genes could particularly adapt to human stomach niche. H. pylori contains high proportion of pseudogenes whose genesis was principally caused by homopolynucleotide (HPN) mutations. Such mutations are reversible and facilitate the control of gene expression through the change of DNA structure. The reversible mutations and a quasi-panmictic feature could allow such genes or gene fragments frequently transferred within or between populations. Hence, pseudogenes could be a reservoir of adaptation materials and the HPN mutations could be favorable to H. pylori adaptation, leading to HPN accumulation on the genomes, which corresponds to a special feature of Helicobacter species: extremely high HPN composition of genome. Conclusion Our research demonstrated that both genome content and structure of H. pylori have been highly adapted to its particular life style. PMID:21387011

  15. Genomic and Phenomic Study of Mammary Pathogenic Escherichia coli

    PubMed Central

    Blum, Shlomo E.; Heller, Elimelech D.; Sela, Shlomo; Elad, Daniel; Edery, Nir; Leitner, Gabriel

    2015-01-01

    Escherichia coli is a major etiological agent of intra-mammary infections (IMI) in cows, leading to acute mastitis and causing great economic losses in dairy production worldwide. Particular strains cause persistent IMI, leading to recurrent mastitis. Virulence factors of mammary pathogenic E. coli (MPEC) involved pathogenesis of mastitis as well as those differentiating strains causing acute or persistent mastitis are largely unknown. This study aimed to identify virulence markers in MPEC through whole genome and phenome comparative analysis. MPEC strains causing acute (VL2874 and P4) or persistent (VL2732) mastitis were compared to an environmental strain (K71) and to the genomes of strains representing different E. coli pathotypes. Intra-mammary challenge in mice confirmed experimentally that the strains studied here have different pathogenic potential, and that the environmental strain K71 is non-pathogenic in the mammary gland. Analysis of whole genome sequences and predicted proteomes revealed high similarity among MPEC, whereas MPEC significantly differed from the non-mammary pathogenic strain K71, and from E. coli genomes from other pathotypes. Functional features identified in MPEC genomes and lacking in the non-mammary pathogenic strain were associated with synthesis of lipopolysaccharide and other membrane antigens, ferric-dicitrate iron acquisition and sugars metabolism. Features associated with cytotoxicity or intra-cellular survival were found specifically in the genomes of strains from severe and acute (VL2874) or persistent (VL2732) mastitis, respectively. MPEC genomes were relatively similar to strain K-12, which was subsequently shown here to be possibly pathogenic in the mammary gland. Phenome analysis showed that the persistent MPEC was the most versatile in terms of nutrients metabolized and acute MPEC the least. Among phenotypes unique to MPEC compared to the non-mammary pathogenic strain were uric acid and D-serine metabolism. This study reveals virulence factors and phenotypic characteristics of MPEC that may play a role in pathogenesis of E. coli mastitis. PMID:26327312

  16. Integrating epigenomic data and 3D genomic structure with a new measure of chromatin assortativity.

    PubMed

    Pancaldi, Vera; Carrillo-de-Santa-Pau, Enrique; Javierre, Biola Maria; Juan, David; Fraser, Peter; Spivakov, Mikhail; Valencia, Alfonso; Rico, Daniel

    2016-07-08

    Network analysis is a powerful way of modeling chromatin interactions. Assortativity is a network property used in social sciences to identify factors affecting how people establish social ties. We propose a new approach, using chromatin assortativity, to integrate the epigenomic landscape of a specific cell type with its chromatin interaction network and thus investigate which proteins or chromatin marks mediate genomic contacts. We use high-resolution promoter capture Hi-C and Hi-Cap data as well as ChIA-PET data from mouse embryonic stem cells to investigate promoter-centered chromatin interaction networks and calculate the presence of specific epigenomic features in the chromatin fragments constituting the nodes of the network. We estimate the association of these features with the topology of four chromatin interaction networks and identify features localized in connected areas of the network. Polycomb group proteins and associated histone marks are the features with the highest chromatin assortativity in promoter-centered networks. We then ask which features distinguish contacts amongst promoters from contacts between promoters and other genomic elements. We observe higher chromatin assortativity of the actively elongating form of RNA polymerase 2 (RNAPII) compared with inactive forms only in interactions between promoters and other elements. Contacts among promoters and between promoters and other elements have different characteristic epigenomic features. We identify a possible role for the elongating form of RNAPII in mediating interactions among promoters, enhancers, and transcribed gene bodies. Our approach facilitates the study of multiple genome-wide epigenomic profiles, considering network topology and allowing the comparison of chromatin interaction networks.

  17. Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

    PubMed Central

    Moore, Carrie B.; Wallace, John R.; Wolfe, Daniel J.; Frase, Alex T.; Pendergrass, Sarah A.; Weiss, Kenneth M.; Ritchie, Marylyn D.

    2013-01-01

    Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses. PMID:24385916

  18. The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat

    PubMed Central

    Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

    2016-01-01

    Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. PMID:26645327

  19. Comparative genomics of the marine bacterial genus Glaciecola reveals the high degree of genomic diversity and genomic characteristic for cold adaptation.

    PubMed

    Qin, Qi-Long; Xie, Bin-Bin; Yu, Yong; Shu, Yan-Li; Rong, Jin-Cheng; Zhang, Yan-Jiao; Zhao, Dian-Li; Chen, Xiu-Lan; Zhang, Xi-Ying; Chen, Bo; Zhou, Bai-Cheng; Zhang, Yu-Zhong

    2014-06-01

    To what extent the genomes of different species belonging to one genus can be diverse and the relationship between genomic differentiation and environmental factor remain unclear for oceanic bacteria. With many new bacterial genera and species being isolated from marine environments, this question warrants attention. In this study, we sequenced all the type strains of the published species of Glaciecola, a recently defined cold-adapted genus with species from diverse marine locations, to study the genomic diversity and cold-adaptation strategy in this genus.The genome size diverged widely from 3.08 to 5.96 Mb, which can be explained by massive gene gain and loss events. Horizontal gene transfer and new gene emergence contributed substantially to the genome size expansion. The genus Glaciecola had an open pan-genome. Comparative genomic research indicated that species of the genus Glaciecola had high diversity in genome size, gene content and genetic relatedness. This may be prevalent in marine bacterial genera considering the dynamic and complex environments of the ocean. Species of Glaciecola had some common genomic features related to cold adaptation, which enable them to thrive and play a role in biogeochemical cycle in the cold marine environments.

  20. Comparative Genomics in Drosophila.

    PubMed

    Oti, Martin; Pane, Attilio; Sammeth, Michael

    2018-01-01

    Since the pioneering studies of Thomas Hunt Morgan and coworkers at the dawn of the twentieth century, Drosophila melanogaster and its sister species have tremendously contributed to unveil the rules underlying animal genetics, development, behavior, evolution, and human disease. Recent advances in DNA sequencing technologies launched Drosophila into the post-genomic era and paved the way for unprecedented comparative genomics investigations. The complete sequencing and systematic comparison of the genomes from 12 Drosophila species represents a milestone achievement in modern biology, which allowed a plethora of different studies ranging from the annotation of known and novel genomic features to the evolution of chromosomes and, ultimately, of entire genomes. Despite the efforts of countless laboratories worldwide, the vast amount of data that were produced over the past 15 years is far from being fully explored.In this chapter, we will review some of the bioinformatic approaches that were developed to interrogate the genomes of the 12 Drosophila species. Setting off from alignments of the entire genomic sequences, the degree of conservation can be separately evaluated for every region of the genome, providing already first hints about elements that are under purifying selection and therefore likely functional. Furthermore, the careful analysis of repeated sequences sheds light on the evolutionary dynamics of transposons, an enigmatic and fascinating class of mobile elements housed in the genomes of animals and plants. Comparative genomics also aids in the computational identification of the transcriptionally active part of the genome, first and foremost of protein-coding loci, but also of transcribed nevertheless apparently noncoding regions, which were once considered "junk" DNA. Eventually, the synergy between functional and comparative genomics also facilitates in silico and in vivo studies on cis-acting regulatory elements, like transcription factor binding sites, that due to the high degree of sequence variability usually impose increased challenges for bioinformatics approaches.

  1. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk

    PubMed Central

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B.; Huson, Daniel H.; Frick, Julia-Stefanie

    2016-01-01

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. PMID:27071651

  2. Global gene expression profiles of Phytophthora ramorum strain pr102 in response to plant host and tissue differentiation

    Treesearch

    Caroline M. Press; Niklaus J. Grunwald

    2008-01-01

    The release of the draft genome sequence of P. ramorum strain Pr102, enabled the construction of an oligonucleotide microarray of the entire genome of Pr102. The array contains 344,680 features (oligos) that represent the transcriptome of Pr102. P. ramorum RNA was extracted from mycelium and sporangia and used to compare gene...

  3. Genome sequencing and analysis of a highly virulent Vibrio parahaemolyticus strain isolated from the marine environment

    NASA Astrophysics Data System (ADS)

    Parks, M. C.; Moreno, E.

    2016-02-01

    Vibrio parahaemolyticus [Vp] is a Gram-negative bacterium and a natural inhabitant of coastal marine ecosystems worldwide. Vp is also a coincidental pathogen of humans. Virulent strains are commonly identified by the presence of the thermostable direct (tdh) or tdh-related (trh) hemolysin genes. However, virulence is multifaceted and many clinical Vp isolates do not carry tdh or trh. In this study, we sequenced and assembled the draft genome of a tdh- and trh-negative environmental isolate (805) shown previously to be highly virulent in zebrafish. To investigate potential mechanisms of virulence, we compared 805 to the clinical V. parahaemolyticus type strain (RIMD2210633). Pairwise comparison revealed the presence of multiple genomic regions including an IncF conjugative pilus (1.3 Kb) and a colicin V plasmid (1.49 Kb). These features are homologous to genomic regions present in clinical V. vulnificus and V. cholerae strains. Genome comparison also revealed the presence of five toxin-antitoxin systems. Isolate 805 likely attained these new features through the lateral acquisition of mobile genomic material - a hypothesis supported by the aberrant GC content of these regions. Colicin V plasmids are a diverse group of IncF plasmids found in invasive bacterial strains. Similarly, an abundance of toxin-antitoxin systems have been linked to virulence in Gram-negative bacteria. Current efforts are focused on characterizing 142 coding features present in 805 but absent from the type strain.

  4. Comparative genomics reveals high biological diversity and specific adaptations in the industrially and medically important fungal genus Aspergillus.

    PubMed

    de Vries, Ronald P; Riley, Robert; Wiebenga, Ad; Aguilar-Osorio, Guillermo; Amillis, Sotiris; Uchima, Cristiane Akemi; Anderluh, Gregor; Asadollahi, Mojtaba; Askin, Marion; Barry, Kerrie; Battaglia, Evy; Bayram, Özgür; Benocci, Tiziano; Braus-Stromeyer, Susanna A; Caldana, Camila; Cánovas, David; Cerqueira, Gustavo C; Chen, Fusheng; Chen, Wanping; Choi, Cindy; Clum, Alicia; Dos Santos, Renato Augusto Corrêa; Damásio, André Ricardo de Lima; Diallinas, George; Emri, Tamás; Fekete, Erzsébet; Flipphi, Michel; Freyberg, Susanne; Gallo, Antonia; Gournas, Christos; Habgood, Rob; Hainaut, Matthieu; Harispe, María Laura; Henrissat, Bernard; Hildén, Kristiina S; Hope, Ryan; Hossain, Abeer; Karabika, Eugenia; Karaffa, Levente; Karányi, Zsolt; Kraševec, Nada; Kuo, Alan; Kusch, Harald; LaButti, Kurt; Lagendijk, Ellen L; Lapidus, Alla; Levasseur, Anthony; Lindquist, Erika; Lipzen, Anna; Logrieco, Antonio F; MacCabe, Andrew; Mäkelä, Miia R; Malavazi, Iran; Melin, Petter; Meyer, Vera; Mielnichuk, Natalia; Miskei, Márton; Molnár, Ákos P; Mulé, Giuseppina; Ngan, Chew Yee; Orejas, Margarita; Orosz, Erzsébet; Ouedraogo, Jean Paul; Overkamp, Karin M; Park, Hee-Soo; Perrone, Giancarlo; Piumi, Francois; Punt, Peter J; Ram, Arthur F J; Ramón, Ana; Rauscher, Stefan; Record, Eric; Riaño-Pachón, Diego Mauricio; Robert, Vincent; Röhrig, Julian; Ruller, Roberto; Salamov, Asaf; Salih, Nadhira S; Samson, Rob A; Sándor, Erzsébet; Sanguinetti, Manuel; Schütze, Tabea; Sepčić, Kristina; Shelest, Ekaterina; Sherlock, Gavin; Sophianopoulou, Vicky; Squina, Fabio M; Sun, Hui; Susca, Antonia; Todd, Richard B; Tsang, Adrian; Unkles, Shiela E; van de Wiele, Nathalie; van Rossen-Uffink, Diana; Oliveira, Juliana Velasco de Castro; Vesth, Tammi C; Visser, Jaap; Yu, Jae-Hyuk; Zhou, Miaomiao; Andersen, Mikael R; Archer, David B; Baker, Scott E; Benoit, Isabelle; Brakhage, Axel A; Braus, Gerhard H; Fischer, Reinhard; Frisvad, Jens C; Goldman, Gustavo H; Houbraken, Jos; Oakley, Berl; Pócsi, István; Scazzocchio, Claudio; Seiboth, Bernhard; vanKuyk, Patricia A; Wortman, Jennifer; Dyer, Paul S; Grigoriev, Igor V

    2017-02-14

    The fungal genus Aspergillus is of critical importance to humankind. Species include those with industrial applications, important pathogens of humans, animals and crops, a source of potent carcinogenic contaminants of food, and an important genetic model. The genome sequences of eight aspergilli have already been explored to investigate aspects of fungal biology, raising questions about evolution and specialization within this genus. We have generated genome sequences for ten novel, highly diverse Aspergillus species and compared these in detail to sister and more distant genera. Comparative studies of key aspects of fungal biology, including primary and secondary metabolism, stress response, biomass degradation, and signal transduction, revealed both conservation and diversity among the species. Observed genomic differences were validated with experimental studies. This revealed several highlights, such as the potential for sex in asexual species, organic acid production genes being a key feature of black aspergilli, alternative approaches for degrading plant biomass, and indications for the genetic basis of stress response. A genome-wide phylogenetic analysis demonstrated in detail the relationship of the newly genome sequenced species with other aspergilli. Many aspects of biological differences between fungal species cannot be explained by current knowledge obtained from genome sequences. The comparative genomics and experimental study, presented here, allows for the first time a genus-wide view of the biological diversity of the aspergilli and in many, but not all, cases linked genome differences to phenotype. Insights gained could be exploited for biotechnological and medical applications of fungi.

  5. Insights into Conifer Giga-Genomes1

    PubMed Central

    De La Torre, Amanda R.; Birol, Inanc; Bousquet, Jean; Ingvarsson, Pär K.; Jansson, Stefan; Jones, Steven J.M.; Keeling, Christopher I.; MacKay, John; Nilsson, Ove; Ritland, Kermit; Street, Nathaniel; Yanchuk, Alvin; Zerbe, Philipp; Bohlmann, Jörg

    2014-01-01

    Insights from sequenced genomes of major land plant lineages have advanced research in almost every aspect of plant biology. Until recently, however, assembled genome sequences of gymnosperms have been missing from this picture. Conifers of the pine family (Pinaceae) are a group of gymnosperms that dominate large parts of the world’s forests. Despite their ecological and economic importance, conifers seemed long out of reach for complete genome sequencing, due in part to their enormous genome size (20–30 Gb) and the highly repetitive nature of their genomes. Technological advances in genome sequencing and assembly enabled the recent publication of three conifer genomes: white spruce (Picea glauca), Norway spruce (Picea abies), and loblolly pine (Pinus taeda). These genome sequences revealed distinctive features compared with other plant genomes and may represent a window into the past of seed plant genomes. This Update highlights recent advances, remaining challenges, and opportunities in light of the publication of the first conifer and gymnosperm genomes. PMID:25349325

  6. Insights into conifer giga-genomes.

    PubMed

    De La Torre, Amanda R; Birol, Inanc; Bousquet, Jean; Ingvarsson, Pär K; Jansson, Stefan; Jones, Steven J M; Keeling, Christopher I; MacKay, John; Nilsson, Ove; Ritland, Kermit; Street, Nathaniel; Yanchuk, Alvin; Zerbe, Philipp; Bohlmann, Jörg

    2014-12-01

    Insights from sequenced genomes of major land plant lineages have advanced research in almost every aspect of plant biology. Until recently, however, assembled genome sequences of gymnosperms have been missing from this picture. Conifers of the pine family (Pinaceae) are a group of gymnosperms that dominate large parts of the world's forests. Despite their ecological and economic importance, conifers seemed long out of reach for complete genome sequencing, due in part to their enormous genome size (20-30 Gb) and the highly repetitive nature of their genomes. Technological advances in genome sequencing and assembly enabled the recent publication of three conifer genomes: white spruce (Picea glauca), Norway spruce (Picea abies), and loblolly pine (Pinus taeda). These genome sequences revealed distinctive features compared with other plant genomes and may represent a window into the past of seed plant genomes. This Update highlights recent advances, remaining challenges, and opportunities in light of the publication of the first conifer and gymnosperm genomes. © 2014 American Society of Plant Biologists. All Rights Reserved.

  7. The genome sequence of Dyella jiangningensis FCAV SCS01 from a lignocellulose-decomposing microbial consortium metagenome reveals potential for biotechnological applications.

    PubMed

    Desiderato, Joana G; Alvarenga, Danillo O; Constancio, Milena T L; Alves, Lucia M C; Varani, Alessandro M

    2018-05-14

    Cellulose and its associated polymers are structural components of the plant cell wall, constituting one of the major sources of carbon and energy in nature. The carbon cycle is dependent on cellulose- and lignin-decomposing microbial communities and their enzymatic systems acting as consortia. These microbial consortia are under constant exploration for their potential biotechnological use. Herein, we describe the characterization of the genome of Dyella jiangningensis FCAV SCS01, recovered from the metagenome of a lignocellulose-degrading microbial consortium, which was isolated from a sugarcane crop soil under mechanical harvesting and covered by decomposing straw. The 4.7 Mbp genome encodes 4,194 proteins, including 36 glycoside hydrolases (GH), supporting the hypothesis that this bacterium may contribute to lignocellulose decomposition. Comparative analysis among fully sequenced Dyella species indicate that the genome synteny is not conserved, and that D. jiangningensis FCAV SCS01 carries 372 unique genes, including an alpha-glucosidase and maltodextrin glucosidase coding genes, and other potential biomass degradation related genes. Additional genomic features, such as prophage-like, genomic islands and putative new biosynthetic clusters were also uncovered. Overall, D. jiangningensis FCAV SCS01 represents the first South American Dyella genome sequenced and shows an exclusive feature among its genus, related to biomass degradation.

  8. Analysis and Prediction of Exon Skipping Events from RNA-Seq with Sequence Information Using Rotation Forest.

    PubMed

    Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping

    2017-12-12

    In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.

  9. Draft genome sequence of chloride-tolerant Leptospirillum ferriphilum Sp-Cl from industrial bioleaching operations in northern Chile.

    PubMed

    Issotta, Francisco; Galleguillos, Pedro A; Moya-Beltrán, Ana; Davis-Belmar, Carol S; Rautenbach, George; Covarrubias, Paulo C; Acosta, Mauricio; Ossandon, Francisco J; Contador, Yasna; Holmes, David S; Marín-Eliantonio, Sabrina; Quatrini, Raquel; Demergasso, Cecilia

    2016-01-01

    Leptospirillum ferriphilum Sp-Cl is a Gram negative, thermotolerant, curved, rod-shaped bacterium, isolated from an industrial bioleaching operation in northern Chile, where chalcocite is the major copper mineral and copper hydroxychloride atacamite is present in variable proportions in the ore. This strain has unique features as compared to the other members of the species, namely resistance to elevated concentrations of chloride, sulfate and metals. Basic microbiological features and genomic properties of this biotechnologically relevant strain are described in this work. The 2,475,669 bp draft genome is arranged into 74 scaffolds of 74 contigs. A total of 48 RNA genes and 2,834 protein coding genes were predicted from its annotation; 55 % of these were assigned a putative function. Release of the genome sequence of this strain will provide further understanding of the mechanisms used by acidophilic bacteria to endure high osmotic stress and high chloride levels and of the role of chloride-tolerant iron-oxidizers in industrial bioleaching operations.

  10. Hemipteran Mitochondrial Genomes: Features, Structures and Implications for Phylogeny

    PubMed Central

    Wang, Yuan; Chen, Jing; Jiang, Li-Yun; Qiao, Ge-Xia

    2015-01-01

    The study of Hemipteran mitochondrial genomes (mitogenomes) began with the Chagas disease vector, Triatoma dimidiata, in 2001. At present, 90 complete Hemipteran mitogenomes have been sequenced and annotated. This review examines the history of Hemipteran mitogenomes research and summarizes the main features of them including genome organization, nucleotide composition, protein-coding genes, tRNAs and rRNAs, and non-coding regions. Special attention is given to the comparative analysis of repeat regions. Gene rearrangements are an additional data type for a few families, and most mitogenomes are arranged in the same order to the proposed ancestral insect. We also discuss and provide insights on the phylogenetic analyses of a variety of taxonomic levels. This review is expected to further expand our understanding of research in this field and serve as a valuable reference resource. PMID:26039239

  11. Short interspersed element (SINE) depletion and long interspersed element (LINE) abundance are not features universally required for imprinting.

    PubMed

    Cowley, Michael; de Burca, Anna; McCole, Ruth B; Chahal, Mandeep; Saadat, Ghazal; Oakey, Rebecca J; Schulz, Reiner

    2011-04-20

    Genomic imprinting is a form of gene dosage regulation in which a gene is expressed from only one of the alleles, in a manner dependent on the parent of origin. The mechanisms governing imprinted gene expression have been investigated in detail and have greatly contributed to our understanding of genome regulation in general. Both DNA sequence features, such as CpG islands, and epigenetic features, such as DNA methylation and non-coding RNAs, play important roles in achieving imprinted expression. However, the relative importance of these factors varies depending on the locus in question. Defining the minimal features that are absolutely required for imprinting would help us to understand how imprinting has evolved mechanistically. Imprinted retrogenes are a subset of imprinted loci that are relatively simple in their genomic organisation, being distinct from large imprinting clusters, and have the potential to be used as tools to address this question. Here, we compare the repeat element content of imprinted retrogene loci with non-imprinted controls that have a similar locus organisation. We observe no significant differences that are conserved between mouse and human, suggesting that the paucity of SINEs and relative abundance of LINEs at imprinted loci reported by others is not a sequence feature universally required for imprinting.

  12. Tracing phylogenomic events leading to diversity of Haemophilus influenzae and the emergence of Brazilian Purpuric Fever (BPF)-associated clones.

    PubMed

    Papazisi, Leka; Ratnayake, Shashikala; Remortel, Brian G; Bock, Geoffrey R; Liang, Wei; Saeed, Alexander I; Liu, Jia; Fleischmann, Robert D; Kilian, Mogens; Peterson, Scott N

    2010-11-01

    Here we report the use of a multi-genome DNA microarray to elucidate the genomic events associated with the emergence of the clonal variants of Haemophilus influenzae biogroup aegyptius causing Brazilian Purpuric Fever (BPF), an important pediatric disease with a high mortality rate. We performed directed genome sequencing of strain HK1212 unique loci to construct a species DNA microarray. Comparative genome hybridization using this microarray enabled us to determine and compare gene complements, and infer reliable phylogenomic relationships among members of the species. The higher genomic variability observed in the genomes of BPF-related strains (clones) and their close relatives may be characterized by significant gene flux related to a subset of functional role categories. We found that the acquisition of a large number of virulence determinants featuring numerous cell membrane proteins coupled to the loss of genes involved in transport, central biosynthetic pathways and in particular, energy production pathways to be characteristics of the BPF genomic variants. Copyright © 2010 Elsevier Inc. All rights reserved.

  13. Comparative whole genome analysis of six diagnostic brucellaphages.

    PubMed

    Farlow, Jason; Filippov, Andrey A; Sergueev, Kirill V; Hang, Jun; Kotorashvili, Adam; Nikolich, Mikeljon P

    2014-05-15

    Whole genome sequencing of six diagnostic brucellaphages, Tbilisi (Tb), Firenze (Fz), Weybridge (Wb), S708, Berkeley (Bk) and R/C, was followed with genomic comparisons including recently described genomes of the Tb phage from Mexico (TbM) and Pr phage to elucidate genomic diversity and candidate host range determinants. Comparative whole genome analysis revealed high sequence homogeneity among these brucellaphage genomes and resolved three genetic groups consistent with defined host range phenotypes. Group I was composed of Tb and Fz phages that are predominantly lytic for Brucella abortus and Brucella neotomae; Group II included Bk, R/C, and Pr phages that are lytic mainly for B. abortus, Brucella melitensis and Brucella suis; Group III was composed of Wb and S708 phages that are lytic for B. suis, B. abortus and B. neotomae. We found that the putative phage collar protein is a variable locus with features that may be contributing to the host specificities exhibited by different brucellaphage groups. The presence of several candidate host range determinants is illustrated herein for future dissection of the differential host specificity observed among these phages. Published by Elsevier B.V.

  14. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Challacombe, Jean F.; Stubben, Chris J.; Klimko, Christopher P.

    Infection by the Gram-negative pathogen Burkholderia pseudomallei results in the disease melioidosis, acquired from the environment in parts of southeast Asia and northern Australia. Clinical symptoms of melioidosis range from acute (fever, pneumonia, septicemia, and localized infection) to chronic (abscesses in various organs and tissues, most commonly occurring in the lungs, liver, spleen, kidney, prostate and skeletal muscle), and persistent infections in humans are difficult to cure. Understanding the basic biology and genomics of B. pseudomallei is imperative for the development of new vaccines and therapeutic interventions. This formidable task is becoming more tractable due to the increasing number ofmore » B. pseudomallei genomes that are being sequenced and compared. Here, we compared three B. pseudomallei genomes, from strains MSHR668, K96243 and 1106a, to identify features that might explain why MSHR668 is more virulent than K96243 and 1106a in a mouse model of B. pseudomallei infection. Our analyses focused on metabolic, virulence and regulatory genes that were present in MSHR668 but absent from both K96243 and 1106a. We also noted features present in K96243 and 1106a but absent from MSHR668, and identified genomic differences that may contribute to variations in virulence noted among the three B. pseudomallei isolates. While this work contributes to our understanding of B. pseudomallei genomics, more detailed experiments are necessary to characterize the relevance of specific genomic features to B. pseudomallei metabolism and virulence. Functional analyses of metabolic networks, virulence and regulation shows promise for examining the effects of B. pseudomallei on host cell metabolism and will lay a foundation for future prediction of the virulence of emerging strains. Continued emphasis in this area will be critical for protection against melioidosis, as a better understanding of what constitutes a fully virulent Burkholderia isolate may provide for better diagnostic and medical countermeasure strategies.« less

  15. Complete mitochondrial DNA sequence of oyster Crassostrea hongkongensis-a case of "Tandem duplication-random loss" for genome rearrangement in Crassostrea?

    PubMed Central

    Yu, Ziniu; Wei, Zhengpeng; Kong, Xiaoyu; Shi, Wei

    2008-01-01

    Background Mitochondrial DNA sequences are extensively used as genetic markers not only for studies of population or ecological genetics, but also for phylogenetic and evolutionary analyses. Complete mt-sequences can reveal information about gene order and its variation, as well as gene and genome evolution when sequences from multiple phyla are compared. Mitochondrial gene order is highly variable among mollusks, with bivalves exhibiting the most variability. Of the 41 complete mt genomes sequenced so far, 12 are from bivalves. We determined, in the current study, the complete mitochondrial DNA sequence of Crassostrea hongkongensis. We present here an analysis of features of its gene content and genome organization in comparison with two other Crassostrea species to assess the variation within bivalves and among main groups of mollusks. Results The complete mitochondrial genome of C. hongkongensis was determined using long PCR and a primer walking sequencing strategy with genus-specific primers. The genome is 16,475 bp in length and contains 12 protein-coding genes (the atp8 gene is missing, as in most bivalves), 22 transfer tRNA genes (including a suppressor tRNA gene), and 2 ribosomal RNA genes, all of which appear to be transcribed from the same strand. A striking finding of this study is that a DNA segment containing four tRNA genes (trnk1, trnC, trnQ1 and trnN) and two duplicated or split rRNA gene (rrnL5' and rrnS) are absent from the genome, when compared with that of two other extant Crassostrea species, which is very likely a consequence of loss of a single genomic region present in ancestor of C. hongkongensis. It indicates this region seem to be a "hot spot" of genomic rearrangements over the Crassostrea mt-genomes. The arrangement of protein-coding genes in C. hongkongensis is identical to that of Crassostrea gigas and Crassostrea virginica, but higher amino acid sequence identities are shared between C. hongkongensis and C. gigas than between other pairs. There exists significant codon bias, favoring codons ending in A or T and against those ending with C. Pair analysis of genome rearrangements showed that the rearrangement distance is great between C. gigas-C. hongkongensis and C. virginica, indicating a high degree of rearrangements within Crassostrea. The determination of complete mt-genome of C. hongkongensis has yielded useful insight into features of gene order, variation, and evolution of Crassostrea and bivalve mt-genomes. Conclusion The mt-genome of C. hongkongensis shares some similarity with, and interesting differences to, other Crassostrea species and bivalves. The absence of trnC and trnN genes and duplicated or split rRNA genes from the C. hongkongensis genome is a completely novel feature not previously reported in Crassostrea species. The phenomenon is likely due to the loss of a segment that is present in other Crassostrea species and was present in ancestor of C. hongkongensis, thus a case of "tandem duplication-random loss (TDRL)". The mt-genome and new feature presented here reveal and underline the high level variation of gene order and gene content in Crassostrea and bivalves, inspiring more research to gain understanding to mechanisms underlying gene and genome evolution in bivalves and mollusks. PMID:18847502

  16. The complete mitochondrial genome of the stomatopod crustacean Squilla mantis

    PubMed Central

    Cook, Charles E

    2005-01-01

    Background Animal mitochondrial genomes are physically separate from the much larger nuclear genomes and have proven useful both for phylogenetic studies and for understanding genome evolution. Within the phylum Arthropoda the subphylum Crustacea includes over 50,000 named species with immense variation in body plans and habitats, yet only 23 complete mitochondrial genomes are available from this subphylum. Results I describe here the complete mitochondrial genome of the crustacean Squilla mantis (Crustacea: Malacostraca: Stomatopoda). This 15994-nucleotide genome, the first described from a hoplocarid, contains the standard complement of 13 protein-coding genes, 22 transfer RNA genes, two ribosomal RNA genes, and a non-coding AT-rich region that is found in most other metazoans. The gene order is identical to that considered ancestral for hexapods and crustaceans. The 70% AT base composition is within the range described for other arthropods. A single unusual feature of the genome is a 230 nucleotide non-coding region between a serine transfer RNA and the nad1 gene, which has no apparent function. I also compare gene order, nucleotide composition, and codon usage of the S. mantis genome and eight other malacostracan crustaceans. A translocation of the histidine transfer RNA gene is shared by three taxa in the order Decapoda, infraorder Brachyura; Callinectes sapidus, Portunus trituberculatus and Pseudocarcinus gigas. This translocation may be diagnostic for the Brachyura. For all nine taxa nucleotide composition is biased towards AT-richness, as expected for arthropods, and is within the range reported for other arthropods. Codon usage is biased, and much of this bias is probably due to the skew in nucleotide composition towards AT-richness. Conclusion The mitochondrial genome of Squilla mantis contains one unusual feature, a 230 base pair non-coding region has so far not been described in any other malacostracan. Comparisons with other Malacostraca show that all nine genomes, like most other mitochondrial genomes, share a bias toward AT-richness and a related bias in codon usage. The nine malacostracans included in this analysis are not representative of the diversity of the class Malacostraca, and additional malacostracan sequences would surely reveal other unusual genomic features that could be useful in understanding mitochondrial evolution in this taxon. PMID:16091132

  17. Whole-genome phylogeny of Escherichia coli/Shigella group by feature frequency profiles (FFPs)

    PubMed Central

    Sims, Gregory E.; Kim, Sung-Hou

    2011-01-01

    A whole-genome phylogeny of the Escherichia coli/Shigella group was constructed by using the feature frequency profile (FFP) method. This alignment-free approach uses the frequencies of l-mer features of whole genomes to infer phylogenic distances. We present two phylogenies that accentuate different aspects of E. coli/Shigella genomic evolution: (i) one based on the compositions of all possible features of length l = 24 (∼8.4 million features), which are likely to reveal the phenetic grouping and relationship among the organisms and (ii) the other based on the compositions of core features with low frequency and low variability (∼0.56 million features), which account for ∼69% of all commonly shared features among 38 taxa examined and are likely to have genome-wide lineal evolutionary signal. Shigella appears as a single clade when all possible features are used without filtering of noncore features. However, results using core features show that Shigella consists of at least two distantly related subclades, implying that the subclades evolved into a single clade because of a high degree of convergence influenced by mobile genetic elements and niche adaptation. In both FFP trees, the basal group of the E. coli/Shigella phylogeny is the B2 phylogroup, which contains primarily uropathogenic strains, suggesting that the E. coli/Shigella ancestor was likely a facultative or opportunistic pathogen. The extant commensal strains diverged relatively late and appear to be the result of reductive evolution of genomes. We also identify clade distinguishing features and their associated genomic regions within each phylogroup. Such features may provide useful information for understanding evolution of the groups and for quick diagnostic identification of each phylogroup. PMID:21536867

  18. Genomic analysis of WCP30 Phage of Weissella cibaria for Dairy Fermented Foods.

    PubMed

    Lee, Young-Duck; Park, Jong-Hyun

    2017-01-01

    In this study, we report the morphogenetic analysis and genome sequence of a new WCP30 phage of Weissella cibaria , isolated from a fermented food. Based on its morphology, as observed by transmission electron microscopy, WCP30 phage belongs to the family Siphoviridae . Genomic analysis of WCP30 phage showed that it had a 33,697-bp double-stranded DNA genome with 41.2% G+C content. Bioinformatics analysis of the genome revealed 35 open reading frames. A BLASTN search showed that WCP30 phage had low sequence similarity compared to other phages infecting lactic acid bacteria. This is the first report of the morphological features and complete genome sequence of WCP30 phage, which may be useful for controlling the fermentation of dairy foods.

  19. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

    PubMed

    Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A

    2017-10-15

    Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. favorov@sensi.org. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  20. Three-Dimensional Genome Organization and Function in Drosophila

    PubMed Central

    Schwartz, Yuri B.; Cavalli, Giacomo

    2017-01-01

    Understanding how the metazoan genome is used during development and cell differentiation is one of the major challenges in the postgenomic era. Early studies in Drosophila suggested that three-dimensional (3D) chromosome organization plays important regulatory roles in this process and recent technological advances started to reveal connections at the molecular level. Here we will consider general features of the architectural organization of the Drosophila genome, providing historical perspective and insights from recent work. We will compare the linear and spatial segmentation of the fly genome and focus on the two key regulators of genome architecture: insulator components and Polycomb group proteins. With its unique set of genetic tools and a compact, well annotated genome, Drosophila is poised to remain a model system of choice for rapid progress in understanding principles of genome organization and to serve as a proving ground for development of 3D genome-engineering techniques. PMID:28049701

  1. Genome sequencing of mucosal melanomas reveals that they are driven by distinct mechanisms from cutaneous melanoma.

    PubMed

    Furney, Simon J; Turajlic, Samra; Stamp, Gordon; Nohadani, Mahrokh; Carlisle, Anna; Thomas, J Meirion; Hayes, Andrew; Strauss, Dirk; Gore, Martin; van den Oord, Joost; Larkin, James; Marais, Richard

    2013-07-01

    Mucosal melanoma displays distinct clinical and epidemiological features compared to cutaneous melanoma. Here we used whole genome and whole exome sequencing to characterize the somatic alterations and mutation spectra in the genomes of ten mucosal melanomas. We observed somatic mutation rates that are considerably lower than occur in sun-exposed cutaneous melanoma, but comparable to the rates seen in cancers not associated with exposure to known mutagens. In particular, the mutation signatures are not indicative of ultraviolet light- or tobacco smoke-induced DNA damage. Genes previously reported as mutated in other cancers were also mutated in mucosal melanoma. Notably, there were substantially more copy number and structural variations in mucosal melanoma than have been reported in cutaneous melanoma. Thus, mucosal and cutaneous melanomas are distinct diseases with discrete genetic features. Our data suggest that different mechanisms underlie the genesis of these diseases and that structural variations play a more important role in mucosal than in cutaneous melanomagenesis. Copyright © 2013 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.

  2. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  3. Bolbase: a comprehensive genomics database for Brassica oleracea.

    PubMed

    Yu, Jingyin; Zhao, Meixia; Wang, Xiaowu; Tong, Chaobo; Huang, Shunmou; Tehrim, Sadia; Liu, Yumei; Hua, Wei; Liu, Shengyi

    2013-09-30

    Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives. We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting. Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase.

  4. A tutorial of diverse genome analysis tools found in the CoGe web-platform using Plasmodium spp. as a model

    PubMed Central

    Castillo, Andreina I; Nelson, Andrew D L; Haug-Baltzell, Asher K; Lyons, Eric

    2018-01-01

    Abstract Integrated platforms for storage, management, analysis and sharing of large quantities of omics data have become fundamental to comparative genomics. CoGe (https://genomevolution.org/coge/) is an online platform designed to manage and study genomic data, enabling both data- and hypothesis-driven comparative genomics. CoGe’s tools and resources can be used to organize and analyse both publicly available and private genomic data from any species. Here, we demonstrate the capabilities of CoGe through three example workflows using 17 Plasmodium genomes as a model. Plasmodium genomes present unique challenges for comparative genomics due to their rapidly evolving and highly variable genomic AT/GC content. These example workflows are intended to serve as templates to help guide researchers who would like to use CoGe to examine diverse aspects of genome evolution. In the first workflow, trends in genome composition and amino acid usage are explored. In the second, changes in genome structure and the distribution of synonymous (Ks) and non-synonymous (Kn) substitution values are evaluated across species with different levels of evolutionary relatedness. In the third workflow, microsyntenic analyses of multigene families’ genomic organization are conducted using two Plasmodium-specific gene families—serine repeat antigen, and cytoadherence-linked asexual gene—as models. In general, these example workflows show how to achieve quick, reproducible and shareable results using the CoGe platform. We were able to replicate previously published results, as well as leverage CoGe’s tools and resources to gain additional insight into various aspects of Plasmodium genome evolution. Our results highlight the usefulness of the CoGe platform, particularly in understanding complex features of genome evolution. Database URL: https://genomevolution.org/coge/

  5. FGWAS: Functional genome wide association analysis.

    PubMed

    Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-10-01

    Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Shifts in the evolutionary rate and intensity of purifying selection between two Brassica genomes revealed by analyses of orthologous transposons and relics of a whole genome triplication.

    PubMed

    Zhao, Meixia; Du, Jianchang; Lin, Feng; Tong, Chaobo; Yu, Jingyin; Huang, Shunmou; Wang, Xiaowu; Liu, Shengyi; Ma, Jianxin

    2013-10-01

    Recent sequencing of the Brassica rapa and Brassica oleracea genomes revealed extremely contrasting genomic features such as the abundance and distribution of transposable elements between the two genomes. However, whether and how these structural differentiations may have influenced the evolutionary rates of the two genomes since their split from a common ancestor are unknown. Here, we investigated and compared the rates of nucleotide substitution between two long terminal repeats (LTRs) of individual orthologous LTR-retrotransposons, the rates of synonymous and non-synonymous substitution among triplicated genes retained in both genomes from a shared whole genome triplication event, and the rates of genetic recombination estimated/deduced by the comparison of physical and genetic distances along chromosomes and ratios of solo LTRs to intact elements. Overall, LTR sequences and genic sequences showed more rapid nucleotide substitution in B. rapa than in B. oleracea. Synonymous substitution of triplicated genes retained from a shared whole genome triplication was detected at higher rates in B. rapa than in B. oleracea. Interestingly, non-synonymous substitution was observed at lower rates in the former than in the latter, indicating shifted densities of purifying selection between the two genomes. In addition to evolutionary asymmetry, orthologous genes differentially regulated and/or disrupted by transposable elements between the two genomes were also characterized. Our analyses suggest that local genomic and epigenomic features, such as recombination rates and chromatin dynamics reshaped by independent proliferation of transposable elements and elimination between the two genomes, are perhaps partially the causes and partially the outcomes of the observed inter-specific asymmetric evolution. © 2013 Purdue University The Plant Journal © 2013 John Wiley & Sons Ltd.

  7. The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat.

    PubMed

    Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

    2016-02-01

    Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  8. The Sinocyclocheilus cavefish genome provides insights into cave adaptation.

    PubMed

    Yang, Junxing; Chen, Xiaoli; Bai, Jie; Fang, Dongming; Qiu, Ying; Jiang, Wansheng; Yuan, Hui; Bian, Chao; Lu, Jiang; He, Shiyang; Pan, Xiaofu; Zhang, Yaolei; Wang, Xiaoai; You, Xinxin; Wang, Yongsi; Sun, Ying; Mao, Danqing; Liu, Yong; Fan, Guangyi; Zhang, He; Chen, Xiaoyong; Zhang, Xinhui; Zheng, Lanping; Wang, Jintu; Cheng, Le; Chen, Jieming; Ruan, Zhiqiang; Li, Jia; Yu, Hui; Peng, Chao; Ma, Xingyu; Xu, Junmin; He, You; Xu, Zhengfeng; Xu, Pao; Wang, Jian; Yang, Huanming; Wang, Jun; Whitten, Tony; Xu, Xun; Shi, Qiong

    2016-01-04

    An emerging cavefish model, the cyprinid genus Sinocyclocheilus, is endemic to the massive southwestern karst area adjacent to the Qinghai-Tibetan Plateau of China. In order to understand whether orogeny influenced the evolution of these species, and how genomes change under isolation, especially in subterranean habitats, we performed whole-genome sequencing and comparative analyses of three species in this genus, S. grahami, S. rhinocerous and S. anshuiensis. These species are surface-dwelling, semi-cave-dwelling and cave-restricted, respectively. The assembled genome sizes of S. grahami, S. rhinocerous and S. anshuiensis are 1.75 Gb, 1.73 Gb and 1.68 Gb, respectively. Divergence time and population history analyses of these species reveal that their speciation and population dynamics are correlated with the different stages of uplifting of the Qinghai-Tibetan Plateau. We carried out comparative analyses of these genomes and found that many genetic changes, such as gene loss (e.g. opsin genes), pseudogenes (e.g. crystallin genes), mutations (e.g. melanogenesis-related genes), deletions (e.g. scale-related genes) and down-regulation (e.g. circadian rhythm pathway genes), are possibly associated with the regressive features (such as eye degeneration, albinism, rudimentary scales and lack of circadian rhythms), and that some gene expansion (e.g. taste-related transcription factor gene) may point to the constructive features (such as enhanced taste buds) which evolved in these cave fishes. As the first report on cavefish genomes among distinct species in Sinocyclocheilus, our work provides not only insights into genetic mechanisms of cave adaptation, but also represents a fundamental resource for a better understanding of cavefish biology.

  9. Novel Virulent and Broad-Host-Range Erwinia amylovora Bacteriophages Reveal a High Degree of Mosaicism and a Relationship to Enterobacteriaceae Phages ▿†

    PubMed Central

    Born, Yannick; Fieseler, Lars; Marazzi, Janine; Lurz, Rudi; Duffy, Brion; Loessner, Martin J.

    2011-01-01

    A diverse set of 24 novel phages infecting the fire blight pathogen Erwinia amylovora was isolated from fruit production environments in Switzerland. Based on initial screening, four phages (L1, M7, S6, and Y2) with broad host ranges were selected for detailed characterization and genome sequencing. Phage L1 is a member of the Podoviridae, with a 39.3-kbp genome featuring invariable genome ends with direct terminal repeats. Phage S6, another podovirus, was also found to possess direct terminal repeats but has a larger genome (74.7 kbp), and the virus particle exhibits a complex tail fiber structure. Phages M7 and Y2 both belong to the Myoviridae family and feature long, contractile tails and genomes of 84.7 kbp (M7) and 56.6 kbp (Y2), respectively, with direct terminal repeats. The architecture of all four phage genomes is typical for tailed phages, i.e., organized into function-specific gene clusters. All four phages completely lack genes or functions associated with lysogeny control, which correlates well with their broad host ranges and indicates strictly lytic (virulent) lifestyles without the possibility for host lysogenization. Comparative genomics revealed that M7 is similar to E. amylovora virus ΦEa21-4, whereas L1, S6, and Y2 are unrelated to any other E. amylovora phage. Instead, they feature similarities to enterobacterial viruses T7, N4, and ΦEcoM-GJ1. In a series of laboratory experiments, we provide proof of concept that specific two-phage cocktails offer the potential for biocontrol of the pathogen. PMID:21764969

  10. Novel virulent and broad-host-range Erwinia amylovora bacteriophages reveal a high degree of mosaicism and a relationship to Enterobacteriaceae phages.

    PubMed

    Born, Yannick; Fieseler, Lars; Marazzi, Janine; Lurz, Rudi; Duffy, Brion; Loessner, Martin J

    2011-09-01

    A diverse set of 24 novel phages infecting the fire blight pathogen Erwinia amylovora was isolated from fruit production environments in Switzerland. Based on initial screening, four phages (L1, M7, S6, and Y2) with broad host ranges were selected for detailed characterization and genome sequencing. Phage L1 is a member of the Podoviridae, with a 39.3-kbp genome featuring invariable genome ends with direct terminal repeats. Phage S6, another podovirus, was also found to possess direct terminal repeats but has a larger genome (74.7 kbp), and the virus particle exhibits a complex tail fiber structure. Phages M7 and Y2 both belong to the Myoviridae family and feature long, contractile tails and genomes of 84.7 kbp (M7) and 56.6 kbp (Y2), respectively, with direct terminal repeats. The architecture of all four phage genomes is typical for tailed phages, i.e., organized into function-specific gene clusters. All four phages completely lack genes or functions associated with lysogeny control, which correlates well with their broad host ranges and indicates strictly lytic (virulent) lifestyles without the possibility for host lysogenization. Comparative genomics revealed that M7 is similar to E. amylovora virus ΦEa21-4, whereas L1, S6, and Y2 are unrelated to any other E. amylovora phage. Instead, they feature similarities to enterobacterial viruses T7, N4, and ΦEcoM-GJ1. In a series of laboratory experiments, we provide proof of concept that specific two-phage cocktails offer the potential for biocontrol of the pathogen.

  11. Comparative genomics of Clavibacter michiganensis subspecies, pathogens of important agricultural crops.

    PubMed

    Tambong, James T

    2017-01-01

    Subspecies of Clavibacter michiganensis are important phytobacterial pathogens causing devastating diseases in several agricultural crops. The genome organizations of these pathogens are poorly understood. Here, the complete genomes of 5 subspecies (C. michiganensis subsp. michiganensis, Cmi; C. michiganensis subsp. sepedonicus, Cms; C. michiganensis subsp. nebraskensis, Cmn; C. michiganensis subsp. insidiosus, Cmi and C. michiganensis subsp. capsici, Cmc) were analyzed. This study assessed the taxonomic position of the subspecies based on 16S rRNA and genome-based DNA homology and concludes that there is ample evidence to elevate some of the subspecies to species-level. Comparative genomics analysis indicated distinct genomic features evident on the DNA structural atlases and annotation features. Based on orthologous gene analysis, about 2300 CDSs are shared across all the subspecies; and Cms showed the highest number of subspecies-specific CDS, most of which are mobile elements suggesting that Cms could be more prone to translocation of foreign genes. Cms and Cmi had the highest number of pseudogenes, an indication of potential degenerating genomes. The stress response factors that may be involved in cold/heat shock, detoxification, oxidative stress, osmoregulation, and carbon utilization are outlined. For example, the wco-cluster encoding for extracellular polysaccharide II is highly conserved while the sucrose-6-phosphate hydrolase that catalyzes the hydrolysis of sucrose-6-phosphate yielding glucose-6-phosphate and fructose is highly divergent. A unique second form of the enzyme is only present in Cmn NCPPB 2581. Also, twenty-eight plasmid-borne CDSs in the other subspecies were found to have homologues in the chromosomal genome of Cmn which is known not to carry plasmids. These CDSs include pathogenesis-related factors such as Endocellulases E1 and Beta-glucosidase. The results presented here provide an insight of the functional organization of the genomes of five core C. michiganensis subspecies, enabling a better understanding of these phytobacteria.

  12. Comparative genomics of Clavibacter michiganensis subspecies, pathogens of important agricultural crops

    PubMed Central

    2017-01-01

    Subspecies of Clavibacter michiganensis are important phytobacterial pathogens causing devastating diseases in several agricultural crops. The genome organizations of these pathogens are poorly understood. Here, the complete genomes of 5 subspecies (C. michiganensis subsp. michiganensis, Cmi; C. michiganensis subsp. sepedonicus, Cms; C. michiganensis subsp. nebraskensis, Cmn; C. michiganensis subsp. insidiosus, Cmi and C. michiganensis subsp. capsici, Cmc) were analyzed. This study assessed the taxonomic position of the subspecies based on 16S rRNA and genome-based DNA homology and concludes that there is ample evidence to elevate some of the subspecies to species-level. Comparative genomics analysis indicated distinct genomic features evident on the DNA structural atlases and annotation features. Based on orthologous gene analysis, about 2300 CDSs are shared across all the subspecies; and Cms showed the highest number of subspecies-specific CDS, most of which are mobile elements suggesting that Cms could be more prone to translocation of foreign genes. Cms and Cmi had the highest number of pseudogenes, an indication of potential degenerating genomes. The stress response factors that may be involved in cold/heat shock, detoxification, oxidative stress, osmoregulation, and carbon utilization are outlined. For example, the wco-cluster encoding for extracellular polysaccharide II is highly conserved while the sucrose-6-phosphate hydrolase that catalyzes the hydrolysis of sucrose-6-phosphate yielding glucose-6-phosphate and fructose is highly divergent. A unique second form of the enzyme is only present in Cmn NCPPB 2581. Also, twenty-eight plasmid-borne CDSs in the other subspecies were found to have homologues in the chromosomal genome of Cmn which is known not to carry plasmids. These CDSs include pathogenesis-related factors such as Endocellulases E1 and Beta-glucosidase. The results presented here provide an insight of the functional organization of the genomes of five core C. michiganensis subspecies, enabling a better understanding of these phytobacteria. PMID:28319117

  13. The dog genome map and its use in mammalian comparative genomics.

    PubMed

    Switonski, Marek; Szczerbal, Izabela; Nowacka, Joanna

    2004-01-01

    The dog genome organization was extensively studied in the last ten years. The most important achievements are the well-developed marker genome maps, including over 3200 marker loci, and a survey of the DNA genome sequence. This knowledge, along with the most advanced map of the human genome, turned out to be very useful in comparative genomic studies. On the one hand, it has promoted the development of marker genome maps of other species of the family Canidae (red fox, arctic fox, Chinese raccoon dog) as well as studies on the evolution of their karyotype. But the most important approach is the comparative analysis of human and canine hereditary diseases. At present, causative gene mutations are known for 30 canine hereditary diseases. A majority of them have human counterparts with similar clinical and molecular features. Studies on identification of genes having a major impact on some multifactorial diseases (hip dysplasia, epilepsy) and cancers (multifocal renal cystadenocarcinoma and nodular dermatofibrosis) are advanced. Very promising are the results of gene therapy for certain canine monogenic diseases (haemophilia, hereditary retinal dystrophy, mucopolysaccharidosis), which have human equivalents. The above-mentioned examples prove a very important model role of the dog in studies of human genetic diseases. On the other hand, the identification of gene mutations responsible for hereditary diseases has a substantial impact on breeding strategy in the dog.

  14. Extensive Mobilome-Driven Genome Diversification in Mouse Gut-Associated Bacteroides vulgatus mpk.

    PubMed

    Lange, Anna; Beier, Sina; Steimle, Alex; Autenrieth, Ingo B; Huson, Daniel H; Frick, Julia-Stefanie

    2016-04-25

    Like many other Bacteroides species, Bacteroides vulgatus strain mpk, a mouse fecal isolate which was shown to promote intestinal homeostasis, utilizes a variety of mobile elements for genome evolution. Based on sequences collected by Pacific Biosciences SMRT sequencing technology, we discuss the challenges of assembling and studying a bacterial genome of high plasticity. Additionally, we conducted comparative genomics comparing this commensal strain with the B. vulgatus type strain ATCC 8482 as well as multiple other Bacteroides and Parabacteroides strains to reveal the most important differences and identify the unique features of B. vulgatus mpk. The genome of B. vulgatus mpk harbors a large and diverse set of mobile element proteins compared with other sequenced Bacteroides strains. We found evidence of a number of different horizontal gene transfer events and a genome landscape that has been extensively altered by different mobilization events. A CRISPR/Cas system could be identified that provides a possible mechanism for preventing the integration of invading external DNA. We propose that the high genome plasticity and the introduced genome instabilities of B. vulgatus mpk arising from the various mobilization events might play an important role not only in its adaptation to the challenging intestinal environment in general, but also in its ability to interact with the gut microbiota. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

    PubMed

    Mallick, Swapan; Li, Heng; Lipson, Mark; Mathieson, Iain; Gymrek, Melissa; Racimo, Fernando; Zhao, Mengyao; Chennagiri, Niru; Nordenfelt, Susanne; Tandon, Arti; Skoglund, Pontus; Lazaridis, Iosif; Sankararaman, Sriram; Fu, Qiaomei; Rohland, Nadin; Renaud, Gabriel; Erlich, Yaniv; Willems, Thomas; Gallo, Carla; Spence, Jeffrey P; Song, Yun S; Poletti, Giovanni; Balloux, Francois; van Driem, George; de Knijff, Peter; Romero, Irene Gallego; Jha, Aashish R; Behar, Doron M; Bravi, Claudio M; Capelli, Cristian; Hervig, Tor; Moreno-Estrada, Andres; Posukh, Olga L; Balanovska, Elena; Balanovsky, Oleg; Karachanak-Yankova, Sena; Sahakyan, Hovhannes; Toncheva, Draga; Yepiskoposyan, Levon; Tyler-Smith, Chris; Xue, Yali; Abdullah, M Syafiq; Ruiz-Linares, Andres; Beall, Cynthia M; Di Rienzo, Anna; Jeong, Choongwon; Starikovskaya, Elena B; Metspalu, Ene; Parik, Jüri; Villems, Richard; Henn, Brenna M; Hodoglugil, Ugur; Mahley, Robert; Sajantila, Antti; Stamatoyannopoulos, George; Wee, Joseph T S; Khusainova, Rita; Khusnutdinova, Elza; Litvinov, Sergey; Ayodo, George; Comas, David; Hammer, Michael F; Kivisild, Toomas; Klitz, William; Winkler, Cheryl A; Labuda, Damian; Bamshad, Michael; Jorde, Lynn B; Tishkoff, Sarah A; Watkins, W Scott; Metspalu, Mait; Dryomov, Stanislav; Sukernik, Rem; Singh, Lalji; Thangaraj, Kumarasamy; Pääbo, Svante; Kelso, Janet; Patterson, Nick; Reich, David

    2016-10-13

    Here we report the Simons Genome Diversity Project data set: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioural modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that of other non-Africans.

  16. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations

    PubMed Central

    Mallick, Swapan; Li, Heng; Lipson, Mark; Mathieson, Iain; Gymrek, Melissa; Racimo, Fernando; Zhao, Mengyao; Chennagiri, Niru; Nordenfelt, Susanne; Tandon, Arti; Skoglund, Pontus; Lazaridis, Iosif; Sankararaman, Sriram; Fu, Qiaomei; Rohland, Nadin; Renaud, Gabriel; Erlich, Yaniv; Willems, Thomas; Gallo, Carla; Spence, Jeffrey P.; Song, Yun S.; Poletti, Giovanni; Balloux, Francois; van Driem, George; de Knijff, Peter; Romero, Irene Gallego; Jha, Aashish R.; Behar, Doron M.; Bravi, Claudio M.; Capelli, Cristian; Hervig, Tor; Moreno-Estrada, Andres; Posukh, Olga L.; Balanovska, Elena; Balanovsky, Oleg; Karachanak-Yankova, Sena; Sahakyan, Hovhannes; Toncheva, Draga; Yepiskoposyan, Levon; Tyler-Smith, Chris; Xue, Yali; Abdullah, M. Syafiq; Ruiz-Linares, Andres; Beall, Cynthia M.; Di Rienzo, Anna; Jeong, Choongwon; Starikovskaya, Elena B.; Metspalu, Ene; Parik, Jüri; Villems, Richard; Henn, Brenna M.; Hodoglugil, Ugur; Mahley, Robert; Sajantila, Antti; Stamatoyannopoulos, George; Wee, Joseph T. S.; Khusainova, Rita; Khusnutdinova, Elza; Litvinov, Sergey; Ayodo, George; Comas, David; Hammer, Michael; Kivisild, Toomas; Klitz, William; Winkler, Cheryl; Labuda, Damian; Bamshad, Michael; Jorde, Lynn B.; Tishkoff, Sarah A.; Watkins, W. Scott; Metspalu, Mait; Dryomov, Stanislav; Sukernik, Rem; Singh, Lalji; Thangaraj, Kumarasamy; Pääbo, Svante; Kelso, Janet; Patterson, Nick; Reich, David

    2016-01-01

    We report the Simons Genome Diversity Project (SGDP) dataset: high quality genomes from 300 individuals from 142 diverse populations. These genomes include at least 5.8 million base pairs that are not present in the human reference genome. Our analysis reveals key features of the landscape of human genome variation, including that the rate of accumulation of mutations has accelerated by about 5% in non-Africans compared to Africans since divergence. We show that the ancestors of some pairs of present-day human populations were substantially separated by 100,000 years ago, well before the archaeologically attested onset of behavioral modernity. We also demonstrate that indigenous Australians, New Guineans and Andamanese do not derive substantial ancestry from an early dispersal of modern humans; instead, their modern human ancestry is consistent with coming from the same source as that in other non-Africans. PMID:27654912

  17. TU-CD-BRB-12: Radiogenomics of MRI-Guided Prostate Cancer Biopsy Habitats

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stoyanova, R; Lynne, C; Abraham, S

    2015-06-15

    Purpose: Diagnostic prostate biopsies are subject to sampling bias. We hypothesize that quantitative imaging with multiparametric (MP)-MRI can more accurately direct targeted biopsies to index lesions associated with highest risk clinical and genomic features. Methods: Regionally distinct prostate habitats were delineated on MP-MRI (T2-weighted, perfusion and diffusion imaging). Directed biopsies were performed on 17 habitats from 6 patients using MRI-ultrasound fusion. Biopsy location was characterized with 52 radiographic features. Transcriptome-wide analysis of 1.4 million RNA probes was performed on RNA from each habitat. Genomics features with insignificant expression values (<0.25) and interquartile range <0.5 were filtered, leaving total of 212more » genes. Correlation between imaging features, genes and a 22 feature genomic classifier (GC), developed as a prognostic assay for metastasis after radical prostatectomy was investigated. Results: High quality genomic data was derived from 17 (100%) biopsies. Using the 212 ‘unbiased’ genes, the samples clustered by patient origin in unsupervised analysis. When only prostate cancer related genomic features were used, hierarchical clustering revealed samples clustered by needle-biopsy Gleason score (GS). Similarly, principal component analysis of the imaging features, found the primary source of variance segregated the samples into high (≥7) and low (6) GS. Pearson’s correlation analysis of genes with significant expression showed two main patterns of gene expression clustering prostate peripheral and transitional zone MRI features. Two-way hierarchical clustering of GC with radiomics features resulted in the expected groupings of high and low expressed genes in this metastasis signature. Conclusions: MP-MRI-targeted diagnostic biopsies can potentially improve risk stratification by directing pathological and genomic analysis to clinically significant index lesions. As determinant lesions are more reliably identified, targeting with radiotherapy should improve outcome. This is the first demonstration of a link between quantitative imaging features (radiomics) with genomic features in MRI-directed prostate biopsies. The research was supported by NIH- NCI R01 CA 189295 and R01 CA 189295; E Davicioni is partial owner of GenomeDx Biosciences, Inc. M Takhar, N Erho, L Lam, C Buerki and E Davicioni are current employees at GenomeDx Biosciences, Inc.« less

  18. Conserved noncoding sequences conserve biological networks and influence genome evolution.

    PubMed

    Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang

    2018-05-01

    Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.

  19. The Naegleria genome: a free-living microbial eukaryote lends unique insights into core eukaryotic cell biology

    PubMed Central

    Fritz-Laylin, Lillian K.; Ginger, Michael L.; Walsh, Charles; Dawson, Scott C.; Fulton, Chandler

    2016-01-01

    Naegleria gruberi, a free-living protist, has long been treasured as a model for basal body and flagellar assembly due to its ability to differentiate from crawling amoebae into swimming flagellates. The full genome sequence of Naegleria gruberi has recently been used to estimate gene families ancestral to all eukaryotes and to identify novel aspects of Naegleria biology, including likely facultative anaerobic metabolism, extensive signaling cascades, and evidence for sexuality. Distinctive features of the Naegleria genome and nuclear biology provide unique perspectives for comparative cell biology, including cell division, RNA processing and nucleolar assembly. We highlight here exciting new and novel aspects of Naegleria biology identified through genomic analysis. PMID:21392573

  20. MED: a new non-supervised gene prediction algorithm for bacterial and archaeal genomes.

    PubMed

    Zhu, Huaiqiu; Hu, Gang-Qing; Yang, Yi-Fan; Wang, Jin; She, Zhen-Su

    2007-03-16

    Despite a remarkable success in the computational prediction of genes in Bacteria and Archaea, a lack of comprehensive understanding of prokaryotic gene structures prevents from further elucidation of differences among genomes. It continues to be interesting to develop new ab initio algorithms which not only accurately predict genes, but also facilitate comparative studies of prokaryotic genomes. This paper describes a new prokaryotic genefinding algorithm based on a comprehensive statistical model of protein coding Open Reading Frames (ORFs) and Translation Initiation Sites (TISs). The former is based on a linguistic "Entropy Density Profile" (EDP) model of coding DNA sequence and the latter comprises several relevant features related to the translation initiation. They are combined to form a so-called Multivariate Entropy Distance (MED) algorithm, MED 2.0, that incorporates several strategies in the iterative program. The iterations enable us to develop a non-supervised learning process and to obtain a set of genome-specific parameters for the gene structure, before making the prediction of genes. Results of extensive tests show that MED 2.0 achieves a competitive high performance in the gene prediction for both 5' and 3' end matches, compared to the current best prokaryotic gene finders. The advantage of the MED 2.0 is particularly evident for GC-rich genomes and archaeal genomes. Furthermore, the genome-specific parameters given by MED 2.0 match with the current understanding of prokaryotic genomes and may serve as tools for comparative genomic studies. In particular, MED 2.0 is shown to reveal divergent translation initiation mechanisms in archaeal genomes while making a more accurate prediction of TISs compared to the existing gene finders and the current GenBank annotation.

  1. Novel Insights into Tree Biology and Genome Evolution as Revealed Through Genomics.

    PubMed

    Neale, David B; Martínez-García, Pedro J; De La Torre, Amanda R; Montanari, Sara; Wei, Xiao-Xin

    2017-04-28

    Reference genome sequences are the key to the discovery of genes and gene families that determine traits of interest. Recent progress in sequencing technologies has enabled a rapid increase in genome sequencing of tree species, allowing the dissection of complex characters of economic importance, such as fruit and wood quality and resistance to biotic and abiotic stresses. Although the number of reference genome sequences for trees lags behind those for other plant species, it is not too early to gain insight into the unique features that distinguish trees from nontree plants. Our review of the published data suggests that, although many gene families are conserved among herbaceous and tree species, some gene families, such as those involved in resistance to biotic and abiotic stresses and in the synthesis and transport of sugars, are often expanded in tree genomes. As the genomes of more tree species are sequenced, comparative genomics will further elucidate the complexity of tree genomes and how this relates to traits unique to trees.

  2. Chromatin Landscapes of Retroviral and Transposon Integration Profiles

    PubMed Central

    Badhai, Jitendra; Rust, Alistair G.; Rad, Roland; Hilkens, John; Berns, Anton; van Lohuizen, Maarten; Wessels, Lodewyk F. A.; de Ridder, Jeroen

    2014-01-01

    The ability of retroviruses and transposons to insert their genetic material into host DNA makes them widely used tools in molecular biology, cancer research and gene therapy. However, these systems have biases that may strongly affect research outcomes. To address this issue, we generated very large datasets consisting of to unselected integrations in the mouse genome for the Sleeping Beauty (SB) and piggyBac (PB) transposons, and the Mouse Mammary Tumor Virus (MMTV). We analyzed (epi)genomic features to generate bias maps at both local and genome-wide scales. MMTV showed a remarkably uniform distribution of integrations across the genome. More distinct preferences were observed for the two transposons, with PB showing remarkable resemblance to bias profiles of the Murine Leukemia Virus. Furthermore, we present a model where target site selection is directed at multiple scales. At a large scale, target site selection is similar across systems, and defined by domain-oriented features, namely expression of proximal genes, proximity to CpG islands and to genic features, chromatin compaction and replication timing. Notable differences between the systems are mainly observed at smaller scales, and are directed by a diverse range of features. To study the effect of these biases on integration sites occupied under selective pressure, we turned to insertional mutagenesis (IM) screens. In IM screens, putative cancer genes are identified by finding frequently targeted genomic regions, or Common Integration Sites (CISs). Within three recently completed IM screens, we identified 7%–33% putative false positive CISs, which are likely not the result of the oncogenic selection process. Moreover, results indicate that PB, compared to SB, is more suited to tag oncogenes. PMID:24721906

  3. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    PubMed Central

    Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

    2008-01-01

    Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client side software setup or installation required. Source code is freely available to researchers interested in setting up a local version of PSAT for analysis of genomes not available through the public server. Access to the public web server and instructions for obtaining source code can be found at . PMID:18366802

  4. MicroScope: a platform for microbial genome annotation and comparative genomics

    PubMed Central

    Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493

  5. Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

    PubMed Central

    Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

    2014-01-01

    Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool, with a user-friendly JAVA graphical interface. PMID:24291226

  6. MicroScope: a platform for microbial genome annotation and comparative genomics.

    PubMed

    Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.

  7. Gramene 2013: comparative plant genomics resources.

    PubMed

    Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

    2014-01-01

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.

  8. Chloroplast genome expansion by intron multiplication in the basal psychrophilic euglenoid Eutreptiella pomquetensis

    PubMed Central

    Bennett, Matthew S.; Triemer, Richard E.; Preisfeld, Angelika

    2017-01-01

    Background Over the last few years multiple studies have been published showing a great diversity in size of chloroplast genomes (cpGenomes), and in the arrangement of gene clusters, in the Euglenales. However, while these genomes provided important insights into the evolution of cpGenomes across the Euglenales and within their genera, only two genomes were analyzed in regard to genomic variability between and within Euglenales and Eutreptiales. To better understand the dynamics of chloroplast genome evolution in early evolving Eutreptiales, this study focused on the cpGenome of Eutreptiella pomquetensis, and the spread and peculiarities of introns. Methods The Etl. pomquetensis cpGenome was sequenced, annotated and afterwards examined in structure, size, gene order and intron content. These features were compared with other euglenoid cpGenomes as well as those of prasinophyte green algae, including Pyramimonas parkeae. Results and Discussion With about 130,561 bp the chloroplast genome of Etl. pomquetensis, a basal taxon in the phototrophic euglenoids, was considerably larger than the two other Eutreptiales cpGenomes sequenced so far. Although the detected quadripartite structure resembled most green algae and plant chloroplast genomes, the gene content of the single copy regions in Etl. pomquetensis was completely different from those observed in green algae and plants. The gene composition of Etl. pomquetensis was extensively changed and turned out to be almost identical to other Eutreptiales and Euglenales, and not to P. parkeae. Furthermore, the cpGenome of Etl. pomquetensis was unexpectedly permeated by a high number of introns, which led to a substantially larger genome. The 51 identified introns of Etl. pomquetensis showed two major unique features: (i) more than half of the introns displayed a high level of pairwise identities; (ii) no group III introns could be identified in the protein coding genes. These findings support the hypothesis that group III introns are degenerated group II introns and evolved later. PMID:28852596

  9. Gekko japonicus genome reveals evolution of adhesive toe pads and tail regeneration

    PubMed Central

    Liu, Yan; Zhou, Qian; Wang, Yongjun; Luo, Longhai; Yang, Jian; Yang, Linfeng; Liu, Mei; Li, Yingrui; Qian, Tianmei; Zheng, Yuan; Li, Meiyuan; Li, Jiang; Gu, Yun; Han, Zujing; Xu, Man; Wang, Yingjie; Zhu, Changlai; Yu, Bin; Yang, Yumin; Ding, Fei; Jiang, Jianping; Yang, Huanming; Gu, Xiaosong

    2015-01-01

    Reptiles are the most morphologically and physiologically diverse tetrapods, and have undergone 300 million years of adaptive evolution. Within the reptilian tetrapods, geckos possess several interesting features, including the ability to regenerate autotomized tails and to climb on smooth surfaces. Here we sequence the genome of Gekko japonicus (Schlegel's Japanese Gecko) and investigate genetic elements related to its physiology. We obtain a draft G. japonicus genome sequence of 2.55 Gb and annotated 22,487 genes. Comparative genomic analysis reveals specific gene family expansions or reductions that are associated with the formation of adhesive setae, nocturnal vision and tail regeneration, as well as the diversification of olfactory sensation. The obtained genomic data provide robust genetic evidence of adaptive evolution in reptiles. PMID:26598231

  10. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

    PubMed Central

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-01-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464

  11. GenomeVx: simple web-based creation of editable circular chromosome maps.

    PubMed

    Conant, Gavin C; Wolfe, Kenneth H

    2008-03-15

    We describe GenomeVx, a web-based tool for making editable, publication-quality, maps of mitochondrial and chloroplast genomes and of large plasmids. These maps show the location of genes and chromosomal features as well as a position scale. The program takes as input either raw feature positions or GenBank records. In the latter case, features are automatically extracted and colored, an example of which is given. Output is in the Adobe Portable Document Format (PDF) and can be edited by programs such as Adobe Illustrator. GenomeVx is available at http://wolfe.gen.tcd.ie/GenomeVx

  12. Gramene 2016: comparative plant genomics and pathway resources.

    PubMed

    Tello-Ruiz, Marcela K; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A; Huerta, Laura; Keays, Maria; Tang, Y Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J; Jaiswal, Pankaj; Ware, Doreen

    2016-01-04

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  13. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species.

    PubMed

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-06-23

    The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the theory that the large IRs stabilize the cp genome. Furthermore, the deleted large IR and the numerous genomic rearrangements that have occurred in the C. japonica cp genome provide new insights into both the evolutionary lineage of coniferous species in gymnosperm and the evolution of the cp genome.

  14. The mitochondrial genome sequences of the round goby and the sand goby reveal patterns of recent evolution in gobiid fish.

    PubMed

    Adrian-Kalchhauser, Irene; Svensson, Ola; Kutschera, Verena E; Alm Rosenblad, Magnus; Pippel, Martin; Winkler, Sylke; Schloissnig, Siegfried; Blomberg, Anders; Burkhardt-Holm, Patricia

    2017-02-16

    Vertebrate mitochondrial genomes are optimized for fast replication and low cost of RNA expression. Accordingly, they are devoid of introns, are transcribed as polycistrons and contain very little intergenic sequences. Usually, vertebrate mitochondrial genomes measure between 16.5 and 17 kilobases (kb). During genome sequencing projects for two novel vertebrate models, the invasive round goby and the sand goby, we found that the sand goby genome is exceptionally small (16.4 kb), while the mitochondrial genome of the round goby is much larger than expected for a vertebrate. It is 19 kb in size and is thus one of the largest fish and even vertebrate mitochondrial genomes known to date. The expansion is attributable to a sequence insertion downstream of the putative transcriptional start site. This insertion carries traces of repeats from the control region, but is mostly novel. To get more information about this phenomenon, we gathered all available mitochondrial genomes of Gobiidae and of nine gobioid species, performed phylogenetic analyses, analysed gene arrangements, and compared gobiid mitochondrial genome sizes, ecological information and other species characteristics with respect to the mitochondrial phylogeny. This allowed us amongst others to identify a unique arrangement of tRNAs among Ponto-Caspian gobies. Our results indicate that the round goby mitochondrial genome may contain novel features. Since mitochondrial genome organisation is tightly linked to energy metabolism, these features may be linked to its invasion success. Also, the unique tRNA arrangement among Ponto-Caspian gobies may be helpful in studying the evolution of this highly adaptive and invasive species group. Finally, we find that the phylogeny of gobiids can be further refined by the use of longer stretches of linked DNA sequence.

  15. SigWin-detector: a Grid-enabled workflow for discovering enriched windows of genomic features related to DNA sequences.

    PubMed

    Inda, Márcia A; van Batenburg, Marinus F; Roos, Marco; Belloum, Adam S Z; Vasunin, Dmitry; Wibisono, Adianto; van Kampen, Antoine H C; Breit, Timo M

    2008-08-08

    Chromosome location is often used as a scaffold to organize genomic information in both the living cell and molecular biological research. Thus, ever-increasing amounts of data about genomic features are stored in public databases and can be readily visualized by genome browsers. To perform in silico experimentation conveniently with this genomics data, biologists need tools to process and compare datasets routinely and explore the obtained results interactively. The complexity of such experimentation requires these tools to be based on an e-Science approach, hence generic, modular, and reusable. A virtual laboratory environment with workflows, workflow management systems, and Grid computation are therefore essential. Here we apply an e-Science approach to develop SigWin-detector, a workflow-based tool that can detect significantly enriched windows of (genomic) features in a (DNA) sequence in a fast and reproducible way. For proof-of-principle, we utilize a biological use case to detect regions of increased and decreased gene expression (RIDGEs and anti-RIDGEs) in human transcriptome maps. We improved the original method for RIDGE detection by replacing the costly step of estimation by random sampling with a faster analytical formula for computing the distribution of the null hypothesis being tested and by developing a new algorithm for computing moving medians. SigWin-detector was developed using the WS-VLAM workflow management system and consists of several reusable modules that are linked together in a basic workflow. The configuration of this basic workflow can be adapted to satisfy the requirements of the specific in silico experiment. As we show with the results from analyses in the biological use case on RIDGEs, SigWin-detector is an efficient and reusable Grid-based tool for discovering windows enriched for features of a particular type in any sequence of values. Thus, SigWin-detector provides the proof-of-principle for the modular e-Science based concept of integrative bioinformatics experimentation.

  16. A privacy-preserving solution for compressed storage and selective retrieval of genomic data.

    PubMed

    Huang, Zhicong; Ayday, Erman; Lin, Huang; Aiyar, Raeka S; Molyneaux, Adam; Xu, Zhenyu; Fellay, Jacques; Steinmetz, Lars M; Hubaux, Jean-Pierre

    2016-12-01

    In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients' complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data. © 2016 Huang et al.; Published by Cold Spring Harbor Laboratory Press.

  17. A privacy-preserving solution for compressed storage and selective retrieval of genomic data

    PubMed Central

    Huang, Zhicong; Ayday, Erman; Lin, Huang; Aiyar, Raeka S.; Molyneaux, Adam; Xu, Zhenyu; Hubaux, Jean-Pierre

    2016-01-01

    In clinical genomics, the continuous evolution of bioinformatic algorithms and sequencing platforms makes it beneficial to store patients’ complete aligned genomic data in addition to variant calls relative to a reference sequence. Due to the large size of human genome sequence data files (varying from 30 GB to 200 GB depending on coverage), two major challenges facing genomics laboratories are the costs of storage and the efficiency of the initial data processing. In addition, privacy of genomic data is becoming an increasingly serious concern, yet no standard data storage solutions exist that enable compression, encryption, and selective retrieval. Here we present a privacy-preserving solution named SECRAM (Selective retrieval on Encrypted and Compressed Reference-oriented Alignment Map) for the secure storage of compressed aligned genomic data. Our solution enables selective retrieval of encrypted data and improves the efficiency of downstream analysis (e.g., variant calling). Compared with BAM, the de facto standard for storing aligned genomic data, SECRAM uses 18% less storage. Compared with CRAM, one of the most compressed nonencrypted formats (using 34% less storage than BAM), SECRAM maintains efficient compression and downstream data processing, while allowing for unprecedented levels of security in genomic data storage. Compared with previous work, the distinguishing features of SECRAM are that (1) it is position-based instead of read-based, and (2) it allows random querying of a subregion from a BAM-like file in an encrypted form. Our method thus offers a space-saving, privacy-preserving, and effective solution for the storage of clinical genomic data. PMID:27789525

  18. MBGD update 2013: the microbial genome database for exploring the diversity of microbial world.

    PubMed

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2013-01-01

    The microbial genome database for comparative analysis (MBGD, available at http://mbgd.genome.ad.jp/) is a platform for microbial genome comparison based on orthology analysis. As its unique feature, MBGD allows users to conduct orthology analysis among any specified set of organisms; this flexibility allows MBGD to adapt to a variety of microbial genomic study. Reflecting the huge diversity of microbial world, the number of microbial genome projects now becomes several thousands. To efficiently explore the diversity of the entire microbial genomic data, MBGD now provides summary pages for pre-calculated ortholog tables among various taxonomic groups. For some closely related taxa, MBGD also provides the conserved synteny information (core genome alignment) pre-calculated using the CoreAligner program. In addition, efficient incremental updating procedure can create extended ortholog table by adding additional genomes to the default ortholog table generated from the representative set of genomes. Combining with the functionalities of the dynamic orthology calculation of any specified set of organisms, MBGD is an efficient and flexible tool for exploring the microbial genome diversity.

  19. Comparative Genomic Analyses of Clavibacter michiganensis subsp. insidiosus and Pathogenicity on Medicago truncatula.

    PubMed

    Lu, You; Ishimaru, Carol A; Glazebrook, Jane; Samac, Deborah A

    2018-02-01

    Clavibacter michiganensis is the most economically important gram-positive bacterial plant pathogen, with subspecies that cause serious diseases of maize, wheat, tomato, potato, and alfalfa. Much less is known about pathogenesis involving gram-positive plant pathogens than is known for gram-negative bacteria. Comparative genome analyses of C. michiganensis subspecies affecting tomato, potato, and maize have provided insights on pathogenicity. In this study, we identified strains of C. michiganensis subsp. insidiosus with contrasting pathogenicity on three accessions of the model legume Medicago truncatula. We generated complete genome sequences for two strains and compared these to a previously sequenced strain and genome sequences of four other subspecies. The three C. michiganensis subsp. insidiosus strains varied in gene content due to genome rearrangements, most likely facilitated by insertion elements, and plasmid number, which varied from one to three depending on strain. The core C. michiganensis genome consisted of 1,917 genes, with 379 genes unique to C. michiganensis subsp. insidiosus. An operon for synthesis of the extracellular blue pigment indigoidine, enzymes for pectin degradation, and an operon for inositol metabolism are among the unique features. Secreted serine proteases belonging to both the pat-1 and ppa families were present but highly diverged from those in other subspecies.

  20. Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

    PubMed

    Khatri, Indu; Tomar, Rajul; Ganesan, K; Prasad, G S; Subramanian, Srikrishna

    2017-03-23

    The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.

  1. CFGP: a web-based, comparative fungal genomics platform

    PubMed Central

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F.; Blair, Jaime E.; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the ‘fill-in-the-form-and-press-SUBMIT’ user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI. PMID:17947331

  2. Hybrid genome assembly and annotation of Paenibacillus pasadenensis strain R16 reveals insights on endophytic life style and antifungal activity

    PubMed Central

    Passera, Alessandro; Marcolungo, Luca; Brasca, Milena; Quaglino, Fabio; Cantaloni, Chiara; Delledonne, Massimo

    2018-01-01

    Bacteria of the Paenibacillus genus are becoming important in many fields of science, including agriculture, for their positive effects on the health of plants. However, there are little information available on this genus compared to other bacteria (such as Bacillus or Pseudomonas), especially when considering genomic information. Sequencing the genomes of plant-beneficial bacteria is a crucial step to identify the genetic elements underlying the adaptation to life inside a plant host and, in particular, which of these features determine the differences between a helpful microorganism and a pathogenic one. In this study, we have characterized the genome of Paenibacillus pasadenensis, strain R16, recently investigated for its antifungal activities and plant-associated features. An hybrid assembly approach was used integrating the very precise reads obtained by Illumina technology and long fragments acquired with Oxford Nanopore Technology (ONT) sequencing. De novo genome assembly based solely on Illumina reads generated a relatively fragmented assembly of 5.72 Mbp in 99 ungapped sequences with an N50 length of 544 Kbp; hybrid assembly, integrating Illumina and ONT reads, improved the assembly quality, generating a genome of 5.75 Mbp, organized in 6 contigs with an N50 length of 3.4 Mbp. Annotation of the latter genome identified 4987 coding sequences, of which 1610 are hypothetical proteins. Enrichment analysis identified pathways of particular interest for the endophyte biology, including the chitin-utilization pathway and the incomplete siderophore pathway which hints at siderophore parasitism. In addition the analysis led to the identification of genes for the production of terpenes, as for example farnesol, that was hypothesized as the main antifungal molecule produced by the strain. The functional analysis on the genome confirmed several plant-associated, plant-growth promotion, and biocontrol traits of strain R16, thus adding insights in the genetic bases of these complex features, and of the Paenibacillus genus in general. PMID:29351296

  3. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    PubMed

    Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

    2015-01-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.

  4. Structure of the germline genome of Tetrahymena thermophila and relationship to the massively rearranged somatic genome

    PubMed Central

    Hamilton, Eileen P; Kapusta, Aurélie; Huvos, Piroska E; Bidwell, Shelby L; Zafar, Nikhat; Tang, Haibao; Hadjithomas, Michalis; Krishnakumar, Vivek; Badger, Jonathan H; Caler, Elisabet V; Russ, Carsten; Zeng, Qiandong; Fan, Lin; Levin, Joshua Z; Shea, Terrance; Young, Sarah K; Hegarty, Ryan; Daza, Riza; Gujja, Sharvari; Wortman, Jennifer R; Birren, Bruce W; Nusbaum, Chad; Thomas, Jainy; Carey, Clayton M; Pritham, Ellen J; Feschotte, Cédric; Noto, Tomoko; Mochizuki, Kazufumi; Papazyan, Romeo; Taverna, Sean D; Dear, Paul H; Cassidy-Hanley, Donna M; Xiong, Jie; Miao, Wei; Orias, Eduardo; Coyne, Robert S

    2016-01-01

    The germline genome of the binucleated ciliate Tetrahymena thermophila undergoes programmed chromosome breakage and massive DNA elimination to generate the somatic genome. Here, we present a complete sequence assembly of the germline genome and analyze multiple features of its structure and its relationship to the somatic genome, shedding light on the mechanisms of genome rearrangement as well as the evolutionary history of this remarkable germline/soma differentiation. Our results strengthen the notion that a complex, dynamic, and ongoing interplay between mobile DNA elements and the host genome have shaped Tetrahymena chromosome structure, locally and globally. Non-standard outcomes of rearrangement events, including the generation of short-lived somatic chromosomes and excision of DNA interrupting protein-coding regions, may represent novel forms of developmental gene regulation. We also compare Tetrahymena’s germline/soma differentiation to that of other characterized ciliates, illustrating the wide diversity of adaptations that have occurred within this phylum. DOI: http://dx.doi.org/10.7554/eLife.19090.001 PMID:27892853

  5. Genomic Prediction for Quantitative Traits Is Improved by Mapping Variants to Gene Ontology Categories in Drosophila melanogaster

    PubMed Central

    Edwards, Stefan M.; Sørensen, Izel F.; Sarup, Pernille; Mackay, Trudy F. C.; Sørensen, Peter

    2016-01-01

    Predicting individual quantitative trait phenotypes from high-resolution genomic polymorphism data is important for personalized medicine in humans, plant and animal breeding, and adaptive evolution. However, this is difficult for populations of unrelated individuals when the number of causal variants is low relative to the total number of polymorphisms and causal variants individually have small effects on the traits. We hypothesized that mapping molecular polymorphisms to genomic features such as genes and their gene ontology categories could increase the accuracy of genomic prediction models. We developed a genomic feature best linear unbiased prediction (GFBLUP) model that implements this strategy and applied it to three quantitative traits (startle response, starvation resistance, and chill coma recovery) in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel. Our results indicate that subsetting markers based on genomic features increases the predictive ability relative to the standard genomic best linear unbiased prediction (GBLUP) model. Both models use all markers, but GFBLUP allows differential weighting of the individual genetic marker relationships, whereas GBLUP weighs the genetic marker relationships equally. Simulation studies show that it is possible to further increase the accuracy of genomic prediction for complex traits using this model, provided the genomic features are enriched for causal variants. Our GFBLUP model using prior information on genomic features enriched for causal variants can increase the accuracy of genomic predictions in populations of unrelated individuals and provides a formal statistical framework for leveraging and evaluating information across multiple experimental studies to provide novel insights into the genetic architecture of complex traits. PMID:27235308

  6. Nematode.net update 2011: addition of data sets and tools featuring next-generation sequencing data

    PubMed Central

    Martin, John; Abubucker, Sahar; Heizer, Esley; Taylor, Christina M.; Mitreva, Makedonka

    2012-01-01

    Nematode.net (http://nematode.net) has been a publicly available resource for studying nematodes for over a decade. In the past 3 years, we reorganized Nematode.net to provide more user-friendly navigation through the site, a necessity due to the explosion of data from next-generation sequencing platforms. Organism-centric portals containing dynamically generated data are available for over 56 different nematode species. Next-generation data has been added to the various data-mining portals hosted, including NemaBLAST and NemaBrowse. The NemaPath metabolic pathway viewer builds associations using KOs, rather than ECs to provide more accurate and fine-grained descriptions of proteins. Two new features for data analysis and comparative genomics have been added to the site. NemaSNP enables the user to perform population genetics studies in various nematode populations using next-generation sequencing data. HelmCoP (Helminth Control and Prevention) as an independent component of Nematode.net provides an integrated resource for storage, annotation and comparative genomics of helminth genomes to aid in learning more about nematode genomes, as well as drug, pesticide, vaccine and drug target discovery. With this update, Nematode.net will continue to realize its original goal to disseminate diverse bioinformatic data sets and provide analysis tools to the broad scientific community in a useful and user-friendly manner. PMID:22139919

  7. The role of DNA repair in herpesvirus pathogenesis.

    PubMed

    Brown, Jay C

    2014-10-01

    In cells latently infected with a herpesvirus, the viral DNA is present in the cell nucleus, but it is not extensively replicated or transcribed. In this suppressed state the virus DNA is vulnerable to mutagenic events that affect the host cell and have the potential to destroy the virus' genetic integrity. Despite the potential for genetic damage, however, herpesvirus sequences are well conserved after reactivation from latency. To account for this apparent paradox, I have tested the idea that host cell-encoded mechanisms of DNA repair are able to control genetic damage to latent herpesviruses. Studies were focused on homologous recombination-dependent DNA repair (HR). Methods of DNA sequence analysis were employed to scan herpesvirus genomes for DNA features able to activate HR. Analyses were carried out with a total of 39 herpesvirus DNA sequences, a group that included viruses from the alpha-, beta- and gamma-subfamilies. The results showed that all 39 genome sequences were enriched in two or more of the eight recombination-initiating features examined. The results were interpreted to indicate that HR can stabilize latent herpesvirus genomes. The results also showed, unexpectedly, that repair-initiating DNA features differed in alpha- compared to gamma-herpesviruses. Whereas inverted and tandem repeats predominated in alpha-herpesviruses, gamma-herpesviruses were enriched in short, GC-rich initiation sequences such as CCCAG and depleted in repeats. In alpha-herpesviruses, repair-initiating repeat sequences were found to be concentrated in a specific region (the S segment) of the genome while repair-initiating short sequences were distributed more uniformly in gamma-herpesviruses. The results suggest that repair pathways are activated differently in alpha- compared to gamma-herpesviruses. Copyright © 2014. Published by Elsevier Inc.

  8. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements

    PubMed Central

    Liu, Pengfei; Erez, Ayelet; Sreenath Nagamani, Sandesh C.; Dhar, Shweta U.; Kołodziejska, Katarzyna E.; Dharmadhikari, Avinash V.; Cooper, M. Lance; Wiszniewska, Joanna; Zhang, Feng; Withers, Marjorie A.; Bacino, Carlos A.; Campos-Acevedo, Luis Daniel; Delgado, Mauricio R.; Freedenberg, Debra; Garnica, Adolfo; Grebe, Theresa A.; Hernández-Almaguer, Dolores; Immken, LaDonna; Lalani, Seema R.; McLean, Scott D.; Northrup, Hope; Scaglia, Fernando; Strathearn, Lane; Trapane, Pamela; Kang, Sung-Hae L.; Patel, Ankita; Cheung, Sau Wai; Hastings, P. J.; Stankiewicz, Paweł; Lupski, James R.; Bi, Weimin

    2011-01-01

    SUMMARY Complex genomic rearrangements (CGR) consisting of two or more breakpoint junctions have been observed in genomic disorders. Recently, a chromosome catastrophe phenomenon termed chromothripsis, in which numerous genomic rearrangements are apparently acquired in one single catastrophic event, was described in multiple cancers. Here we show that constitutionally acquired CGRs share similarities with cancer chromothripsis. In the 17 CGR cases investigated we observed localization and multiple copy number changes including deletions, duplications and/or triplications, as well as extensive translocations and inversions. Genomic rearrangements involved varied in size and complexities; in one case, array comparative genomic hybridization revealed 18 copy number changes. Breakpoint sequencing identified characteristic features, including small templated insertions at breakpoints and microhomology at breakpoint junctions, which have been attributed to replicative processes. The resemblance between CGR and chromothripsis suggests similar mechanistic underpinnings. Such chromosome catastrophic events appear to reflect basic DNA metabolism operative throughout an organism’s life cycle. PMID:21925314

  9. Genomic features of bacterial adaptation to plants

    PubMed Central

    Levy, Asaf; Gonzalez, Isai Salas; Mittelviefhaus, Maximilian; Clingenpeel, Scott; Paredes, Sur Herrera; Miao, Jiamin; Wang, Kunru; Devescovi, Giulia; Stillman, Kyra; Monteiro, Freddy; Alvarez, Bryan Rangel; Lundberg, Derek S.; Lu, Tse-Yuan; Lebeis, Sarah; Jin, Zhao; McDonald, Meredith; Klein, Andrew P.; Feltcher, Meghan E.; del Rio, Tijana Glavina; Grant, Sarah R.; Doty, Sharon L.; Ley, Ruth E.; Zhao, Bingyu; Venturi, Vittorio; Pelletier, Dale A.; Vorholt, Julia A.; Tringe, Susannah G.; Woyke, Tanja; Dangl, Jeffery L.

    2017-01-01

    Plants intimately associate with diverse bacteria. Plant-associated (PA) bacteria have ostensibly evolved genes enabling adaptation to the plant environment. However, the identities of such genes are mostly unknown and their functions are poorly characterized. We sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3837 bacterial genomes to identify thousands of PA gene clusters. Genomes of PA bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant associated genomes. We experimentally validated candidates from two sets of PA genes, one involved in plant colonization, the other serving in microbe-microbe competition between PA bacteria. We also identified 64 PA protein domains that potentially mimic plant domains; some are shared with PA fungi and oomycetes. This work expands the genome-based understanding of plant-microbe interactions and provides leads for efficient and sustainable agriculture through microbiome engineering. PMID:29255260

  10. Minke whale genome and aquatic adaptation in cetaceans

    PubMed Central

    Yim, Hyung-Soon; Cho, Yun Sung; Guang, Xuanmin; Kang, Sung Gyun; Jeong, Jae-Yeon; Cha, Sun-Shin; Oh, Hyun-Myung; Lee, Jae-Hak; Yang, Eun Chan; Kwon, Kae Kyoung; Kim, Yun Jae; Kim, Tae Wan; Kim, Wonduck; Jeon, Jeong Ho; Kim, Sang-Jin; Choi, Dong Han; Jho, Sungwoong; Kim, Hak-Min; Ko, Junsu; Kim, Hyunmin; Shin, Young-Ah; Jung, Hyun-Ju; Zheng, Yuan; Wang, Zhuo; Chen, Yan

    2014-01-01

    The shift from terrestrial to aquatic life by whales was a substantial evolutionary event. Here we report the whole-genome sequencing and de novo assembly of the minke whale genome, as well as the whole-genome sequences of three minke whales, a fin whale, a bottlenose dolphin and a finless porpoise. Our comparative genomic analysis identified an expansion in the whale lineage of gene families associated with stress-responsive proteins and anaerobic metabolism, whereas gene families related to body hair and sensory receptors were contracted. Our analysis also identified whale-specific mutations in genes encoding antioxidants and enzymes controlling blood pressure and salt concentration. Overall the whale-genome sequences exhibited distinct features that are associated with the physiological and morphological changes needed for life in an aquatic environment, marked by resistance to physiological stresses caused by a lack of oxygen, increased amounts of reactive oxygen species and high salt levels. PMID:24270359

  11. Minke whale genome and aquatic adaptation in cetaceans.

    PubMed

    Yim, Hyung-Soon; Cho, Yun Sung; Guang, Xuanmin; Kang, Sung Gyun; Jeong, Jae-Yeon; Cha, Sun-Shin; Oh, Hyun-Myung; Lee, Jae-Hak; Yang, Eun Chan; Kwon, Kae Kyoung; Kim, Yun Jae; Kim, Tae Wan; Kim, Wonduck; Jeon, Jeong Ho; Kim, Sang-Jin; Choi, Dong Han; Jho, Sungwoong; Kim, Hak-Min; Ko, Junsu; Kim, Hyunmin; Shin, Young-Ah; Jung, Hyun-Ju; Zheng, Yuan; Wang, Zhuo; Chen, Yan; Chen, Ming; Jiang, Awei; Li, Erli; Zhang, Shu; Hou, Haolong; Kim, Tae Hyung; Yu, Lili; Liu, Sha; Ahn, Kung; Cooper, Jesse; Park, Sin-Gi; Hong, Chang Pyo; Jin, Wook; Kim, Heui-Soo; Park, Chankyu; Lee, Kyooyeol; Chun, Sung; Morin, Phillip A; O'Brien, Stephen J; Lee, Hang; Kimura, Jumpei; Moon, Dae Yeon; Manica, Andrea; Edwards, Jeremy; Kim, Byung Chul; Kim, Sangsoo; Wang, Jun; Bhak, Jong; Lee, Hyun Sook; Lee, Jung-Hyun

    2014-01-01

    The shift from terrestrial to aquatic life by whales was a substantial evolutionary event. Here we report the whole-genome sequencing and de novo assembly of the minke whale genome, as well as the whole-genome sequences of three minke whales, a fin whale, a bottlenose dolphin and a finless porpoise. Our comparative genomic analysis identified an expansion in the whale lineage of gene families associated with stress-responsive proteins and anaerobic metabolism, whereas gene families related to body hair and sensory receptors were contracted. Our analysis also identified whale-specific mutations in genes encoding antioxidants and enzymes controlling blood pressure and salt concentration. Overall the whale-genome sequences exhibited distinct features that are associated with the physiological and morphological changes needed for life in an aquatic environment, marked by resistance to physiological stresses caused by a lack of oxygen, increased amounts of reactive oxygen species and high salt levels.

  12. Comparative genomic analysis of novel bacteriophages infecting Vibrio parahaemolyticus isolated from western and southern coastal areas of Korea.

    PubMed

    Yu, Junhyeok; Lim, Jeong-A; Kwak, Su-Jin; Park, Jong-Hyun; Chang, Hyun-Joo

    2018-05-01

    Vibrio parahaemolyticus, a foodborne pathogen, has become resistant to antibiotics. Therefore, alternative bio-control agents such bacteriophage are urgently needed for its control. Six novel bacteriophages specific to V. parahaemolyticus (vB_VpaP_KF1~2, vB_VpaS_KF3~6) were characterized at the molecular level in this study. Genomic similarity analysis revealed that these six bacteriophages could be divided into two groups with different genomic features, phylogenetic grouping, and morphologies. Two groups of bacteriophages had their own genes with different mechanisms for infection, assembly, and metabolism. Our results could be used as a future reference to study phage genomics or apply phages in future bio-control studies.

  13. Bolbase: a comprehensive genomics database for Brassica oleracea

    PubMed Central

    2013-01-01

    Background Brassica oleracea is a morphologically diverse species in the family Brassicaceae and contains a group of nutrition-rich vegetable crops, including common heading cabbage, cauliflower, broccoli, kohlrabi, kale, Brussels sprouts. This diversity along with its phylogenetic membership in a group of three diploid and three tetraploid species, and the recent availability of genome sequences within Brassica provide an unprecedented opportunity to study intra- and inter-species divergence and evolution in this species and its close relatives. Description We have developed a comprehensive database, Bolbase, which provides access to the B. oleracea genome data and comparative genomics information. The whole genome of B. oleracea is available, including nine fully assembled chromosomes and 1,848 scaffolds, with 45,758 predicted genes, 13,382 transposable elements, and 3,581 non-coding RNAs. Comparative genomics information is available, including syntenic regions among B. oleracea, Brassica rapa and Arabidopsis thaliana, synonymous (Ks) and non-synonymous (Ka) substitution rates between orthologous gene pairs, gene families or clusters, and differences in quantity, category, and distribution of transposable elements on chromosomes. Bolbase provides useful search and data mining tools, including a keyword search, a local BLAST server, and a customized GBrowse tool, which can be used to extract annotations of genome components, identify similar sequences and visualize syntenic regions among species. Users can download all genomic data and explore comparative genomics in a highly visual setting. Conclusions Bolbase is the first resource platform for the B. oleracea genome and for genomic comparisons with its relatives, and thus it will help the research community to better study the function and evolution of Brassica genomes as well as enhance molecular breeding research. This database will be updated regularly with new features, improvements to genome annotation, and new genomic sequences as they become available. Bolbase is freely available at http://ocri-genomics.org/bolbase. PMID:24079801

  14. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia

    PubMed Central

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M.

    2016-01-01

    Abstract Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints. PMID:28175287

  15. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia.

    PubMed

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M; Ruiz-Herrera, Aurora

    2016-12-01

    Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints.

  16. The Genome of the “Great Speciator” Provides Insights into Bird Diversification

    PubMed Central

    Cornetti, Luca; Valente, Luis M.; Dunning, Luke T.; Quan, Xueping; Black, Richard A.; Hébert, Olivier; Savolainen, Vincent

    2015-01-01

    Among birds, white-eyes (genus Zosterops) have diversified so extensively that Jared Diamond and Ernst Mayr referred to them as the “great speciator.” The Zosterops lineage exhibits some of the fastest rates of species diversification among vertebrates, and its members are the most prolific passerine island colonizers. We present a high-quality genome assembly for the silvereye (Zosterops lateralis), a white-eye species consisting of several subspecies distributed across multiple islands. We investigate the genetic basis of rapid diversification in white-eyes by conducting genomic analyses at varying taxonomic levels. First, we compare the silvereye genome with those of birds from different families and searched for genomic features that may be unique to Zosterops. Second, we compare the genomes of different species of white-eyes from Lifou island (South Pacific), using whole genome resequencing and restriction site associated DNA. Third, we contrast the genomes of two subspecies of silvereye that differ in plumage color. In accordance with theory, we show that white-eyes have high rates of substitutions, gene duplication, and positive selection relative to other birds. Below genus level, we find that genomic differentiation accumulates rapidly and reveals contrasting demographic histories between sympatric species on Lifou, indicative of past interspecific interactions. Finally, we highlight genes possibly involved in color polymorphism between the subspecies of silvereye. By providing the first whole-genome sequence resources for white-eyes and by conducting analyses at different taxonomic levels, we provide genomic evidence underpinning this extraordinary bird radiation. PMID:26338191

  17. Comparative Analysis of the First Complete Enterococcus faecium Genome

    PubMed Central

    Lam, Margaret M. C.; Seemann, Torsten; Bulach, Dieter M.; Gladman, Simon L.; Chen, Honglei; Haring, Volker; Moore, Robert J.; Ballard, Susan; Grayson, M. Lindsay; Johnson, Paul D. R.; Howden, Benjamin P.

    2012-01-01

    Vancomycin-resistant enterococci (VRE) are one of the leading causes of nosocomial infections in health care facilities around the globe. In particular, infections caused by vancomycin-resistant Enterococcus faecium are becoming increasingly common. Comparative and functional genomic studies of E. faecium isolates have so far been limited owing to the lack of a fully assembled E. faecium genome sequence. Here we address this issue and report the complete 3.0-Mb genome sequence of the multilocus sequence type 17 vancomycin-resistant Enterococcus faecium strain Aus0004, isolated from the bloodstream of a patient in Melbourne, Australia, in 1998. The genome comprises a 2.9-Mb circular chromosome and three circular plasmids. The chromosome harbors putative E. faecium virulence factors such as enterococcal surface protein, hemolysin, and collagen-binding adhesin. Aus0004 has a very large accessory genome (38%) that includes three prophage and two genomic islands absent among 22 other E. faecium genomes. One of the prophage was present as inverted 50-kb repeats that appear to have facilitated a 683-kb chromosomal inversion across the replication terminus, resulting in a striking replichore imbalance. Other distinctive features include 76 insertion sequence elements and a single chromosomal copy of Tn1549 containing the vanB vancomycin resistance element. A complete E. faecium genome will be a useful resource to assist our understanding of this emerging nosocomial pathogen. PMID:22366422

  18. Sequencing of the Litchi Downy Blight Pathogen Reveals It Is a Phytophthora Species With Downy Mildew-Like Characteristics.

    PubMed

    Ye, Wenwu; Wang, Yang; Shen, Danyu; Li, Delong; Pu, Tianhuizi; Jiang, Zide; Zhang, Zhengguang; Zheng, Xiaobo; Tyler, Brett M; Wang, Yuanchao

    2016-07-01

    On the basis of its downy mildew-like morphology, the litchi downy blight pathogen was previously named Peronophythora litchii. Recently, however, it was proposed to transfer this pathogen to Phytophthora clade 4. To better characterize this unusual oomycete species and important fruit pathogen, we obtained the genome sequence of Phytophthora litchii and compared it to those from other oomycete species. P. litchii has a small genome with tightly spaced genes. On the basis of a multilocus phylogenetic analysis, the placement of P. litchii in the genus Phytophthora is strongly supported. Effector proteins predicted included 245 RxLR, 30 necrosis-and-ethylene-inducing protein-like, and 14 crinkler proteins. The typical motifs, phylogenies, and activities of these effectors were typical for a Phytophthora species. However, like the genome features of the analyzed downy mildews, P. litchii exhibited a streamlined genome with a relatively small number of genes in both core and species-specific protein families. The low GC content and slight codon preferences of P. litchii sequences were similar to those of the analyzed downy mildews and a subset of Phytophthora species. Taken together, these observations suggest that P. litchii is a Phytophthora pathogen that is in the process of acquiring downy mildew-like genomic and morphological features. Thus P. litchii may provide a novel model for investigating morphological development and genomic adaptation in oomycete pathogens.

  19. Obligate Biotrophy Features Unraveled by the Genomic Analysis of the Rust Fungi, Melampsora larici-populina and Puccinia graminis f. sp. tritici

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Duplessis, Sebastien; Cuomo, Christina A.; Lin, Yao-Cheng

    Rust fungi are some of the most devastating pathogens of crop plants. They are obligate biotrophs, which extract nutrients only from living plant tissues and cannot grow apart from their hosts. Their lifestyle has slowed the dissection of molecular mechanisms underlying host invasion and avoidance or suppression of plant innate immunity. We sequenced the 101 mega base pair genome of Melampsora larici-populina, the causal agent of poplar leaf rust, and the 89 mega base pair genome of Puccinia graminis f. sp. tritici, the causal agent of wheat and barley stem rust. We then compared the 16,841 predicted proteins of M.more » larici-populina to the 18,241 predicted proteins of P. graminis f. sp tritici. Genomic features related to their obligate biotrophic life-style include expanded lineage-specific gene families, a large repertoire of effector-like small secreted proteins (SSPs), impaired nitrogen and sulfur assimilation pathways, and expanded families of amino-acid, oligopeptide and hexose membrane transporters. The dramatic upregulation of transcripts coding for SSPs, secreted hydrolytic enzymes, and transporters in planta suggests that they play a role in host infection and nutrient acquisition. Some of these genomic hallmarks are mirrored in the genomes of other microbial eukaryotes that have independently evolved to infect plants, indicating convergent adaptation to a biotrophic existence inside plant cells« less

  20. Mapping genomic features to functional traits through microbial whole genome sequences.

    PubMed

    Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott

    2014-01-01

    Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.

  1. SVGenes: a library for rendering genomic features in scalable vector graphic format.

    PubMed

    Etherington, Graham J; MacLean, Daniel

    2013-08-01

    Drawing genomic features in attractive and informative ways is a key task in visualization of genomics data. Scalable Vector Graphics (SVG) format is a modern and flexible open standard that provides advanced features including modular graphic design, advanced web interactivity and animation within a suitable client. SVGs do not suffer from loss of image quality on re-scaling and provide the ability to edit individual elements of a graphic on the whole object level independent of the whole image. These features make SVG a potentially useful format for the preparation of publication quality figures including genomic objects such as genes or sequencing coverage and for web applications that require rich user-interaction with the graphical elements. SVGenes is a Ruby-language library that uses SVG primitives to render typical genomic glyphs through a simple and flexible Ruby interface. The library implements a simple Page object that spaces and contains horizontal Track objects that in turn style, colour and positions features within them. Tracks are the level at which visual information is supplied providing the full styling capability of the SVG standard. Genomic entities like genes, transcripts and histograms are modelled in Glyph objects that are attached to a track and take advantage of SVG primitives to render the genomic features in a track as any of a selection of defined glyphs. The feature model within SVGenes is simple but flexible and not dependent on particular existing gene feature formats meaning graphics for any existing datasets can easily be created without need for conversion. The library is provided as a Ruby Gem from https://rubygems.org/gems/bio-svgenes under the MIT license, and open source code is available at https://github.com/danmaclean/bioruby-svgenes also under the MIT License. dan.maclean@tsl.ac.uk.

  2. Genome-Wide Detection and Analysis of Multifunctional Genes

    PubMed Central

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  3. EuPathDB: the eukaryotic pathogen genomics database resource

    PubMed Central

    Aurrecoechea, Cristina; Barreto, Ana; Basenko, Evelina Y.; Brestelli, John; Brunk, Brian P.; Cade, Shon; Crouch, Kathryn; Doherty, Ryan; Falke, Dave; Fischer, Steve; Gajria, Bindu; Harb, Omar S.; Heiges, Mark; Hertz-Fowler, Christiane; Hu, Sufen; Iodice, John; Kissinger, Jessica C.; Lawrence, Cris; Li, Wei; Pinney, Deborah F.; Pulman, Jane A.; Roos, David S.; Shanmugasundram, Achchuthan; Silva-Franco, Fatima; Steinbiss, Sascha; Stoeckert, Christian J.; Spruill, Drew; Wang, Haiming; Warrenfeltz, Susanne; Zheng, Jie

    2017-01-01

    The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host–pathogen interactions. PMID:27903906

  4. Compactness of viral genomes: effect of disperse and localized random mutations

    NASA Astrophysics Data System (ADS)

    Lošdorfer Božič, Anže; Micheletti, Cristian; Podgornik, Rudolf; Tubiana, Luca

    2018-02-01

    Genomes of single-stranded RNA viruses have evolved to optimize several concurrent properties. One of them is the architecture of their genomic folds, which must not only feature precise structural elements at specific positions, but also allow for overall spatial compactness. The latter was shown to be disrupted by random synonymous mutations, a disruption which can consequently negatively affect genome encapsidation. In this study, we use three mutation schemes with different degrees of locality to mutate the genomes of phage MS2 and Brome Mosaic virus in order to understand the observed sensitivity of the global compactness of their folds. We find that mutating local stretches of their genomes’ sequence or structure is less disruptive to their compactness compared to inducing randomly-distributed mutations. Our findings are indicative of a mechanism for the conservation of compactness acting on a global scale of the genomes, and have several implications for understanding the interplay between local and global architecture of viral RNA genomes.

  5. Genomic Definition of Hypervirulent and Multidrug-Resistant Klebsiella pneumoniae Clonal Groups

    PubMed Central

    Bialek-Davenet, Suzanne; Criscuolo, Alexis; Ailloud, Florent; Passet, Virginie; Jones, Louis; Delannoy-Vieillard, Anne-Sophie; Garin, Benoit; Le Hello, Simon; Arlet, Guillaume; Nicolas-Chanoine, Marie-Hélène; Decré, Dominique

    2014-01-01

    Multidrug-resistant and highly virulent Klebsiella pneumoniae isolates are emerging, but the clonal groups (CGs) corresponding to these high-risk strains have remained imprecisely defined. We aimed to identify K. pneumoniae CGs on the basis of genome-wide sequence variation and to provide a simple bioinformatics tool to extract virulence and resistance gene data from genomic data. We sequenced 48 K. pneumoniae isolates, mostly of serotypes K1 and K2, and compared the genomes with 119 publicly available genomes. A total of 694 highly conserved genes were included in a core-genome multilocus sequence typing scheme, and cluster analysis of the data enabled precise definition of globally distributed hypervirulent and multidrug-resistant CGs. In addition, we created a freely accessible database, BIGSdb-Kp, to enable rapid extraction of medically and epidemiologically relevant information from genomic sequences of K. pneumoniae. Although drug-resistant and virulent K. pneumoniae populations were largely nonoverlapping, isolates with combined virulence and resistance features were detected. PMID:25341126

  6. Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction.

    PubMed

    Palmer, Lance E; Dejori, Mathaeus; Bolanos, Randall; Fasulo, Daniel

    2010-01-15

    With the rapid expansion of DNA sequencing databases, it is now feasible to identify relevant information from prior sequencing projects and completed genomes and apply it to de novo sequencing of new organisms. As an example, this paper demonstrates how such extra information can be used to improve de novo assemblies by augmenting the overlapping step. Finding all pairs of overlapping reads is a key task in many genome assemblers, and to this end, highly efficient algorithms have been developed to find alignments in large collections of sequences. It is well known that due to repeated sequences, many aligned pairs of reads nevertheless do not overlap. But no overlapping algorithm to date takes a rigorous approach to separating aligned but non-overlapping read pairs from true overlaps. We present an approach that extends the Minimus assembler by a data driven step to classify overlaps as true or false prior to contig construction. We trained several different classification models within the Weka framework using various statistics derived from overlaps of reads available from prior sequencing projects. These statistics included percent mismatch and k-mer frequencies within the overlaps as well as a comparative genomics score derived from mapping reads to multiple reference genomes. We show that in real whole-genome sequencing data from the E. coli and S. aureus genomes, by providing a curated set of overlaps to the contigging phase of the assembler, we nearly doubled the median contig length (N50) without sacrificing coverage of the genome or increasing the number of mis-assemblies. Machine learning methods that use comparative and non-comparative features to classify overlaps as true or false can be used to improve the quality of a sequence assembly.

  7. Complete genome sequence and comparative analysis of Acetobacter pasteurianus 386B, a strain well-adapted to the cocoa bean fermentation ecosystem.

    PubMed

    Illeghems, Koen; De Vuyst, Luc; Weckx, Stefan

    2013-08-01

    Acetobacter pasteurianus 386B, an acetic acid bacterium originating from a spontaneous cocoa bean heap fermentation, proved to be an ideal functional starter culture for coca bean fermentations. It is able to dominate the fermentation process, thereby resisting high acetic acid concentrations and temperatures. However, the molecular mechanisms underlying its metabolic capabilities and niche adaptations are unknown. In this study, whole-genome sequencing and comparative genome analysis was used to investigate this strain's mechanisms to dominate the cocoa bean fermentation process. The genome sequence of A. pasteurianus 386B is composed of a 2.8-Mb chromosome and seven plasmids. The annotation of 2875 protein-coding sequences revealed important characteristics, including several metabolic pathways, the occurrence of strain-specific genes such as an endopolygalacturonase, and the presence of mechanisms involved in tolerance towards various stress conditions. Furthermore, the low number of transposases in the genome and the absence of complete phage genomes indicate that this strain might be more genetically stable compared with other A. pasteurianus strains, which is an important advantage for the use of this strain as a functional starter culture. Comparative genome analysis with other members of the Acetobacteraceae confirmed the functional properties of A. pasteurianus 386B, such as its thermotolerant nature and unique genetic composition. Genome analysis of A. pasteurianus 386B provided detailed insights into the underlying mechanisms of its metabolic features, niche adaptations, and tolerance towards stress conditions. Combination of these data with previous experimental knowledge enabled an integrated, global overview of the functional characteristics of this strain. This knowledge will enable improved fermentation strategies and selection of appropriate acetic acid bacteria strains as functional starter culture for cocoa bean fermentation processes.

  8. Lactobacillus rossiae, a Vitamin B12 Producer, Represents a Metabolically Versatile Species within the Genus Lactobacillus

    PubMed Central

    De Angelis, Maria; Bottacini, Francesca; Fosso, Bruno; Kelleher, Philip; Calasso, Maria; Di Cagno, Raffaella; Ventura, Marco; Picardi, Ernesto; van Sinderen, Douwe; Gobbetti, Marco

    2014-01-01

    Lactobacillus rossiae is an obligately hetero-fermentative lactic acid bacterium, which can be isolated from a broad range of environments including sourdoughs, vegetables, fermented meat and flour, as well as the gastrointestinal tract of both humans and animals. In order to unravel distinctive genomic features of this particular species and investigate the phylogenetic positioning within the genus Lactobacillus, comparative genomics and phylogenomic approaches, followed by functional analyses were performed on L. rossiae DSM 15814T, showing how this type strain not only occupies an independent phylogenetic branch, but also possesses genomic features underscoring its biotechnological potential. This strain in fact represents one of a small number of bacteria known to encode a complete de novo biosynthetic pathway of vitamin B12 (in addition to other B vitamins such as folate and riboflavin). In addition, it possesses the capacity to utilize an extensive set of carbon sources, a characteristic that may contribute to environmental adaptation, perhaps enabling the strain's ability to populate different niches. PMID:25264826

  9. PATRIC: the Comprehensive Bacterial Bioinformatics Resource with a Focus on Human Pathogenic Species ▿ ‡ #

    PubMed Central

    Gillespie, Joseph J.; Wattam, Alice R.; Cammer, Stephen A.; Gabbard, Joseph L.; Shukla, Maulik P.; Dalay, Oral; Driscoll, Timothy; Hix, Deborah; Mane, Shrinivasrao P.; Mao, Chunhong; Nordberg, Eric K.; Scott, Mark; Schulman, Julie R.; Snyder, Eric E.; Sullivan, Daniel E.; Wang, Chunxia; Warren, Andrew; Williams, Kelly P.; Xue, Tian; Seung Yoo, Hyun; Zhang, Chengdong; Zhang, Yan; Will, Rebecca; Kenyon, Ronald W.; Sobral, Bruno W.

    2011-01-01

    Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided. PMID:21896772

  10. Predicting conformational ensembles and genome-wide transcription factor binding sites from DNA sequences.

    PubMed

    Andrabi, Munazah; Hutchins, Andrew Paul; Miranda-Saavedra, Diego; Kono, Hidetoshi; Nussinov, Ruth; Mizuguchi, Kenji; Ahmad, Shandar

    2017-06-22

    DNA shape is emerging as an important determinant of transcription factor binding beyond just the DNA sequence. The only tool for large scale DNA shape estimates, DNAshape was derived from Monte-Carlo simulations and predicts four broad and static DNA shape features, Propeller twist, Helical twist, Minor groove width and Roll. The contributions of other shape features e.g. Shift, Slide and Opening cannot be evaluated using DNAshape. Here, we report a novel method DynaSeq, which predicts molecular dynamics-derived ensembles of a more exhaustive set of DNA shape features. We compared the DNAshape and DynaSeq predictions for the common features and applied both to predict the genome-wide binding sites of 1312 TFs available from protein interaction quantification (PIQ) data. The results indicate a good agreement between the two methods for the common shape features and point to advantages in using DynaSeq. Predictive models employing ensembles from individual conformational parameters revealed that base-pair opening - known to be important in strand separation - was the best predictor of transcription factor-binding sites (TFBS) followed by features employed by DNAshape. Of note, TFBS could be predicted not only from the features at the target motif sites, but also from those as far as 200 nucleotides away from the motif.

  11. Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park.

    PubMed

    Podar, Mircea; Makarova, Kira S; Graham, David E; Wolf, Yuri I; Koonin, Eugene V; Reysenbach, Anna-Louise

    2013-04-22

    A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia.

  12. Genomic correlates of recombination rate and its variability across eight recombination maps in the western honey bee (Apis mellifera L.).

    PubMed

    Ross, Caitlin R; DeFelice, Dominick S; Hunt, Greg J; Ihle, Kate E; Amdam, Gro V; Rueppell, Olav

    2015-02-21

    Meiotic recombination has traditionally been explained based on the structural requirement to stabilize homologous chromosome pairs to ensure their proper meiotic segregation. Competing hypotheses seek to explain the emerging findings of significant heterogeneity in recombination rates within and between genomes, but intraspecific comparisons of genome-wide recombination patterns are rare. The honey bee (Apis mellifera) exhibits the highest rate of genomic recombination among multicellular animals with about five cross-over events per chromatid. Here, we present a comparative analysis of recombination rates across eight genetic linkage maps of the honey bee genome to investigate which genomic sequence features are correlated with recombination rate and with its variation across the eight data sets, ranging in average marker spacing ranging from 1 Mbp to 120 kbp. Overall, we found that GC content explained best the variation in local recombination rate along chromosomes at the analyzed 100 kbp scale. In contrast, variation among the different maps was correlated to the abundance of microsatellites and several specific tri- and tetra-nucleotides. The combined evidence from eight medium-scale recombination maps of the honey bee genome suggests that recombination rate variation in this highly recombining genome might be due to the DNA configuration instead of distinct sequence motifs. However, more fine-scale analyses are needed. The empirical basis of eight differing genetic maps allowed for robust conclusions about the correlates of the local recombination rates and enabled the study of the relation between DNA features and variability in local recombination rates, which is particularly relevant in the honey bee genome with its exceptionally high recombination rate.

  13. Genomic Features of the Damselfly Calopteryx splendens Representing a Sister Clade to Most Insect Orders

    PubMed Central

    Ioannidis, Panagiotis; Simao, Felipe A.; Waterhouse, Robert M.; Manni, Mosè; Seppey, Mathieu; Robertson, Hugh M.; Misof, Bernhard; Niehuis, Oliver

    2017-01-01

    Insects comprise the most diverse and successful animal group with over one million described species that are found in almost every terrestrial and limnic habitat, with many being used as important models in genetics, ecology, and evolutionary research. Genome sequencing projects have greatly expanded the sampling of species from many insect orders, but genomic resources for species of certain insect lineages have remained relatively limited to date. To address this paucity, we sequenced the genome of the banded demoiselle, Calopteryx splendens, a damselfly (Odonata: Zygoptera) belonging to Palaeoptera, the clade containing the first winged insects. The 1.6 Gbp C. splendens draft genome assembly is one of the largest insect genomes sequenced to date and encodes a predicted set of 22,523 protein-coding genes. Comparative genomic analyses with other sequenced insects identified a relatively small repertoire of C. splendens detoxification genes, which could explain its previously noted sensitivity to habitat pollution. Intriguingly, this repertoire includes a cytochrome P450 gene not previously described in any insect genome. The C. splendens immune gene repertoire appears relatively complete and features several genes encoding novel multi-domain peptidoglycan recognition proteins. Analysis of chemosensory genes revealed the presence of both gustatory and ionotropic receptors, as well as the insect odorant receptor coreceptor gene (OrCo) and at least four partner odorant receptors (ORs). This represents the oldest known instance of a complete OrCo/OR system in insects, and provides the molecular underpinning for odonate olfaction. The C. splendens genome improves the sampling of insect lineages that diverged before the radiation of Holometabola and offers new opportunities for molecular-level evolutionary, ecological, and behavioral studies. PMID:28137743

  14. Genome Sequence of Azospirillum brasilense CBG497 and Comparative Analyses of Azospirillum Core and Accessory Genomes provide Insight into Niche Adaptation

    PubMed Central

    Wisniewski-Dyé, Florence; Lozano, Luis; Acosta-Cruz, Erika; Borland, Stéphanie; Drogue, Benoît; Prigent-Combaret, Claire; Rouy, Zoé; Barbe, Valérie; Mendoza Herrera, Alberto; González, Victor; Mavingui, Patrick

    2012-01-01

    Bacteria of the genus Azospirillum colonize roots of important cereals and grasses, and promote plant growth by several mechanisms, notably phytohormone synthesis. The genomes of several Azospirillum strains belonging to different species, isolated from various host plants and locations, were recently sequenced and published. In this study, an additional genome of an A. brasilense strain, isolated from maize grown on an alkaline soil in the northeast of Mexico, strain CBG497, was obtained. Comparative genomic analyses were performed on this new genome and three other genomes (A. brasilense Sp245, A. lipoferum 4B and Azospirillum sp. B510). The Azospirillum core genome was established and consists of 2,328 proteins, representing between 30% to 38% of the total encoded proteins within a genome. It is mainly chromosomally-encoded and contains 74% of genes of ancestral origin shared with some aquatic relatives. The non-ancestral part of the core genome is enriched in genes involved in signal transduction, in transport and in metabolism of carbohydrates and amino-acids, and in surface properties features linked to adaptation in fluctuating environments, such as soil and rhizosphere. Many genes involved in colonization of plant roots, plant-growth promotion (such as those involved in phytohormone biosynthesis), and properties involved in rhizosphere adaptation (such as catabolism of phenolic compounds, uptake of iron) are restricted to a particular strain and/or species, strongly suggesting niche-specific adaptation. PMID:24705077

  15. Complete sequence and analysis of the mitochondrial genome of Hemiselmis andersenii CCMP644 (Cryptophyceae).

    PubMed

    Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M

    2008-05-12

    Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes-a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a approximately 20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22-336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol.

  16. A brief introduction to web-based genome browsers.

    PubMed

    Wang, Jun; Kong, Lei; Gao, Ge; Luo, Jingchu

    2013-03-01

    Genome browser provides a graphical interface for users to browse, search, retrieve and analyze genomic sequence and annotation data. Web-based genome browsers can be classified into general genome browsers with multiple species and species-specific genome browsers. In this review, we attempt to give an overview for the main functions and features of web-based genome browsers, covering data visualization, retrieval, analysis and customization. To give a brief introduction to the multiple-species genome browser, we describe the user interface and main functions of the Ensembl and UCSC genome browsers using the human alpha-globin gene cluster as an example. We further use the MSU and the Rice-Map genome browsers to show some special features of species-specific genome browser, taking a rice transcription factor gene OsSPL14 as an example.

  17. Scanning the human genome at kilobase resolution.

    PubMed

    Chen, Jun; Kim, Yeong C; Jung, Yong-Chul; Xuan, Zhenyu; Dworkin, Geoff; Zhang, Yanming; Zhang, Michael Q; Wang, San Ming

    2008-05-01

    Normal genome variation and pathogenic genome alteration frequently affect small regions in the genome. Identifying those genomic changes remains a technical challenge. We report here the development of the DGS (Ditag Genome Scanning) technique for high-resolution analysis of genome structure. The basic features of DGS include (1) use of high-frequent restriction enzymes to fractionate the genome into small fragments; (2) collection of two tags from two ends of a given DNA fragment to form a ditag to represent the fragment; (3) application of the 454 sequencing system to reach a comprehensive ditag sequence collection; (4) determination of the genome origin of ditags by mapping to reference ditags from known genome sequences; (5) use of ditag sequences directly as the sense and antisense PCR primers to amplify the original DNA fragment. To study the relationship between ditags and genome structure, we performed a computational study by using the human genome reference sequences as a model, and analyzed the ditags experimentally collected from the well-characterized normal human DNA GM15510 and the leukemic human DNA of Kasumi-1 cells. Our studies show that DGS provides a kilobase resolution for studying genome structure with high specificity and high genome coverage. DGS can be applied to validate genome assembly, to compare genome similarity and variation in normal populations, and to identify genomic abnormality including insertion, inversion, deletion, translocation, and amplification in pathological genomes such as cancer genomes.

  18. An Exploration into Fern Genome Space.

    PubMed

    Wolf, Paul G; Sessa, Emily B; Marchant, Daniel Blaine; Li, Fay-Wei; Rothfels, Carl J; Sigel, Erin M; Gitzendanner, Matthew A; Visger, Clayton J; Banks, Jo Ann; Soltis, Douglas E; Soltis, Pamela S; Pryer, Kathleen M; Der, Joshua P

    2015-08-26

    Ferns are one of the few remaining major clades of land plants for which a complete genome sequence is lacking. Knowledge of genome space in ferns will enable broad-scale comparative analyses of land plant genes and genomes, provide insights into genome evolution across green plants, and shed light on genetic and genomic features that characterize ferns, such as their high chromosome numbers and large genome sizes. As part of an initial exploration into fern genome space, we used a whole genome shotgun sequencing approach to obtain low-density coverage (∼0.4X to 2X) for six fern species from the Polypodiales (Ceratopteris, Pteridium, Polypodium, Cystopteris), Cyatheales (Plagiogyria), and Gleicheniales (Dipteris). We explore these data to characterize the proportion of the nuclear genome represented by repetitive sequences (including DNA transposons, retrotransposons, ribosomal DNA, and simple repeats) and protein-coding genes, and to extract chloroplast and mitochondrial genome sequences. Such initial sweeps of fern genomes can provide information useful for selecting a promising candidate fern species for whole genome sequencing. We also describe variation of genomic traits across our sample and highlight some differences and similarities in repeat structure between ferns and seed plants. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

    PubMed Central

    Galpert, Deborah; del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

    2015-01-01

    Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. PMID:26605337

  20. Splicing-Related Features of Introns Serve to Propel Evolution

    PubMed Central

    Luo, Yuping; Li, Chun; Gong, Xi; Wang, Yanlu; Zhang, Kunshan; Cui, Yaru; Sun, Yi Eve; Li, Siguang

    2013-01-01

    The role of spliceosomal intronic structures played in evolution has only begun to be elucidated. Comparative genomic analyses of fungal snoRNA sequences, which are often contained within introns and/or exons, revealed that about one-third of snoRNA-associated introns in three major snoRNA gene clusters manifested polymorphisms, likely resulting from intron loss and gain events during fungi evolution. Genomic deletions can clearly be observed as one mechanism underlying intron and exon loss, as well as generation of complex introns where several introns lie in juxtaposition without intercalating exons. Strikingly, by tracking conserved snoRNAs in introns, we found that some introns had moved from one position to another by excision from donor sites and insertion into target sties elsewhere in the genome without needing transposon structures. This study revealed the origin of many newly gained introns. Moreover, our analyses suggested that intron-containing sequences were more prone to sustainable structural changes than DNA sequences without introns due to intron's ability to jump within the genome via unknown mechanisms. We propose that splicing-related structural features of introns serve as an additional motor to propel evolution. PMID:23516505

  1. Network-constrained group lasso for high-dimensional multinomial classification with application to cancer subtype prediction.

    PubMed

    Tian, Xinyu; Wang, Xuefeng; Chen, Jun

    2014-01-01

    Classic multinomial logit model, commonly used in multiclass regression problem, is restricted to few predictors and does not take into account the relationship among variables. It has limited use for genomic data, where the number of genomic features far exceeds the sample size. Genomic features such as gene expressions are usually related by an underlying biological network. Efficient use of the network information is important to improve classification performance as well as the biological interpretability. We proposed a multinomial logit model that is capable of addressing both the high dimensionality of predictors and the underlying network information. Group lasso was used to induce model sparsity, and a network-constraint was imposed to induce the smoothness of the coefficients with respect to the underlying network structure. To deal with the non-smoothness of the objective function in optimization, we developed a proximal gradient algorithm for efficient computation. The proposed model was compared to models with no prior structure information in both simulations and a problem of cancer subtype prediction with real TCGA (the cancer genome atlas) gene expression data. The network-constrained mode outperformed the traditional ones in both cases.

  2. An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

    PubMed

    Galpert, Deborah; Del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

    2015-01-01

    Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

  3. Comparative Genomics and Transcriptomics Analyses Reveal Divergent Lifestyle Features of Nematode Endoparasitic Fungus Hirsutella minnesotensis

    PubMed Central

    Lai, Yiling; Liu, Keke; Zhang, Xinyu; Zhang, Xiaoling; Li, Kuan; Wang, Niuniu; Shu, Chi; Wu, Yunpeng; Wang, Chengshu; Bushley, Kathryn E.; Xiang, Meichun; Liu, Xingzhong

    2014-01-01

    Hirsutella minnesotensis [Ophiocordycipitaceae (Hypocreales, Ascomycota)] is a dominant endoparasitic fungus by using conidia that adhere to and penetrate the secondary stage juveniles of soybean cyst nematode. Its genome was de novo sequenced and compared with five entomopathogenic fungi in the Hypocreales and three nematode-trapping fungi in the Orbiliales (Ascomycota). The genome of H. minnesotensis is 51.4 Mb and encodes 12,702 genes enriched with transposable elements up to 32%. Phylogenomic analysis revealed that H. minnesotensis was diverged from entomopathogenic fungi in Hypocreales. Genome of H. minnesotensis is similar to those of entomopathogenic fungi to have fewer genes encoding lectins for adhesion and glycoside hydrolases for cellulose degradation, but is different from those of nematode-trapping fungi to possess more genes for protein degradation, signal transduction, and secondary metabolism. Those results indicate that H. minnesotensis has evolved different mechanism for nematode endoparasitism compared with nematode-trapping fungi. Transcriptomics analyses for the time-scale parasitism revealed the upregulations of lectins, secreted proteases and the genes for biosynthesis of secondary metabolites that could be putatively involved in host surface adhesion, cuticle degradation, and host manipulation. Genome and transcriptome analyses provided comprehensive understanding of the evolution and lifestyle of nematode endoparasitism. PMID:25359922

  4. Genomic analysis of thermophilic Bacillus coagulans strains: efficient producers for platform bio-chemicals.

    PubMed

    Su, Fei; Xu, Ping

    2014-01-29

    Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species.

  5. Genomic analysis of thermophilic Bacillus coagulans strains: efficient producers for platform bio-chemicals

    PubMed Central

    Su, Fei; Xu, Ping

    2014-01-01

    Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species. PMID:24473268

  6. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource

    PubMed Central

    Seaver, Samuel M. D.; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M. T.; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D.; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D.; Henry, Christopher S.

    2014-01-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today’s annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed. PMID:24927599

  7. High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

    PubMed

    Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S

    2014-07-01

    The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

  8. Nanopore Long-Read Guided Complete Genome Assembly of Hydrogenophaga intermedia, and Genomic Insights into 4-Aminobenzenesulfonate, p-Aminobenzoic Acid and Hydrogen Metabolism in the Genus Hydrogenophaga.

    PubMed

    Gan, Han M; Lee, Yin P; Austin, Christopher M

    2017-01-01

    We improved upon the previously reported draft genome of Hydrogenophaga intermedia strain PBC, a 4-aminobenzenesulfonate-degrading bacterium, by supplementing the assembly with Nanopore long reads which enabled the reconstruction of the genome as a single contig. From the complete genome, major genes responsible for the catabolism of 4-aminobenzenesulfonate in strain PBC are clustered in two distinct genomic regions. Although the catabolic genes for 4-sulfocatechol, the deaminated product of 4-aminobenzenesulfonate, are only found in H. intermedia , the sad operon responsible for the first deamination step of 4-aminobenzenesulfonate is conserved in various Hydrogenophaga strains. The absence of pabB gene in the complete genome of H. intermedia PBC is consistent with its p -aminobenzoic acid (pABA) auxotrophy but surprisingly comparative genomics analysis of 14 Hydrogenophaga genomes indicate that pABA auxotrophy is not an uncommon feature among members of this genus. Of even more interest, several Hydrogenophaga strains do not possess the genomic potential for hydrogen oxidation, calling for a revision to the taxonomic description of Hydrogenophaga as "hydrogen eating bacteria."

  9. Genomic analysis and temperature-dependent transcriptome profiles of the rhizosphere originating strain Pseudomonas aeruginosa M18

    PubMed Central

    2011-01-01

    Background Our previously published reports have described an effective biocontrol agent named Pseudomonas sp. M18 as its 16S rDNA sequence and several regulator genes share homologous sequences with those of P. aeruginosa, but there are several unusual phenotypic features. This study aims to explore its strain specific genomic features and gene expression patterns at different temperatures. Results The complete M18 genome is composed of a single chromosome of 6,327,754 base pairs containing 5684 open reading frames. Seven genomic islands, including two novel prophages and five specific non-phage islands were identified besides the conserved P. aeruginosa core genome. Each prophage contains a putative chitinase coding gene, and the prophage II contains a capB gene encoding a putative cold stress protein. The non-phage genomic islands contain genes responsible for pyoluteorin biosynthesis, environmental substance degradation and type I and III restriction-modification systems. Compared with other P. aeruginosa strains, the fewest number (3) of insertion sequences and the most number (3) of clustered regularly interspaced short palindromic repeats in M18 genome may contribute to the relative genome stability. Although the M18 genome is most closely related to that of P. aeruginosa strain LESB58, the strain M18 is more susceptible to several antimicrobial agents and easier to be erased in a mouse acute lung infection model than the strain LESB58. The whole M18 transcriptomic analysis indicated that 10.6% of the expressed genes are temperature-dependent, with 22 genes up-regulated at 28°C in three non-phage genomic islands and one prophage but none at 37°C. Conclusions The P. aeruginosa strain M18 has evolved its specific genomic structures and temperature dependent expression patterns to meet the requirement of its fitness and competitiveness under selective pressures imposed on the strain in rhizosphere niche. PMID:21884571

  10. Genomic characterization of recurrent high-grade astroblastoma.

    PubMed

    Bale, Tejus A; Abedalthagafi, Malak; Bi, Wenya Linda; Kang, Yun Jee; Merrill, Parker; Dunn, Ian F; Dubuc, Adrian; Charbonneau, Sarah K; Brown, Loreal; Ligon, Azra H; Ramkissoon, Shakti H; Ligon, Keith L

    2016-01-01

    Astroblastomas are rare primary brain tumors, diagnosed based on histologic features. Not currently assigned a WHO grade, they typically display indolent behavior, with occasional variants taking a more aggressive course. We characterized the immunohistochemical characteristics, copy number (high-resolution array comparative genomic hybridization, OncoCopy) and mutational profile (targeted next-generation exome sequencing, OncoPanel) of a cohort of seven biopsies from four patients to identify recurrent genomic events that may help distinguish astroblastomas from other more common high-grade gliomas. We found that tumor histology was variable across patients and between primary and recurrent tumor samples. No common molecular features were identified among the four tumors. Mutations commonly observed in astrocytic tumors (IDH1/2, TP53, ATRX, and PTEN) or ependymoma were not identified. However one case with rapid clinical progression displayed mutations more commonly associated with GBM (NF1(N1054H/K63)*, PIK3CA(R38H) and ERG(A403T)). Conversely, another case, originally classified as glioblastoma with nine-year survival before recurrence, lacked a GBM mutational profile. Other mutations frequently seen in lower grade gliomas (BCOR, BCORL1, ERBB3, MYB, ATM) were also present in several tumors. Copy number changes were variable across tumors. Our findings indicate that astroblastomas have variable growth patterns and morphologic features, posing significant challenges to accurate classification in the absence of diagnostically specific copy number alterations and molecular features. Their histopathologic overlap with glioblastoma will likely confound the observation of long-term GBM "survivors". Further genomic profiling is needed to determine whether these tumors represent a distinct entity and to guide management strategies. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. tRNomics: analysis of tRNA genes from 50 genomes of Eukarya, Archaea, and Bacteria reveals anticodon-sparing strategies and domain-specific features.

    PubMed Central

    Marck, Christian; Grosjean, Henri

    2002-01-01

    From 50 genomes of the three domains of life (7 eukarya, 13 archaea, and 30 bacteria), we extracted, analyzed, and compared over 4,000 sequences corresponding to cytoplasmic, nonorganellar tRNAs. For each genome, the complete set of tRNAs required to read the 61 sense codons was identified, which permitted revelation of three major anticodon-sparing strategies. Other features and sequence peculiarities analyzed are the following: (1) fit to the standard cloverleaf structure, (2) characteristic consensus sequences for elongator and initiator tDNAs, (3) frequencies of bases at each sequence position, (4) type and frequencies of conserved 2D and 3D base pairs, (5) anticodon/tDNA usages and anticodon-sparing strategies, (6) identification of the tRNA-Ile with anticodon CAU reading AUA, (7) size of variable arm, (8) occurrence and location of introns, (9) occurrence of 3'-CCA and 5'-extra G encoded at the tDNA level, and (10) distribution of the tRNA genes in genomes and their mode of transcription. Among all tRNA isoacceptors, we found that initiator tDNA-iMet is the most conserved across the three domains, yet domain-specific signatures exist. Also, according to which tRNA feature is considered (5'-extra G encoded in tDNAs-His, AUA codon read by tRNA-Ile with anticodon CAU, presence of intron, absence of "two-out-of-three" reading mode and short V-arm in tDNA-Tyr) Archaea sequester either with Bacteria or Eukarya. No common features between Eukarya and Bacteria not shared with Archaea could be unveiled. Thus, from the tRNomic point of view, Archaea appears as an "intermediate domain" between Eukarya and Bacteria. PMID:12403461

  12. Analyses of Genotypes and Phenotypes of Ten Chinese Patients with Wolf-Hirschhorn Syndrome by Multiplex Ligation-dependent Probe Amplification and Array Comparative Genomic Hybridization

    PubMed Central

    Yang, Wen-Xu; Pan, Hong; Li, Lin; Wu, Hai-Rong; Wang, Song-Tao; Bao, Xin-Hua; Jiang, Yu-Wu; Qi, Yu

    2016-01-01

    Background: Wolf-Hirschhorn syndrome (WHS) is a contiguous gene syndrome that is typically caused by a deletion of the distal portion of the short arm of chromosome 4. However, there are few reports about the features of Chinese WHS patients. This study aimed to characterize the clinical and molecular cytogenetic features of Chinese WHS patients using the combination of multiplex ligation-dependent probe amplification (MLPA) and array comparative genomic hybridization (array CGH). Methods: Clinical information was collected from ten patients with WHS. Genomic DNA was extracted from the peripheral blood of the patients. The deletions were analyzed by MLPA and array CGH. Results: All patients exhibited the core clinical symptoms of WHS, including severe growth delay, a Greek warrior helmet facial appearance, differing degrees of intellectual disability, and epilepsy or electroencephalogram anomalies. The 4p deletions ranged from 2.62 Mb to 17.25 Mb in size and included LETM1, WHSC1, and FGFR3. Conclusions: The combined use of MLPA and array CGH is an effective and specific means to diagnose WHS and allows for the precise identification of the breakpoints and sizes of deletions. The deletion of genes in the WHS candidate region is closely correlated with the core WHS phenotype. PMID:26960370

  13. Discovery and Validation of Novel Expression Signature for Postcystectomy Recurrence in High-Risk Bladder Cancer

    PubMed Central

    Lam, Lucia L.; Ghadessi, Mercedeh; Erho, Nicholas; Vergara, Ismael A.; Alshalalfa, Mohammed; Buerki, Christine; Haddad, Zaid; Sierocinski, Thomas; Triche, Timothy J.; Skinner, Eila C.; Davicioni, Elai; Daneshmand, Siamak; Black, Peter C.

    2014-01-01

    Background Nearly half of muscle-invasive bladder cancer patients succumb to their disease following cystectomy. Selecting candidates for adjuvant therapy is currently based on clinical parameters with limited predictive power. This study aimed to develop and validate genomic-based signatures that can better identify patients at risk for recurrence than clinical models alone. Methods Transcriptome-wide expression profiles were generated using 1.4 million feature-arrays on archival tumors from 225 patients who underwent radical cystectomy and had muscle-invasive and/or node-positive bladder cancer. Genomic (GC) and clinical (CC) classifiers for predicting recurrence were developed on a discovery set (n = 133). Performances of GC, CC, an independent clinical nomogram (IBCNC), and genomic-clinicopathologic classifiers (G-CC, G-IBCNC) were assessed in the discovery and independent validation (n = 66) sets. GC was further validated on four external datasets (n = 341). Discrimination and prognostic abilities of classifiers were compared using area under receiver-operating characteristic curves (AUCs). All statistical tests were two-sided. Results A 15-feature GC was developed on the discovery set with area under curve (AUC) of 0.77 in the validation set. This was higher than individual clinical variables, IBCNC (AUC = 0.73), and comparable to CC (AUC = 0.78). Performance was improved upon combining GC with clinical nomograms (G-IBCNC, AUC = 0.82; G-CC, AUC = 0.86). G-CC high-risk patients had elevated recurrence probabilities (P < .001), with GC being the best predictor by multivariable analysis (P = .005). Genomic-clinicopathologic classifiers outperformed clinical nomograms by decision curve and reclassification analyses. GC performed the best in validation compared with seven prior signatures. GC markers remained prognostic across four independent datasets. Conclusions The validated genomic-based classifiers outperform clinical models for predicting postcystectomy bladder cancer recurrence. This may be used to better identify patients who need more aggressive management. PMID:25344601

  14. FIGENIX: Intelligent automation of genomic annotation: expertise integration in a new software platform

    PubMed Central

    Gouret, Philippe; Vitiello, Vérane; Balandraud, Nathalie; Gilles, André; Pontarotti, Pierre; Danchin, Etienne GJ

    2005-01-01

    Background Two of the main objectives of the genomic and post-genomic era are to structurally and functionally annotate genomes which consists of detecting genes' position and structure, and inferring their function (as well as of other features of genomes). Structural and functional annotation both require the complex chaining of numerous different software, algorithms and methods under the supervision of a biologist. The automation of these pipelines is necessary to manage huge amounts of data released by sequencing projects. Several pipelines already automate some of these complex chaining but still necessitate an important contribution of biologists for supervising and controlling the results at various steps. Results Here we propose an innovative automated platform, FIGENIX, which includes an expert system capable to substitute to human expertise at several key steps. FIGENIX currently automates complex pipelines of structural and functional annotation under the supervision of the expert system (which allows for example to make key decisions, check intermediate results or refine the dataset). The quality of the results produced by FIGENIX is comparable to those obtained by expert biologists with a drastic gain in terms of time costs and avoidance of errors due to the human manipulation of data. Conclusion The core engine and expert system of the FIGENIX platform currently handle complex annotation processes of broad interest for the genomic community. They could be easily adapted to new, or more specialized pipelines, such as for example the annotation of miRNAs, the classification of complex multigenic families, annotation of regulatory elements and other genomic features of interest. PMID:16083500

  15. The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis.

    PubMed

    Poczai, Péter; Hyvönen, Jaakko

    2017-01-01

    Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC-rps14 region and 6-kb in the trnG-UCC-psbD, followed by a third <1kb inversion in the trnT sequence.

  16. The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis

    PubMed Central

    Hyvönen, Jaakko

    2017-01-01

    Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC–rps14 region and 6-kb in the trnG-UCC–psbD, followed by a third <1kb inversion in the trnT sequence. PMID:29095905

  17. On an early gene for membrane-integral inorganic pyrophosphatase in the genome of an apparently pre-luca extremophile, the archaeon Candidatus Korarchaeum cryptofilum.

    PubMed

    Baltscheffsky, Herrick; Persson, Bengt

    2014-02-01

    A gene for membrane-integral inorganic pyrophosphatase (miPPase) was found in the composite genome of the extremophile archaeon Candidatus Korarchaeum cryptofilum (CKc). This korarchaeal genome shows unusual partial similarity to both major archaeal phyla Crenarchaeota and Euryarchaeota. Thus this Korarchaeote might have retained features that represent an ancestral archaeal form, existing before the occurrence of the evolutionary bifurcation into Crenarchaeota and Euryarchaeota. In addition, CKc lacks five genes that are common to early genomes at the LUCA border. These two properties independently suggest a pre-LUCA evolutionary position of this extremophile. Our finding of the miPPase gene in the CKc genome points to a role for the enzyme in the energy conversion of this very early archaeon. The structural features of its miPPase indicate that it can pump protons through membranes. An miPPase from the extremophile bacterium Caldicellulosiruptor saccharolyticus also has a sequence indicating a proton pump. Recent analysis of the three-dimensional structure of the miPPase from Vigna radiata has resulted in the recognition of a strongly acidic substrate (orthophosphate: Pi, pyrophosphate: PPi) binding pocket, containing 11 Asp and one Glu residues. Asp (aspartic acid) is an evolutionarily very early proteinaceous amino acid as compared to the later appearing Glu (glutamic acid). All the Asp residues are conserved in the miPPase of CKc, V. radiata and other miPPases. The high proportion of Asp, as compared to Glu, seems to strengthen our argument that biological energy conversion with binding and activities of orthophosphate (Pi) and energy-rich pyrophosphate (PPi) in connection with the origin and early evolution of life may have started with similar or even more primitive acidic peptide funnels and/or pockets.

  18. Complete Sequence and Analysis of the Mitochondrial Genome of Hemiselmis andersenii CCMP644 (Cryptophyceae)

    PubMed Central

    Kim, Eunsoo; Lane, Christopher E; Curtis, Bruce A; Kozera, Catherine; Bowman, Sharen; Archibald, John M

    2008-01-01

    Background Cryptophytes are an enigmatic group of unicellular eukaryotes with plastids derived by secondary (i.e., eukaryote-eukaryote) endosymbiosis. Cryptophytes are unusual in that they possess four genomes–a host cell-derived nuclear and mitochondrial genome and an endosymbiont-derived plastid and 'nucleomorph' genome. The evolutionary origins of the host and endosymbiont components of cryptophyte algae are at present poorly understood. Thus far, a single complete mitochondrial genome sequence has been determined for the cryptophyte Rhodomonas salina. Here, the second complete mitochondrial genome of the cryptophyte alga Hemiselmis andersenii CCMP644 is presented. Results The H. andersenii mtDNA is 60,553 bp in size and encodes 30 structural RNAs and 36 protein-coding genes, all located on the same strand. A prominent feature of the genome is the presence of a ~20 Kbp long intergenic region comprised of numerous tandem and dispersed repeat units of between 22–336 bp. Adjacent to these repeats are 27 copies of palindromic sequences predicted to form stable DNA stem-loop structures. One such stem-loop is located near a GC-rich and GC-poor region and may have a regulatory function in replication or transcription. The H. andersenii mtDNA shares a number of features in common with the genome of the cryptophyte Rhodomonas salina, including general architecture, gene content, and the presence of a large repeat region. However, the H. andersenii mtDNA is devoid of inverted repeats and introns, which are present in R. salina. Comparative analyses of the suite of tRNAs encoded in the two genomes reveal that the H. andersenii mtDNA has lost or converted its original trnK(uuu) gene and possesses a trnS-derived 'trnK(uuu)', which appears unable to produce a functional tRNA. Mitochondrial protein coding gene phylogenies strongly support a variety of previously established eukaryotic groups, but fail to resolve the relationships among higher-order eukaryotic lineages. Conclusion Comparison of the H. andersenii and R. salina mitochondrial genomes reveals a number of cryptophyte-specific genomic features, most notably the presence of a large repeat-rich intergenic region. However, unlike R. salina, the H. andersenii mtDNA does not possess introns and lacks a Lys-tRNA, which is presumably imported from the cytosol. PMID:18474103

  19. Automatic glaucoma diagnosis through medical imaging informatics.

    PubMed

    Liu, Jiang; Zhang, Zhuo; Wong, Damon Wing Kee; Xu, Yanwu; Yin, Fengshou; Cheng, Jun; Tan, Ngan Meng; Kwoh, Chee Keong; Xu, Dong; Tham, Yih Chung; Aung, Tin; Wong, Tien Yin

    2013-01-01

    Computer-aided diagnosis for screening utilizes computer-based analytical methodologies to process patient information. Glaucoma is the leading irreversible cause of blindness. Due to the lack of an effective and standard screening practice, more than 50% of the cases are undiagnosed, which prevents the early treatment of the disease. To design an automatic glaucoma diagnosis architecture automatic glaucoma diagnosis through medical imaging informatics (AGLAIA-MII) that combines patient personal data, medical retinal fundus image, and patient's genome information for screening. 2258 cases from a population study were used to evaluate the screening software. These cases were attributed with patient personal data, retinal images and quality controlled genome data. Utilizing the multiple kernel learning-based classifier, AGLAIA-MII, combined patient personal data, major image features, and important genome single nucleotide polymorphism (SNP) features. Receiver operating characteristic curves were plotted to compare AGLAIA-MII's performance with classifiers using patient personal data, images, and genome SNP separately. AGLAIA-MII was able to achieve an area under curve value of 0.866, better than 0.551, 0.722 and 0.810 by the individual personal data, image and genome information components, respectively. AGLAIA-MII also demonstrated a substantial improvement over the current glaucoma screening approach based on intraocular pressure. AGLAIA-MII demonstrates for the first time the capability of integrating patients' personal data, medical retinal image and genome information for automatic glaucoma diagnosis and screening in a large dataset from a population study. It paves the way for a holistic approach for automatic objective glaucoma diagnosis and screening.

  20. Genome features of moderately halophilic polyhydroxyalkanoate-producing Yangia sp. CCB-MM3.

    PubMed

    Lau, Nyok-Sean; Sam, Ka-Kei; Amirul, Abdullah Al-Ashraf

    2017-01-01

    Yangia sp. CCB-MM3 was one of several halophilic bacteria isolated from soil sediment in the estuarine Matang Mangrove, Malaysia. So far, no member from the genus Yangia , a member of the Rhodobacteraceae family, has been reported sequenced. In the current study, we present the first complete genome sequence of Yangia sp. strain CCB-MM3. The genome includes two chromosomes and five plasmids with a total length of 5,522,061 bp and an average GC content of 65%. Since a different strain of Yangia sp. (ND199) was reported to produce a polyhydroxyalkanoate copolymer, the ability for this production was tested in vitro and confirmed for strain CCB-MM3. Analysis of its genome sequence confirmed presence of a pathway for production of propionyl-CoA and gene cluster for PHA production in the sequenced strain. The genome sequence described will be a useful resource for understanding the physiology and metabolic potential of Yangia as well as for comparative genomic analysis with other Rhodobacteraceae .

  1. Genomic features of bacterial adaptation to plants

    DOE PAGES

    Levy, Asaf; Salas Gonzalez, Isai; Mittelviefhaus, Maximilian; ...

    2017-12-18

    Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. In this study, we sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and themore » other serving in microbe–microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. In conclusion, this work expands the genome-based understanding of plant–microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.« less

  2. Minimal-assumption inference from population-genomic data

    NASA Astrophysics Data System (ADS)

    Weissman, Daniel; Hallatschek, Oskar

    Samples of multiple complete genome sequences contain vast amounts of information about the evolutionary history of populations, much of it in the associations among polymorphisms at different loci. Current methods that take advantage of this linkage information rely on models of recombination and coalescence, limiting the sample sizes and populations that they can analyze. We introduce a method, Minimal-Assumption Genomic Inference of Coalescence (MAGIC), that reconstructs key features of the evolutionary history, including the distribution of coalescence times, by integrating information across genomic length scales without using an explicit model of recombination, demography or selection. Using simulated data, we show that MAGIC's performance is comparable to PSMC' on single diploid samples generated with standard coalescent and recombination models. More importantly, MAGIC can also analyze arbitrarily large samples and is robust to changes in the coalescent and recombination processes. Using MAGIC, we show that the inferred coalescence time histories of samples of multiple human genomes exhibit inconsistencies with a description in terms of an effective population size based on single-genome data.

  3. Genomic features of bacterial adaptation to plants

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Levy, Asaf; Salas Gonzalez, Isai; Mittelviefhaus, Maximilian

    Plants intimately associate with diverse bacteria. Plant-associated bacteria have ostensibly evolved genes that enable them to adapt to plant environments. However, the identities of such genes are mostly unknown, and their functions are poorly characterized. In this study, we sequenced 484 genomes of bacterial isolates from roots of Brassicaceae, poplar, and maize. We then compared 3,837 bacterial genomes to identify thousands of plant-associated gene clusters. Genomes of plant-associated bacteria encode more carbohydrate metabolism functions and fewer mobile elements than related non-plant-associated genomes do. We experimentally validated candidates from two sets of plant-associated genes: one involved in plant colonization, and themore » other serving in microbe–microbe competition between plant-associated bacteria. We also identified 64 plant-associated protein domains that potentially mimic plant domains; some are shared with plant-associated fungi and oomycetes. In conclusion, this work expands the genome-based understanding of plant–microbe interactions and provides potential leads for efficient and sustainable agriculture through microbiome engineering.« less

  4. Genome Structure of the Legume, Lotus japonicus

    PubMed Central

    Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi

    2008-01-01

    The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435

  5. Genome Features of “Dark-Fly”, a Drosophila Line Reared Long-Term in a Dark Environment

    PubMed Central

    Zhou, Jun; Sugiyama, Yuzo; Nishimura, Osamu; Aizu, Tomoyuki; Toyoda, Atsushi; Fujiyama, Asao; Agata, Kiyokazu

    2012-01-01

    Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed “Dark-fly”, which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation. PMID:22432011

  6. Comprehensive definition of genome features in Spirodela polyrhiza by high-depth physical mapping and short-read DNA sequencing strategies.

    PubMed

    Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric

    2017-02-01

    Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.

  7. Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm.

    PubMed

    Martinez, Emmanuel; Alvarez, Mario Moises; Trevino, Victor

    2010-08-01

    Biomarker discovery is a typical application from functional genomics. Due to the large number of genes studied simultaneously in microarray data, feature selection is a key step. Swarm intelligence has emerged as a solution for the feature selection problem. However, swarm intelligence settings for feature selection fail to select small features subsets. We have proposed a swarm intelligence feature selection algorithm based on the initialization and update of only a subset of particles in the swarm. In this study, we tested our algorithm in 11 microarray datasets for brain, leukemia, lung, prostate, and others. We show that the proposed swarm intelligence algorithm successfully increase the classification accuracy and decrease the number of selected features compared to other swarm intelligence methods. Copyright © 2010 Elsevier Ltd. All rights reserved.

  8. Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis

    PubMed Central

    Chakrabarti, Kausik; Pearson, Michael; Grate, Leslie; Sterne-Weiler, Timothy; Deans, Jonathan; Donohue, John Paul; Ares, Manuel

    2007-01-01

    As the genomes of more eukaryotic pathogens are sequenced, understanding how molecular differences between parasite and host might be exploited to provide new therapies has become a major focus. Central to cell function are RNA-containing complexes involved in gene expression, such as the ribosome, the spliceosome, snoRNAs, RNase P, and telomerase, among others. In this article we identify by comparative genomics and validate by RNA analysis numerous previously unknown structural RNAs encoded by the Plasmodium falciparum genome, including the telomerase RNA, U3, 31 snoRNAs, as well as previously predicted spliceosomal snRNAs, SRP RNA, MRP RNA, and RNAse P RNA. Furthermore, we identify six new RNA coding genes of unknown function. To investigate the relationships of the RNA coding genes to other genomic features in related parasites, we developed a genome browser for P. falciparum (http://areslab.ucsc.edu/cgi-bin/hgGateway). Additional experiments provide evidence supporting the prediction that snoRNAs guide methylation of a specific position on U4 snRNA, as well as predicting an snRNA promoter element particular to Plasmodium sp. These findings should allow detailed structural comparisons between the RNA components of the gene expression machinery of the parasite and its vertebrate hosts. PMID:17901154

  9. Genomic comparison of multi-drug resistant invasive and colonizing Acinetobacter baumannii isolated from diverse human body sites reveals genomic plasticity.

    PubMed

    Sahl, Jason W; Johnson, J Kristie; Harris, Anthony D; Phillippy, Adam M; Hsiao, William W; Thom, Kerri A; Rasko, David A

    2011-06-04

    Acinetobacter baumannii has recently emerged as a significant global pathogen, with a surprisingly rapid acquisition of antibiotic resistance and spread within hospitals and health care institutions. This study examines the genomic content of three A. baumannii strains isolated from distinct body sites. Isolates from blood, peri-anal, and wound sources were examined in an attempt to identify genetic features that could be correlated to each isolation source. Pulsed-field gel electrophoresis, multi-locus sequence typing and antibiotic resistance profiles demonstrated genotypic and phenotypic variation. Each isolate was sequenced to high-quality draft status, which allowed for comparative genomic analyses with existing A. baumannii genomes. A high resolution, whole genome alignment method detailed the phylogenetic relationships of sequenced A. baumannii and found no correlation between phylogeny and body site of isolation. This method identified genomic regions unique to both those isolates found on the surface of the skin or in wounds, termed colonization isolates, and those identified from body fluids, termed invasive isolates; these regions may play a role in the pathogenesis and spread of this important pathogen. A PCR-based screen of 74 A. baumanii isolates demonstrated that these unique genes are not exclusive to either phenotype or isolation source; however, a conserved genomic region exclusive to all sequenced A. baumannii was identified and verified. The results of the comparative genome analysis and PCR assay show that A. baumannii is a diverse and genomically variable pathogen that appears to have the potential to cause a range of human disease regardless of the isolation source.

  10. The First Complete Mitochondrial Genome Sequences for Stomatopod Crustaceans: Implications for Phylogeny

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Swinstrom, Kirsten; Caldwell, Roy; Fourcade, H. Matthew

    2005-09-07

    We report the first complete mitochondrial genome sequences of stomatopods and compare their features to each other and to those of other crustaceans. Phylogenetic analyses of the concatenated mitochondrial protein-coding sequences were used to explore relationships within the Stomatopoda, within the malacostracan crustaceans, and among crustaceans and insects. Although these analyses support the monophyly of both Malacostraca and, within it, Stomatopoda, it also confirms the view of a paraphyletic Crustacea, with Malacostraca being more closely related to insects than to the branchiopod crustaceans.

  11. Comparative genomics of closely related Salmonella enterica serovar Typhi strains reveals genome dynamics and the acquisition of novel pathogenic elements.

    PubMed

    Yap, Kien-Pong; Gan, Han Ming; Teh, Cindy Shuan Ju; Chai, Lay Ching; Thong, Kwai Lin

    2014-11-20

    Typhoid fever is an infectious disease of global importance that is caused by Salmonella enterica subsp. enterica serovar Typhi (S. Typhi). This disease causes an estimated 200,000 deaths per year and remains a serious global health threat. S. Typhi is strictly a human pathogen, and some recovered individuals become long-term carriers who continue to shed the bacteria in their faeces, thus becoming main reservoirs of infection. A comparative genomics analysis combined with a phylogenomic analysis revealed that the strains from the outbreak and carrier were closely related with microvariations and possibly derived from a common ancestor. Additionally, the comparative genomics analysis with all of the other completely sequenced S. Typhi genomes revealed that strains BL196 and CR0044 exhibit unusual genomic variations despite S. Typhi being generally regarded as highly clonal. The two genomes shared distinct chromosomal architectures and uncommon genome features; notably, the presence of a ~10 kb novel genomic island containing uncharacterised virulence-related genes, and zot in particular. Variations were also detected in the T6SS system and genes that were related to SPI-10, insertion sequences, CRISPRs and nsSNPs among the studied genomes. Interestingly, the carrier strain CR0044 harboured far more genetic polymorphisms (83% mutant nsSNPs) compared with the closely related BL196 outbreak strain. Notably, the two highly related virulence-determinant genes, rpoS and tviE, were mutated in strains BL196 and CR0044, respectively, which revealed that the mutation in rpoS is stabilising, while that in tviE is destabilising. These microvariations provide novel insight into the optimisation of genes by the pathogens. However, the sporadic strain was found to be far more conserved compared with the others. The uncommon genomic variations in the two closely related BL196 and CR0044 strains suggests that S. Typhi is more diverse than previously thought. Our study has demonstrated that the pathogen is continually acquiring new genes through horizontal gene transfer in the process of host adaptation, providing novel insight into its unusual genomic dynamics. The understanding of these strains and virulence factors, and particularly the strain that is associated with the large outbreak and the less studied asymptomatic Typhi carrier in the population, will have important impact on disease control.

  12. The Sequenced Angiosperm Genomes and Genome Databases.

    PubMed

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology.

  13. The genomes and comparative genomics of Lactobacillus delbrueckii phages.

    PubMed

    Riipinen, Katja-Anneli; Forsman, Päivi; Alatossava, Tapani

    2011-07-01

    Lactobacillus delbrueckii phages are a great source of genetic diversity. Here, the genome sequences of Lb. delbrueckii phages LL-Ku, c5 and JCL1032 were analyzed in detail, and the genetic diversity of Lb. delbrueckii phages belonging to different taxonomic groups was explored. The lytic isometric group b phages LL-Ku (31,080 bp) and c5 (31,841 bp) showed a minimum nucleotide sequence identity of 90% over about three-fourths of their genomes. The genomic locations of their lysis modules were unique, and the genomes featured several putative overlapping transcription units of genes. LL-Ku and c5 virions displayed peptidoglycan hydrolytic activity associated with a ~36-kDa protein similar in size to the endolysin. Unexpectedly, the 49,433-bp genome of the prolate phage JCL1032 (temperate, group c) revealed a conserved gene order within its structural genes. Lb. delbrueckii phages representing groups a (a phage LL-H), b and c possessed only limited protein sequence homology. Genomic comparison of LL-Ku and c5 suggested that diversification of Lb. delbrueckii phages is mainly due to insertions, deletions and recombination. For the first time, the complete genome sequences of group b and c Lb. delbrueckii phages are reported.

  14. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    PubMed

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  15. The Sequenced Angiosperm Genomes and Genome Databases

    PubMed Central

    Chen, Fei; Dong, Wei; Zhang, Jiawei; Guo, Xinyue; Chen, Junhao; Wang, Zhengjia; Lin, Zhenguo; Tang, Haibao; Zhang, Liangsheng

    2018-01-01

    Angiosperms, the flowering plants, provide the essential resources for human life, such as food, energy, oxygen, and materials. They also promoted the evolution of human, animals, and the planet earth. Despite the numerous advances in genome reports or sequencing technologies, no review covers all the released angiosperm genomes and the genome databases for data sharing. Based on the rapid advances and innovations in the database reconstruction in the last few years, here we provide a comprehensive review for three major types of angiosperm genome databases, including databases for a single species, for a specific angiosperm clade, and for multiple angiosperm species. The scope, tools, and data of each type of databases and their features are concisely discussed. The genome databases for a single species or a clade of species are especially popular for specific group of researchers, while a timely-updated comprehensive database is more powerful for address of major scientific mysteries at the genome scale. Considering the low coverage of flowering plants in any available database, we propose construction of a comprehensive database to facilitate large-scale comparative studies of angiosperm genomes and to promote the collaborative studies of important questions in plant biology. PMID:29706973

  16. Specialized adaptation of a lactic acid bacterium to the milk environment: the comparative genomics of Streptococcus thermophilus LMD-9

    PubMed Central

    2011-01-01

    Background Streptococcus thermophilus represents the only species among the streptococci that has “Generally Regarded As Safe” status and that plays an economically important role in the fermentation of yogurt and cheeses. We conducted comparative genome analysis of S. thermophilus LMD-9 to identify unique gene features as well as features that contribute to its adaptation to the dairy environment. In addition, we investigated the transcriptome response of LMD-9 during growth in milk in the presence of Lactobacillus delbrueckii ssp. bulgaricus, a companion culture in yogurt fermentation, and during lytic bacteriophage infection. Results The S. thermophilus LMD-9 genome is comprised of a 1.8 Mbp circular chromosome (39.1% GC; 1,834 predicted open reading frames) and two small cryptic plasmids. Genome comparison with the previously sequenced LMG 18311 and CNRZ1066 strains revealed 114 kb of LMD-9 specific chromosomal region, including genes that encode for histidine biosynthetic pathway, a cell surface proteinase, various host defense mechanisms and a phage remnant. Interestingly, also unique to LMD-9 are genes encoding for a putative mucus-binding protein, a peptide transporter, and exopolysaccharide biosynthetic proteins that have close orthologs in human intestinal microorganisms. LMD-9 harbors a large number of pseudogenes (13% of ORFeome), indicating that like LMG 18311 and CNRZ1066, LMD-9 has also undergone major reductive evolution, with the loss of carbohydrate metabolic genes and virulence genes found in their streptococcal counterparts. Functional genome distribution analysis of ORFeomes among streptococci showed that all three S. thermophilus strains formed a distinct functional cluster, further establishing their specialized adaptation to the nutrient-rich milk niche. An upregulation of CRISPR1 expression in LMD-9 during lytic bacteriophage DT1 infection suggests its protective role against phage invasion. When co-cultured with L. bulgaricus, LMD-9 overexpressed genes involved in amino acid transport and metabolism as well as DNA replication. Conclusions The genome of S. thermophilus LMD-9 is shaped by its domestication in the dairy environment, with gene features that conferred rapid growth in milk, stress response mechanisms and host defense systems that are relevant to its industrial applications. The presence of a unique exopolysaccharide gene cluster and cell surface protein orthologs commonly associated with probiotic functionality revealed potential probiotic applications of LMD-9. PMID:21995282

  17. Specialized adaptation of a lactic acid bacterium to the milk environment: the comparative genomics of Streptococcus thermophilus LMD-9.

    PubMed

    Goh, Yong Jun; Goin, Caitlin; O'Flaherty, Sarah; Altermann, Eric; Hutkins, Robert

    2011-08-30

    Streptococcus thermophilus represents the only species among the streptococci that has "Generally Regarded As Safe" status and that plays an economically important role in the fermentation of yogurt and cheeses. We conducted comparative genome analysis of S. thermophilus LMD-9 to identify unique gene features as well as features that contribute to its adaptation to the dairy environment. In addition, we investigated the transcriptome response of LMD-9 during growth in milk in the presence of Lactobacillus delbrueckii ssp. bulgaricus, a companion culture in yogurt fermentation, and during lytic bacteriophage infection. The S. thermophilus LMD-9 genome is comprised of a 1.8 Mbp circular chromosome (39.1% GC; 1,834 predicted open reading frames) and two small cryptic plasmids. Genome comparison with the previously sequenced LMG 18311 and CNRZ1066 strains revealed 114 kb of LMD-9 specific chromosomal region, including genes that encode for histidine biosynthetic pathway, a cell surface proteinase, various host defense mechanisms and a phage remnant. Interestingly, also unique to LMD-9 are genes encoding for a putative mucus-binding protein, a peptide transporter, and exopolysaccharide biosynthetic proteins that have close orthologs in human intestinal microorganisms. LMD-9 harbors a large number of pseudogenes (13% of ORFeome), indicating that like LMG 18311 and CNRZ1066, LMD-9 has also undergone major reductive evolution, with the loss of carbohydrate metabolic genes and virulence genes found in their streptococcal counterparts. Functional genome distribution analysis of ORFeomes among streptococci showed that all three S. thermophilus strains formed a distinct functional cluster, further establishing their specialized adaptation to the nutrient-rich milk niche. An upregulation of CRISPR1 expression in LMD-9 during lytic bacteriophage DT1 infection suggests its protective role against phage invasion. When co-cultured with L. bulgaricus, LMD-9 overexpressed genes involved in amino acid transport and metabolism as well as DNA replication. The genome of S. thermophilus LMD-9 is shaped by its domestication in the dairy environment, with gene features that conferred rapid growth in milk, stress response mechanisms and host defense systems that are relevant to its industrial applications. The presence of a unique exopolysaccharide gene cluster and cell surface protein orthologs commonly associated with probiotic functionality revealed potential probiotic applications of LMD-9.

  18. Comparative features of sixteen yeast genomes having significant biotechnological interest

    USDA-ARS?s Scientific Manuscript database

    Saccharomyces cerevisiae has been used in fermentations for millennia and metabolically engineered for decades. While its genetic system is powerful, its limited capacities for ATP and NADPH production along with the limited range of substrates that it will use for growth make it less useful for var...

  19. Comparative Genomics of a Plant-Parasitic Nematode Endosymbiont Suggest a Role in Nutritional Symbiosis

    PubMed Central

    Brown, Amanda M.V.; Howe, Dana K.; Wasala, Sulochana K.; Peetz, Amy B.; Zasada, Inga A.; Denver, Dee R.

    2015-01-01

    Bacterial mutualists can modulate the biochemical capacity of animals. Highly coevolved nutritional mutualists do this by synthesizing nutrients missing from the host’s diet. Genomics tools have advanced the study of these partnerships. Here we examined the endosymbiont Xiphinematobacter (phylum Verrucomicrobia) from the dagger nematode Xiphinema americanum, a migratory ectoparasite of numerous crops that also vectors nepovirus. Previously, this endosymbiont was identified in the gut, ovaries, and eggs, but its role was unknown. We explored the potential role of this symbiont using fluorescence in situ hybridization, genome sequencing, and comparative functional genomics. We report the first genome of an intracellular Verrucomicrobium and the first exclusively intracellular non-Wolbachia nematode symbiont. Results revealed that Xiphinematobacter had a small 0.916-Mb genome with only 817 predicted proteins, resembling genomes of other mutualist endosymbionts. Compared with free-living relatives, conserved proteins were shorter on average, and there was large-scale loss of regulatory pathways. Despite massive gene loss, more genes were retained for biosynthesis of amino acids predicted to be essential to the host. Gene ontology enrichment tests showed enrichment for biosynthesis of arginine, histidine, and aromatic amino acids, as well as thiamine and coenzyme A, diverging from the profiles of relatives Akkermansia muciniphilia (in the human colon), Methylacidiphilum infernorum, and the mutualist Wolbachia from filarial nematodes. Together, these features and the location in the gut suggest that Xiphinematobacter functions as a nutritional mutualist, supplementing essential nutrients that are depleted in the nematode diet. This pattern points to evolutionary convergence with endosymbionts found in sap-feeding insects. PMID:26362082

  20. GFVO: the Genomic Feature and Variation Ontology.

    PubMed

    Baran, Joachim; Durgahee, Bibi Sehnaaz Begum; Eilbeck, Karen; Antezana, Erick; Hoehndorf, Robert; Dumontier, Michel

    2015-01-01

    Falling costs in genomic laboratory experiments have led to a steady increase of genomic feature and variation data. Multiple genomic data formats exist for sharing these data, and whilst they are similar, they are addressing slightly different data viewpoints and are consequently not fully compatible with each other. The fragmentation of data format specifications makes it hard to integrate and interpret data for further analysis with information from multiple data providers. As a solution, a new ontology is presented here for annotating and representing genomic feature and variation dataset contents. The Genomic Feature and Variation Ontology (GFVO) specifically addresses genomic data as it is regularly shared using the GFF3 (incl. FASTA), GTF, GVF and VCF file formats. GFVO simplifies data integration and enables linking of genomic annotations across datasets through common semantics of genomic types and relations. Availability and implementation. The latest stable release of the ontology is available via its base URI; previous and development versions are available at the ontology's GitHub repository: https://github.com/BioInterchange/Ontologies; versions of the ontology are indexed through BioPortal (without external class-/property-equivalences due to BioPortal release 4.10 limitations); examples and reference documentation is provided on a separate web-page: http://www.biointerchange.org/ontologies.html. GFVO version 1.0.2 is licensed under the CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0) and therefore de facto within the public domain; the ontology can be appropriated without attribution for commercial and non-commercial use.

  1. Discovery and mapping of single feature polymorphisms in wheat using Affymetrix arrays

    PubMed Central

    Bernardo, Amy N; Bradbury, Peter J; Ma, Hongxiang; Hu, Shengwa; Bowden, Robert L; Buckler, Edward S; Bai, Guihua

    2009-01-01

    Background Wheat (Triticum aestivum L.) is a staple food crop worldwide. The wheat genome has not yet been sequenced due to its huge genome size (~17,000 Mb) and high levels of repetitive sequences; the whole genome sequence may not be expected in the near future. Available linkage maps have low marker density due to limitation in available markers; therefore new technologies that detect genome-wide polymorphisms are still needed to discover a large number of new markers for construction of high-resolution maps. A high-resolution map is a critical tool for gene isolation, molecular breeding and genomic research. Single feature polymorphism (SFP) is a new microarray-based type of marker that is detected by hybridization of DNA or cRNA to oligonucleotide probes. This study was conducted to explore the feasibility of using the Affymetrix GeneChip to discover and map SFPs in the large hexaploid wheat genome. Results Six wheat varieties of diverse origins (Ning 7840, Clark, Jagger, Encruzilhada, Chinese Spring, and Opata 85) were analyzed for significant probe by variety interactions and 396 probe sets with SFPs were identified. A subset of 164 unigenes was sequenced and 54% showed polymorphism within probes. Microarray analysis of 71 recombinant inbred lines from the cross Ning 7840/Clark identified 955 SFPs and 877 of them were mapped together with 269 simple sequence repeat markers. The SFPs were randomly distributed within a chromosome but were unevenly distributed among different genomes. The B genome had the most SFPs, and the D genome had the least. Map positions of a selected set of SFPs were validated by mapping single nucleotide polymorphism using SNaPshot and comparing with expressed sequence tags mapping data. Conclusion The Affymetrix array is a cost-effective platform for SFP discovery and SFP mapping in wheat. The new high-density map constructed in this study will be a useful tool for genetic and genomic research in wheat. PMID:19480702

  2. Improved maize reference genome with single-molecule technologies.

    PubMed

    Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen

    2017-06-22

    Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.

  3. Large Diversity of Nonstandard Genes and Dynamic Evolution of Chloroplast Genomes in Siphonous Green Algae (Bryopsidales, Chlorophyta)

    PubMed Central

    Leliaert, Frederik; Marcelino, Vanessa R

    2018-01-01

    Abstract Chloroplast genomes have undergone tremendous alterations through the evolutionary history of the green algae (Chloroplastida). This study focuses on the evolution of chloroplast genomes in the siphonous green algae (order Bryopsidales). We present five new chloroplast genomes, which along with existing sequences, yield a data set representing all but one families of the order. Using comparative phylogenetic methods, we investigated the evolutionary dynamics of genomic features in the order. Our results show extensive variation in chloroplast genome architecture and intron content. Variation in genome size is accounted for by the amount of intergenic space and freestanding open reading frames that do not show significant homology to standard plastid genes. We show the diversity of these nonstandard genes based on their conserved protein domains, which are often associated with mobile functions (reverse transcriptase/intron maturase, integrases, phage- or plasmid-DNA primases, transposases, integrases, ligases). Investigation of the introns showed proliferation of group II introns in the early evolution of the order and their subsequent loss in the core Halimedineae, possibly through RT-mediated intron loss. PMID:29635329

  4. Delta: a new web-based 3D genome visualization and analysis platform.

    PubMed

    Tang, Bixia; Li, Feifei; Li, Jing; Zhao, Wenming; Zhang, Zhihua

    2018-04-15

    Delta is an integrative visualization and analysis platform to facilitate visually annotating and exploring the 3D physical architecture of genomes. Delta takes Hi-C or ChIA-PET contact matrix as input and predicts the topologically associating domains and chromatin loops in the genome. It then generates a physical 3D model which represents the plausible consensus 3D structure of the genome. Delta features a highly interactive visualization tool which enhances the integration of genome topology/physical structure with extensive genome annotation by juxtaposing the 3D model with diverse genomic assay outputs. Finally, by visually comparing the 3D model of the β-globin gene locus and its annotation, we speculated a plausible transitory interaction pattern in the locus. Experimental evidence was found to support this speculation by literature survey. This served as an example of intuitive hypothesis testing with the help of Delta. Delta is freely accessible from http://delta.big.ac.cn, and the source code is available at https://github.com/zhangzhwlab/delta. zhangzhihua@big.ac.cn. Supplementary data are available at Bioinformatics online.

  5. Can BI-RADS features on mammography be used as a surrogate for expensive genomic testing in breast cancer patients?

    NASA Astrophysics Data System (ADS)

    Harowicz, Michael R.; Marks, Jeffrey R.; Marcom, P. Kelly; Mazurowski, Maciej A.

    2017-03-01

    Medical oncologists increasingly rely on expensive genomic analysis to stratify patients for different treatment. The genomic markers are able to divide patients into groups that behave differently in terms of tumor presentation, likelihood of metastatic spread, and response to chemotherapy and radiation therapy. In recent years there has been a rapid increase in the number of genomic tests available, like the Oncotype DX test, which provides the risk of cancer recurrence for a subset of patients. Radiogenomics, a new field that investigates the relationship between imaging phenotypes and genomic characteristics, may offer a less expensive and less invasive imaging surrogate for molecular subtype and Oncotype DX recurrence score (ODRS). This retrospective study analyzes the relationship between Breast Imaging-Reporting and Data System (BI-RADS) features as assessed by radiologists on mammograms with molecular subtype and ODRS. We used data from patients with BI-RADS features (shape or margin) and a genomic feature (subtype or ODRS) for the following cohort: shape vs. subtype (n=69), margin vs. subtype (n=78), shape vs. ODRS (n=20), and margin vs. ODRS (n=18). The association between features was assessed using a Fisher's exact test. Our results show that shape assessed by radiologists according to the BI-RADS lexicon is associated with molecular subtype (p=0.0171), while BI-RADS features of shape and margin were not significantly associated with ODRS (p=0.7839, p=0.6047 respectively).

  6. Insights into bilaterian evolution from three spiralian genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simakov, Oleg; Marletaz, Ferdinand; Cho, Sung-Jin

    2012-01-07

    Current genomic perspectives on animal diversity neglect two prominent phyla, the molluscs and annelids, that together account for nearly one-third of known marine species and are important both ecologically and as experimental systems in classical embryology1, 2, 3. Here we describe the draft genomes of the owl limpet (Lottia gigantea), a marine polychaete (Capitella teleta) and a freshwater leech (Helobdella robusta), and compare them with other animal genomes to investigate the origin and diversification of bilaterians from a genomic perspective. We find that the genome organization, gene structure and functional content of these species are more similar to those ofmore » some invertebrate deuterostome genomes (for example, amphioxus and sea urchin) than those of other protostomes that have been sequenced to date (flies, nematodes and flatworms). The conservation of these genomic features enables us to expand the inventory of genes present in the last common bilaterian ancestor, establish the tripartite diversification of bilaterians using multiple genomic characteristics and identify ancient conserved long- and short-range genetic linkages across metazoans. Superimposed on this broadly conserved pan-bilaterian background we find examples of lineage-specific genome evolution, including varying rates of rearrangement, intron gain and loss, expansions and contractions of gene families, and the evolution of clade-specific genes that produce the unique content of each genome.« less

  7. The genome sequence of E. coli W (ATCC 9637): comparative genome analysis and an improved genome-scale reconstruction of E. coli

    PubMed Central

    2011-01-01

    Background Escherichia coli is a model prokaryote, an important pathogen, and a key organism for industrial biotechnology. E. coli W (ATCC 9637), one of four strains designated as safe for laboratory purposes, has not been sequenced. E. coli W is a fast-growing strain and is the only safe strain that can utilize sucrose as a carbon source. Lifecycle analysis has demonstrated that sucrose from sugarcane is a preferred carbon source for industrial bioprocesses. Results We have sequenced and annotated the genome of E. coli W. The chromosome is 4,900,968 bp and encodes 4,764 ORFs. Two plasmids, pRK1 (102,536 bp) and pRK2 (5,360 bp), are also present. W has unique features relative to other sequenced laboratory strains (K-12, B and Crooks): it has a larger genome and belongs to phylogroup B1 rather than A. W also grows on a much broader range of carbon sources than does K-12. A genome-scale reconstruction was developed and validated in order to interrogate metabolic properties. Conclusions The genome of W is more similar to commensal and pathogenic B1 strains than phylogroup A strains, and therefore has greater utility for comparative analyses with these strains. W should therefore be the strain of choice, or 'type strain' for group B1 comparative analyses. The genome annotation and tools created here are expected to allow further utilization and development of E. coli W as an industrial organism for sucrose-based bioprocesses. Refinements in our E. coli metabolic reconstruction allow it to more accurately define E. coli metabolism relative to previous models. PMID:21208457

  8. Detecting Positive Selection of Korean Native Goat Populations Using Next-Generation Sequencing

    PubMed Central

    Lee, Wonseok; Ahn, Sojin; Taye, Mengistie; Sung, Samsun; Lee, Hyun-Jeong; Cho, Seoae; Kim, Heebal

    2016-01-01

    Goats (Capra hircus) are one of the oldest species of domesticated animals. Native Korean goats are a particularly interesting group, as they are indigenous to the area and were raised in the Korean peninsula almost 2,000 years ago. Although they have a small body size and produce low volumes of milk and meat, they are quite resistant to lumbar paralysis. Our study aimed to reveal the distinct genetic features and patterns of selection in native Korean goats by comparing the genomes of native Korean goat and crossbred goat populations. We sequenced the whole genome of 15 native Korean goats and 11 crossbred goats using next-generation sequencing (Illumina platform) to compare the genomes of the two populations. We found decreased nucleotide diversity in the native Korean goats compared to the crossbred goats. Genetic structural analysis demonstrated that the native Korean goat and crossbred goat populations shared a common ancestry, but were clearly distinct. Finally, to reveal the native Korean goat’s selective sweep region, selective sweep signals were identified in the native Korean goat genome using cross-population extended haplotype homozygosity (XP-EHH) and a cross-population composite likelihood ratio test (XP-CLR). As a result, we were able to identify candidate genes for recent selection, such as the CCR3 gene, which is related to lumbar paralysis resistance. Combined with future studies and recent goat genome information, this study will contribute to a thorough understanding of the native Korean goat genome. PMID:27989103

  9. Detecting Positive Selection of Korean Native Goat Populations Using Next-Generation Sequencing.

    PubMed

    Lee, Wonseok; Ahn, Sojin; Taye, Mengistie; Sung, Samsun; Lee, Hyun-Jeong; Cho, Seoae; Kim, Heebal

    2016-12-01

    Goats ( Capra hircus ) are one of the oldest species of domesticated animals. Native Korean goats are a particularly interesting group, as they are indigenous to the area and were raised in the Korean peninsula almost 2,000 years ago. Although they have a small body size and produce low volumes of milk and meat, they are quite resistant to lumbar paralysis. Our study aimed to reveal the distinct genetic features and patterns of selection in native Korean goats by comparing the genomes of native Korean goat and crossbred goat populations. We sequenced the whole genome of 15 native Korean goats and 11 crossbred goats using next-generation sequencing (Illumina platform) to compare the genomes of the two populations. We found decreased nucleotide diversity in the native Korean goats compared to the crossbred goats. Genetic structural analysis demonstrated that the native Korean goat and crossbred goat populations shared a common ancestry, but were clearly distinct. Finally, to reveal the native Korean goat's selective sweep region, selective sweep signals were identified in the native Korean goat genome using cross-population extended haplotype homozygosity (XP-EHH) and a cross-population composite likelihood ratio test (XP-CLR). As a result, we were able to identify candidate genes for recent selection, such as the CCR3 gene, which is related to lumbar paralysis resistance. Combined with future studies and recent goat genome information, this study will contribute to a thorough understanding of the native Korean goat genome.

  10. Distinct p53 genomic binding patterns in normal and cancer-derived human cells

    PubMed Central

    McCorkle, Sean R; McCombie, WR; Dunn, John J

    2011-01-01

    Here, we report genome-wide analysis of the tumor suppressor p53 binding sites in normal human cells. 743 high-confidence ChIP-seq peaks representing putative genomic binding sites were identified in normal IMR90 fibroblasts using a reference chromatin sample. More than 40% were located within 2 kb of a transcription start site (TSS), a distribution similar to that documented for individually studied, functional p53 binding sites and, to date, not observed by previous p53 genome-wide studies. Nearly half of the high-confidence binding sites in the IMR90 cells reside in CpG islands in marked contrast to sites reported in cancer-derived cells. The distinct genomic features of the IMR90 binding sites do not reflect a distinct preference for specific sequences, since the de novo developed p53 motif based on our study is similar to those reported by genome-wide studies of cancer cells. More likely, the different chromatin landscape in normal, compared with cancer-derived cells, influences p53 binding via modulating availability of the sites. We compared the IMR90 ChIP-seq peaks to the recently published IMR90 methylome1 and demonstrated that they are enriched at hypomethylated DNA. Our study represents the first genome-wide, de novo mapping of p53 binding sites in normal human cells and reveals that p53 binding sites reside in distinct genomic landscapes in normal and cancer-derived human cells. PMID:22127205

  11. Evolutionary trajectories of snake genes and genomes revealed by comparative analyses of five-pacer viper

    PubMed Central

    Yin, Wei; Wang, Zong-ji; Li, Qi-ye; Lian, Jin-ming; Zhou, Yang; Lu, Bing-zheng; Jin, Li-jun; Qiu, Peng-xin; Zhang, Pei; Zhu, Wen-bo; Wen, Bo; Huang, Yi-jun; Lin, Zhi-long; Qiu, Bi-tao; Su, Xing-wen; Yang, Huan-ming; Zhang, Guo-jie; Yan, Guang-mei; Zhou, Qi

    2016-01-01

    Snakes have numerous features distinctive from other tetrapods and a rich history of genome evolution that is still obscure. Here, we report the high-quality genome of the five-pacer viper, Deinagkistrodon acutus, and comparative analyses with other representative snake and lizard genomes. We map the evolutionary trajectories of transposable elements (TEs), developmental genes and sex chromosomes onto the snake phylogeny. TEs exhibit dynamic lineage-specific expansion, and many viper TEs show brain-specific gene expression along with their nearby genes. We detect signatures of adaptive evolution in olfactory, venom and thermal-sensing genes and also functional degeneration of genes associated with vision and hearing. Lineage-specific relaxation of functional constraints on respective Hox and Tbx limb-patterning genes supports fossil evidence for a successive loss of forelimbs then hindlimbs during snake evolution. Finally, we infer that the ZW sex chromosome pair had undergone at least three recombination suppression events in the ancestor of advanced snakes. These results altogether forge a framework for our deep understanding into snakes' history of molecular evolution. PMID:27708285

  12. Contribution of transposable elements and distal enhancers to evolution of human-specific features of interphase chromatin architecture in embryonic stem cells.

    PubMed

    Glinsky, Gennadi V

    2018-03-01

    Transposable elements have made major evolutionary impacts on creation of primate-specific and human-specific genomic regulatory loci and species-specific genomic regulatory networks (GRNs). Molecular and genetic definitions of human-specific changes to GRNs contributing to development of unique to human phenotypes remain a highly significant challenge. Genome-wide proximity placement analysis of diverse families of human-specific genomic regulatory loci (HSGRL) identified topologically associating domains (TADs) that are significantly enriched for HSGRL and designated rapidly evolving in human TADs. Here, the analysis of HSGRL, hESC-enriched enhancers, super-enhancers (SEs), and specific sub-TAD structures termed super-enhancer domains (SEDs) has been performed. In the hESC genome, 331 of 504 (66%) of SED-harboring TADs contain HSGRL and 68% of SEDs co-localize with HSGRL, suggesting that emergence of HSGRL may have rewired SED-associated GRNs within specific TADs by inserting novel and/or erasing existing non-coding regulatory sequences. Consequently, markedly distinct features of the principal regulatory structures of interphase chromatin evolved in the hESC genome compared to mouse: the SED quantity is 3-fold higher and the median SED size is significantly larger. Concomitantly, the overall TAD quantity is increased by 42% while the median TAD size is significantly decreased (p = 9.11E-37) in the hESC genome. Present analyses illustrate a putative global role for transposable elements and HSGRL in shaping the human-specific features of the interphase chromatin organization and functions, which are facilitated by accelerated creation of novel transcription factor binding sites and new enhancers driven by targeted placement of HSGRL at defined genomic coordinates. A trend toward the convergence of TAD and SED architectures of interphase chromatin in the hESC genome may reflect changes of 3D-folding patterns of linear chromatin fibers designed to enhance both regulatory complexity and functional precision of GRNs by creating predominantly a single gene (or a set of functionally linked genes) per regulatory domain structures. Collectively, present analyses reveal critical evolutionary contributions of transposable elements and distal enhancers to creation of thousands primate- and human-specific elements of a chromatin folding code, which defines the 3D context of interphase chromatin both restricting and facilitating biological functions of GRNs.

  13. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

    PubMed

    Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

  14. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE PAGES

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  15. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  16. Anonymization of electronic medical records for validating genome-wide association studies

    PubMed Central

    Loukides, Grigorios; Gkoulalas-Divanis, Aris; Malin, Bradley

    2010-01-01

    Genome-wide association studies (GWAS) facilitate the discovery of genotype–phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients’ standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data “as is” may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients’ identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks. PMID:20385806

  17. The platypus genome unraveled.

    PubMed

    O'Brien, Stephen J

    2008-06-13

    The genome of the platypus has been sequenced, assembled, and annotated by an international genomics team. Like the animal itself the platypus genome contains an amalgam of mammal, reptile, and bird-like features.

  18. Web Apollo: a web-based genomic annotation editing platform.

    PubMed

    Lee, Eduardo; Helt, Gregg A; Reese, Justin T; Munoz-Torres, Monica C; Childers, Chris P; Buels, Robert M; Stein, Lincoln; Holmes, Ian H; Elsik, Christine G; Lewis, Suzanna E

    2013-08-30

    Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.

  19. Web Apollo: a web-based genomic annotation editing platform

    PubMed Central

    2013-01-01

    Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world. PMID:24000942

  20. Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems.« less

  1. Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements

    PubMed Central

    Szitenberg, Amir; Cha, Soyeon; Opperman, Charles H.; Bird, David M.; Blaxter, Mark L.; Lunt, David H.

    2016-01-01

    Abstract Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host’s genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes. PMID:27566762

  2. Metabolic Versatility and Antibacterial Metabolite Biosynthesis Are Distinguishing Genomic Features of the Fire Blight Antagonist Pantoea vagans C9-1

    PubMed Central

    Smits, Theo H. M.; Rezzonico, Fabio; Kamber, Tim; Blom, Jochen; Goesmann, Alexander; Ishimaru, Carol A.; Frey, Jürg E.; Stockwell, Virginia O.; Duffy, Brion

    2011-01-01

    Background Pantoea vagans is a commercialized biological control agent used against the pome fruit bacterial disease fire blight, caused by Erwinia amylovora. Compared to other biocontrol agents, relatively little is currently known regarding Pantoea genetics. Better understanding of antagonist mechanisms of action and ecological fitness is critical to improving efficacy. Principal Findings Genome analysis indicated two major factors contribute to biocontrol activity: competition for limiting substrates and antibacterial metabolite production. Pathways for utilization of a broad diversity of sugars and acquisition of iron were identified. Metabolism of sorbitol by P. vagans C9-1 may be a major metabolic feature in biocontrol of fire blight. Biosynthetic genes for the antibacterial peptide pantocin A were found on a chromosomal 28-kb genomic island, and for dapdiamide E on the plasmid pPag2. There was no evidence of potential virulence factors that could enable an animal or phytopathogenic lifestyle and no indication of any genetic-based biosafety risk in the antagonist. Conclusions Identifying key determinants contributing to disease suppression allows the development of procedures to follow their expression in planta and the genome sequence contributes to rationale risk assessment regarding the use of the biocontrol strain in agricultural systems. PMID:21789243

  3. Cas9 versus Cas12a/Cpf1: Structure-function comparisons and implications for genome editing.

    PubMed

    Swarts, Daan C; Jinek, Martin

    2018-05-22

    Cas9 and Cas12a are multidomain CRISPR-associated nucleases that can be programmed with a guide RNA to bind and cleave complementary DNA targets. The guide RNA sequence can be varied, making these effector enzymes versatile tools for genome editing and gene regulation applications. While Cas9 is currently the best-characterized and most widely used nuclease for such purposes, Cas12a (previously named Cpf1) has recently emerged as an alternative for Cas9. Cas9 and Cas12a have distinct evolutionary origins and exhibit different structural architectures, resulting in distinct molecular mechanisms. Here we compare the structural and mechanistic features that distinguish Cas9 and Cas12a, and describe how these features modulate their activity. We discuss implications for genome editing, and how they may influence the choice of Cas9 or Cas12a for specific applications. Finally, we review recent studies in which Cas12a has been utilized as a genome editing tool. This article is categorized under: RNA Interactions with Proteins and Other Molecules > Protein-RNA Interactions: Functional Implications Regulatory RNAs/RNAi/Riboswitches > Biogenesis of Effector Small RNAs RNA Interactions with Proteins and Other Molecules > RNA-Protein Complexes. © 2018 Wiley Periodicals, Inc.

  4. SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes.

    PubMed

    D'Agostino, Nunzio; Traini, Alessandra; Frusciante, Luigi; Chiusano, Maria Luisa

    2009-11-30

    Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of Solanaceae transcriptomes, an attractive route for understanding Solanaceae genome functionality and a powerful reference for the structural annotation of emerging Solanaceae genome sequences. We describe the SolEST database http://biosrv.cab.unina.it/solestdb which integrates different EST datasets from both cultivated and wild Solanaceae species and from two species of the genus Coffea. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing Solanaceae EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different Solanaceae assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community. Different databases have been established worldwide for collecting Solanaceae ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of Solanaceae transcriptomes.

  5. Genetic resources offer efficient tools for rice functional genomics research.

    PubMed

    Lo, Shuen-Fang; Fan, Ming-Jen; Hsing, Yue-Ie; Chen, Liang-Jwu; Chen, Shu; Wen, Ien-Chie; Liu, Yi-Lun; Chen, Ku-Ting; Jiang, Mirng-Jier; Lin, Ming-Kuang; Rao, Meng-Yen; Yu, Lin-Chih; Ho, Tuan-Hua David; Yu, Su-May

    2016-05-01

    Rice is an important crop and major model plant for monocot functional genomics studies. With the establishment of various genetic resources for rice genomics, the next challenge is to systematically assign functions to predicted genes in the rice genome. Compared with the robustness of genome sequencing and bioinformatics techniques, progress in understanding the function of rice genes has lagged, hampering the utilization of rice genes for cereal crop improvement. The use of transfer DNA (T-DNA) insertional mutagenesis offers the advantage of uniform distribution throughout the rice genome, but preferentially in gene-rich regions, resulting in direct gene knockout or activation of genes within 20-30 kb up- and downstream of the T-DNA insertion site and high gene tagging efficiency. Here, we summarize the recent progress in functional genomics using the T-DNA-tagged rice mutant population. We also discuss important features of T-DNA activation- and knockout-tagging and promoter-trapping of the rice genome in relation to mutant and candidate gene characterizations and how to more efficiently utilize rice mutant populations and datasets for high-throughput functional genomics and phenomics studies by forward and reverse genetics approaches. These studies may facilitate the translation of rice functional genomics research to improvements of rice and other cereal crops. © 2015 John Wiley & Sons Ltd.

  6. Deep Investigation of Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter

    PubMed Central

    Maumus, Florian; Quesneville, Hadi

    2014-01-01

    Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant. PMID:24709859

  7. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE PAGES

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; ...

    2016-12-20

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  8. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  9. Genome-Based Taxonomic Classification of Bacteroidetes

    PubMed Central

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N.; Woyke, Tanja; Kyrpides, Nikos C.; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved. PMID:28066339

  10. The complete mitochondrial genome of the enigmatic bigheadedturtle (Platysternon): description of unusual genomic features and thereconciliation of phylogenetic hypotheses based on mitochondrial andnuclear DNA

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parham, James F.; Feldman, Chris R.; Boore, Jeffrey L.

    2005-12-28

    The big-headed turtle (Platysternon megacephalum) from east Asia is the sole living representative of a poorly-studied turtle lineage (Platysternidae). It has no close living relatives, and its phylogenetic position within turtles is one of the outstanding controversies in turtle systematics. Platysternon was traditionally considered to be close to snapping turtles (Chelydridae) based on some studies of its morphology and mitochondrial (mt) DNA, however, other studies of morphology and nuclear (nu) DNA do not support that hypothesis. We sequenced the complete mt genome of Platysternon and the nearly complete mt genomes of two other relevant turtles and compared them to turtlemore » mt genomes from the literature to form the largest molecular dataset used to date to address this issue. The resulting phylogeny robustly rejects the placement of Platysternon with Chelydridae, but instead shows that it is a member of the Testudinoidea, a diverse, nearly globally-distributed group that includes pond turtles and tortoises. We also discovered that Platysternon mtDNA has large-scale gene rearrangements and possesses two, nearly identical, control regions, features that distinguish it from all other studied turtles. Our study robustly determines the phylogenetic placement of Platysternon and provides a well-resolved outline of major turtle lineages, while demonstrating the significantly greater resolving power of comparing large amounts of mt sequence over that of short fragments. Earlier phylogenies placing Platysternon with chelydrids required a temporal gap in the fossil record that is now unnecessary. The duplicated control regions and gene rearrangements of the Platysternon mt DNA probably resulted from the duplication of part of the genome and then the subsequent loss of redundant genes. Although it is possible that having two control regions may provide some advantage, explaining why the control regions would be maintained while some of the duplicated genes were eroded, examples of this are rare. So far, duplicated control regions have been reported for mt genomes from just 12 clades of metazoans, including Platysternon.« less

  11. GBshape: a genome browser database for DNA shape annotations

    PubMed Central

    Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J.; Parker, Stephen C.J.; Nuzhdin, Sergey V.; Tullius, Thomas D.; Rohs, Remo

    2015-01-01

    Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species. PMID:25326329

  12. Genomic evolution and chemoresistance in germ-cell tumours.

    PubMed

    Taylor-Weiner, Amaro; Zack, Travis; O'Donnell, Elizabeth; Guerriero, Jennifer L; Bernard, Brandon; Reddy, Anita; Han, G Celine; AlDubayan, Saud; Amin-Mansour, Ali; Schumacher, Steven E; Litchfield, Kevin; Turnbull, Clare; Gabriel, Stacey; Beroukhim, Rameen; Getz, Gad; Carter, Scott L; Hirsch, Michelle S; Letai, Anthony; Sweeney, Christopher; Van Allen, Eliezer M

    2016-11-30

    Germ-cell tumours (GCTs) are derived from germ cells and occur most frequently in the testes. GCTs are histologically heterogeneous and distinctly curable with chemotherapy. Gains of chromosome arm 12p and aneuploidy are nearly universal in GCTs, but specific somatic genomic features driving tumour initiation, chemosensitivity and progression are incompletely characterized. Here, using clinical whole-exome and transcriptome sequencing of precursor, primary (testicular and mediastinal) and chemoresistant metastatic human GCTs, we show that the primary somatic feature of GCTs is highly recurrent chromosome arm level amplifications and reciprocal deletions (reciprocal loss of heterozygosity), variations that are significantly enriched in GCTs compared to 19 other cancer types. These tumours also acquire KRAS mutations during the development from precursor to primary disease, and primary testicular GCTs (TGCTs) are uniformly wild type for TP53. In addition, by functional measurement of apoptotic signalling (BH3 profiling) of fresh tumour and adjacent tissue, we find that primary TGCTs have high mitochondrial priming that facilitates chemotherapy-induced apoptosis. Finally, by phylogenetic analysis of serial TGCTs that emerge with chemotherapy resistance, we show how TGCTs gain additional reciprocal loss of heterozygosity and that this is associated with loss of pluripotency markers (NANOG and POU5F1) in chemoresistant teratomas or transformed carcinomas. Our results demonstrate the distinct genomic features underlying the origins of this disease and associated with the chemosensitivity phenotype, as well as the rare progression to chemoresistance. These results identify the convergence of cancer genomics, mitochondrial priming and GCT evolution, and may provide insights into chemosensitivity and resistance in other cancers.

  13. A Ruby API to query the Ensembl database for genomic features.

    PubMed

    Strozzi, Francesco; Aerts, Jan

    2011-04-01

    The Ensembl database makes genomic features available via its Genome Browser. It is also possible to access the underlying data through a Perl API for advanced querying. We have developed a full-featured Ruby API to the Ensembl databases, providing the same functionality as the Perl interface with additional features. A single Ruby API is used to access different releases of the Ensembl databases and is also able to query multi-species databases. Most functionality of the API is provided using the ActiveRecord pattern. The library depends on introspection to make it release independent. The API is available through the Rubygem system and can be installed with the command gem install ruby-ensembl-api.

  14. Insights into archaeal evolution and symbiosis from the genomes of a nanoarchaeon and its inferred crenarchaeal host from Obsidian Pool, Yellowstone National Park

    PubMed Central

    2013-01-01

    Background A single cultured marine organism, Nanoarchaeum equitans, represents the Nanoarchaeota branch of symbiotic Archaea, with a highly reduced genome and unusual features such as multiple split genes. Results The first terrestrial hyperthermophilic member of the Nanoarchaeota was collected from Obsidian Pool, a thermal feature in Yellowstone National Park, separated by single cell isolation, and sequenced together with its putative host, a Sulfolobales archaeon. Both the new Nanoarchaeota (Nst1) and N. equitans lack most biosynthetic capabilities, and phylogenetic analysis of ribosomal RNA and protein sequences indicates that the two form a deep-branching archaeal lineage. However, the Nst1 genome is more than 20% larger, and encodes a complete gluconeogenesis pathway as well as the full complement of archaeal flagellum proteins. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. These findings imply that, rather than representing ancestral characters, the extremely compact genomes and multiple split genes of Nanoarchaeota are derived characters associated with their symbiotic or parasitic lifestyle. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. Conclusions Comparison of the N. equitans and Nst1 genomes suggests that the marine and terrestrial lineages of Nanoarchaeota share a common ancestor that was already a symbiont of another archaeon. The two distinct Nanoarchaeota-host genomic data sets offer novel insights into the evolution of archaeal symbiosis and parasitism, enabling further studies of the cellular and molecular mechanisms of these relationships. Reviewers This article was reviewed by Patrick Forterre, Bettina Siebers (nominated by Michael Galperin) and Purification Lopez-Garcia PMID:23607440

  15. Comparative analyses of putative toxin gene homologs from an Old World viper, Daboia russelii

    PubMed Central

    Krishnan, Neeraja M.

    2017-01-01

    Availability of snake genome sequences has opened up exciting areas of research on comparative genomics and gene diversity. One of the challenges in studying snake genomes is the acquisition of biological material from live animals, especially from the venomous ones, making the process cumbersome and time-consuming. Here, we report comparative sequence analyses of putative toxin gene homologs from Russell’s viper (Daboia russelii) using whole-genome sequencing data obtained from shed skin. When compared with the major venom proteins in Russell’s viper studied previously, we found 45–100% sequence similarity between the venom proteins and their putative homologs in the skin. Additionally, comparative analyses of 20 putative toxin gene family homologs provided evidence of unique sequence motifs in nerve growth factor (NGF), platelet derived growth factor (PDGF), Kunitz/Bovine pancreatic trypsin inhibitor (Kunitz BPTI), cysteine-rich secretory proteins, antigen 5, andpathogenesis-related1 proteins (CAP) and cysteine-rich secretory protein (CRISP). In those derived proteins, we identified V11 and T35 in the NGF domain; F23 and A29 in the PDGF domain; N69, K2 and A5 in the CAP domain; and Q17 in the CRISP domain to be responsible for differences in the largest pockets across the protein domain structures in crotalines, viperines and elapids from the in silico structure-based analysis. Similarly, residues F10, Y11 and E20 appear to play an important role in the protein structures across the kunitz protein domain of viperids and elapids. Our study highlights the usefulness of shed skin in obtaining good quality high-molecular weight DNA for comparative genomic studies, and provides evidence towards the unique features and evolution of putative venom gene homologs in vipers. PMID:29230357

  16. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features

    PubMed Central

    Bakas, Spyridon; Akbari, Hamed; Sotiras, Aristeidis; Bilello, Michel; Rozycki, Martin; Kirby, Justin S.; Freymann, John B.; Farahani, Keyvan; Davatzikos, Christos

    2017-01-01

    Gliomas belong to a group of central nervous system tumors, and consist of various sub-regions. Gold standard labeling of these sub-regions in radiographic imaging is essential for both clinical and computational studies, including radiomic and radiogenomic analyses. Towards this end, we release segmentation labels and radiomic features for all pre-operative multimodal magnetic resonance imaging (MRI) (n=243) of the multi-institutional glioma collections of The Cancer Genome Atlas (TCGA), publicly available in The Cancer Imaging Archive (TCIA). Pre-operative scans were identified in both glioblastoma (TCGA-GBM, n=135) and low-grade-glioma (TCGA-LGG, n=108) collections via radiological assessment. The glioma sub-region labels were produced by an automated state-of-the-art method and manually revised by an expert board-certified neuroradiologist. An extensive panel of radiomic features was extracted based on the manually-revised labels. This set of labels and features should enable i) direct utilization of the TCGA/TCIA glioma collections towards repeatable, reproducible and comparative quantitative studies leading to new predictive, prognostic, and diagnostic assessments, as well as ii) performance evaluation of computer-aided segmentation methods, and comparison to our state-of-the-art method. PMID:28872634

  17. Gee Fu: a sequence version and web-services database tool for genomic assembly, genome feature and NGS data.

    PubMed

    Ramirez-Gonzalez, Ricardo; Caccamo, Mario; MacLean, Daniel

    2011-10-01

    Scientists now use high-throughput sequencing technologies and short-read assembly methods to create draft genome assemblies in just days. Tools and pipelines like the assembler, and the workflow management environments make it easy for a non-specialist to implement complicated pipelines to produce genome assemblies and annotations very quickly. Such accessibility results in a proliferation of assemblies and associated files, often for many organisms. These assemblies get used as a working reference by lots of different workers, from a bioinformatician doing gene prediction or a bench scientist designing primers for PCR. Here we describe Gee Fu, a database tool for genomic assembly and feature data, including next-generation sequence alignments. Gee Fu is an instance of a Ruby-On-Rails web application on a feature database that provides web and console interfaces for input, visualization of feature data via AnnoJ, access to data through a web-service interface, an API for direct data access by Ruby scripts and access to feature data stored in BAM files. Gee Fu provides a platform for storing and sharing different versions of an assembly and associated features that can be accessed and updated by bench biologists and bioinformaticians in ways that are easy and useful for each. http://tinyurl.com/geefu dan.maclean@tsl.ac.uk.

  18. Feature co-localization landscape of the human genome

    PubMed Central

    Ng, Siu-Kin; Hu, Taobo; Long, Xi; Chan, Cheuk-Hin; Tsang, Shui-Ying; Xue, Hong

    2016-01-01

    Although feature co-localizations could serve as useful guide-posts to genome architecture, a comprehensive and quantitative feature co-localization map of the human genome has been lacking. Herein we show that, in contrast to the conventional bipartite division of genomic sequences into genic and inter-genic regions, pairwise co-localizations of forty-two genomic features in the twenty-two autosomes based on 50-kb to 2,000-kb sequence windows indicate a tripartite zonal architecture comprising Genic zones enriched with gene-related features and Alu-elements; Proximal zones enriched with MIR- and L2-elements, transcription-factor-binding-sites (TFBSs), and conserved-indels (CIDs); and Distal zones enriched with L1-elements. Co-localizations between single-nucleotide-polymorphisms (SNPs) and copy-number-variations (CNVs) reveal a fraction of sequence windows displaying steeply enhanced levels of SNPs, CNVs and recombination rates that point to active adaptive evolution in such pathways as immune response, sensory perceptions, and cognition. The strongest positive co-localization observed between TFBSs and CIDs suggests a regulatory role of CIDs in cooperation with TFBSs. The positive co-localizations of cancer somatic CNVs (CNVT) with all Proximal zone and most Genic zone features, in contrast to the distinctly more restricted co-localizations exhibited by germline CNVs (CNVG), reveal disparate distributions of CNVTs and CNVGs indicative of dissimilarity in their underlying mechanisms. PMID:26854351

  19. Signatures of host specialization and a recent transposable element burst in the dynamic one-speed genome of the fungal barley powdery mildew pathogen.

    PubMed

    Frantzeskakis, Lamprinos; Kracher, Barbara; Kusch, Stefan; Yoshikawa-Maekawa, Makoto; Bauer, Saskia; Pedersen, Carsten; Spanu, Pietro D; Maekawa, Takaki; Schulze-Lefert, Paul; Panstruga, Ralph

    2018-05-22

    Powdery mildews are biotrophic pathogenic fungi infecting a number of economically important plants. The grass powdery mildew, Blumeria graminis, has become a model organism to study host specialization of obligate biotrophic fungal pathogens. We resolved the large-scale genomic architecture of B. graminis forma specialis hordei (Bgh) to explore the potential influence of its genome organization on the co-evolutionary process with its host plant, barley (Hordeum vulgare). The near-chromosome level assemblies of the Bgh reference isolate DH14 and one of the most diversified isolates, RACE1, enabled a comparative analysis of these haploid genomes, which are highly enriched with transposable elements (TEs). We found largely retained genome synteny and gene repertoires, yet detected copy number variation (CNV) of secretion signal peptide-containing protein-coding genes (SPs) and locally disrupted synteny blocks. Genes coding for sequence-related SPs are often locally clustered, but neither the SPs nor the TEs reside preferentially in genomic regions with unique features. Extended comparative analysis with different host-specific B. graminis formae speciales revealed the existence of a core suite of SPs, but also isolate-specific SP sets as well as congruence of SP CNV and phylogenetic relationship. We further detected evidence for a recent, lineage-specific expansion of TEs in the Bgh genome. The characteristics of the Bgh genome (largely retained synteny, CNV of SP genes, recently proliferated TEs and a lack of significant compartmentalization) are consistent with a "one-speed" genome that differs in its architecture and (co-)evolutionary pattern from the "two-speed" genomes reported for several other filamentous phytopathogens.

  20. The Genome of the Obligate Intracellular Parasite Trachipleistophora hominis: New Insights into Microsporidian Genome Dynamics and Reductive Evolution

    PubMed Central

    Heinz, Eva; Williams, Tom A.; Nakjang, Sirintra; Noël, Christophe J.; Swan, Daniel C.; Goldberg, Alina V.; Harris, Simon R.; Weinmaier, Thomas; Markert, Stephanie; Becher, Dörte; Bernhardt, Jörg; Dagan, Tal; Hacker, Christian; Lucocq, John M.; Schweder, Thomas; Rattei, Thomas; Hall, Neil; Hirt, Robert P.; Embley, T. Martin

    2012-01-01

    The dynamics of reductive genome evolution for eukaryotes living inside other eukaryotic cells are poorly understood compared to well-studied model systems involving obligate intracellular bacteria. Here we present 8.5 Mb of sequence from the genome of the microsporidian Trachipleistophora hominis, isolated from an HIV/AIDS patient, which is an outgroup to the smaller compacted-genome species that primarily inform ideas of evolutionary mode for these enormously successful obligate intracellular parasites. Our data provide detailed information on the gene content, genome architecture and intergenic regions of a larger microsporidian genome, while comparative analyses allowed us to infer genomic features and metabolism of the common ancestor of the species investigated. Gene length reduction and massive loss of metabolic capacity in the common ancestor was accompanied by the evolution of novel microsporidian-specific protein families, whose conservation among microsporidians, against a background of reductive evolution, suggests they may have important functions in their parasitic lifestyle. The ancestor had already lost many metabolic pathways but retained glycolysis and the pentose phosphate pathway to provide cytosolic ATP and reduced coenzymes, and it had a minimal mitochondrion (mitosome) making Fe-S clusters but not ATP. It possessed bacterial-like nucleotide transport proteins as a key innovation for stealing host-generated ATP, the machinery for RNAi, key elements of the early secretory pathway, canonical eukaryotic as well as microsporidian-specific regulatory elements, a diversity of repetitive and transposable elements, and relatively low average gene density. Microsporidian genome evolution thus appears to have proceeded in at least two major steps: an ancestral remodelling of the proteome upon transition to intracellular parasitism that involved reduction but also selective expansion, followed by a secondary compaction of genome architecture in some, but not all, lineages. PMID:23133373

  1. Resolving the homology—function relationship through comparative genomics of membrane-trafficking machinery and parasite cell biology

    PubMed Central

    Klinger, Christen M.; Ramirez-Macias, Inmaculada; Herman, Emily K.; Turkewitz, Aaron P.; Field, Mark C.; Dacks, Joel B.

    2016-01-01

    With advances in DNA sequencing technology, it is increasingly common and tractable to informatically look for genes of interest in the genomic databases of parasitic organisms and infer cellular states. Assignment of a putative gene function based on homology to functionally characterized genes in other organisms, though powerful, relies on the implicit assumption of functional homology, i.e. that orthology indicates conserved function. Eukaryotes reveal a dazzling array of cellular features and structural organization, suggesting a concomitant diversity in their underlying molecular machinery. Significantly, examples of novel functions for pre-existing or new paralogues are not uncommon. Do these examples undermine the basic assumption of functional homology, especially in parasitic protists, which are often highly derived? Here we examine the extent to which functional homology exists between organisms spanning the eukaryotic lineage. By comparing membrane trafficking proteins between parasitic protists and traditional model organisms, where direct functional evidence is available, we find that function is indeed largely conserved between orthologues, albeit with significant adaptation arising from the unique biological features within each lineage. PMID:27444378

  2. Local admixture of amplified and diversified secreted pathogenesis determinants shapes mosaic Toxoplasma gondii genomes

    PubMed Central

    Lorenzi, Hernan; Khan, Asis; Behnke, Michael S.; Namasivayam, Sivaranjani; Swapna, Lakshmipuram S.; Hadjithomas, Michalis; Karamycheva, Svetlana; Pinney, Deborah; Brunk, Brian P.; Ajioka, James W.; Ajzenberg, Daniel; Boothroyd, John C.; Boyle, Jon P.; Dardé, Marie L.; Diaz-Miranda, Maria A.; Dubey, Jitender P.; Fritz, Heather M.; Gennari, Solange M.; Gregory, Brian D.; Kim, Kami; Saeij, Jeroen P. J.; Su, Chunlei; White, Michael W.; Zhu, Xing-Quan; Howe, Daniel K.; Rosenthal, Benjamin M.; Grigg, Michael E.; Parkinson, John; Liu, Liang; Kissinger, Jessica C.; Roos, David S.; David Sibley, L

    2016-01-01

    Toxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges. To elucidate the genetic basis for these differences, we compared the genomes of 62 globally distributed T. gondii isolates to several closely related coccidian parasites. Our findings reveal that tandem amplification and diversification of secretory pathogenesis determinants is the primary feature that distinguishes the closely related genomes of these biologically diverse parasites. We further show that the unusual population structure of T. gondii is characterized by clade-specific inheritance of large conserved haploblocks that are significantly enriched in tandemly clustered secretory pathogenesis determinants. The shared inheritance of these conserved haploblocks, which show a different ancestry than the genome as a whole, may thus influence transmission, host range and pathogenicity. PMID:26738725

  3. Systems biology approach in plant abiotic stresses.

    PubMed

    Mohanta, Tapan Kumar; Bashir, Tufail; Hashem, Abeer; Abd Allah, Elsayed Fathi

    2017-12-01

    Plant abiotic stresses are the major constraint on plant growth and development, causing enormous crop losses across the world. Plants have unique features to defend themselves against these challenging adverse stress conditions. They modulate their phenotypes upon changes in physiological, biochemical, molecular and genetic information, thus making them tolerant against abiotic stresses. It is of paramount importance to determine the stress-tolerant traits of a diverse range of genotypes of plant species and integrate those traits for crop improvement. Stress-tolerant traits can be identified by conducting genome-wide analysis of stress-tolerant genotypes through the highly advanced structural and functional genomics approach. Specifically, whole-genome sequencing, development of molecular markers, genome-wide association studies and comparative analysis of interaction networks between tolerant and susceptible crop varieties grown under stress conditions can greatly facilitate discovery of novel agronomic traits that protect plants against abiotic stresses. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  4. Genome analysis of the platypus reveals unique signatures of evolution.

    PubMed

    Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

    2008-05-08

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.

  5. DNA microarrays: a powerful genomic tool for biomedical and clinical research

    PubMed Central

    Trevino, Victor; Falciani, Francesco; Barrera-Saldaña, Hugo A.

    2007-01-01

    Among the many benefits of the Human Genome Project are new and powerful tools such as the genome-wide hybridization devices referred as microarrays. Initially designed to measure gene transcriptional levels, microarray technologies are now used for comparing other genome features among individuals and their tissues and cells. Results provide valuable information on disease subcategories, disease prognosis, and treatment outcome. Likewise, reveal differences in genetic makeup, regulatory mechanisms and subtle variations are approaching the era of personalized medicine. To understand this powerful tool, its versatility and how it is dramatically changing the molecular approach to biomedical and clinical research, this review describes the technology, its applications, a didactic step-by-step review of a typical microarray protocol, and a real experiment. Finally, it calls the attention of the medical community to integrate multidisciplinary teams, to take advantage of this technology and its expanding applications that in a slide reveals our genetic inheritance and destiny. PMID:17660860

  6. Genome analysis of the platypus reveals unique signatures of evolution

    PubMed Central

    Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

    2009-01-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  7. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives.

    PubMed

    Zhao, Min; Wang, Qingguo; Wang, Quan; Jia, Peilin; Zhao, Zhongming

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development.

  8. Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives

    PubMed Central

    2013-01-01

    Copy number variation (CNV) is a prevalent form of critical genetic variation that leads to an abnormal number of copies of large genomic regions in a cell. Microarray-based comparative genome hybridization (arrayCGH) or genotyping arrays have been standard technologies to detect large regions subject to copy number changes in genomes until most recently high-resolution sequence data can be analyzed by next-generation sequencing (NGS). During the last several years, NGS-based analysis has been widely applied to identify CNVs in both healthy and diseased individuals. Correspondingly, the strong demand for NGS-based CNV analyses has fuelled development of numerous computational methods and tools for CNV detection. In this article, we review the recent advances in computational methods pertaining to CNV detection using whole genome and whole exome sequencing data. Additionally, we discuss their strengths and weaknesses and suggest directions for future development. PMID:24564169

  9. WormBase 2016: expanding to enable helminth genomic research.

    PubMed

    Howe, Kevin L; Bolt, Bruce J; Cain, Scott; Chan, Juancarlos; Chen, Wen J; Davis, Paul; Done, James; Down, Thomas; Gao, Sibyl; Grove, Christian; Harris, Todd W; Kishore, Ranjana; Lee, Raymond; Lomax, Jane; Li, Yuling; Muller, Hans-Michael; Nakamura, Cecilia; Nuin, Paulo; Paulini, Michael; Raciti, Daniela; Schindelman, Gary; Stanley, Eleanor; Tuli, Mary Ann; Van Auken, Kimberly; Wang, Daniel; Wang, Xiaodong; Williams, Gary; Wright, Adam; Yook, Karen; Berriman, Matthew; Kersey, Paul; Schedl, Tim; Stein, Lincoln; Sternberg, Paul W

    2016-01-04

    WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species, and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support helminth genomic research. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Community-led comparative genomic and phenotypic analysis of the aquaculture pathogen Pseudomonas baetica a390T sequenced by Ion semiconductor and Nanopore technologies

    PubMed Central

    Beaton, Ainsley; Lood, Cédric; Cunningham-Oakes, Edward; MacFadyen, Alison; Mullins, Alex J; Bestawy, Walid El; Botelho, João; Chevalier, Sylvie; Dalzell, Chloe; Dolan, Stephen K; Faccenda, Alberto; Ghequire, Maarten G K; Higgins, Steven; Kutschera, Alexander; Murray, Jordan; Redway, Martha; Salih, Talal; Smith, Brian A; Smits, Nathan; Thomson, Ryan; Woodcock, Stuart; Cornelis, Pierre; Lavigne, Rob; van Noort, Vera

    2018-01-01

    Abstract Pseudomonas baetica strain a390T is the type strain of this recently described species and here we present its high-contiguity draft genome. To celebrate the 16th International Conference on Pseudomonas, the genome of P. baetica strain a390T was sequenced using a unique combination of Ion Torrent semiconductor and Oxford Nanopore methods as part of a collaborative community-led project. The use of high-quality Ion Torrent sequences with long Nanopore reads gave rapid, high-contiguity and -quality, 16-contig genome sequence. Whole genome phylogenetic analysis places P. baetica within the P. koreensis clade of the P. fluorescens group. Comparison of the main genomic features of P. baetica with a variety of other Pseudomonas spp. suggests that it is a highly adaptable organism, typical of the genus. This strain was originally isolated from the liver of a diseased wedge sole fish, and genotypic and phenotypic analyses show that it is tolerant to osmotic stress and to oxytetracycline. PMID:29579234

  11. The Plant Genome Integrative Explorer Resource: PlantGenIE.org.

    PubMed

    Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R

    2015-12-01

    Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  12. Determination of the Core of a Minimal Bacterial Gene Set†

    PubMed Central

    Gil, Rosario; Silva, Francisco J.; Peretó, Juli; Moya, Andrés

    2004-01-01

    The availability of a large number of complete genome sequences raises the question of how many genes are essential for cellular life. Trying to reconstruct the core of the protein-coding gene set for a hypothetical minimal bacterial cell, we have performed a computational comparative analysis of eight bacterial genomes. Six of the analyzed genomes are very small due to a dramatic genome size reduction process, while the other two, corresponding to free-living relatives, are larger. The available data from several systematic experimental approaches to define all the essential genes in some completely sequenced bacterial genomes were also considered, and a reconstruction of a minimal metabolic machinery necessary to sustain life was carried out. The proposed minimal genome contains 206 protein-coding genes with all the genetic information necessary for self-maintenance and reproduction in the presence of a full complement of essential nutrients and in the absence of environmental stress. The main features of such a minimal gene set, as well as the metabolic functions that must be present in the hypothetical minimal cell, are discussed. PMID:15353568

  13. Comparative Genomics of a Plant-Parasitic Nematode Endosymbiont Suggest a Role in Nutritional Symbiosis.

    PubMed

    Brown, Amanda M V; Howe, Dana K; Wasala, Sulochana K; Peetz, Amy B; Zasada, Inga A; Denver, Dee R

    2015-09-10

    Bacterial mutualists can modulate the biochemical capacity of animals. Highly coevolved nutritional mutualists do this by synthesizing nutrients missing from the host's diet. Genomics tools have advanced the study of these partnerships. Here we examined the endosymbiont Xiphinematobacter (phylum Verrucomicrobia) from the dagger nematode Xiphinema americanum, a migratory ectoparasite of numerous crops that also vectors nepovirus. Previously, this endosymbiont was identified in the gut, ovaries, and eggs, but its role was unknown. We explored the potential role of this symbiont using fluorescence in situ hybridization, genome sequencing, and comparative functional genomics. We report the first genome of an intracellular Verrucomicrobium and the first exclusively intracellular non-Wolbachia nematode symbiont. Results revealed that Xiphinematobacter had a small 0.916-Mb genome with only 817 predicted proteins, resembling genomes of other mutualist endosymbionts. Compared with free-living relatives, conserved proteins were shorter on average, and there was large-scale loss of regulatory pathways. Despite massive gene loss, more genes were retained for biosynthesis of amino acids predicted to be essential to the host. Gene ontology enrichment tests showed enrichment for biosynthesis of arginine, histidine, and aromatic amino acids, as well as thiamine and coenzyme A, diverging from the profiles of relatives Akkermansia muciniphilia (in the human colon), Methylacidiphilum infernorum, and the mutualist Wolbachia from filarial nematodes. Together, these features and the location in the gut suggest that Xiphinematobacter functions as a nutritional mutualist, supplementing essential nutrients that are depleted in the nematode diet. This pattern points to evolutionary convergence with endosymbionts found in sap-feeding insects. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome.

    PubMed

    Wenger, Yvan; Galliot, Brigitte

    2013-03-25

    Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48'909 unique sequences including splice variants, representing approximately 24'450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10'597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11'270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events.

  15. RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

    PubMed Central

    2013-01-01

    Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

  16. Two Antarctic penguin genomes reveal insights into their evolutionary history and molecular changes related to the Antarctic environment.

    PubMed

    Li, Cai; Zhang, Yong; Li, Jianwen; Kong, Lesheng; Hu, Haofu; Pan, Hailin; Xu, Luohao; Deng, Yuan; Li, Qiye; Jin, Lijun; Yu, Hao; Chen, Yan; Liu, Binghang; Yang, Linfeng; Liu, Shiping; Zhang, Yan; Lang, Yongshan; Xia, Jinquan; He, Weiming; Shi, Qiong; Subramanian, Sankar; Millar, Craig D; Meader, Stephen; Rands, Chris M; Fujita, Matthew K; Greenwold, Matthew J; Castoe, Todd A; Pollock, David D; Gu, Wanjun; Nam, Kiwoong; Ellegren, Hans; Ho, Simon Yw; Burt, David W; Ponting, Chris P; Jarvis, Erich D; Gilbert, M Thomas P; Yang, Huanming; Wang, Jian; Lambert, David M; Wang, Jun; Zhang, Guojie

    2014-01-01

    Penguins are flightless aquatic birds widely distributed in the Southern Hemisphere. The distinctive morphological and physiological features of penguins allow them to live an aquatic life, and some of them have successfully adapted to the hostile environments in Antarctica. To study the phylogenetic and population history of penguins and the molecular basis of their adaptations to Antarctica, we sequenced the genomes of the two Antarctic dwelling penguin species, the Adélie penguin [Pygoscelis adeliae] and emperor penguin [Aptenodytes forsteri]. Phylogenetic dating suggests that early penguins arose ~60 million years ago, coinciding with a period of global warming. Analysis of effective population sizes reveals that the two penguin species experienced population expansions from ~1 million years ago to ~100 thousand years ago, but responded differently to the climatic cooling of the last glacial period. Comparative genomic analyses with other available avian genomes identified molecular changes in genes related to epidermal structure, phototransduction, lipid metabolism, and forelimb morphology. Our sequencing and initial analyses of the first two penguin genomes provide insights into the timing of penguin origin, fluctuations in effective population sizes of the two penguin species over the past 10 million years, and the potential associations between these biological patterns and global climate change. The molecular changes compared with other avian genomes reflect both shared and diverse adaptations of the two penguin species to the Antarctic environment.

  17. The pomegranate (Punica granatum L.) genome provides insights into fruit quality and ovule developmental biology.

    PubMed

    Yuan, Zhaohe; Fang, Yanming; Zhang, Taikui; Fei, Zhangjun; Han, Fengming; Liu, Cuiyu; Liu, Min; Xiao, Wei; Zhang, Wenjing; Wu, Shan; Zhang, Mengwei; Ju, Youhui; Xu, Huili; Dai, He; Liu, Yujun; Chen, Yanhui; Wang, Lili; Zhou, Jianqing; Guan, Dian; Yan, Ming; Xia, Yanhua; Huang, Xianbin; Liu, Dongyuan; Wei, Hongmin; Zheng, Hongkun

    2017-12-22

    Pomegranate (Punica granatum L.) has an ancient cultivation history and has become an emerging profitable fruit crop due to its attractive features such as the bright red appearance and the high abundance of medicinally valuable ellagitannin-based compounds in its peel and aril. However, the limited genomic resources have restricted further elucidation of genetics and evolution of these interesting traits. Here, we report a 274-Mb high-quality draft pomegranate genome sequence, which covers approximately 81.5% of the estimated 336-Mb genome, consists of 2177 scaffolds with an N50 size of 1.7 Mb and contains 30 903 genes. Phylogenomic analysis supported that pomegranate belongs to the Lythraceae family rather than the monogeneric Punicaceae family, and comparative analyses showed that pomegranate and Eucalyptus grandis share the paleotetraploidy event. Integrated genomic and transcriptomic analyses provided insights into the molecular mechanisms underlying the biosynthesis of ellagitannin-based compounds, the colour formation in both peels and arils during pomegranate fruit development, and the unique ovule development processes that are characteristic of pomegranate. This genome sequence provides an important resource to expand our understanding of some unique biological processes and to facilitate both comparative biology studies and crop breeding. © 2017 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  18. A machine learning approach for viral genome classification.

    PubMed

    Remita, Mohamed Amine; Halioui, Ahmed; Malick Diouara, Abou Abdallah; Daigle, Bruno; Kiani, Golrokh; Diallo, Abdoulaye Baniré

    2017-04-11

    Advances in cloning and sequencing technology are yielding a massive number of viral genomes. The classification and annotation of these genomes constitute important assets in the discovery of genomic variability, taxonomic characteristics and disease mechanisms. Existing classification methods are often designed for specific well-studied family of viruses. Thus, the viral comparative genomic studies could benefit from more generic, fast and accurate tools for classifying and typing newly sequenced strains of diverse virus families. Here, we introduce a virus classification platform, CASTOR, based on machine learning methods. CASTOR is inspired by a well-known technique in molecular biology: restriction fragment length polymorphism (RFLP). It simulates, in silico, the restriction digestion of genomic material by different enzymes into fragments. It uses two metrics to construct feature vectors for machine learning algorithms in the classification step. We benchmark CASTOR for the classification of distinct datasets of human papillomaviruses (HPV), hepatitis B viruses (HBV) and human immunodeficiency viruses type 1 (HIV-1). Results reveal true positive rates of 99%, 99% and 98% for HPV Alpha species, HBV genotyping and HIV-1 M subtyping, respectively. Furthermore, CASTOR shows a competitive performance compared to well-known HIV-1 specific classifiers (REGA and COMET) on whole genomes and pol fragments. The performance of CASTOR, its genericity and robustness could permit to perform novel and accurate large scale virus studies. The CASTOR web platform provides an open access, collaborative and reproducible machine learning classifiers. CASTOR can be accessed at http://castor.bioinfo.uqam.ca .

  19. A Pan-Genomic Approach to Understand the Basis of Host Adaptation in Achromobacter

    PubMed Central

    Jeukens, Julie; Freschi, Luca; Vincent, Antony T.; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Charette, Steve J.

    2017-01-01

    Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the cystic fibrosis lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of Achromobacter xylosoxidans, Achromobacter insuavis, Achromobacter dolens, and Achromobacter ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared with other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus’s resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. PMID:28383665

  20. Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

    PubMed Central

    Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

    2013-01-01

    Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121

  1. Comparative Analysis of the Mitochondrial Genomes of Callitettixini Spittlebugs (Hemiptera: Cercopidae) Confirms the Overall High Evolutionary Speed of the AT-Rich Region but Reveals the Presence of Short Conservative Elements at the Tribal Level

    PubMed Central

    Liu, Jie; Bu, Cuiping; Wipfler, Benjamin; Liang, Aiping

    2014-01-01

    The present study compares the mitochondrial genomes of five species of the spittlebug tribe Callitettixini (Hemiptera: Cercopoidea: Cercopidae) from eastern Asia. All genomes of the five species sequenced are circular double-stranded DNA molecules and range from 15,222 to 15,637 bp in length. They contain 22 tRNA genes, 13 protein coding genes (PCGs) and 2 rRNA genes and share the putative ancestral gene arrangement of insects. The PCGs show an extreme bias of nucleotide and amino acid composition. Significant differences of the substitution rates among the different genes as well as the different codon position of each PCG are revealed by the comparative evolutionary analyses. The substitution speeds of the first and second codon position of different PCGs are negatively correlated with their GC content. Among the five species, the AT-rich region features great differences in length and pattern and generally shows a 2–5 times higher substitution rate than the fastest PCG in the mitochondrial genome, atp8. Despite the significant variability in length, short conservative segments were identified in the AT-rich region within Callitettixini, although absent from the other groups of the spittlebug superfamily Cercopoidea. PMID:25285442

  2. Who's afraid of Homo sapiens?

    PubMed

    Preuss, Todd M

    2006-11-29

    Understanding how humans differ from other animals, as well as how we are like them, requires comparative investigations. For the purpose of documenting the distinctive features of humans, the most informative research involves comparing humans to our closest relatives-the chimpanzees and other great apes. Psychology and anthropology have maintained a tradition of empirical comparative research on human specializations of cognition. The neurosciences, by contrast, have been dominated by the model-animal research paradigm, which presupposes the commonality of "basic" features of brain organization across species and discourages serious treatment of species differences. As a result, the neurosciences have made little progress in understanding human brain specializations. Recent developments in neuroimaging, genomics, and other non-invasive techniques make it possible to directly compare humans and nonhuman species at levels of organization that were previously inaccessible, offering the hope of gaining a better understanding of the species-specific features of the human brain. This hope will be dashed, however, if chimpanzees and other great ape species become unavailable for even non-invasive research.

  3. Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation.

    PubMed

    Mourad, Raphaël; Cuvier, Olivier

    2016-05-01

    Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1.

  4. Computational Identification of Genomic Features That Influence 3D Chromatin Domain Formation

    PubMed Central

    Mourad, Raphaël; Cuvier, Olivier

    2016-01-01

    Recent advances in long-range Hi-C contact mapping have revealed the importance of the 3D structure of chromosomes in gene expression. A current challenge is to identify the key molecular drivers of this 3D structure. Several genomic features, such as architectural proteins and functional elements, were shown to be enriched at topological domain borders using classical enrichment tests. Here we propose multiple logistic regression to identify those genomic features that positively or negatively influence domain border establishment or maintenance. The model is flexible, and can account for statistical interactions among multiple genomic features. Using both simulated and real data, we show that our model outperforms enrichment test and non-parametric models, such as random forests, for the identification of genomic features that influence domain borders. Using Drosophila Hi-C data at a very high resolution of 1 kb, our model suggests that, among architectural proteins, BEAF-32 and CP190 are the main positive drivers of 3D domain borders. In humans, our model identifies well-known architectural proteins CTCF and cohesin, as well as ZNF143 and Polycomb group proteins as positive drivers of domain borders. The model also reveals the existence of several negative drivers that counteract the presence of domain borders including P300, RXRA, BCL11A and ELK1. PMID:27203237

  5. Evolution of gastropod mitochondrial genome arrangements

    PubMed Central

    2008-01-01

    Background Gastropod mitochondrial genomes exhibit an unusually great variety of gene orders compared to other metazoan mitochondrial genome such as e.g those of vertebrates. Hence, gastropod mitochondrial genomes constitute a good model system to study patterns, rates, and mechanisms of mitochondrial genome rearrangement. However, this kind of evolutionary comparative analysis requires a robust phylogenetic framework of the group under study, which has been elusive so far for gastropods in spite of the efforts carried out during the last two decades. Here, we report the complete nucleotide sequence of five mitochondrial genomes of gastropods (Pyramidella dolabrata, Ascobulla fragilis, Siphonaria pectinata, Onchidella celtica, and Myosotella myosotis), and we analyze them together with another ten complete mitochondrial genomes of gastropods currently available in molecular databases in order to reconstruct the phylogenetic relationships among the main lineages of gastropods. Results Comparative analyses with other mollusk mitochondrial genomes allowed us to describe molecular features and general trends in the evolution of mitochondrial genome organization in gastropods. Phylogenetic reconstruction with commonly used methods of phylogenetic inference (ME, MP, ML, BI) arrived at a single topology, which was used to reconstruct the evolution of mitochondrial gene rearrangements in the group. Conclusion Four main lineages were identified within gastropods: Caenogastropoda, Vetigastropoda, Patellogastropoda, and Heterobranchia. Caenogastropoda and Vetigastropoda are sister taxa, as well as, Patellogastropoda and Heterobranchia. This result rejects the validity of the derived clade Apogastropoda (Caenogastropoda + Heterobranchia). The position of Patellogastropoda remains unclear likely due to long-branch attraction biases. Within Heterobranchia, the most heterogeneous group of gastropods, neither Euthyneura (because of the inclusion of P. dolabrata) nor Pulmonata (polyphyletic) nor Opisthobranchia (because of the inclusion S. pectinata) were recovered as monophyletic groups. The gene order of the Vetigastropoda might represent the ancestral mitochondrial gene order for Gastropoda and we propose that at least three major rearrangements have taken place in the evolution of gastropods: one in the ancestor of Caenogastropoda, another in the ancestor of Patellogastropoda, and one more in the ancestor of Heterobranchia. PMID:18302768

  6. Evolution, functions, and mysteries of plant ARGONAUTE proteins.

    PubMed

    Zhang, Han; Xia, Rui; Meyers, Blake C; Walbot, Virginia

    2015-10-01

    ARGONAUTE (AGO) proteins bind small RNAs (sRNAs) to form RNA-induced silencing complexes for transcriptional and post-transcriptional gene silencing. Genomes of primitive plants encode only a few AGO proteins. The Arabidopsis thaliana genome encodes ten AGO proteins, designated AGO1 to AGO10. Most early studies focused on these ten proteins and their interacting sRNAs. AGOs in other flowering plant species have duplicated and diverged from this set, presumably corresponding to new, diverged or specific functions. Among these, the grass-specific AGO18 family has been discovered and implicated as playing important roles during plant reproduction and viral defense. This review covers our current knowledge about functions and features of AGO proteins in both eudicots and monocots and compares their similarities and differences. On the basis of these features, we propose a new nomenclature for some plant AGOs. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Microdeletion of 19p13.3 in a girl with Peutz-Jeghers syndrome, intellectual disability, hypotonia, and distinctive features.

    PubMed

    Kuroda, Yukiko; Saito, Toshiyuki; Nagai, Jun-Ichi; Ida, Kazumi; Naruto, Takuya; Masuno, Mitsuo; Kurosawa, Kenji

    2015-02-01

    Peutz-Jeghers syndrome (PJS) is a rare autosomal dominant disease characterized by gastrointestinal polyposis and mucocutaneous pigmentation. Germline point mutations in the serine/threonine kinase 11 (STK11) have been identified in about 70% of patients with PJS. Only a few large genomic deletions have been identified. We report on a girl with PJS and multiple congenital anomalies. She had intellectual disability, umbilical hernia, bilateral inguinal hernias, scoliosis, and distinct facial appearance including prominent mandible, smooth philtrum, and malformed ears. She developed lip pigmentation at the age of 12 years but had no gastrointestinal polyps. Array comparative genomic hybridization revealed an approximately 610 kb deletion at 19p13.3, encompassing STK11. Together with previous reports, the identification of common clinical features suggests that microdeletion at 19p13.3 encompassing STK11 constitutes a distinctive phenotype. © 2014 Wiley Periodicals, Inc.

  8. Giraffe genome sequence reveals clues to its unique morphology and physiology

    PubMed Central

    Agaba, Morris; Ishengoma, Edson; Miller, Webb C.; McGrath, Barbara C.; Hudson, Chelsea N.; Bedoya Reina, Oscar C.; Ratan, Aakrosh; Burhans, Rico; Chikhi, Rayan; Medvedev, Paul; Praul, Craig A.; Wu-Cavener, Lan; Wood, Brendan; Robertson, Heather; Penfold, Linda; Cavener, Douglas R.

    2016-01-01

    The origins of giraffe's imposing stature and associated cardiovascular adaptations are unknown. Okapi, which lacks these unique features, is giraffe's closest relative and provides a useful comparison, to identify genetic variation underlying giraffe's long neck and cardiovascular system. The genomes of giraffe and okapi were sequenced, and through comparative analyses genes and pathways were identified that exhibit unique genetic changes and likely contribute to giraffe's unique features. Some of these genes are in the HOX, NOTCH and FGF signalling pathways, which regulate both skeletal and cardiovascular development, suggesting that giraffe's stature and cardiovascular adaptations evolved in parallel through changes in a small number of genes. Mitochondrial metabolism and volatile fatty acids transport genes are also evolutionarily diverged in giraffe and may be related to its unusual diet that includes toxic plants. Unexpectedly, substantial evolutionary changes have occurred in giraffe and okapi in double-strand break repair and centrosome functions. PMID:27187213

  9. MitoAge: a database for comparative analysis of mitochondrial DNA, with a special focus on animal longevity.

    PubMed

    Toren, Dmitri; Barzilay, Thomer; Tacutu, Robi; Lehmann, Gilad; Muradian, Khachik K; Fraifeld, Vadim E

    2016-01-04

    Mitochondria are the only organelles in the animal cells that have their own genome. Due to a key role in energy production, generation of damaging factors (ROS, heat), and apoptosis, mitochondria and mtDNA in particular have long been considered one of the major players in the mechanisms of aging, longevity and age-related diseases. The rapidly increasing number of species with fully sequenced mtDNA, together with accumulated data on longevity records, provides a new fascinating basis for comparative analysis of the links between mtDNA features and animal longevity. To facilitate such analyses and to support the scientific community in carrying these out, we developed the MitoAge database containing calculated mtDNA compositional features of the entire mitochondrial genome, mtDNA coding (tRNA, rRNA, protein-coding genes) and non-coding (D-loop) regions, and codon usage/amino acids frequency for each protein-coding gene. MitoAge includes 922 species with fully sequenced mtDNA and maximum lifespan records. The database is available through the MitoAge website (www.mitoage.org or www.mitoage.info), which provides the necessary tools for searching, browsing, comparing and downloading the data sets of interest for selected taxonomic groups across the Kingdom Animalia. The MitoAge website assists in statistical analysis of different features of the mtDNA and their correlative links to longevity. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Minimalist ensemble algorithms for genome-wide protein localization prediction.

    PubMed

    Lin, Jhih-Rong; Mondal, Ananda Mohan; Liu, Rong; Hu, Jianjun

    2012-07-03

    Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.

  11. Minimalist ensemble algorithms for genome-wide protein localization prediction

    PubMed Central

    2012-01-01

    Background Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. Results This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. Conclusions We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi. PMID:22759391

  12. The complete chloroplast DNA sequence of the green alga Oltmannsiellopsis viridis reveals a distinctive quadripartite architecture in the chloroplast genome of early diverging ulvophytes

    PubMed Central

    Pombert, Jean-François; Lemieux, Claude; Turmel, Monique

    2006-01-01

    Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA) sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae), in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR) featuring an inverted rRNA operon and a small single-copy (SSC) region containing 14 genes normally found in the large single-copy (LSC) region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of Oltmannsiellopsis cpDNA more closely resembles that of Chlorella (Trebouxiophyceae) cpDNA. Conclusion The chloroplast genome of the last common ancestor of Oltmannsiellopsis and Pseudendoclonium contained a minimum of 108 genes, carried only a few group I introns, and featured a distinctive quadripartite architecture. Numerous changes were experienced by the chloroplast genome in the lineages leading to Oltmannsiellopsis and Pseudendoclonium. Our comparative analyses of chlorophyte cpDNAs support the notion that the Ulvophyceae is sister to the Chlorophyceae. PMID:16472375

  13. Nonclinical and Clinical Enterococcus faecium Strains, but Not Enterococcus faecalis Strains, Have Distinct Structural and Functional Genomic Features

    PubMed Central

    Kim, Eun Bae

    2014-01-01

    Certain strains of Enterococcus faecium and Enterococcus faecalis contribute beneficially to animal health and food production, while others are associated with nosocomial infections. To determine whether there are structural and functional genomic features that are distinct between nonclinical (NC) and clinical (CL) strains of those species, we analyzed the genomes of 31 E. faecium and 38 E. faecalis strains. Hierarchical clustering of 7,017 orthologs found in the E. faecium pangenome revealed that NC strains clustered into two clades and are distinct from CL strains. NC E. faecium genomes are significantly smaller than CL genomes, and this difference was partly explained by significantly fewer mobile genetic elements (ME), virulence factors (VF), and antibiotic resistance (AR) genes. E. faecium ortholog comparisons identified 68 and 153 genes that are enriched for NC and CL strains, respectively. Proximity analysis showed that CL-enriched loci, and not NC-enriched loci, are more frequently colocalized on the genome with ME. In CL genomes, AR genes are also colocalized with ME, and VF are more frequently associated with CL-enriched loci. Genes in 23 functional groups are also differentially enriched between NC and CL E. faecium genomes. In contrast, differences were not observed between NC and CL E. faecalis genomes despite their having larger genomes than E. faecium. Our findings show that unlike E. faecalis, NC and CL E. faecium strains are equipped with distinct structural and functional genomic features indicative of adaptation to different environments. PMID:24141120

  14. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies.

    PubMed

    Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L

    2018-01-01

    Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.

  15. Superior ab initio identification, annotation and characterisation of TEs and segmental duplications from genome assemblies

    PubMed Central

    Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.

    2018-01-01

    Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441

  16. A Novel Method for Accurate Operon Predictions in All SequencedProkaryotes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Price, Morgan N.; Huang, Katherine H.; Alm, Eric J.

    2004-12-01

    We combine comparative genomic measures and the distance separating adjacent genes to predict operons in 124 completely sequenced prokaryotic genomes. Our method automatically tailors itself to each genome using sequence information alone, and thus can be applied to any prokaryote. For Escherichia coli K12 and Bacillus subtilis, our method is 85 and 83% accurate, respectively, which is similar to the accuracy of methods that use the same features but are trained on experimentally characterized transcripts. In Halobacterium NRC-1 and in Helicobacterpylori, our method correctly infers that genes in operons are separated by shorter distances than they are in E.coli, andmore » its predictions using distance alone are more accurate than distance-only predictions trained on a database of E.coli transcripts. We use microarray data from sixphylogenetically diverse prokaryotes to show that combining intergenic distance with comparative genomic measures further improves accuracy and that our method is broadly effective. Finally, we survey operon structure across 124 genomes, and find several surprises: H.pylori has many operons, contrary to previous reports; Bacillus anthracis has an unusual number of pseudogenes within conserved operons; and Synechocystis PCC6803 has many operons even though it has unusually wide spacings between conserved adjacent genes.« less

  17. Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM)

    PubMed Central

    Beagrie, Robert A.; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C.A.; Chotalia, Mita; Xie, Sheila Q.; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R.; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A.W.; Nicodemi, Mario; Pombo, Ana

    2017-01-01

    Summary The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. We developed a novel genome-wide method, Genome Architecture Mapping (GAM), for measuring chromatin contacts, and other features of three-dimensional chromatin topology, based on sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify an enrichment for specific interactions between active genes and enhancers across very large genomic distances, using a mathematical model ‘SLICE’ (Statistical Inference of Co-segregation). GAM also reveals an abundance of three-way contacts genome-wide, especially between regions that are highly transcribed or contain super-enhancers, highlighting a previously inaccessible complexity in genome architecture and a major role for gene-expression specific contacts in organizing the genome in mammalian nuclei. PMID:28273065

  18. Diverse circovirus-like genome architectures revealed by environmental metagenomics.

    PubMed

    Rosario, Karyna; Duffy, Siobain; Breitbart, Mya

    2009-10-01

    Single-stranded DNA (ssDNA) viruses with circular genomes are the smallest viruses known to infect eukaryotes. The present study identified 10 novel genomes similar to ssDNA circoviruses through data-mining of public viral metagenomes. The metagenomic libraries included samples from reclaimed water and three different marine environments (Chesapeake Bay, British Columbia coastal waters and Sargasso Sea). All the genomes have similarities to the replication (Rep) protein of circoviruses; however, only half have genomic features consistent with known circoviruses. Some of the genomes exhibit a mixture of genomic features associated with different families of ssDNA viruses (i.e. circoviruses, geminiviruses and parvoviruses). Unique genome architectures and phylogenetic analysis of the Rep protein suggest that these viruses belong to novel genera and/or families. Investigating the complex community of ssDNA viruses in the environment can lead to the discovery of divergent species and help elucidate evolutionary links between ssDNA viruses.

  19. Comparison of Clinical Features and Outcomes in Patients with Extraskeletal Versus Skeletal Localized Ewing Sarcoma: A Report from the Children’s Oncology Group

    PubMed Central

    Cash, Thomas; McIlvaine, Elizabeth; Krailo, Mark D.; Lessnick, Stephen L.; Lawlor, Elizabeth R.; Laack, Nadia; Sorger, Joel; Marina, Neyssa; Grier, Holcombe E.; Granowetter, Linda; Womer, Richard B.; DuBois, Steven G.

    2016-01-01

    BACKGROUND The prognostic significance of having extraskeletal vs. skeletal Ewing sarcoma in the setting of modern chemotherapy protocols is unknown. The purpose of this study was to compare the clinical characteristics, biologic features, and outcomes for patients with extraskeletal and skeletal Ewing sarcoma. METHODS Patients had localized Ewing sarcoma (ES) and were treated on two consecutive protocols using 5-drug chemotherapy (INT-0154 and AEWS0031). Patients were analyzed based on having an extraskeletal (n=213) or skeletal (n=826) site of tumor origin. Event-free survival (EFS) was estimated using the Kaplan-Meier method, compared using the log-rank test, and modeled using Cox multivariate regression. RESULTS Patients with extraskeletal Ewing Sarcoma (EES) were more likely to have axial tumors (72% vs. 55%; P < 0.001), less likely to have tumors > 8 cm (9% vs. 17%; P < 0.01), and less likely to be white (81% vs. 87%; P < 0.001) compared to patients with skeletal ES. There was no difference in key genomic features (type of EWSR1 translocation, TP53 mutation, CDKN2A mutation/loss) between groups. After controlling for age, race, and primary site, EES was associated with superior EFS [hazard ratio = 0.69; 95% CI: 0.50–0.95; P = 0.02]. Among patients with EES, age ≥ 18 years, non-white race, and elevated baseline erythrocyte sedimentation rate (ESR) were independently associated with inferior EFS. CONCLUSION Clinical characteristics, but not key tumor genomic features, differ between EES and skeletal ES. Extraskeletal origin is a favorable prognostic factor, independent of age, race, and primary site. PMID:27297500

  20. Rapid CRISPR/Cas9-Mediated Cloning of Full-Length Epstein-Barr Virus Genomes from Latently Infected Cells.

    PubMed

    Yajima, Misako; Ikuta, Kazufumi; Kanda, Teru

    2018-04-03

    Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC) cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV) genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as possible pitfalls and points to be improved in the future. The obtained EBV-BAC clones are subjected to long-read sequencing analysis to determine complete EBV genome sequence including repetitive regions. Rapid cloning and sequence determination of various EBV strains will greatly contribute to the understanding of their global geographical distribution. This technology can also be used to clone disease-associated EBV strains and test the hypothesis that they have special features that distinguish them from strains that infect asymptomatically.

  1. Rapid CRISPR/Cas9-Mediated Cloning of Full-Length Epstein-Barr Virus Genomes from Latently Infected Cells

    PubMed Central

    Ikuta, Kazufumi; Kanda, Teru

    2018-01-01

    Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC) cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV) genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as possible pitfalls and points to be improved in the future. The obtained EBV-BAC clones are subjected to long-read sequencing analysis to determine complete EBV genome sequence including repetitive regions. Rapid cloning and sequence determination of various EBV strains will greatly contribute to the understanding of their global geographical distribution. This technology can also be used to clone disease-associated EBV strains and test the hypothesis that they have special features that distinguish them from strains that infect asymptomatically. PMID:29614006

  2. Association between genomic instability and evolutionary chromosomal rearrangements in Neotropical Primates.

    PubMed

    Puntieri, Fiona; Andrioli, Nancy B; Nieves, Mariela

    2018-06-14

    During the last decades the mammalian genome has been proposed to have regions prone to breakage and reorganization concentrated in certain chromosomal bands that seem to correspond to evolutionary breakpoints. These bands are likely to be involved in chromosome fragility or instability. In Primates, some biomarkers of genetic damage may be associated with various degrees of genomic instability. Here, we investigated the usefulness of Sister Chromatid Exchange (SCE) as a biomarker of potential sites of frequent chromosome breakage and rearrangement in Alouatta caraya, Ateles chamek, Ateles paniscus and Cebus cay. These Neotropical species have particular genomic and chromosomal features allowing the analysis of genomic instability for comparative purposes. We determined the frequency of spontaneous induction of SCEs and assessed the relationship between these and structural rearrangements implicated in the evolution of the primates of interest. Overall, A. caraya and C. cay presented a low proportion of statistically significant unstable bands, suggesting fairly stable genomes and the existence of some kind of protection against endogenous damage. In contrast, Ateles showed a highly significant proportion of unstable bands; these were mainly found in the rearranged regions, which is consistent with the numerous genomic reorganizations that might have occurred during the evolution of this genus.

  3. Genomic insights into the Acidobacteria reveal strategies for their success in terrestrial environments

    PubMed Central

    Trojan, Daniela; Roux, Simon; Herbold, Craig; Rattei, Thomas; Woebken, Dagmar

    2018-01-01

    Summary Members of the phylum Acidobacteria are abundant and ubiquitous across soils. We performed a large‐scale comparative genome analysis spanning subdivisions 1, 3, 4, 6, 8 and 23 (n = 24) with the goal to identify features to help explain their prevalence in soils and understand their ecophysiology. Our analysis revealed that bacteriophage integration events along with transposable and mobile elements influenced the structure and plasticity of these genomes. Low‐ and high‐affinity respiratory oxygen reductases were detected in multiple genomes, suggesting the capacity for growing across different oxygen gradients. Among many genomes, the capacity to use a diverse collection of carbohydrates, as well as inorganic and organic nitrogen sources (such as via extracellular peptidases), was detected – both advantageous traits in environments with fluctuating nutrient environments. We also identified multiple soil acidobacteria with the potential to scavenge atmospheric concentrations of H2, now encompassing mesophilic soil strains within the subdivision 1 and 3, in addition to a previously identified thermophilic strain in subdivision 4. This large‐scale acidobacteria genome analysis reveal traits that provide genomic, physiological and metabolic versatility, presumably allowing flexibility and versatility in the challenging and fluctuating soil environment. PMID:29327410

  4. multi-dice: r package for comparative population genomic inference under hierarchical co-demographic models of independent single-population size changes.

    PubMed

    Xue, Alexander T; Hickerson, Michael J

    2017-11-01

    Population genetic data from multiple taxa can address comparative phylogeographic questions about community-scale response to environmental shifts, and a useful strategy to this end is to employ hierarchical co-demographic models that directly test multi-taxa hypotheses within a single, unified analysis. This approach has been applied to classical phylogeographic data sets such as mitochondrial barcodes as well as reduced-genome polymorphism data sets that can yield 10,000s of SNPs, produced by emergent technologies such as RAD-seq and GBS. A strategy for the latter had been accomplished by adapting the site frequency spectrum to a novel summarization of population genomic data across multiple taxa called the aggregate site frequency spectrum (aSFS), which potentially can be deployed under various inferential frameworks including approximate Bayesian computation, random forest and composite likelihood optimization. Here, we introduce the r package multi-dice, a wrapper program that exploits existing simulation software for flexible execution of hierarchical model-based inference using the aSFS, which is derived from reduced genome data, as well as mitochondrial data. We validate several novel software features such as applying alternative inferential frameworks, enforcing a minimal threshold of time surrounding co-demographic pulses and specifying flexible hyperprior distributions. In sum, multi-dice provides comparative analysis within the familiar R environment while allowing a high degree of user customization, and will thus serve as a tool for comparative phylogeography and population genomics. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  5. Comparative analysis of 2D and 3D distance measurements to study spatial genome organization.

    PubMed

    Finn, Elizabeth H; Pegoraro, Gianluca; Shachar, Sigal; Misteli, Tom

    2017-07-01

    The spatial organization of genomes is non-random, cell-type specific, and has been linked to cellular function. The investigation of spatial organization has traditionally relied extensively on fluorescence microscopy. The validity of the imaging methods used to probe spatial genome organization often depends on the accuracy and precision of distance measurements. Imaging-based measurements may either use 2 dimensional datasets or 3D datasets which include the z-axis information in image stacks. Here we compare the suitability of 2D vs 3D distance measurements in the analysis of various features of spatial genome organization. We find in general good agreement between 2D and 3D analysis with higher convergence of measurements as the interrogated distance increases, especially in flat cells. Overall, 3D distance measurements are more accurate than 2D distances, but are also more susceptible to noise. In particular, z-stacks are prone to error due to imaging properties such as limited resolution along the z-axis and optical aberrations, and we also find significant deviations from unimodal distance distributions caused by low sampling frequency in z. These deviations are ameliorated by significantly higher sampling frequency in the z-direction. We conclude that 2D distances are preferred for comparative analyses between cells, but 3D distances are preferred when comparing to theoretical models in large samples of cells. In general and for practical purposes, 2D distance measurements are preferable for many applications of analysis of spatial genome organization. Published by Elsevier Inc.

  6. RELATIONSHIP BETWEEN PHYLOGENETIC DISTRIBUTION AND GENOMIC FEATURES IN NEUROSPORA CRASSA

    USDA-ARS?s Scientific Manuscript database

    In the post-genome era, insufficient functional annotation of predicted genes greatly restricts the potential of mining genome data. We demonstrate that an evolutionary approach, which is independent of functional annotation, has great potential as a tool for genome analysis. We chose the genome o...

  7. Extending information retrieval methods to personalized genomic-based studies of disease.

    PubMed

    Ye, Shuyun; Dawson, John A; Kendziorski, Christina

    2014-01-01

    Genomic-based studies of disease now involve diverse types of data collected on large groups of patients. A major challenge facing statistical scientists is how best to combine the data, extract important features, and comprehensively characterize the ways in which they affect an individual's disease course and likelihood of response to treatment. We have developed a survival-supervised latent Dirichlet allocation (survLDA) modeling framework to address these challenges. Latent Dirichlet allocation (LDA) models have proven extremely effective at identifying themes common across large collections of text, but applications to genomics have been limited. Our framework extends LDA to the genome by considering each patient as a "document" with "text" detailing his/her clinical events and genomic state. We then further extend the framework to allow for supervision by a time-to-event response. The model enables the efficient identification of collections of clinical and genomic features that co-occur within patient subgroups, and then characterizes each patient by those features. An application of survLDA to The Cancer Genome Atlas ovarian project identifies informative patient subgroups showing differential response to treatment, and validation in an independent cohort demonstrates the potential for patient-specific inference.

  8. Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

    DOE PAGES

    Zhang, Qian; Jun, Se -Ran; Leuze, Michael; ...

    2017-01-19

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less

  9. Viral phylogenomics using an alignment-free method: A three-step approach to determine optimal length of k-mer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Qian; Jun, Se -Ran; Leuze, Michael

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral tree of life . However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conservedmore » proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. Lastly, the resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses.« less

  10. GBshape: a genome browser database for DNA shape annotations.

    PubMed

    Chiu, Tsu-Pei; Yang, Lin; Zhou, Tianyin; Main, Bradley J; Parker, Stephen C J; Nuzhdin, Sergey V; Tullius, Thomas D; Rohs, Remo

    2015-01-01

    Many regulatory mechanisms require a high degree of specificity in protein-DNA binding. Nucleotide sequence does not provide an answer to the question of why a protein binds only to a small subset of the many putative binding sites in the genome that share the same core motif. Whereas higher-order effects, such as chromatin accessibility, cooperativity and cofactors, have been described, DNA shape recently gained attention as another feature that fine-tunes the DNA binding specificities of some transcription factor families. Our Genome Browser for DNA shape annotations (GBshape; freely available at http://rohslab.cmb.usc.edu/GBshape/) provides minor groove width, propeller twist, roll, helix twist and hydroxyl radical cleavage predictions for the entire genomes of 94 organisms. Additional genomes can easily be added using the GBshape framework. GBshape can be used to visualize DNA shape annotations qualitatively in a genome browser track format, and to download quantitative values of DNA shape features as a function of genomic position at nucleotide resolution. As biological applications, we illustrate the periodicity of DNA shape features that are present in nucleosome-occupied sequences from human, fly and worm, and we demonstrate structural similarities between transcription start sites in the genomes of four Drosophila species. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  11. Viral Phylogenomics Using an Alignment-Free Method: A Three-Step Approach to Determine Optimal Length of k-mer

    PubMed Central

    Zhang, Qian; Jun, Se-Ran; Leuze, Michael; Ussery, David; Nookaew, Intawat

    2017-01-01

    The development of rapid, economical genome sequencing has shed new light on the classification of viruses. As of October 2016, the National Center for Biotechnology Information (NCBI) database contained >2 million viral genome sequences and a reference set of ~4000 viral genome sequences that cover a wide range of known viral families. Whole-genome sequences can be used to improve viral classification and provide insight into the viral “tree of life”. However, due to the lack of evolutionary conservation amongst diverse viruses, it is not feasible to build a viral tree of life using traditional phylogenetic methods based on conserved proteins. In this study, we used an alignment-free method that uses k-mers as genomic features for a large-scale comparison of complete viral genomes available in RefSeq. To determine the optimal feature length, k (an essential step in constructing a meaningful dendrogram), we designed a comprehensive strategy that combines three approaches: (1) cumulative relative entropy, (2) average number of common features among genomes, and (3) the Shannon diversity index. This strategy was used to determine k for all 3,905 complete viral genomes in RefSeq. The resulting dendrogram shows consistency with the viral taxonomy of the ICTV and the Baltimore classification of viruses. PMID:28102365

  12. A multi-scale analysis of bull sperm methylome revealed both species peculiarities and conserved tissue-specific features.

    PubMed

    Perrier, Jean-Philippe; Sellem, Eli; Prézelin, Audrey; Gasselin, Maxime; Jouneau, Luc; Piumi, François; Al Adhami, Hala; Weber, Michaël; Fritz, Sébastien; Boichard, Didier; Le Danvic, Chrystelle; Schibler, Laurent; Jammes, Hélène; Kiefer, Hélène

    2018-05-29

    Spermatozoa have a remarkable epigenome in line with their degree of specialization, their unique nature and different requirements for successful fertilization. Accordingly, perturbations in the establishment of DNA methylation patterns during male germ cell differentiation have been associated with infertility in several species. While bull semen is widely used in artificial insemination, the literature describing DNA methylation in bull spermatozoa is still scarce. The purpose of this study was therefore to characterize the bull sperm methylome relative to both bovine somatic cells and the sperm of other mammals through a multiscale analysis. The quantification of DNA methylation at CCGG sites using luminometric methylation assay (LUMA) highlighted the undermethylation of bull sperm compared to the sperm of rams, stallions, mice, goats and men. Total blood cells displayed a similarly high level of methylation in bulls and rams, suggesting that undermethylation of the bovine genome was specific to sperm. Annotation of CCGG sites in different species revealed no striking bias in the distribution of genome features targeted by LUMA that could explain undermethylation of bull sperm. To map DNA methylation at a genome-wide scale, bull sperm was compared with bovine liver, fibroblasts and monocytes using reduced representation bisulfite sequencing (RRBS) and immunoprecipitation of methylated DNA followed by microarray hybridization (MeDIP-chip). These two methods exhibited differences in terms of genome coverage, and consistently, two independent sets of sequences differentially methylated in sperm and somatic cells were identified for RRBS and MeDIP-chip. Remarkably, in the two sets most of the differentially methylated sequences were hypomethylated in sperm. In agreement with previous studies in other species, the sequences that were specifically hypomethylated in bull sperm targeted processes relevant to the germline differentiation program (piRNA metabolism, meiosis, spermatogenesis) and sperm functions (cell adhesion, fertilization), as well as satellites and rDNA repeats. These results highlight the undermethylation of bull spermatozoa when compared with both bovine somatic cells and the sperm of other mammals, and raise questions regarding the dynamics of DNA methylation in bovine male germline. Whether sperm undermethylation has potential interactions with structural variation in the cattle genome may deserve further attention.

  13. The Essential Genome of Escherichia coli K-12.

    PubMed

    Goodall, Emily C A; Robinson, Ashley; Johnston, Iain G; Jabbari, Sara; Turner, Keith A; Cunningham, Adam F; Lund, Peter A; Cole, Jeffrey A; Henderson, Ian R

    2018-02-20

    Transposon-directed insertion site sequencing (TraDIS) is a high-throughput method coupling transposon mutagenesis with short-fragment DNA sequencing. It is commonly used to identify essential genes. Single gene deletion libraries are considered the gold standard for identifying essential genes. Currently, the TraDIS method has not been benchmarked against such libraries, and therefore, it remains unclear whether the two methodologies are comparable. To address this, a high-density transposon library was constructed in Escherichia coli K-12. Essential genes predicted from sequencing of this library were compared to existing essential gene databases. To decrease false-positive identification of essential genes, statistical data analysis included corrections for both gene length and genome length. Through this analysis, new essential genes and genes previously incorrectly designated essential were identified. We show that manual analysis of TraDIS data reveals novel features that would not have been detected by statistical analysis alone. Examples include short essential regions within genes, orientation-dependent effects, and fine-resolution identification of genome and protein features. Recognition of these insertion profiles in transposon mutagenesis data sets will assist genome annotation of less well characterized genomes and provides new insights into bacterial physiology and biochemistry. IMPORTANCE Incentives to define lists of genes that are essential for bacterial survival include the identification of potential targets for antibacterial drug development, genes required for rapid growth for exploitation in biotechnology, and discovery of new biochemical pathways. To identify essential genes in Escherichia coli , we constructed a transposon mutant library of unprecedented density. Initial automated analysis of the resulting data revealed many discrepancies compared to the literature. We now report more extensive statistical analysis supported by both literature searches and detailed inspection of high-density TraDIS sequencing data for each putative essential gene for the E. coli model laboratory organism. This paper is important because it provides a better understanding of the essential genes of E. coli , reveals the limitations of relying on automated analysis alone, and provides a new standard for the analysis of TraDIS data. Copyright © 2018 Goodall et al.

  14. AbsIDconvert: An absolute approach for converting genetic identifiers at different granularities

    PubMed Central

    2012-01-01

    Background High-throughput molecular biology techniques yield vast amounts of data, often by detecting small portions of ribonucleotides corresponding to specific identifiers. Existing bioinformatic methodologies categorize and compare these elements using inferred descriptive annotation given this sequence information irrespective of the fact that it may not be representative of the identifier as a whole. Results All annotations, no matter the granularity, can be aligned to genomic sequences and therefore annotated by genomic intervals. We have developed AbsIDconvert, a methodology for converting between genomic identifiers by first mapping them onto a common universal coordinate system using an interval tree which is subsequently queried for overlapping identifiers. AbsIDconvert has many potential uses, including gene identifier conversion, identification of features within a genomic region, and cross-species comparisons. The utility is demonstrated in three case studies: 1) comparative genomic study mapping plasmodium gene sequences to corresponding human and mosquito transcriptional regions; 2) cross-species study of Incyte clone sequences; and 3) analysis of human Ensembl transcripts mapped by Affymetrix®; and Agilent microarray probes. AbsIDconvert currently supports ID conversion of 53 species for a given list of input identifiers, genomic sequence, or genome intervals. Conclusion AbsIDconvert provides an efficient and reliable mechanism for conversion between identifier domains of interest. The flexibility of this tool allows for custom definition identifier domains contingent upon the availability and determination of a genomic mapping interval. As the genomes and the sequences for genetic elements are further refined, this tool will become increasingly useful and accurate. AbsIDconvert is freely available as a web application or downloadable as a virtual machine at: http://bioinformatics.louisville.edu/abid/. PMID:22967011

  15. Genomic characterization of a Helicobacter pylori isolate from a patient with gastric cancer in China

    PubMed Central

    2014-01-01

    Background Helicobacter pylori is well known for its relationship with the occurrence of several severe gastric diseases. The mechanisms of pathogenesis triggered by H. pylori are less well known. In this study, we report the genome sequence and genomic characterizations of H. pylori strain HLJ039 that was isolated from a patient with gastric cancer in the Chinese province of Heilongjiang, where there is a high incidence of gastric cancer. To investigate potential genomic features that may be involved in pathogenesis of carcinoma, the genome was compared to three previously sequenced genomes in this area. Result We obtained 42 contigs with a total length of 1,611,192 bp and predicted 1,687 coding sequences. Compared to strains isolated from gastritis and ulcers in this area, 10 different regions were identified as being unique for HLJ039; they mainly encoded type II restriction-modification enzyme, type II m6A methylase, DNA-cytosine methyltransferase, DNA methylase, and hypothetical proteins. A unique 547-bp fragment sharing 93% identity with a hypothetical protein of Helicobacter cinaedi ATCC BAA-847 was not present in any other previous H. pylori strains. Phylogenetic analysis based on core genome single nucleotide polymorphisms shows that HLJ039 is defined as hspEAsia subgroup, which belongs to the hpEastAsia group. Conclusion DNA methylations, variations of the genomic regions involved in restriction and modification systems, are the “hot” regions that may be related to the mechanism of H. pylori-induced gastric cancer. The genome sequence will provide useful information for the deep mining of potential mechanisms related to East Asian gastric cancer. PMID:24565107

  16. Virulence factors encoded by Legionella longbeachae identified on the basis of the genome sequence analysis of clinical isolate D-4968.

    PubMed

    Kozak, Natalia A; Buss, Meghan; Lucas, Claressa E; Frace, Michael; Govil, Dhwani; Travis, Tatiana; Olsen-Rasmussen, Melissa; Benson, Robert F; Fields, Barry S

    2010-02-01

    Legionella longbeachae causes most cases of legionellosis in Australia and may be underreported worldwide due to the lack of L. longbeachae-specific diagnostic tests. L. longbeachae displays distinctive differences in intracellular trafficking, caspase 1 activation, and infection in mouse models compared to Legionella pneumophila, yet these two species have indistinguishable clinical presentations in humans. Unlike other legionellae, which inhabit freshwater systems, L. longbeachae is found predominantly in moist soil. In this study, we sequenced and annotated the genome of an L. longbeachae clinical isolate from Oregon, isolate D-4968, and compared it to the previously published genomes of L. pneumophila. The results revealed that the D-4968 genome is larger than the L. pneumophila genome and has a gene order that is different from that of the L. pneumophila genome. Genes encoding structural components of type II, type IV Lvh, and type IV Icm/Dot secretion systems are conserved. In contrast, only 42/140 homologs of genes encoding L. pneumophila Icm/Dot substrates have been found in the D-4968 genome. L. longbeachae encodes numerous proteins with eukaryotic motifs and eukaryote-like proteins unique to this species, including 16 ankyrin repeat-containing proteins and a novel U-box protein. We predict that these proteins are secreted by the L. longbeachae Icm/Dot secretion system. In contrast to the L. pneumophila genome, the L. longbeachae D-4968 genome does not contain flagellar biosynthesis genes, yet it contains a chemotaxis operon. The lack of a flagellum explains the failure of L. longbeachae to activate caspase 1 and trigger pyroptosis in murine macrophages. These unique features of L. longbeachae may reflect adaptation of this species to life in soil.

  17. The Genome and Development-Dependent Transcriptomes of Pyronema confluens: A Window into Fungal Evolution

    PubMed Central

    Traeger, Stefanie; Altegoer, Florian; Freitag, Michael; Gabaldon, Toni; Kempken, Frank; Kumar, Abhishek; Marcet-Houben, Marina; Pöggeler, Stefanie; Stajich, Jason E.; Nowrousian, Minou

    2013-01-01

    Fungi are a large group of eukaryotes found in nearly all ecosystems. More than 250 fungal genomes have already been sequenced, greatly improving our understanding of fungal evolution, physiology, and development. However, for the Pezizomycetes, an early-diverging lineage of filamentous ascomycetes, there is so far only one genome available, namely that of the black truffle, Tuber melanosporum, a mycorrhizal species with unusual subterranean fruiting bodies. To help close the sequence gap among basal filamentous ascomycetes, and to allow conclusions about the evolution of fungal development, we sequenced the genome and assayed transcriptomes during development of Pyronema confluens, a saprobic Pezizomycete with a typical apothecium as fruiting body. With a size of 50 Mb and ∼13,400 protein-coding genes, the genome is more characteristic of higher filamentous ascomycetes than the large, repeat-rich truffle genome; however, some typical features are different in the P. confluens lineage, e.g. the genomic environment of the mating type genes that is conserved in higher filamentous ascomycetes, but only partly conserved in P. confluens. On the other hand, P. confluens has a full complement of fungal photoreceptors, and expression studies indicate that light perception might be similar to distantly related ascomycetes and, thus, represent a basic feature of filamentous ascomycetes. Analysis of spliced RNA-seq sequence reads allowed the detection of natural antisense transcripts for 281 genes. The P. confluens genome contains an unusually high number of predicted orphan genes, many of which are upregulated during sexual development, consistent with the idea of rapid evolution of sex-associated genes. Comparative transcriptomics identified the transcription factor gene pro44 that is upregulated during development in P. confluens and the Sordariomycete Sordaria macrospora. The P. confluens pro44 gene (PCON_06721) was used to complement the S. macrospora pro44 deletion mutant, showing functional conservation of this developmental regulator. PMID:24068976

  18. Microevolution Analysis of Bacillus coahuilensis Unveils Differences in Phosphorus Acquisition Strategies and Their Regulation.

    PubMed

    Gómez-Lunar, Zulema; Hernández-González, Ismael; Rodríguez-Torres, María-Dolores; Souza, Valeria; Olmedo-Álvarez, Gabriela

    2016-01-01

    Bacterial genomes undergo numerous events of gene losses and gains that generate genome variability among strains of the same species (microevolution). Our aim was to compare the genomes and relevant phenotypes of three Bacillus coahuilensis strains from two oligotrophic hydrological systems in the Cuatro Ciénegas Basin (México), to unveil the environmental challenges that this species cope with, and the microevolutionary differences in these genotypes. Since the strains were isolated from a low P environment, we placed emphasis on the search of different phosphorus acquisition strategies. The three B. coahuilensis strains exhibited similar numbers of coding DNA sequences, of which 82% (2,893) constituted the core genome, and 18% corresponded to accessory genes. Most of the genes in this last group were associated with mobile genetic elements (MGEs) or were annotated as hypothetical proteins. Ten percent of the pangenome consisted of strain-specific genes. Alignment of the three B. coahuilensis genomes indicated a high level of synteny and revealed the presence of several genomic islands. Unexpectedly, one of these islands contained genes that encode the 2-keto-3-deoxymannooctulosonic acid (Kdo) biosynthesis enzymes, a feature associated to cell walls of Gram-negative bacteria. Some microevolutionary changes were clearly associated with MGEs. Our analysis revealed inconsistencies between phenotype and genotype, which we suggest result from the impossibility to map regulatory features to genome analysis. Experimental results revealed variability in the types and numbers of auxotrophies between the strains that could not consistently be explained by in silico metabolic models. Several intraspecific differences in preferences for carbohydrate and phosphorus utilization were observed. Regarding phosphorus recycling, scavenging, and storage, variations were found between the three genomes. The three strains exhibited differences regarding alkaline phosphatase that revealed that in addition to gene gain and loss, regulation adjustment of gene expression also has contributed to the intraspecific diversity of B. coahuilensis.

  19. Complete Genome Sequence of the Broad-Host-Range Vibriophage KVP40: Comparative Genomics of a T4-Related Bacteriophage

    PubMed Central

    Miller, Eric S.; Heidelberg, John F.; Eisen, Jonathan A.; Nelson, William C.; Durkin, A. Scott; Ciecko, Ann; Feldblyum, Tamara V.; White, Owen; Paulsen, Ian T.; Nierman, William C.; Lee, Jong; Szczypinski, Bridget; Fraser, Claire M.

    2003-01-01

    The complete genome sequence of the T4-like, broad-host-range vibriophage KVP40 has been determined. The genome sequence is 244,835 bp, with an overall G+C content of 42.6%. It encodes 386 putative protein-encoding open reading frames (CDSs), 30 tRNAs, 33 T4-like late promoters, and 57 potential rho-independent terminators. Overall, 92.1% of the KVP40 genome is coding, with an average CDS size of 587 bp. While 65% of the CDSs were unique to KVP40 and had no known function, the genome sequence and organization show specific regions of extensive conservation with phage T4. At least 99 KVP40 CDSs have homologs in the T4 genome (Blast alignments of 45 to 68% amino acid similarity). The shared CDSs represent 36% of all T4 CDSs but only 26% of those from KVP40. There is extensive representation of the DNA replication, recombination, and repair enzymes as well as the viral capsid and tail structural genes. KVP40 lacks several T4 enzymes involved in host DNA degradation, appears not to synthesize the modified cytosine (hydroxymethyl glucose) present in T-even phages, and lacks group I introns. KVP40 likely utilizes the T4-type sigma-55 late transcription apparatus, but features of early- or middle-mode transcription were not identified. There are 26 CDSs that have no viral homolog, and many did not necessarily originate from Vibrio spp., suggesting an even broader host range for KVP40. From these latter CDSs, an NAD salvage pathway was inferred that appears to be unique among bacteriophages. Features of the KVP40 genome that distinguish it from T4 are presented, as well as those, such as the replication and virion gene clusters, that are substantially conserved. PMID:12923095

  20. Structural Variation Shapes the Landscape of Recombination in Mouse

    PubMed Central

    Morgan, Andrew P.; Gatti, Daniel M.; Najarian, Maya L.; Keane, Thomas M.; Galante, Raymond J.; Pack, Allan I.; Mott, Richard; Churchill, Gary A.; de Villena, Fernando Pardo-Manuel

    2017-01-01

    Meiotic recombination is an essential feature of sexual reproduction that ensures faithful segregation of chromosomes and redistributes genetic variants in populations. Multiparent populations such as the Diversity Outbred (DO) mouse stock accumulate large numbers of crossover (CO) events between founder haplotypes, and thus present a unique opportunity to study the role of genetic variation in shaping the recombination landscape. We obtained high-density genotype data from 6886 DO mice, and localized 2.2 million CO events to intervals with a median size of 28 kb. The resulting sex-averaged genetic map of the DO population is highly concordant with large-scale (order 10 Mb) features of previously reported genetic maps for mouse. To examine fine-scale (order 10 kb) patterns of recombination in the DO, we overlaid putative recombination hotspots onto our CO intervals. We found that CO intervals are enriched in hotspots compared to the genomic background. However, as many as 26% of CO intervals do not overlap any putative hotspots, suggesting that our understanding of hotspots is incomplete. We also identified coldspots encompassing 329 Mb, or 12% of observable genome, in which there is little or no recombination. In contrast to hotspots, which are a few kilobases in size, and widely scattered throughout the genome, coldspots have a median size of 2.1 Mb and are spatially clustered. Coldspots are strongly associated with copy-number variant (CNV) regions, especially multi-allelic clusters, identified from whole-genome sequencing of 228 DO mice. Genes in these regions have reduced expression, and epigenetic features of closed chromatin in male germ cells, which suggests that CNVs may repress recombination by altering chromatin structure in meiosis. Our findings demonstrate how multiparent populations, by bridging the gap between large-scale and fine-scale genetic mapping, can reveal new features of the recombination landscape. PMID:28592499

  1. Structural Variation Shapes the Landscape of Recombination in Mouse.

    PubMed

    Morgan, Andrew P; Gatti, Daniel M; Najarian, Maya L; Keane, Thomas M; Galante, Raymond J; Pack, Allan I; Mott, Richard; Churchill, Gary A; de Villena, Fernando Pardo-Manuel

    2017-06-01

    Meiotic recombination is an essential feature of sexual reproduction that ensures faithful segregation of chromosomes and redistributes genetic variants in populations. Multiparent populations such as the Diversity Outbred (DO) mouse stock accumulate large numbers of crossover (CO) events between founder haplotypes, and thus present a unique opportunity to study the role of genetic variation in shaping the recombination landscape. We obtained high-density genotype data from [Formula: see text] DO mice, and localized 2.2 million CO events to intervals with a median size of 28 kb. The resulting sex-averaged genetic map of the DO population is highly concordant with large-scale (order 10 Mb) features of previously reported genetic maps for mouse. To examine fine-scale (order 10 kb) patterns of recombination in the DO, we overlaid putative recombination hotspots onto our CO intervals. We found that CO intervals are enriched in hotspots compared to the genomic background. However, as many as [Formula: see text] of CO intervals do not overlap any putative hotspots, suggesting that our understanding of hotspots is incomplete. We also identified coldspots encompassing 329 Mb, or [Formula: see text] of observable genome, in which there is little or no recombination. In contrast to hotspots, which are a few kilobases in size, and widely scattered throughout the genome, coldspots have a median size of 2.1 Mb and are spatially clustered. Coldspots are strongly associated with copy-number variant (CNV) regions, especially multi-allelic clusters, identified from whole-genome sequencing of 228 DO mice. Genes in these regions have reduced expression, and epigenetic features of closed chromatin in male germ cells, which suggests that CNVs may repress recombination by altering chromatin structure in meiosis. Our findings demonstrate how multiparent populations, by bridging the gap between large-scale and fine-scale genetic mapping, can reveal new features of the recombination landscape. Copyright © 2017 by the Genetics Society of America.

  2. Bacteriophages of Gordonia spp. Display a Spectrum of Diversity and Genetic Relationships.

    PubMed

    Pope, Welkin H; Mavrich, Travis N; Garlena, Rebecca A; Guerrero-Bustamante, Carlos A; Jacobs-Sera, Deborah; Montgomery, Matthew T; Russell, Daniel A; Warner, Marcie H; Hatfull, Graham F

    2017-08-15

    The global bacteriophage population is large, dynamic, old, and highly diverse genetically. Many phages are tailed and contain double-stranded DNA, but these remain poorly characterized genomically. A collection of over 1,000 phages infecting Mycobacterium smegmatis reveals the diversity of phages of a common bacterial host, but their relationships to phages of phylogenetically proximal hosts are not known. Comparative sequence analysis of 79 phages isolated on Gordonia shows these also to be diverse and that the phages can be grouped into 14 clusters of related genomes, with an additional 14 phages that are "singletons" with no closely related genomes. One group of six phages is closely related to Cluster A mycobacteriophages, but the other Gordonia phages are distant relatives and share only 10% of their genes with the mycobacteriophages. The Gordonia phage genomes vary in genome length (17.1 to 103.4 kb), percentage of GC content (47 to 68.8%), and genome architecture and contain a variety of features not seen in other phage genomes. Like the mycobacteriophages, the highly mosaic Gordonia phages demonstrate a spectrum of genetic relationships. We show this is a general property of bacteriophages and suggest that any barriers to genetic exchange are soft and readily violable. IMPORTANCE Despite the numerical dominance of bacteriophages in the biosphere, there is a dearth of complete genomic sequences. Current genomic information reveals that phages are highly diverse genomically and have mosaic architectures formed by extensive horizontal genetic exchange. Comparative analysis of 79 phages of Gordonia shows them to not only be highly diverse, but to present a spectrum of relatedness. Most are distantly related to phages of the phylogenetically proximal host Mycobacterium smegmatis , although one group of Gordonia phages is more closely related to mycobacteriophages than to the other Gordonia phages. Phage genome sequence space remains largely unexplored, but further isolation and genomic comparison of phages targeted at related groups of hosts promise to reveal pathways of bacteriophage evolution. Copyright © 2017 Pope et al.

  3. Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution.

    PubMed

    Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F

    2017-06-01

    Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.

  4. Comparative full genome sequence analysis of columbid herpesvirus-1 and falconid herpesvirus-1

    USDA-ARS?s Scientific Manuscript database

    Columbid herpesvirus type 1 (CoHV-1) is an alphaherpesvirus in the genus Mardivirus that infect pigeons and causes fatal disseminated infections in birds of prey: owls, falcons and hawks. A common feature of captive raptures that have succumbed to CoHV-1 infection is that they all have been fed pige...

  5. Comparative analysis reveals genomic features of stress-induced transcriptional readthrough

    PubMed Central

    Vilborg, Anna; Sabath, Niv; Wiesel, Yuval; Nathans, Jenny; Levy-Adam, Flonia; Yario, Therese A.; Steitz, Joan A.; Shalgi, Reut

    2017-01-01

    Transcription is a highly regulated process, and stress-induced changes in gene transcription have been shown to play a major role in stress responses and adaptation. Genome-wide studies reveal prevalent transcription beyond known protein-coding gene loci, generating a variety of RNA classes, most of unknown function. One such class, termed downstream of gene-containing transcripts (DoGs), was reported to result from transcriptional readthrough upon osmotic stress in human cells. However, how widespread the readthrough phenomenon is, and what its causes and consequences are, remain elusive. Here we present a genome-wide mapping of transcriptional readthrough, using nuclear RNA-Seq, comparing heat shock, osmotic stress, and oxidative stress in NIH 3T3 mouse fibroblast cells. We observe massive induction of transcriptional readthrough, both in levels and length, under all stress conditions, with significant, yet not complete, overlap of readthrough-induced loci between different conditions. Importantly, our analyses suggest that stress-induced transcriptional readthrough is not a random failure process, but is rather differentially induced across different conditions. We explore potential regulators and find a role for HSF1 in the induction of a subset of heat shock-induced readthrough transcripts. Analysis of public datasets detected increases in polymerase II occupancy in DoG regions after heat shock, supporting our findings. Interestingly, DoGs tend to be produced in the vicinity of neighboring genes, leading to a marked increase in their antisense-generating potential. Finally, we examine genomic features of readthrough transcription and observe a unique chromatin signature typical of DoG-producing regions, suggesting that readthrough transcription is associated with the maintenance of an open chromatin state. PMID:28928151

  6. Context based computational analysis and characterization of ARS consensus sequences (ACS) of Saccharomyces cerevisiae genome.

    PubMed

    Singh, Vinod Kumar; Krishnamachari, Annangarachari

    2016-09-01

    Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.

  7. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources

    PubMed Central

    Klima, Cassidy L.; Cook, Shaun R.; Zaheer, Rahat; Laing, Chad; Gannon, Vick P.; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W.; McAllister, Tim A.

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2–8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the dependency on antibiotics to treat respiratory infection in cattle. PMID:26926339

  8. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    PubMed

    Klima, Cassidy L; Cook, Shaun R; Zaheer, Rahat; Laing, Chad; Gannon, Vick P; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W; McAllister, Tim A

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the dependency on antibiotics to treat respiratory infection in cattle.

  9. The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes

    PubMed Central

    Zheng, Jinshui; Peng, Donghai; Chen, Ling; Liu, Hualin; Chen, Feng; Xu, Mengci; Ju, Shouyong; Ruan, Lifang

    2016-01-01

    Plant-parasitic nematodes were found in 4 of the 12 clades of phylum Nematoda. These nematodes in different clades may have originated independently from their free-living fungivorous ancestors. However, the exact evolutionary process of these parasites is unclear. Here, we sequenced the genome sequence of a migratory plant nematode, Ditylenchus destructor. We performed comparative genomics among the free-living nematode, Caenorhabditis elegans and all the plant nematodes with genome sequences available. We found that, compared with C. elegans, the core developmental control processes underwent heavy reduction, though most signal transduction pathways were conserved. We also found D. destructor contained more homologies of the key genes in the above processes than the other plant nematodes. We suggest that Ditylenchus spp. may be an intermediate evolutionary history stage from free-living nematodes that feed on fungi to obligate plant-parasitic nematodes. Based on the facts that D. destructor can feed on fungi and has a relatively short life cycle, and that it has similar features to both C. elegans and sedentary plant-parasitic nematodes from clade 12, we propose it as a new model to study the biology, biocontrol of plant nematodes and the interaction between nematodes and plants. PMID:27466450

  10. Molecular profiling reveals frequent gain of MYCN and anaplasia-specific loss of 4q and 14q in Wilms tumor.

    PubMed

    Williams, Richard D; Al-Saadi, Reem; Natrajan, Rachael; Mackay, Alan; Chagtai, Tasnim; Little, Suzanne; Hing, Sandra N; Fenwick, Kerry; Ashworth, Alan; Grundy, Paul; Anderson, James R; Dome, Jeffrey S; Perlman, Elizabeth J; Jones, Chris; Pritchard-Jones, Kathy

    2011-12-01

    Anaplasia in Wilms tumor, a distinctive histology characterized by abnormal mitoses, is associated with poor patient outcome. While anaplastic tumors frequently harbour TP53 mutations, little is otherwise known about their molecular biology. We have used array comparative genomic hybridization (aCGH) and cDNA microarray expression profiling to compare anaplastic and favorable histology Wilms tumors to determine their common and differentiating features. In addition to changes on 17p, consistent with TP53 deletion, recurrent anaplasia-specific genomic loss and under-expression were noted in several other regions, most strikingly 4q and 14q. Further aberrations, including gain of 1q and loss of 16q were common to both histologies. Focal gain of MYCN, initially detected by high resolution aCGH profiling in 6/61 anaplastic samples, was confirmed in a significant proportion of both tumor types by a genomic quantitative PCR survey of over 400 tumors. Overall, these results are consistent with a model where anaplasia, rather than forming an entirely distinct molecular entity, arises from the general continuum of Wilms tumor by the acquisition of additional genomic changes at multiple loci. Copyright © 2011 Wiley Periodicals, Inc.

  11. Mitochondrial genome analysis of the predatory mite Phytoseiulus persimilis and a revisit of the Metaseiulus occidentalis mitochondrial genome.

    PubMed

    Dermauw, Wannes; Vanholme, Bartel; Tirry, Luc; Van Leeuwen, Thomas

    2010-04-01

    In this study we sequenced and analysed the complete mitochondrial (mt) genome of the Chilean predatory mite Phytoseiulus persimilis Athias-Henriot (Chelicerata: Acari: Mesostigmata: Phytoseiidae: Amblyseiinae). The 16 199 bp genome (79.8% AT) contains the standard set of 13 protein-coding and 24 RNA genes. Compared with the ancestral arthropod mtDNA pattern, the gene order is extremely reshuffled (35 genes changed position) and represents a novel arrangement within the arthropods. This is probably related to the presence of several large noncoding regions in the genome. In contrast with the mt genome of the closely related species Metaseiulus occidentalis (Phytoseiidae: Typhlodrominae) - which was reported to be unusually large (24 961 bp), to lack nad6 and nad3 protein-coding genes, and to contain 22 tRNAs without T-arms - the genome of P. persimilis has all the features of a standard metazoan mt genome. Consequently, we performed additional experiments on the M. occidentalis mt genome. Our preliminary restriction digests and Southern hybridization data revealed that this genome is smaller than previously reported. In addition, we cloned nad3 in M. occidentalis and positioned this gene between nad4L and 12S-rRNA on the mt genome. Finally, we report that at least 15 of the 22 tRNAs in the M. occidentalis mt genome can be folded into canonical cloverleaf structures similar to their counterparts in P. persimilis.

  12. Genomic profiling of dedifferentiated liposarcoma compared to matched well-differentiated liposarcoma reveals higher genomic complexity and a common origin

    PubMed Central

    Beird, Hannah C.; Wu, Chia-Chin; Ingram, Davis R.; Wang, Wei-Lien; Alimohamed, Asrar; Gumbs, Curtis; Little, Latasha; Song, Xingzhi; Feig, Barry W.; Roland, Christina L.; Zhang, Jianhua; Benjamin, Robert S.; Hwu, Patrick; Lazar, Alexander J.; Futreal, P. Andrew; Somaiah, Neeta

    2018-01-01

    Well-differentiated (WD) liposarcoma is a low-grade mesenchymal tumor with features of mature adipocytes and high propensity for local recurrence. Often, WD patients present with or later progress to a higher-grade nonlipogenic form known as dedifferentiated (DD) liposarcoma. These DD tumors behave more aggressively and can metastasize. Both WD and DD liposarcomas harbor neochromosomes formed from amplifications and rearrangements of Chr 12q that encode oncogenes (MDM2, CDK4, and YEATS2) and adipocytic differentiation factors (HMGA2 and CPM). However, genomic changes associated with progression from WD to DD have not been well-defined. Therefore, we selected patients with matched WD and DD tumors for extensive genomic profiling in order to understand their clonal relationships and to delineate any defining alterations for each entity. Exome and transcriptomic sequencing was performed for 17 patients with both WD and DD diagnoses. Somatic point and copy-number alterations were integrated with transcriptional analyses to determine subtype-associated genomic features and pathways. The results were, on average, that only 8.3% of somatic mutations in WD liposarcoma were shared with their cognate DD component. DD tumors had higher numbers of somatic copy-number losses, amplifications involving Chr 12q, and fusion transcripts than WD tumors. HMGA2 and CPM rearrangements occur more frequently in DD components. The shared somatic mutations indicate a clonal origin for matched WD and DD tumors and show early divergence with ongoing genomic instability due to continual generation and selection of neochromosomes. Stochastic generation and subsequent expression of fusion transcripts from the neochromosome that involve adipogenesis genes such as HMGA2 and CPM may influence the differentiation state of the subsequent tumor. PMID:29610390

  13. Genome-wide loss of 5-hmC is a novel epigenetic feature of Huntington's disease.

    PubMed

    Wang, Fengli; Yang, Yeran; Lin, Xiwen; Wang, Jiu-Qiang; Wu, Yong-Sheng; Xie, Wenjuan; Wang, Dandan; Zhu, Shu; Liao, You-Qi; Sun, Qinmiao; Yang, Yun-Gui; Luo, Huai-Rong; Guo, Caixia; Han, Chunsheng; Tang, Tie-Shan

    2013-09-15

    5-Hydroxymethylcytosine (5-hmC) may represent a new epigenetic modification of cytosine. While the dynamics of 5-hmC during neurodevelopment have recently been reported, little is known about its genomic distribution and function(s) in neurodegenerative diseases such as Huntington's disease (HD). We here observed a marked reduction of the 5-hmC signal in YAC128 (yeast artificial chromosome transgene with 128 CAG repeats) HD mouse brain tissues when compared with age-matched wild-type (WT) mice, suggesting a deficiency of 5-hmC reconstruction in HD brains during postnatal development. Genome-wide distribution analysis of 5-hmC further confirmed the diminishment of the 5-hmC signal in striatum and cortex in YAC128 HD mice. General genomic features of 5-hmC are highly conserved, not being affected by either disease or brain regions. Intriguingly, we have identified disease-specific (YAC128 versus WT) differentially hydroxymethylated regions (DhMRs), and found that acquisition of DhmRs in gene body is a positive epigenetic regulator for gene expression. Ingenuity pathway analysis (IPA) of genotype-specific DhMR-annotated genes revealed that alternation of a number of canonical pathways involving neuronal development/differentiation (Wnt/β-catenin/Sox pathway, axonal guidance signaling pathway) and neuronal function/survival (glutamate receptor/calcium/CREB, GABA receptor signaling, dopamine-DARPP32 feedback pathway, etc.) could be important for the onset of HD. Our results indicate that loss of the 5-hmC marker is a novel epigenetic feature in HD, and that this aberrant epigenetic regulation may impair the neurogenesis, neuronal function and survival in HD brain. Our study also opens a new avenue for HD treatment; re-establishing the native 5-hmC landscape may have the potential to slow/halt the progression of HD.

  14. Genome packaging in EL and Lin68, two giant phiKZ-like bacteriophages of P. aeruginosa

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sokolova, O.S., E-mail: sokolova@mail.bio.msu.ru; A.V. Shoubnikov Institute of Crystallography RAS, Moscow; Shaburova, O.V.

    A unique feature of the Pseudomonas aeruginosa giant phage phiKZ is its way of genome packaging onto a spool-like protein structure, the inner body. Until recently, no similar structures have been detected in other phages. We have studied DNA packaging in P. aeruginosa phages EL and Lin68 using cryo-electron microscopy and revealed the presence of inner bodies. The shape and positioning of the inner body and the density of the DNA packaging in EL are different from those found in phiKZ and Lin68. This internal organization explains how the shorter EL genome is packed into a large EL capsid, whichmore » has the same external dimensions as the capsids of phiKZ and Lin68. The similarity in the structural organization in EL and other phiKZ-like phages indicates that EL is phylogenetically related to other phiKZ-like phages, and, despite the lack of detectable DNA homology, EL, phiKZ, and Lin68 descend from a common ancestor. - Highlights: • We performed a comparative structural study of giant P. aeruginosa phages: EL, Lin68 and phiKZ. • We revealed that the inner body is a common feature in giant phages. • The phage genome size correlates with the overall dimensions of the inner body.« less

  15. Chromosomal location and gene paucity of the male specific region on papaya Y chromosome.

    PubMed

    Yu, Qingyi; Hou, Shaobin; Hobza, Roman; Feltus, F Alex; Wang, Xiue; Jin, Weiwei; Skelton, Rachel L; Blas, Andrea; Lemke, Cornelia; Saw, Jimmy H; Moore, Paul H; Alam, Maqsudul; Jiang, Jiming; Paterson, Andrew H; Vyskot, Boris; Ming, Ray

    2007-08-01

    Sex chromosomes in flowering plants evolved recently and many of them remain homomorphic, including those in papaya. We investigated the chromosomal location of papaya's small male specific region of the hermaphrodite Y (Yh) chromosome (MSY) and its genomic features. We conducted chromosome fluorescence in situ hybridization mapping of Yh-specific bacterial artificial chromosomes (BACs) and placed the MSY near the centromere of the papaya Y chromosome. Then we sequenced five MSY BACs to examine the genomic features of this specialized region, which resulted in the largest collection of contiguous genomic DNA sequences of a Y chromosome in flowering plants. Extreme gene paucity was observed in the papaya MSY with no functional gene identified in 715 kb MSY sequences. A high density of retroelements and local sequence duplications were detected in the MSY that is suppressed for recombination. Location of the papaya MSY near the centromere might have provided recombination suppression and fostered paucity of genes in the male specific region of the Y chromosome. Our findings provide critical information for deciphering the sex chromosomes in papaya and reference information for comparative studies of other sex chromosomes in animals and plants.

  16. [A family of short retroposons (Squaml) from squamate reptiles (Reptilia: Squamata): structure, evolution and correlation with phylogeny].

    PubMed

    Kosushkin, S A; Borodulina, O R; Solov'eva, E N; Grechko, V V

    2008-01-01

    We have isolated and characterised sequences of a SINE family specific for squamate reptiles from a genome of lacertid lizard that we called Squam1. Copies are 360-390 bp in length and share a significant similarity with tRNA gene sequence on its 5'-end. This family was also detected by us in DNA of representatives of varanids, iguanids (anolis), gekkonids, and snakes. No signs of it were found in DNA of mammals, birds, amphibians, and crocodiles. Detailed analysis of primary structure of the retroposons obtained by us from genomic libraries or GenBank sequences was carried out. Most taxa possess 2-3 subfamilies of the SINE in their genomes with specific diagnostic features in their primary structure. Individual variability of copies in different families is about 85% and is just slightly lower on the genera level. Comparison of consensus sequences on family level reveals a high degree of structural similarity with a number of specific apomorphic features which makes it a useful marker of phylogeny for this group of reptiles. Snakes do not show specific affinity to varanids when compared to other lizards, as it was suggested earlier.

  17. RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

    DOE PAGES

    Brettin, Thomas; Davis, James J.; Disz, Terry; ...

    2015-02-10

    The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offersmore » a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.« less

  18. Machine learning for epigenetics and future medical applications.

    PubMed

    Holder, Lawrence B; Haque, M Muksitul; Skinner, Michael K

    2017-07-03

    Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.

  19. BrucellaBase: Genome information resource.

    PubMed

    Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

    2016-09-01

    Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html. Copyright © 2016 Elsevier B.V. All rights reserved.

  20. Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

    PubMed

    Li, Sanshu; Breaker, Ronald R

    2017-10-13

    With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.

  1. Evolutionary genomics: is Buchnera a bacterium or an organelle?

    PubMed

    Andersson, J O

    2000-11-30

    The first genome sequence of an intracellular bacterial symbiont of a eukaryotic cell has been determined. The Buchnera genome shares features with the genomes of both intracellular pathogenic bacteria and eukaryotic organelles, and it may represent an intermediate between the two.

  2. Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features

    PubMed Central

    Mohammad-Noori, Morteza; Beer, Michael A.

    2014-01-01

    Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408

  3. Enhanced regulatory sequence prediction using gapped k-mer features.

    PubMed

    Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A

    2014-07-01

    Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.

  4. Comparative mitochondrial genomics of snakes: extraordinary substitution rate dynamics and functionality of the duplicate control region

    PubMed Central

    Jiang, Zhi J; Castoe, Todd A; Austin, Christopher C; Burbrink, Frank T; Herron, Matthew D; McGuire, Jimmy A; Parkinson, Christopher L; Pollock, David D

    2007-01-01

    Background The mitochondrial genomes of snakes are characterized by an overall evolutionary rate that appears to be one of the most accelerated among vertebrates. They also possess other unusual features, including short tRNAs and other genes, and a duplicated control region that has been stably maintained since it originated more than 70 million years ago. Here, we provide a detailed analysis of evolutionary dynamics in snake mitochondrial genomes to better understand the basis of these extreme characteristics, and to explore the relationship between mitochondrial genome molecular evolution, genome architecture, and molecular function. We sequenced complete mitochondrial genomes from Slowinski's corn snake (Pantherophis slowinskii) and two cottonmouths (Agkistrodon piscivorus) to complement previously existing mitochondrial genomes, and to provide an improved comparative view of how genome architecture affects molecular evolution at contrasting levels of divergence. Results We present a Bayesian genetic approach that suggests that the duplicated control region can function as an additional origin of heavy strand replication. The two control regions also appear to have different intra-specific versus inter-specific evolutionary dynamics that may be associated with complex modes of concerted evolution. We find that different genomic regions have experienced substantial accelerated evolution along early branches in snakes, with different genes having experienced dramatic accelerations along specific branches. Some of these accelerations appear to coincide with, or subsequent to, the shortening of various mitochondrial genes and the duplication of the control region and flanking tRNAs. Conclusion Fluctuations in the strength and pattern of selection during snake evolution have had widely varying gene-specific effects on substitution rates, and these rate accelerations may have been functionally related to unusual changes in genomic architecture. The among-lineage and among-gene variation in rate dynamics observed in snakes is the most extreme thus far observed in animal genomes, and provides an important study system for further evaluating the biochemical and physiological basis of evolutionary pressures in vertebrate mitochondria. PMID:17655768

  5. Toward Integration of Comparative Genetic, Physical, Diversity, and Cytomolecular Maps for Grasses and Grains, Using the Sorghum Genome as a Foundation1

    PubMed Central

    Draye, Xavier; Lin, Yann-Rong; Qian, Xiao-yin; Bowers, John E.; Burow, Gloria B.; Morrell, Peter L.; Peterson, Daniel G.; Presting, Gernot G.; Ren, Shu-xin; Wing, Rod A.; Paterson, Andrew H.

    2001-01-01

    The small genome of sorghum (Sorghum bicolor L. Moench.) provides an important template for study of closely related large-genome crops such as maize (Zea mays) and sugarcane (Saccharum spp.), and is a logical complement to distantly related rice (Oryza sativa) as a “grass genome model.” Using a high-density RFLP map as a framework, a robust physical map of sorghum is being assembled by integrating hybridization and fingerprint data with comparative data from related taxa such as rice and using new methods to resolve genomic duplications into locus-specific groups. By taking advantage of allelic variation revealed by heterologous probes, the positions of corresponding loci on the wheat (Triticum aestivum), rice, maize, sugarcane, and Arabidopsis genomes are being interpolated on the sorghum physical map. Bacterial artificial chromosomes for the small genome of rice are shown to close several gaps in the sorghum contigs; the emerging rice physical map and assembled sequence will further accelerate progress. An important motivation for developing genomic tools is to relate molecular level variation to phenotypic diversity. “Diversity maps,” which depict the levels and patterns of variation in different gene pools, shed light on relationships of allelic diversity with chromosome organization, and suggest possible locations of genomic regions that are under selection due to major gene effects (some of which may be revealed by quantitative trait locus mapping). Both physical maps and diversity maps suggest interesting features that may be integrally related to the chromosomal context of DNA—progress in cytology promises to provide a means to elucidate such relationships. We seek to provide a detailed picture of the structure, function, and evolution of the genome of sorghum and its relatives, together with molecular tools such as locus-specific sequence-tagged site DNA markers and bacterial artificial chromosome contigs that will have enduring value for many aspects of genome analysis. PMID:11244113

  6. Comparative chloroplast genomics and phylogenetics of Fagopyrum esculentum ssp. ancestrale – A wild ancestor of cultivated buckwheat

    PubMed Central

    Logacheva, Maria D; Samigullin, Tahir H; Dhingra, Amit; Penin, Aleksey A

    2008-01-01

    Background Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. Results We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. Conclusion Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids. PMID:18492277

  7. Comparison of single-molecule sequencing and hybrid approaches for finishing the genome of Clostridium autoethanogenum and analysis of CRISPR systems in industrial relevant Clostridia

    PubMed Central

    2014-01-01

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G + C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a phophoenolpyruvate synthase) and substrate utilization pathway (mannose and aromatics utilization) that might explain phenotypic differences between C. autoethanogenum and C. ljungdahlii. Conclusions Single molecule sequencing will be increasingly used to produce finished microbial genomes. The complete genome will facilitate comparative genomics and functional genomics and support future comparisons between Clostridia and studies that examine the evolution of plasmids, bacteriophage and CRISPR systems. PMID:24655715

  8. Characterization and Complete Genome Sequences of Three N4-Like Roseobacter Phages Isolated from the South China Sea.

    PubMed

    Li, Baolian; Zhang, Si; Long, Lijuan; Huang, Sijun

    2016-09-01

    Three bacteriophages (RD-1410W1-01, RD-1410Ws-07, and DS-1410Ws-06) were isolated from the surface water of Sanya Bay, northern South China Sea, on two marine bacteria type strains of the Roseobacter lineage. These phages have an isometric head and a short tail, morphologically belonging to the Podoviridae family. Two of these phages can infect four of seven marine roseobacter strains tested and the other one can infect three of them, showing relatively broader host ranges compared to known N4-like roseophages. One-step growth curves showed that these phages have similar short latent periods (1-2 h) but highly variable burst sizes (27-341 pfu cell(-1)). Their complete genomes show high level of similarities to known N4-like roseophages in terms of genome size, G + C content, gene content, and arrangement. The morphological and genomic features of these phages indicate that they belong to the N4likevirus genus. Moreover, comparative genomic analysis based on 43 N4-like phages (10 roseobacter phages and 33 phages infecting other lineages of bacteria) revealed a core genome of 18 genes shared by all the 43 phages and 38 genes shared by all the ten roseophages. The 38 core genes of N4-like roseophages nearly make up 70 % of each genome in length. Phylogenetic analysis based on the concatenated core gene products showed that our phage isolates represent two new phyletic branches, suggesting the broad genetic diversity of marine N4-like roseophages remains.

  9. Sequence analysis of the PIP5K locus in Eimeria maxima provides further evidence for eimerian genome plasticity and segmental organization.

    PubMed

    Song, B K; Pan, M Z; Lau, Y L; Wan, K L

    2014-07-29

    Commercial flocks infected by Eimeria species parasites, including Eimeria maxima, have an increased risk of developing clinical or subclinical coccidiosis; an intestinal enteritis associated with increased mortality rates in poultry. Currently, infection control is largely based on chemotherapy or live vaccines; however, drug resistance is common and vaccines are relatively expensive. The development of new cost-effective intervention measures will benefit from unraveling the complex genetic mechanisms that underlie host-parasite interactions, including the identification and characterization of genes encoding proteins such as phosphatidylinositol 4-phosphate 5-kinase (PIP5K). We previously identified a PIP5K coding sequence within the E. maxima genome. In this study, we analyzed two bacterial artificial chromosome clones presenting a ~145-kb E. maxima (Weybridge strain) genomic region spanning the PIP5K gene locus. Sequence analysis revealed that ~95% of the simple sequence repeats detected were located within regions comparable to the previously described feature-rich segments of the Eimeria tenella genome. Comparative sequence analysis with the orthologous E. maxima (Houghton strain) region revealed a moderate level of conserved synteny. Unique segmental organizations and telomere-like repeats were also observed in both genomes. A number of incomplete transposable elements were detected and further scrutiny of these elements in both orthologous segments revealed interesting nesting events, which may play a role in facilitating genome plasticity in E. maxima. The current analysis provides more detailed information about the genome organization of E. maxima and may help to reveal genotypic differences that are important for expression of traits related to pathogenicity and virulence.

  10. Comparative genomic and morphological analyses of Listeria phages isolated from farm environments.

    PubMed

    Denes, Thomas; Vongkamjan, Kitiya; Ackermann, Hans-Wolfgang; Moreno Switt, Andrea I; Wiedmann, Martin; den Bakker, Henk C

    2014-08-01

    The genus Listeria is ubiquitous in the environment and includes the globally important food-borne pathogen Listeria monocytogenes. While the genomic diversity of Listeria has been well studied, considerably less is known about the genomic and morphological diversity of Listeria bacteriophages. In this study, we sequenced and analyzed the genomes of 14 Listeria phages isolated mostly from New York dairy farm environments as well as one related Enterococcus faecalis phage to obtain information on genome characteristics and diversity. We also examined 12 of the phages by electron microscopy to characterize their morphology. These Listeria phages, based on gene orthology and morphology, together with previously sequenced Listeria phages could be classified into five orthoclusters, including one novel orthocluster. One orthocluster (orthocluster I) consists of large genome (~135-kb) myoviruses belonging to the genus “Twort-like viruses,” three orthoclusters (orthoclusters II to IV) contain small-genome (36- to 43-kb) siphoviruses with icosahedral heads, and the novel orthocluster V contains medium-sized-genome (~66-kb) siphoviruses with elongated heads. A novel orthocluster (orthocluster VI) of E. faecalis phages, with medium-sized genomes (~56 kb), was identified, which grouped together and shares morphological features with the novel Listeria phage orthocluster V. This new group of phages (i.e., orthoclusters V and VI) is composed of putative lytic phages that may prove to be useful in phage-based applications for biocontrol, detection, and therapeutic purposes.

  11. A comprehensive overview of computational resources to aid in precision genome editing with engineered nucleases.

    PubMed

    Periwal, Vinita

    2017-07-01

    Genome editing with engineered nucleases (zinc finger nucleases, TAL effector nucleases s and Clustered regularly inter-spaced short palindromic repeats/CRISPR-associated) has recently been shown to have great promise in a variety of therapeutic and biotechnological applications. However, their exploitation in genetic analysis and clinical settings largely depends on their specificity for the intended genomic target. Large and complex genomes often contain highly homologous/repetitive sequences, which limits the specificity of genome editing tools and could result in off-target activity. Over the past few years, various computational approaches have been developed to assist the design process and predict/reduce the off-target activity of these nucleases. These tools could be efficiently used to guide the design of constructs for engineered nucleases and evaluate results after genome editing. This review provides a comprehensive overview of various databases, tools, web servers and resources for genome editing and compares their features and functionalities. Additionally, it also describes tools that have been developed to analyse post-genome editing results. The article also discusses important design parameters that could be considered while designing these nucleases. This review is intended to be a quick reference guide for experimentalists as well as computational biologists working in the field of genome editing with engineered nucleases. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  12. The mitochondrial genomes of Campodea fragilis and C. lubbocki(Hexapoda: Diplura): high genetic divergence in a morphologically uniformtaxon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Podsiadlowski, L.; Carapelli, A.; Nardi, F.

    2005-12-01

    Mitochondrial genomes from two dipluran hexapods of the genus Campodea have been sequenced. Gene order is the same as in most other hexapods and crustaceans. Secondary structures of tRNAs reveal specific structural changes in tRNA-C, tRNA-R, tRNA-S1 and tRNA-S2. Comparative analyses of nucleotide and amino acid composition, as well as structural features of both ribosomal RNA subunits, reveal substantial differences among the analyzed taxa. Although the two Campodea species are morphologically highly uniform, genetic divergence is larger than expected, suggesting a long evolutionary history under stable ecological conditions.

  13. Synthetic Genome Recoding: New genetic codes for new features

    PubMed Central

    Kuo, James; Stirling, Finn; Lau, Yu Heng; Shulgina, Yekaterina; Way, Jeffrey C.; Silver, Pamela A.

    2018-01-01

    Full genome recoding, or rewriting codon meaning, through chemical synthesis of entire bacterial chromosomes has become feasible in the past several years. Recoding an organism can impart new properties including non-natural amino acid incorporation, virus resistance, and biocontainment. The estimated cost of construction that includes DNA synthesis, assembly by recombination, and troubleshooting, is now comparable to costs of early stage development of drugs or other high-tech products. Here we discuss several recently published assembly methods and provide some thoughts on the future, including how synthetic efforts might benefit from analysis of natural recoding processes and organisms that use alternative genetic codes. PMID:28983660

  14. Comparative genomic and proteomic analyses of two Mycoplasma agalactiae strains: clues to the macro- and micro-events that are shaping mycoplasma diversity.

    PubMed

    Nouvel, Laurent X; Sirand-Pugnet, Pascal; Marenda, Marc S; Sagné, Eveline; Barbe, Valérie; Mangenot, Sophie; Schenowitz, Chantal; Jacob, Daniel; Barré, Aurélien; Claverol, Stéphane; Blanchard, Alain; Citti, Christine

    2010-02-02

    While the genomic era is accumulating a tremendous amount of data, the question of how genomics can describe a bacterial species remains to be fully addressed. The recent sequencing of the genome of the Mycoplasma agalactiae type strain has challenged our general view on mycoplasmas by suggesting that these simple bacteria are able to exchange significant amount of genetic material via horizontal gene transfer. Yet, events that are shaping mycoplasma genomes and that are underlining diversity within this species have to be fully evaluated. For this purpose, we compared two strains that are representative of the genetic spectrum encountered in this species: the type strain PG2 which genome is already available and a field strain, 5632, which was fully sequenced and annotated in this study. The two genomes differ by ca. 130 kbp with that of 5632 being the largest (1006 kbp). The make up of this additional genetic material mainly corresponds (i) to mobile genetic elements and (ii) to expanded repertoire of gene families that encode putative surface proteins and display features of highly-variable systems. More specifically, three entire copies of a previously described integrative conjugative element are found in 5632 that accounts for ca. 80 kbp. Other mobile genetic elements, found in 5632 but not in PG2, are the more classical insertion sequences which are related to those found in two other ruminant pathogens, M. bovis and M. mycoides subsp. mycoides SC. In 5632, repertoires of gene families encoding surface proteins are larger due to gene duplication. Comparative proteomic analyses of the two strains indicate that the additional coding capacity of 5632 affects the overall architecture of the surface and suggests the occurrence of new phase variable systems based on single nucleotide polymorphisms. Overall, comparative analyses of two M. agalactiae strains revealed a very dynamic genome which structure has been shaped by gene flow among ruminant mycoplasmas and expansion-reduction of gene repertoires encoding surface proteins, the expression of which is driven by localized genetic micro-events.

  15. Comparative genomic and proteomic analyses of two Mycoplasma agalactiae strains: clues to the macro- and micro-events that are shaping mycoplasma diversity

    PubMed Central

    2010-01-01

    Background While the genomic era is accumulating a tremendous amount of data, the question of how genomics can describe a bacterial species remains to be fully addressed. The recent sequencing of the genome of the Mycoplasma agalactiae type strain has challenged our general view on mycoplasmas by suggesting that these simple bacteria are able to exchange significant amount of genetic material via horizontal gene transfer. Yet, events that are shaping mycoplasma genomes and that are underlining diversity within this species have to be fully evaluated. For this purpose, we compared two strains that are representative of the genetic spectrum encountered in this species: the type strain PG2 which genome is already available and a field strain, 5632, which was fully sequenced and annotated in this study. Results The two genomes differ by ca. 130 kbp with that of 5632 being the largest (1006 kbp). The make up of this additional genetic material mainly corresponds (i) to mobile genetic elements and (ii) to expanded repertoire of gene families that encode putative surface proteins and display features of highly-variable systems. More specifically, three entire copies of a previously described integrative conjugative element are found in 5632 that accounts for ca. 80 kbp. Other mobile genetic elements, found in 5632 but not in PG2, are the more classical insertion sequences which are related to those found in two other ruminant pathogens, M. bovis and M. mycoides subsp. mycoides SC. In 5632, repertoires of gene families encoding surface proteins are larger due to gene duplication. Comparative proteomic analyses of the two strains indicate that the additional coding capacity of 5632 affects the overall architecture of the surface and suggests the occurrence of new phase variable systems based on single nucleotide polymorphisms. Conclusion Overall, comparative analyses of two M. agalactiae strains revealed a very dynamic genome which structure has been shaped by gene flow among ruminant mycoplasmas and expansion-reduction of gene repertoires encoding surface proteins, the expression of which is driven by localized genetic micro-events. PMID:20122262

  16. Secondary structural entropy in RNA switch (Riboswitch) identification.

    PubMed

    Manzourolajdad, Amirhossein; Arnold, Jonathan

    2015-04-28

    RNA regulatory elements play a significant role in gene regulation. Riboswitches, a widespread group of regulatory RNAs, are vital components of many bacterial genomes. These regulatory elements generally function by forming a ligand-induced alternative fold that controls access to ribosome binding sites or other regulatory sites in RNA. Riboswitch-mediated mechanisms are ubiquitous across bacterial genomes. A typical class of riboswitch has its own unique structural and biological complexity, making de novo riboswitch identification a formidable task. Traditionally, riboswitches have been identified through comparative genomics based on sequence and structural homology. The limitations of structural-homology-based approaches, coupled with the assumption that there is a great diversity of undiscovered riboswitches, suggests the need for alternative methods for riboswitch identification, possibly based on features intrinsic to their structure. As of yet, no such reliable method has been proposed. We used structural entropy of riboswitch sequences as a measure of their secondary structural dynamics. Entropy values of a diverse set of riboswitches were compared to that of their mutants, their dinucleotide shuffles, and their reverse complement sequences under different stochastic context-free grammar folding models. Significance of our results was evaluated by comparison to other approaches, such as the base-pairing entropy and energy landscapes dynamics. Classifiers based on structural entropy optimized via sequence and structural features were devised as riboswitch identifiers and tested on Bacillus subtilis, Escherichia coli, and Synechococcus elongatus as an exploration of structural entropy based approaches. The unusually long untranslated region of the cotH in Bacillus subtilis, as well as upstream regions of certain genes, such as the sucC genes were associated with significant structural entropy values in genome-wide examinations. Various tests show that there is in fact a relationship between higher structural entropy and the potential for the RNA sequence to have alternative structures, within the limitations of our methodology. This relationship, though modest, is consistent across various tests. Understanding the behavior of structural entropy as a fairly new feature for RNA conformational dynamics, however, may require extensive exploratory investigation both across RNA sequences and folding models.

  17. Complex multi-enhancer contacts captured by genome architecture mapping.

    PubMed

    Beagrie, Robert A; Scialdone, Antonio; Schueler, Markus; Kraemer, Dorothee C A; Chotalia, Mita; Xie, Sheila Q; Barbieri, Mariano; de Santiago, Inês; Lavitas, Liron-Mark; Branco, Miguel R; Fraser, James; Dostie, Josée; Game, Laurence; Dillon, Niall; Edwards, Paul A W; Nicodemi, Mario; Pombo, Ana

    2017-03-23

    The organization of the genome in the nucleus and the interactions of genes with their regulatory elements are key features of transcriptional control and their disruption can cause disease. Here we report a genome-wide method, genome architecture mapping (GAM), for measuring chromatin contacts and other features of three-dimensional chromatin topology on the basis of sequencing DNA from a large collection of thin nuclear sections. We apply GAM to mouse embryonic stem cells and identify enrichment for specific interactions between active genes and enhancers across very large genomic distances using a mathematical model termed SLICE (statistical inference of co-segregation). GAM also reveals an abundance of three-way contacts across the genome, especially between regions that are highly transcribed or contain super-enhancers, providing a level of insight into genome architecture that, owing to the technical limitations of current technologies, has previously remained unattainable. Furthermore, GAM highlights a role for gene-expression-specific contacts in organizing the genome in mammalian nuclei.

  18. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data.

    PubMed

    Chi, Bryan; DeLeeuw, Ronald J; Coe, Bradley P; MacAulay, Calum; Lam, Wan L

    2004-02-09

    Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles. We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.

  19. The Genome Sequence of the North-European Cucumber (Cucumis sativus L.) Unravels Evolutionary Adaptation Mechanisms in Plants

    PubMed Central

    Wóycicki, Rafał; Witkowicz, Justyna; Gawroński, Piotr; Dąbrowska, Joanna; Lomsadze, Alexandre; Pawełkowicz, Magdalena; Siedlecka, Ewa; Yagi, Kohei; Pląder, Wojciech; Seroczyńska, Anna; Śmiech, Mieczysław; Gutman, Wojciech; Niemirowicz-Szczytt, Katarzyna; Bartoszewski, Grzegorz; Tagashira, Norikazu; Hoshi, Yoshikazu; Borodovsky, Mark; Karpiński, Stanisław; Malepszy, Stefan; Przybecki, Zbigniew

    2011-01-01

    Cucumber (Cucumis sativus L.), a widely cultivated crop, has originated from Eastern Himalayas and secondary domestication regions includes highly divergent climate conditions e.g. temperate and subtropical. We wanted to uncover adaptive genome differences between the cucumber cultivars and what sort of evolutionary molecular mechanisms regulate genetic adaptation of plants to different ecosystems and organism biodiversity. Here we present the draft genome sequence of the Cucumis sativus genome of the North-European Borszczagowski cultivar (line B10) and comparative genomics studies with the known genomes of: C. sativus (Chinese cultivar – Chinese Long (line 9930)), Arabidopsis thaliana, Populus trichocarpa and Oryza sativa. Cucumber genomes show extensive chromosomal rearrangements, distinct differences in quantity of the particular genes (e.g. involved in photosynthesis, respiration, sugar metabolism, chlorophyll degradation, regulation of gene expression, photooxidative stress tolerance, higher non-optimal temperatures tolerance and ammonium ion assimilation) as well as in distributions of abscisic acid-, dehydration- and ethylene-responsive cis-regulatory elements (CREs) in promoters of orthologous group of genes, which lead to the specific adaptation features. Abscisic acid treatment of non-acclimated Arabidopsis and C. sativus seedlings induced moderate freezing tolerance in Arabidopsis but not in C. sativus. This experiment together with analysis of abscisic acid-specific CRE distributions give a clue why C. sativus is much more susceptible to moderate freezing stresses than A. thaliana. Comparative analysis of all the five genomes showed that, each species and/or cultivars has a specific profile of CRE content in promoters of orthologous genes. Our results constitute the substantial and original resource for the basic and applied research on environmental adaptations of plants, which could facilitate creation of new crops with improved growth and yield in divergent conditions. PMID:21829493

  20. The genome sequence of the North-European cucumber (Cucumis sativus L.) unravels evolutionary adaptation mechanisms in plants.

    PubMed

    Wóycicki, Rafał; Witkowicz, Justyna; Gawroński, Piotr; Dąbrowska, Joanna; Lomsadze, Alexandre; Pawełkowicz, Magdalena; Siedlecka, Ewa; Yagi, Kohei; Pląder, Wojciech; Seroczyńska, Anna; Śmiech, Mieczysław; Gutman, Wojciech; Niemirowicz-Szczytt, Katarzyna; Bartoszewski, Grzegorz; Tagashira, Norikazu; Hoshi, Yoshikazu; Borodovsky, Mark; Karpiński, Stanisław; Malepszy, Stefan; Przybecki, Zbigniew

    2011-01-01

    Cucumber (Cucumis sativus L.), a widely cultivated crop, has originated from Eastern Himalayas and secondary domestication regions includes highly divergent climate conditions e.g. temperate and subtropical. We wanted to uncover adaptive genome differences between the cucumber cultivars and what sort of evolutionary molecular mechanisms regulate genetic adaptation of plants to different ecosystems and organism biodiversity. Here we present the draft genome sequence of the Cucumis sativus genome of the North-European Borszczagowski cultivar (line B10) and comparative genomics studies with the known genomes of: C. sativus (Chinese cultivar--Chinese Long (line 9930)), Arabidopsis thaliana, Populus trichocarpa and Oryza sativa. Cucumber genomes show extensive chromosomal rearrangements, distinct differences in quantity of the particular genes (e.g. involved in photosynthesis, respiration, sugar metabolism, chlorophyll degradation, regulation of gene expression, photooxidative stress tolerance, higher non-optimal temperatures tolerance and ammonium ion assimilation) as well as in distributions of abscisic acid-, dehydration- and ethylene-responsive cis-regulatory elements (CREs) in promoters of orthologous group of genes, which lead to the specific adaptation features. Abscisic acid treatment of non-acclimated Arabidopsis and C. sativus seedlings induced moderate freezing tolerance in Arabidopsis but not in C. sativus. This experiment together with analysis of abscisic acid-specific CRE distributions give a clue why C. sativus is much more susceptible to moderate freezing stresses than A. thaliana. Comparative analysis of all the five genomes showed that, each species and/or cultivars has a specific profile of CRE content in promoters of orthologous genes. Our results constitute the substantial and original resource for the basic and applied research on environmental adaptations of plants, which could facilitate creation of new crops with improved growth and yield in divergent conditions.

  1. Progressive but Previously Untreated CLL Patients with Greater Array CGH Complexity Exhibit a Less Durable Response to Chemoimmunotherapy

    PubMed Central

    Kay, Neil E.; Eckel-Passow, Jeanette E.; Braggio, Esteban; VanWier, Scott; Shanafelt, Tait D.; Van Dyke, Daniel L.; Jelinek, Diane F.; Tschumper, Renee C.; Kipps, Thomas; Byrd, John C.; Fonseca, Rafael

    2010-01-01

    To better understand the implications of genomic instability and outcome in B-cell CLL, we sought to address genomic complexity as a predictor of chemosensitivity and ultimately clinical outcome in this disease. We employed array-based comparative genomic hybridization (aCGH), using a one-million probe array and identified gains and losses of genetic material in 48 patients treated on a chemoimmunotherapy (CIT) clinical trial. We identified chromosomal gain or loss in ≥6% of the patients on chromosomes 3, 8, 9, 10, 11, 12, 13, 14 and 17. Higher genomic complexity, as a mechanism favoring clonal selection, was associated with shorter progression-free survival and predicted a poor response to treatment. Of interest, CLL cases with loss of p53 surveillance showed more complex genomic features and were found both in patients with a 17p13.1 deletion and in the more favorable genetic subtype characterized by the presence of 13q14.1 deletion. This aCGH study adds information on the association between inferior trial response and increasing genetic complexity as CLL progresses. PMID:21156228

  2. Evolutionary insights from Erwinia amylovora genomics.

    PubMed

    Smits, Theo H M; Rezzonico, Fabio; Duffy, Brion

    2011-08-20

    Evolutionary genomics is coming into focus with the recent availability of complete sequences for many bacterial species. A hypothesis on the evolution of virulence factors in the plant pathogen Erwinia amylovora, the causative agent of fire blight, was generated using comparative genomics with the genomes E. amylovora, Erwinia pyrifoliae and Erwinia tasmaniensis. Putative virulence factors were mapped to the proposed genealogy of the genus Erwinia that is based on phylogenetic and genomic data. Ancestral origin of several virulence factors was identified, including levan biosynthesis, sorbitol metabolism, three T3SS and two T6SS. Other factors appeared to have been acquired after divergence of pathogenic species, including a second flagellar gene and two glycosyltransferases involved in amylovoran biosynthesis. E. amylovora singletons include 3 unique T3SS effectors that may explain differential virulence/host ranges. E. amylovora also has a unique T1SS export system, and a unique third T6SS gene cluster. Genetic analysis revealed signatures of foreign DNA suggesting that horizontal gene transfer is responsible for some of these differential features between the three species. Copyright © 2010 Elsevier B.V. All rights reserved.

  3. Genomic Structure of an Economically Important Cyanobacterium, Arthrospira (Spirulina) platensis NIES-39

    PubMed Central

    Fujisawa, Takatomo; Narikawa, Rei; Okamoto, Shinobu; Ehira, Shigeki; Yoshimura, Hidehisa; Suzuki, Iwane; Masuda, Tatsuru; Mochimaru, Mari; Takaichi, Shinichi; Awai, Koichiro; Sekine, Mitsuo; Horikawa, Hiroshi; Yashiro, Isao; Omata, Seiha; Takarada, Hiromi; Katano, Yoko; Kosugi, Hiroki; Tanikawa, Satoshi; Ohmori, Kazuko; Sato, Naoki; Ikeuchi, Masahiko; Fujita, Nobuyuki; Ohmori, Masayuki

    2010-01-01

    A filamentous non-N2-fixing cyanobacterium, Arthrospira (Spirulina) platensis, is an important organism for industrial applications and as a food supply. Almost the complete genome of A. platensis NIES-39 was determined in this study. The genome structure of A. platensis is estimated to be a single, circular chromosome of 6.8 Mb, based on optical mapping. Annotation of this 6.7 Mb sequence yielded 6630 protein-coding genes as well as two sets of rRNA genes and 40 tRNA genes. Of the protein-coding genes, 78% are similar to those of other organisms; the remaining 22% are currently unknown. A total 612 kb of the genome comprise group II introns, insertion sequences and some repetitive elements. Group I introns are located in a protein-coding region. Abundant restriction-modification systems were determined. Unique features in the gene composition were noted, particularly in a large number of genes for adenylate cyclase and haemolysin-like Ca2+-binding proteins and in chemotaxis proteins. Filament-specific genes were highlighted by comparative genomic analysis. PMID:20203057

  4. Comparative genomics of Lactobacillus

    PubMed Central

    Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

    2011-01-01

    Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712

  5. Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    PubMed

    Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

    2015-02-14

    Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes were found to be affected by SVs in M. tuberculosis genomes. We believe that the GTN method will be suitable for the exploration of genomic SVs in connection with biological features of bacterial strains, and that GTN-based phylogenetic analysis will provide additional insights into whole genome-based phylogenetic analysis.

  6. GeneWiz browser: An Interactive Tool for Visualizing Sequenced Chromosomes.

    PubMed

    Hallin, Peter F; Stærfeldt, Hans-Henrik; Rotenberg, Eva; Binnewies, Tim T; Benham, Craig J; Ussery, David W

    2009-09-25

    We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal readability and increased functionality compared to other browsers. The tool allows the user to select the display of various genomic features, color setting and data ranges. Custom numerical data can be added to the plot allowing, for example, visualization of gene expression and regulation data. Further, standard atlases are pre-generated for all prokaryotic genomes available in GenBank, providing a fast overview of all available genomes, including recently deposited genome sequences. The tool is available online from http://www.cbs.dtu.dk/services/gwBrowser. Supplemental material including interactive atlases is available online at http://www.cbs.dtu.dk/services/gwBrowser/suppl/.

  7. Genome structure of bdelloid rotifers: shaped by asexuality or desiccation?

    PubMed

    Gladyshev, Eugene A; Arkhipova, Irina R

    2010-01-01

    Bdelloid rotifers are microscopic invertebrate animals best known for their ancient asexuality and the ability to survive desiccation at any life stage. Both factors are expected to have a profound influence on their genome structure. Recent molecular studies demonstrated that, although the gene-rich regions of bdelloid genomes are organized as colinear pairs of closely related sequences and depleted in repetitive DNA, subtelomeric regions harbor diverse transposable elements and horizontally acquired genes of foreign origin. Although asexuality is expected to result in depletion of deleterious transposons, only desiccation appears to have the power to produce all the uncovered genomic peculiarities. Repair of desiccation-induced DNA damage would require the presence of a homologous template, maintaining colinear pairs in gene-rich regions and selecting against insertion of repetitive DNA that might cause chromosomal rearrangements. Desiccation may also induce a transient state of competence in recovering animals, allowing them to acquire environmental DNA. Even if bdelloids engage in rare or obscure forms of sexual reproduction, all these features could still be present. The relative contribution of asexuality and desiccation to genome organization may be clarified by analyzing whole-genome sequences and comparing foreign gene and transposon content in species which lost the ability to survive desiccation.

  8. Genomics and Biochemistry of Saccharomyces cerevisiae Wine Yeast Strains.

    PubMed

    Eldarov, M A; Kishkovskaia, S A; Tanaschuk, T N; Mardanov, A V

    2016-12-01

    Saccharomyces yeasts have been used for millennia for the production of beer, wine, bread, and other fermented products. Long-term "unconscious" selection and domestication led to the selection of hundreds of strains with desired production traits having significant phenotypic and genetic differences from their wild ancestors. This review summarizes the results of recent research in deciphering the genomes of wine Saccharomyces strains, the use of comparative genomics methods to study the mechanisms of yeast genome evolution under conditions of artificial selection, and the use of genomic and postgenomic approaches to identify the molecular nature of the important characteristics of commercial wine strains of Saccharomyces. Succinctly, data concerning metagenomics of microbial communities of grapes and wine and the dynamics of yeast and bacterial flora in the course of winemaking is provided. A separate section is devoted to an overview of the physiological, genetic, and biochemical features of sherry yeast strains used to produce biologically aged wines. The goal of the review is to convince the reader of the efficacy of new genomic and postgenomic technologies as tools for developing strategies for targeted selection and creation of new strains using "classical" and modern techniques for improving winemaking technology.

  9. From Ambiguities to Insights: Query-based Comparisons of High-Dimensional Data

    NASA Astrophysics Data System (ADS)

    Kowalski, Jeanne; Talbot, Conover; Tsai, Hua L.; Prasad, Nijaguna; Umbricht, Christopher; Zeiger, Martha A.

    2007-11-01

    Genomic technologies will revolutionize drag discovery and development; that much is universally agreed upon. The high dimension of data from such technologies has challenged available data analytic methods; that much is apparent. To date, large-scale data repositories have not been utilized in ways that permit their wealth of information to be efficiently processed for knowledge, presumably due in large part to inadequate analytical tools to address numerous comparisons of high-dimensional data. In candidate gene discovery, expression comparisons are often made between two features (e.g., cancerous versus normal), such that the enumeration of outcomes is manageable. With multiple features, the setting becomes more complex, in terms of comparing expression levels of tens of thousands transcripts across hundreds of features. In this case, the number of outcomes, while enumerable, become rapidly large and unmanageable, and scientific inquiries become more abstract, such as "which one of these (compounds, stimuli, etc.) is not like the others?" We develop analytical tools that promote more extensive, efficient, and rigorous utilization of the public data resources generated by the massive support of genomic studies. Our work innovates by enabling access to such metadata with logically formulated scientific inquires that define, compare and integrate query-comparison pair relations for analysis. We demonstrate our computational tool's potential to address an outstanding biomedical informatics issue of identifying reliable molecular markers in thyroid cancer. Our proposed query-based comparison (QBC) facilitates access to and efficient utilization of metadata through logically formed inquires expressed as query-based comparisons by organizing and comparing results from biotechnologies to address applications in biomedicine.

  10. Complete genome sequence of the chromate-reducing bacterium Thermoanaerobacter thermohydrosulfuricus strain BSB-33

    DOE PAGES

    Bhattacharya, Pamela; Barnebey, Adam; Zemla, Marcin; ...

    2015-10-05

    Thermoanaerobacter thermohydrosulfuricus BSB-33 is a thermophilic gram positive obligate anaerobe isolated from a hot spring in West Bengal, India. Unlike other T. thermohydrosulfuricus strains, BSB-33 is able to anaerobically reduce Fe(III) and Cr(VI) optimally at 60 °C. BSB-33 is the first Cr(VI) reducing T. thermohydrosulfuricus genome sequenced and of particular interest for bioremediation of environmental chromium contaminations. Here we discuss features of T. thermohydrosulfuricus BSB-33 and the unique genetic elements that may account for the peculiar metal reducing properties of this organism. The T. thermohydrosulfuricus BSB-33 genome comprises 2597606 bp encoding 2581 protein genes, 12 rRNA, 193 pseudogenes and hasmore » a G + C content of 34.20 %. Lastly, putative chromate reductases were identified by comparative analyses with other Thermoanaerobacter and chromate-reducing bacteria.« less

  11. Mild Intellectual Disability Associated with a Progeny of Father-Daughter Incest: Genetic and Environmental Considerations

    ERIC Educational Resources Information Center

    Ansermet, Francois; Lespinasse, James; Gimelli, Stefania; Bena, Frederique; Paoloni-Giacobino, Ariane

    2010-01-01

    We report the case of a 34-year-old female resulting from a father-daughter sexual abuse and presenting a phenotype of mild intellectual disability with minor dysmorphic features. Karyotyping showed a normal 46, XX constitution. Array-based comparative genomic hybridization (array-CGH) revealed a heterozygote 320kb 6p22.3 microdeletion in the…

  12. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs.

    PubMed

    Lim, Chun Shen; Brown, Chris M

    2017-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community.

  13. Know Your Enemy: Successful Bioinformatic Approaches to Predict Functional RNA Structures in Viral RNAs

    PubMed Central

    Lim, Chun Shen; Brown, Chris M.

    2018-01-01

    Structured RNA elements may control virus replication, transcription and translation, and their distinct features are being exploited by novel antiviral strategies. Viral RNA elements continue to be discovered using combinations of experimental and computational analyses. However, the wealth of sequence data, notably from deep viral RNA sequencing, viromes, and metagenomes, necessitates computational approaches being used as an essential discovery tool. In this review, we describe practical approaches being used to discover functional RNA elements in viral genomes. In addition to success stories in new and emerging viruses, these approaches have revealed some surprising new features of well-studied viruses e.g., human immunodeficiency virus, hepatitis C virus, influenza, and dengue viruses. Some notable discoveries were facilitated by new comparative analyses of diverse viral genome alignments. Importantly, comparative approaches for finding RNA elements embedded in coding and non-coding regions differ. With the exponential growth of computer power we have progressed from stem-loop prediction on single sequences to cutting edge 3D prediction, and from command line to user friendly web interfaces. Despite these advances, many powerful, user friendly prediction tools and resources are underutilized by the virology community. PMID:29354101

  14. Comparative genomics reveals adaptation by Alteromonas sp. SN2 to marine tidal-flat conditions: cold tolerance and aromatic hydrocarbon metabolism.

    PubMed

    Math, Renukaradhya K; Jin, Hyun Mi; Kim, Jeong Myeong; Hahn, Yoonsoo; Park, Woojun; Madsen, Eugene L; Jeon, Che Ok

    2012-01-01

    Alteromonas species are globally distributed copiotrophic bacteria in marine habitats. Among these, sea-tidal flats are distinctive: undergoing seasonal temperature and oxygen-tension changes, plus periodic exposure to petroleum hydrocarbons. Strain SN2 of the genus Alteromonas was isolated from hydrocarbon-contaminated sea-tidal flat sediment and has been shown to metabolize aromatic hydrocarbons there. Strain SN2's genomic features were analyzed bioinformatically and compared to those of Alteromonas macleodii ecotypes: AltDE and ATCC 27126. Strain SN2's genome differs from that of the other two strains in: size, average nucleotide identity value, tRNA genes, noncoding RNAs, dioxygenase gene content, signal transduction genes, and the degree to which genes collected during the Global Ocean Sampling project are represented. Patterns in genetic characteristics (e.g., GC content, GC skew, Karlin signature, CRISPR gene homology) indicate that strain SN2's genome architecture has been altered via horizontal gene transfer (HGT). Experiments proved that strain SN2 was far more cold tolerant, especially at 5°C, than the other two strains. Consistent with the HGT hypothesis, a total of 15 genomic islands in strain SN2 likely confer ecological fitness traits (especially membrane transport, aromatic hydrocarbon metabolism, and fatty acid biosynthesis) specific to the adaptation of strain SN2 to its seasonally cold sea-tidal flat habitat.

  15. Comparative Genomics Reveals Adaptation by Alteromonas sp. SN2 to Marine Tidal-Flat Conditions: Cold Tolerance and Aromatic Hydrocarbon Metabolism

    PubMed Central

    Math, Renukaradhya K.; Jin, Hyun Mi; Kim, Jeong Myeong; Hahn, Yoonsoo; Park, Woojun; Madsen, Eugene L.; Jeon, Che Ok

    2012-01-01

    Alteromonas species are globally distributed copiotrophic bacteria in marine habitats. Among these, sea-tidal flats are distinctive: undergoing seasonal temperature and oxygen-tension changes, plus periodic exposure to petroleum hydrocarbons. Strain SN2 of the genus Alteromonas was isolated from hydrocarbon-contaminated sea-tidal flat sediment and has been shown to metabolize aromatic hydrocarbons there. Strain SN2's genomic features were analyzed bioinformatically and compared to those of Alteromonas macleodii ecotypes: AltDE and ATCC 27126. Strain SN2's genome differs from that of the other two strains in: size, average nucleotide identity value, tRNA genes, noncoding RNAs, dioxygenase gene content, signal transduction genes, and the degree to which genes collected during the Global Ocean Sampling project are represented. Patterns in genetic characteristics (e.g., GC content, GC skew, Karlin signature, CRISPR gene homology) indicate that strain SN2's genome architecture has been altered via horizontal gene transfer (HGT). Experiments proved that strain SN2 was far more cold tolerant, especially at 5°C, than the other two strains. Consistent with the HGT hypothesis, a total of 15 genomic islands in strain SN2 likely confer ecological fitness traits (especially membrane transport, aromatic hydrocarbon metabolism, and fatty acid biosynthesis) specific to the adaptation of strain SN2 to its seasonally cold sea-tidal flat habitat. PMID:22563400

  16. Comparing de novo genome assembly: the long and short of it.

    PubMed

    Narzisi, Giuseppe; Mishra, Bud

    2011-04-29

    Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers--both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies--are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium.

  17. Identification of a Recurrent Microdeletion at 17q23.1q23.2 Flanked by Segmental Duplications Associated with Heart Defects and Limb Abnormalities

    PubMed Central

    Ballif, Blake C.; Theisen, Aaron; Rosenfeld, Jill A.; Traylor, Ryan N.; Gastier-Foster, Julie; Thrush, Devon Lamb; Astbury, Caroline; Bartholomew, Dennis; McBride, Kim L.; Pyatt, Robert E.; Shane, Kate; Smith, Wendy E.; Banks, Valerie; Gallentine, William B.; Brock, Pamela; Rudd, M. Katharine; Adam, Margaret P.; Keene, Julia A.; Phillips, John A.; Pfotenhauer, Jean P.; Gowans, Gordon C.; Stankiewicz, Pawel; Bejjani, Bassem A.; Shaffer, Lisa G.

    2010-01-01

    Segmental duplications, which comprise ∼5%–10% of the human genome, are known to mediate medically relevant deletions, duplications, and inversions through nonallelic homologous recombination (NAHR) and have been suggested to be hot spots in chromosome evolution and human genomic instability. We report seven individuals with microdeletions at 17q23.1q23.2, identified by microarray-based comparative genomic hybridization (aCGH). Six of the seven deletions are ∼2.2 Mb in size and flanked by large segmental duplications of >98% sequence identity and in the same orientation. One of the deletions is ∼2.8 Mb in size and is flanked on the distal side by a segmental duplication, whereas the proximal breakpoint falls between segmental duplications. These characteristics suggest that NAHR mediated six out of seven of these rearrangements. These individuals have common features, including mild to moderate developmental delay (particularly speech delay), microcephaly, postnatal growth retardation, heart defects, and hand, foot, and limb abnormalities. Although all individuals had at least mild dysmorphic facial features, there was no characteristic constellation of features that would elicit clinical suspicion of a specific disorder. The identification of common clinical features suggests that microdeletions at 17q23.1q23.2 constitute a novel syndrome. Furthermore, the inclusion in the minimal deletion region of TBX2 and TBX4, transcription factors belonging to a family of genes implicated in a variety of developmental pathways including those of heart and limb, suggests that these genes may play an important role in the phenotype of this emerging syndrome. PMID:20206336

  18. RNA Editing in Plant Mitochondria

    NASA Astrophysics Data System (ADS)

    Hiesel, Rudolf; Wissinger, Bernd; Schuster, Wolfgang; Brennicke, Axel

    1989-12-01

    Comparative sequence analysis of genomic and complementary DNA clones from several mitochondrial genes in the higher plant Oenothera revealed nucleotide sequence divergences between the genomic and the messenger RNA-derived sequences. These sequence alterations could be most easily explained by specific post-transcriptional nucleotide modifications. Most of the nucleotide exchanges in coding regions lead to altered codons in the mRNA that specify amino acids better conserved in evolution than those encoded by the genomic DNA. Several instances show that the genomic arginine codon CGG is edited in the mRNA to the tryptophan codon TGG in amino acid positions that are highly conserved as tryptophan in the homologous proteins of other species. This editing suggests that the standard genetic code is used in plant mitochondria and resolves the frequent coincidence of CGG codons and tryptophan in different plant species. The apparently frequent and non-species-specific equivalency of CGG and TGG codons in particular suggests that RNA editing is a common feature of all higher plant mitochondria.

  19. Marker chromosome genomic structure and temporal origin implicate a chromoanasynthesis event in a family with pleiotropic psychiatric phenotypes.

    PubMed

    Grochowski, Christopher M; Gu, Shen; Yuan, Bo; Tcw, Julia; Brennand, Kristen J; Sebat, Jonathan; Malhotra, Dheeraj; McCarthy, Shane; Rudolph, Uwe; Lindstrand, Anna; Chong, Zechen; Levy, Deborah L; Lupski, James R; Carvalho, Claudia M B

    2018-04-25

    Small supernumerary marker chromosomes (sSMC) are chromosomal fragments difficult to characterize genomically. Here, we detail a proband with schizoaffective disorder and a mother with bipolar disorder with psychotic features who present with a marker chromosome that segregates with disease. We explored the architecture of this marker and investigated its temporal origin. Array comparative genomic hybridization (aCGH) analysis revealed three duplications and three triplications that spanned the short arm of chromosome 9, suggestive of a chromoanasynthesis-like event. Segregation of marker genotypes, phased using sSMC mosaicism in the mother, provided evidence that it was generated during a germline-level event in the proband's maternal grandmother. Whole-genome sequencing (WGS) was performed to resolve the structure and junctions of the chromosomal fragments, revealing further complexities. While structural variations have been previously associated with neuropsychiatric disorders and marker chromosomes, here we detail the precise architecture, human life-cycle genesis, and propose a DNA replicative/repair mechanism underlying formation. © 2018 Wiley Periodicals, Inc.

  20. Comparative and functional genomics of the Lactococcus lactis taxon; insights into evolution and niche adaptation.

    PubMed

    Kelleher, Philip; Bottacini, Francesca; Mahony, Jennifer; Kilcawley, Kieran N; van Sinderen, Douwe

    2017-03-29

    Lactococcus lactis is among the most widely studied lactic acid bacterial species due to its long history of safe use and economic importance to the dairy industry, where it is exploited as a starter culture in cheese production. In the current study, we report on the complete sequencing of 16 L. lactis subsp. lactis and L. lactis subsp. cremoris genomes. The chromosomal features of these 16 L. lactis strains in conjunction with 14 completely sequenced, publicly available lactococcal chromosomes were assessed with particular emphasis on discerning the L. lactis subspecies division, evolution and niche adaptation. The deduced pan-genome of L. lactis was found to be closed, indicating that the representative data sets employed for this analysis are sufficient to fully describe the genetic diversity of the taxon. Niche adaptation appears to play a significant role in governing the genetic content of each L. lactis subspecies, while (differential) genome decay and redundancy in the dairy niche is also highlighted.

  1. Orpheovirus IHUMI-LCC2: A New Virus among the Giant Viruses

    PubMed Central

    Andreani, Julien; Khalil, Jacques Y. B.; Baptiste, Emeline; Hasni, Issam; Michelle, Caroline; Raoult, Didier; Levasseur, Anthony; La Scola, Bernard

    2018-01-01

    Giant viruses continue to invade the world of virology, in gigantic genome sizes and various particles shapes. Strains discoveries and metagenomic studies make it possible to reveal the complexity of these microorganisms, their origins, ecosystems and putative roles. We isolated from a rat stool sample a new giant virus “Orpheovirus IHUMI-LCC2,” using Vermamoeba vermiformis as host cell. In this paper, we describe the main genomic features and replicative cycle of Orpheovirus IHUMI-LCC2. It possesses a circular genome exceeding 1.4 Megabases with 25% G+C content and ovoidal-shaped particles ranging from 900 to 1300 nm. Particles are closed by at least one thick membrane in a single ostiole-like shape in their apex. Phylogenetic analysis and the reciprocal best hit for Orpheovirus show a connection to the proposed Pithoviridae family. However, some genomic characteristics bear witness to a completely divergent evolution for Orpheovirus IHUMI-LCC2 when compared to Cedratviruses or Pithoviruses. PMID:29403444

  2. Mouse Genome Database: From sequence to phenotypes and disease models

    PubMed Central

    Richardson, Joel E.; Kadin, James A.; Smith, Cynthia L.; Blake, Judith A.; Bult, Carol J.

    2015-01-01

    Summary The Mouse Genome Database (MGD, www.informatics.jax.org) is the international scientific database for genetic, genomic, and biological data on the laboratory mouse to support the research requirements of the biomedical community. To accomplish this goal, MGD provides broad data coverage, serves as the authoritative standard for mouse nomenclature for genes, mutants, and strains, and curates and integrates many types of data from literature and electronic sources. Among the key data sets MGD supports are: the complete catalog of mouse genes and genome features, comparative homology data for mouse and vertebrate genes, the authoritative set of Gene Ontology (GO) annotations for mouse gene functions, a comprehensive catalog of mouse mutations and their phenotypes, and a curated compendium of mouse models of human diseases. Here, we describe the data acquisition process, specifics about MGD's key data areas, methods to access and query MGD data, and outreach and user help facilities. genesis 53:458–473, 2015. © 2015 The Authors. Genesis Published by Wiley Periodicals, Inc. PMID:26150326

  3. Pseudomonas syringae pv. actinidiae Draft Genomes Comparison Reveal Strain-Specific Features Involved in Adaptation and Virulence to Actinidia Species

    PubMed Central

    Marcelletti, Simone; Ferrante, Patrizia; Petriccione, Milena; Firrao, Giuseppe; Scortichini, Marco

    2011-01-01

    A recent re-emerging bacterial canker disease incited by Pseudomonas syringae pv. actinidiae (Psa) is causing severe economic losses to Actinidia chinensis and A. deliciosa cultivations in southern Europe, New Zealand, Chile and South Korea. Little is known about the genetic features of this pathovar. We generated genome-wide Illumina sequence data from two Psa strains causing outbreaks of bacterial canker on the A. deliciosa cv. Hayward in Japan (J-Psa, type-strain of the pathovar) and in Italy (I-Psa) in 1984 and 1992, respectively as well as from a Psa strain (I2-Psa) isolated at the beginning of the recent epidemic on A. chinensis cv. Hort16A in Italy. All strains were isolated from typical leaf spot symptoms. The phylogenetic relationships revealed that Psa is more closely related to P. s. pv. theae than to P. avellanae within genomospecies 8. Comparative genomic analyses revealed both relevant intrapathovar variations and putative pathovar-specific genomic regions in Psa. The genomic sequences of J-Psa and I-Psa were very similar. Conversely, the I2-Psa genome encodes four additional effector protein genes, lacks a 50 kb plasmid and the phaseolotoxin gene cluster, argK-tox but has acquired a 160 kb plasmid and putative prophage sequences. Several lines of evidence from the analysis of the genome sequences support the hypothesis that this strain did not evolve from the Psa population that caused the epidemics in 1984–1992 in Japan and Italy but rather is the product of a recent independent evolution of the pathovar actinidiae for infecting Actinidia spp. All Psa strains share the genetic potential for copper resistance, antibiotic detoxification, high affinity iron acquisition and detoxification of nitric oxide of plant origin. Similar to other sequenced phytopathogenic pseudomonads associated with woody plant species, the Psa strains isolated from leaves also display a set of genes involved in the catabolism of plant-derived aromatic compounds. PMID:22132095

  4. Analysis of the Legionella longbeachae Genome and Transcriptome Uncovers Unique Strategies to Cause Legionnaires' Disease

    PubMed Central

    Rusniok, Christophe; Lomma, Mariella; Dervins-Ravault, Delphine; Newton, Hayley J.; Sansom, Fiona M.; Jarraud, Sophie; Zidane, Nora; Ma, Laurence; Bouchier, Christiane; Etienne, Jerôme; Hartland, Elizabeth L.; Buchrieser, Carmen

    2010-01-01

    Legionella pneumophila and L. longbeachae are two species of a large genus of bacteria that are ubiquitous in nature. L. pneumophila is mainly found in natural and artificial water circuits while L. longbeachae is mainly present in soil. Under the appropriate conditions both species are human pathogens, capable of causing a severe form of pneumonia termed Legionnaires' disease. Here we report the sequencing and analysis of four L. longbeachae genomes, one complete genome sequence of L. longbeachae strain NSW150 serogroup (Sg) 1, and three draft genome sequences another belonging to Sg1 and two to Sg2. The genome organization and gene content of the four L. longbeachae genomes are highly conserved, indicating strong pressure for niche adaptation. Analysis and comparison of L. longbeachae strain NSW150 with L. pneumophila revealed common but also unexpected features specific to this pathogen. The interaction with host cells shows distinct features from L. pneumophila, as L. longbeachae possesses a unique repertoire of putative Dot/Icm type IV secretion system substrates, eukaryotic-like and eukaryotic domain proteins, and encodes additional secretion systems. However, analysis of the ability of a dotA mutant of L. longbeachae NSW150 to replicate in the Acanthamoeba castellanii and in a mouse lung infection model showed that the Dot/Icm type IV secretion system is also essential for the virulence of L. longbeachae. In contrast to L. pneumophila, L. longbeachae does not encode flagella, thereby providing a possible explanation for differences in mouse susceptibility to infection between the two pathogens. Furthermore, transcriptome analysis revealed that L. longbeachae has a less pronounced biphasic life cycle as compared to L. pneumophila, and genome analysis and electron microscopy suggested that L. longbeachae is encapsulated. These species-specific differences may account for the different environmental niches and disease epidemiology of these two Legionella species. PMID:20174605

  5. Hidden genetic variation in the germline genome of Tetrahymena thermophila.

    PubMed

    Dimond, K L; Zufall, R A

    2016-06-01

    Genome architecture varies greatly among eukaryotes. This diversity may profoundly affect the origin and maintenance of genetic variation within a population. Ciliates are microbial eukaryotes with unusual genome features, such as the separation of germline and somatic genomes within a single cell and amitotic division. These features have previously been proposed to increase the rate of molecular evolution in these species. Here, we assessed the fitness effects of genetic variation in the two genomes of natural isolates of the ciliate Tetrahymena thermophila. We find more extensive genetic variation in fitness in the transcriptionally silent germline genome than in the expressed somatic genome. Surprisingly, this variation is not primarily deleterious, but has both beneficial and deleterious effects. We conclude that Tetrahymena genome architecture allows for the maintenance of genetic variation that would otherwise be eliminated by selection. We consider the effect of selection on the two genomes and the impacts of reproductive strategies and the mechanism of sex determination on the structure of this variation. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.

  6. Comparative Genomic and Phenotypic Characterization of Pathogenic and Non-Pathogenic Strains of Xanthomonas arboricola Reveals Insights into the Infection Process of Bacterial Spot Disease of Stone Fruits

    PubMed Central

    Garita-Cambronero, Jerson; Palacio-Bielsa, Ana; López, María M.

    2016-01-01

    Xanthomonas arboricola pv. pruni is the causal agent of bacterial spot disease of stone fruits, a quarantinable pathogen in several areas worldwide, including the European Union. In order to develop efficient control methods for this disease, it is necessary to improve the understanding of the key determinants associated with host restriction, colonization and the development of pathogenesis. After an initial characterization, by multilocus sequence analysis, of 15 strains of X. arboricola isolated from Prunus, one strain did not group into the pathovar pruni or into other pathovars of this species and therefore it was identified and defined as a X. arboricola pv. pruni look-a-like. This non-pathogenic strain and two typical strains of X. arboricola pv. pruni were selected for a whole genome and phenotype comparative analysis in features associated with the pathogenesis process in Xanthomonas. Comparative analysis among these bacterial strains isolated from Prunus spp. and the inclusion of 15 publicly available genome sequences from other pathogenic and non-pathogenic strains of X. arboricola revealed variations in the phenotype associated with variations in the profiles of TonB-dependent transporters, sensors of the two-component regulatory system, methyl accepting chemotaxis proteins, components of the flagella and the type IV pilus, as well as in the repertoire of cell-wall degrading enzymes and the components of the type III secretion system and related effectors. These variations provide a global overview of those mechanisms that could be associated with the development of bacterial spot disease. Additionally, it pointed out some features that might influence the host specificity and the variable virulence observed in X. arboricola. PMID:27571391

  7. Whole-genome sequencing of staphylococcus haemolyticus uncovers the extreme plasticity of its genome and the evolution of human-colonizing staphylococcal species.

    PubMed

    Takeuchi, Fumihiko; Watanabe, Shinya; Baba, Tadashi; Yuzawa, Harumi; Ito, Teruyo; Morimoto, Yuh; Kuroda, Makoto; Cui, Longzhu; Takahashi, Mikio; Ankai, Akiho; Baba, Shin-ichi; Fukui, Shigehiro; Lee, Jean C; Hiramatsu, Keiichi

    2005-11-01

    Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the "oriC environ," likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance.

  8. Motif mismatches in microsatellites: insights from genome-wide investigation among 20 insect species.

    PubMed

    Behura, Susanta K; Severson, David W

    2015-02-01

    We present a detailed genome-wide comparative study of motif mismatches of microsatellites among 20 insect species representing five taxonomic orders. The results show that varying proportions (∼15-46%) of microsatellites identified in these species are imperfect in motif structure, and that they also vary in chromosomal distribution within genomes. It was observed that the genomic abundance of imperfect repeats is significantly associated with the length and number of motif mismatches of microsatellites. Furthermore, microsatellites with a higher number of mismatches tend to have lower abundance in the genome, suggesting that sequence heterogeneity of repeat motifs is a key determinant of genomic abundance of microsatellites. This relationship seems to be a general feature of microsatellites even in unrelated species such as yeast, roundworm, mouse and human. We provide a mechanistic explanation of the evolutionary link between motif heterogeneity and genomic abundance of microsatellites by examining the patterns of motif mismatches and allele sequences of single-nucleotide polymorphisms identified within microsatellite loci. Using Drosophila Reference Genetic Panel data, we further show that pattern of allelic variation modulates motif heterogeneity of microsatellites, and provide estimates of allele age of specific imperfect microsatellites found within protein-coding genes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  9. Bacteriophages of Gordonia spp. Display a Spectrum of Diversity and Genetic Relationships

    PubMed Central

    Pope, Welkin H.; Mavrich, Travis N.; Garlena, Rebecca A.; Guerrero-Bustamante, Carlos A.; Jacobs-Sera, Deborah; Montgomery, Matthew T.; Russell, Daniel A.; Warner, Marcie H.

    2017-01-01

    ABSTRACT The global bacteriophage population is large, dynamic, old, and highly diverse genetically. Many phages are tailed and contain double-stranded DNA, but these remain poorly characterized genomically. A collection of over 1,000 phages infecting Mycobacterium smegmatis reveals the diversity of phages of a common bacterial host, but their relationships to phages of phylogenetically proximal hosts are not known. Comparative sequence analysis of 79 phages isolated on Gordonia shows these also to be diverse and that the phages can be grouped into 14 clusters of related genomes, with an additional 14 phages that are “singletons” with no closely related genomes. One group of six phages is closely related to Cluster A mycobacteriophages, but the other Gordonia phages are distant relatives and share only 10% of their genes with the mycobacteriophages. The Gordonia phage genomes vary in genome length (17.1 to 103.4 kb), percentage of GC content (47 to 68.8%), and genome architecture and contain a variety of features not seen in other phage genomes. Like the mycobacteriophages, the highly mosaic Gordonia phages demonstrate a spectrum of genetic relationships. We show this is a general property of bacteriophages and suggest that any barriers to genetic exchange are soft and readily violable. PMID:28811342

  10. Insights into archaeal evolution and symbiosis from the genomes of a Nanoarchaeon and its crenarchaeal host from Yellowstone National Park

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Podar, Mircea; Graham, David E; Reysenbach, Anna-Louise

    A hyperthemophilic member of the Nanoarchaeota from Obsidian Pool, a thermal feature in Yellowstone National Park was characterized using single cell isolation and sequencing, together with its putative host, a Sulfolobales archaeon. This first representative of a non-marine Nanoarchaeota (Nst1) resembles Nanoarchaeum equitans by lacking most biosynthetic capabilities, the two forming a deep-branching archaeal lineage. However, the Nst1 genome is over 20% larger, encodes a complete gluconeogenesis pathway and a full complement of archaeal flagellum proteins. Comparison of the two genomes suggests that the marine and terrestrial Nanoarchaeota lineages share a common ancestor that was already a symbiont of anothermore » archaeon. With a larger genome, a smaller repertoire of split protein encoding genes and no split non-contiguous tRNAs, Nst1 appears to have experienced less severe genome reduction than N. equitans. The inferred host of Nst1 is potentially autotrophic, with a streamlined genome and simplified central and energetic metabolism as compared to other Sulfolobales. The two distinct Nanoarchaeota-host genomic data sets offer insights into the evolution of archaeal symbiosis and parasitism and will further enable studies of the cellular and molecular mechanisms of these relationships.« less

  11. Different phylogenomic approaches to resolve the evolutionary relationships among model fish species.

    PubMed

    Negrisolo, Enrico; Kuhl, Heiner; Forcato, Claudio; Vitulo, Nicola; Reinhardt, Richard; Patarnello, Tomaso; Bargelloni, Luca

    2010-12-01

    Comparative genomics holds the promise to magnify the information obtained from individual genome sequencing projects, revealing common features conserved across genomes and identifying lineage-specific characteristics. To implement such a comparative approach, a robust phylogenetic framework is required to accurately reconstruct evolution at the genome level. Among vertebrate taxa, teleosts represent the second best characterized group, with high-quality draft genome sequences for five model species (Danio rerio, Gasterosteus aculeatus, Oryzias latipes, Takifugu rubripes, and Tetraodon nigroviridis), and several others are in the finishing lane. However, the relationships among the acanthomorph teleost model fishes remain an unresolved taxonomic issue. Here, a genomic region spanning over 1.2 million base pairs was sequenced in the teleost fish Dicentrarchus labrax. Together with genomic data available for the above fish models, the new sequence was used to identify unique orthologous genomic regions shared across all target taxa. Different strategies were applied to produce robust multiple gene and genomic alignments spanning from 11,802 to 186,474 amino acid/nucleotide positions. Ten data sets were analyzed according to Bayesian inference, maximum likelihood, maximum parsimony, and neighbor joining methods. Extensive analyses were performed to explore the influence of several factors (e.g., alignment methodology, substitution model, data set partitions, and long-branch attraction) on the tree topology. Although a general consensus was observed for a closer relationship between G. aculeatus (Gasterosteidae) and Di. labrax (Moronidae) with the atherinomorph O. latipes (Beloniformes) sister taxon of this clade, with the tetraodontiform group Ta. rubripes and Te. nigroviridis (Tetraodontiformes) representing a more distantly related taxon among acanthomorph model fish species, conflicting results were obtained between data sets and methods, especially with respect to the choice of alignment methodology applied to noncoding parts of the genomic region under study. This may limit the use of intergenic/noncoding sequences in phylogenomics until more robust alignment algorithms are developed.

  12. Gene editing by CRISPR/Cas9 in the obligatory outcrossing Medicago sativa.

    PubMed

    Gao, Ruimin; Feyissa, Biruk A; Croft, Mana; Hannoufa, Abdelali

    2018-04-01

    The CRISPR/Cas9 technique was successfully used to edit the genome of the obligatory outcrossing plant species Medicago sativa L. (alfalfa). RNA-guided genome engineering using Clustered Regularly Interspersed Short Palindromic Repeats (CRISPR)/Cas9 technology enables a variety of applications in plants. Successful application and validation of the CRISPR technique in a multiplex genome, such as that of M. sativa (alfalfa) will ultimately lead to major advances in the improvement of this crop. We used CRISPR/Cas9 technique to mutate squamosa promoter binding protein like 9 (SPL9) gene in alfalfa. Because of the complex features of the alfalfa genome, we first used droplet digital PCR (ddPCR) for high-throughput screening of large populations of CRISPR-modified plants. Based on the results of genome editing rates obtained from the ddPCR screening, plants with relatively high rates were subjected to further analysis by restriction enzyme digestion/PCR amplification analyses. PCR products encompassing the respective small guided RNA target locus were then sub-cloned and sequenced to verify genome editing. In summary, we successfully applied the CRISPR/Cas9 technique to edit the SPL9 gene in a multiplex genome, providing some insights into opportunities to apply this technology in future alfalfa breeding. The overall efficiency in the polyploid alfalfa genome was lower compared to other less-complex plant genomes. Further refinement of the CRISPR technology system will thus be required for more efficient genome editing in this plant.

  13. Three draft genomes of Vibrio coralliilyticus strains isolated from bivalve hatcheries

    USDA-ARS?s Scientific Manuscript database

    Reported here are the draft genomes of three Vibrio coralliilyticus isolates RE87, AIC-7, and 080116A. Each strain was isolated in association with diseased oyster larvae in commercial aquaculture systems. These draft genomes will be useful for further studies in understanding the genomic features...

  14. Systems biology definition of the core proteome of metabolism and expression is consistent with high-throughput data.

    PubMed

    Yang, Laurence; Tan, Justin; O'Brien, Edward J; Monk, Jonathan M; Kim, Donghyuk; Li, Howard J; Charusanti, Pep; Ebrahim, Ali; Lloyd, Colton J; Yurkovich, James T; Du, Bin; Dräger, Andreas; Thomas, Alex; Sun, Yuekai; Saunders, Michael A; Palsson, Bernhard O

    2015-08-25

    Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed by using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems-level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes 212 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has 78% gene function overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium). Based on transcriptomics data across environmental and genetic backgrounds, the systems biology core proteome is significantly enriched in nondifferentially expressed genes and depleted in differentially expressed genes. Compared with the noncore, core gene expression levels are also similar across genetic backgrounds (two times higher Spearman rank correlation) and exhibit significantly more complex transcriptional and posttranscriptional regulatory features (40% more transcription start sites per gene, 22% longer 5'UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, validated by using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.

  15. Comparative genomics of pathogenic lineages of Vibrio nigripulchritudo identifies virulence-associated traits

    PubMed Central

    Goudenège, David; Labreuche, Yannick; Krin, Evelyne; Ansquer, Dominique; Mangenot, Sophie; Calteau, Alexandra; Médigue, Claudine; Mazel, Didier; Polz, Martin F; Le Roux, Frédérique

    2013-01-01

    Vibrio nigripulchritudo is an emerging pathogen of farmed shrimp in New Caledonia and other regions in the Indo-Pacific. The molecular determinants of V. nigripulchritudo pathogenicity are unknown; however, molecular epidemiological studies have suggested that pathogenicity is linked to particular lineages. Here, we performed high-throughput sequencing-based comparative genome analysis of 16 V. nigripulchritudo strains to explore the genomic diversity and evolutionary history of pathogen-containing lineages and to identify pathogen-specific genetic elements. Our phylogenetic analysis revealed three pathogen-containing V. nigripulchritudo clades, including two clades previously identified from New Caledonia and one novel clade comprising putatively pathogenic isolates from septicemic shrimp in Madagascar. The similar genetic distance between the three clades indicates that they have diverged from an ancestral population roughly at the same time and recombination analysis indicates that these genomes have, in the past, shared a common gene pool and exchanged genes. As each contemporary lineage is comprised of nearly identical strains, comparative genomics allowed differentiation of genetic elements specific to shrimp pathogenesis of varying severity. Notably, only a large plasmid present in all highly pathogenic (HP) strains encodes a toxin. Although less/non-pathogenic strains contain related plasmids, these are differentiated by a putative toxin locus. Expression of this gene by a non-pathogenic V. nigripulchritudo strain resulted in production of toxic culture supernatant, normally an exclusive feature of HP strains. Thus, this protein, here termed ‘nigritoxin', is implicated to an extent that remains to be precisely determined in the toxicity of V. nigripulchritudo. PMID:23739050

  16. Complete genome analysis of three Acinetobacter baumannii clinical isolates in China for insight into the diversification of drug resistance elements.

    PubMed

    Zhu, Lingxiang; Yan, Zhongqiang; Zhang, Zhaojun; Zhou, Qiming; Zhou, Jinchun; Wakeland, Edward K; Fang, Xiangdong; Xuan, Zhenyu; Shen, Dingxia; Li, Quan-Zhen

    2013-01-01

    The emergence and rapid spreading of multidrug-resistant Acinetobacter baumannii strains has become a major health threat worldwide. To better understand the genetic recombination related with the acquisition of drug-resistant elements during bacterial infection, we performed complete genome analysis on three newly isolated multidrug-resistant A. baumannii strains from Beijing using next-generation sequencing technology. Whole genome comparison revealed that all 3 strains share some common drug resistant elements including carbapenem-resistant bla OXA-23 and tetracycline (tet) resistance islands, but the genome structures are diversified among strains. Various genomic islands intersperse on the genome with transposons and insertions, reflecting the recombination flexibility during the acquisition of the resistant elements. The blood-isolated BJAB07104 and ascites-isolated BJAB0868 exhibit high similarity on their genome structure with most of the global clone II strains, suggesting these two strains belong to the dominant outbreak strains prevalent worldwide. A large resistance island (RI) of about 121-kb, carrying a cluster of resistance-related genes, was inserted into the ATPase gene on BJAB07104 and BJAB0868 genomes. A 78-kb insertion element carrying tra-locus and bla OXA-23 island, can be either inserted into one of the tniB gene in the 121-kb RI on the chromosome, or transformed to conjugative plasmid in the two BJAB strains. The third strains of this study, BJAB0715, which was isolated from spinal fluid, exhibit much more divergence compared with above two strains. It harbors multiple drug-resistance elements including a truncated AbaR-22-like RI on its genome. One of the unique features of this strain is that it carries both bla OXA-23 and bla OXA-58 genes on its genome. Besides, an Acinetobacter lwoffii adeABC efflux element was found inserted into the ATPase position in BJAB0715. Our comparative analysis on currently completed Acinetobacter baumannii genomes revealed extensive and dynamic genome organizations, which may facilitate the bacteria to acquire drug-resistance elements into their genomes.

  17. The Candidate Phylum Poribacteria by Single-Cell Genomics: New Insights into Phylogeny, Cell-Compartmentation, Eukaryote-Like Repeat Proteins, and Other Genomic Features

    PubMed Central

    Kamke, Janine; Rinke, Christian; Schwientek, Patrick; Mavromatis, Kostas; Ivanova, Natalia; Sczyrba, Alexander; Woyke, Tanja; Hentschel, Ute

    2014-01-01

    The candidate phylum Poribacteria is one of the most dominant and widespread members of the microbial communities residing within marine sponges. Cell compartmentalization had been postulated along with their discovery about a decade ago and their phylogenetic association to the Planctomycetes, Verrucomicrobia, Chlamydiae superphylum was proposed soon thereafter. In the present study we revised these features based on genomic data obtained from six poribacterial single cells. We propose that Poribacteria form a distinct monophyletic phylum contiguous to the PVC superphylum together with other candidate phyla. Our genomic analyses supported the possibility of cell compartmentalization in form of bacterial microcompartments. Further analyses of eukaryote-like protein domains stressed the importance of such proteins with features including tetratricopeptide repeats, leucin rich repeats as well as low density lipoproteins receptor repeats, the latter of which are reported here for the first time from a sponge symbiont. Finally, examining the most abundant protein domain family on poribacterial genomes revealed diverse phyH family proteins, some of which may be related to dissolved organic posphorus uptake. PMID:24498082

  18. Templated sequence insertion polymorphisms in the human genome

    NASA Astrophysics Data System (ADS)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  19. Machine learning for epigenetics and future medical applications

    PubMed Central

    Holder, Lawrence B.; Haque, M. Muksitul; Skinner, Michael K.

    2017-01-01

    ABSTRACT Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review. PMID:28524769

  20. Intrastrand triplex DNA repeats in bacteria: a source of genomic instability

    PubMed Central

    Holder, Isabelle T.; Wagner, Stefanie; Xiong, Peiwen; Sinn, Malte; Frickey, Tancred; Meyer, Axel; Hartig, Jörg S.

    2015-01-01

    Repetitive nucleic acid sequences are often prone to form secondary structures distinct from B-DNA. Prominent examples of such structures are DNA triplexes. We observed that certain intrastrand triplex motifs are highly conserved and abundant in prokaryotic genomes. A systematic search of 5246 different prokaryotic plasmids and genomes for intrastrand triplex motifs was conducted and the results summarized in the ITxF database available online at http://bioinformatics.uni-konstanz.de/utils/ITxF/. Next we investigated biophysical and biochemical properties of a particular G/C-rich triplex motif (TM) that occurs in many copies in more than 260 bacterial genomes by CD and nuclear magnetic resonance spectroscopy as well as in vivo footprinting techniques. A characterization of putative properties and functions of these unusually frequent nucleic acid motifs demonstrated that the occurrence of the TM is associated with a high degree of genomic instability. TM-containing genomic loci are significantly more rearranged among closely related Escherichia coli strains compared to control sites. In addition, we found very high frequencies of TM motifs in certain Enterobacteria and Cyanobacteria that were previously described as genetically highly diverse. In conclusion we link intrastrand triplex motifs with the induction of genomic instability. We speculate that the observed instability might be an adaptive feature of these genomes that creates variation for natural selection to act upon. PMID:26450966

  1. Phaeobacter gallaeciensis genomes from globally opposite locations reveal high similarity of adaptation to surface life

    PubMed Central

    Thole, Sebastian; Kalhoefer, Daniela; Voget, Sonja; Berger, Martine; Engelhardt, Tim; Liesegang, Heiko; Wollherr, Antje; Kjelleberg, Staffan; Daniel, Rolf; Simon, Meinhard; Thomas, Torsten; Brinkhoff, Thorsten

    2012-01-01

    Phaeobacter gallaeciensis, a member of the abundant marine Roseobacter clade, is known to be an effective colonizer of biotic and abiotic marine surfaces. Production of the antibiotic tropodithietic acid (TDA) makes P. gallaeciensis a strong antagonist of many bacteria, including fish and mollusc pathogens. In addition to TDA, several other secondary metabolites are produced, allowing the mutualistic bacterium to also act as an opportunistic pathogen. Here we provide the manually annotated genome sequences of the P. gallaeciensis strains DSM 17395 and 2.10, isolated at the Atlantic coast of north western Spain and near Sydney, Australia, respectively. Despite their isolation sites from the two different hemispheres, the genome comparison demonstrated a surprisingly high level of synteny (only 3% nucleotide dissimilarity and 88% and 93% shared genes). Minor differences in the genomes result from horizontal gene transfer and phage infection. Comparison of the P. gallaeciensis genomes with those of other roseobacters revealed unique genomic traits, including the production of iron-scavenging siderophores. Experiments supported the predicted capacity of both strains to grow on various algal osmolytes. Transposon mutagenesis was used to expand the current knowledge on the TDA biosynthesis pathway in strain DSM 17395. This first comparative genomic analysis of finished genomes of two closely related strains belonging to one species of the Roseobacter clade revealed features that provide competitive advantages and facilitate surface attachment and interaction with eukaryotic hosts. PMID:22717884

  2. Genome of the Actinomycete Plant Pathogen Clavibacter michiganensis subsp. sepedonicus Suggests Recent Niche Adaptation▿ †

    PubMed Central

    Bentley, Stephen D.; Corton, Craig; Brown, Susan E.; Barron, Andrew; Clark, Louise; Doggett, Jon; Harris, Barbara; Ormond, Doug; Quail, Michael A.; May, Georgiana; Francis, David; Knudson, Dennis; Parkhill, Julian; Ishimaru, Carol A.

    2008-01-01

    Clavibacter michiganensis subsp. sepedonicus is a plant-pathogenic bacterium and the causative agent of bacterial ring rot, a devastating agricultural disease under strict quarantine control and zero tolerance in the seed potato industry. This organism appears to be largely restricted to an endophytic lifestyle, proliferating within plant tissues and unable to persist in the absence of plant material. Analysis of the genome sequence of C. michiganensis subsp. sepedonicus and comparison with the genome sequences of related plant pathogens revealed a dramatic recent evolutionary history. The genome contains 106 insertion sequence elements, which appear to have been active in extensive rearrangement of the chromosome compared to that of Clavibacter michiganensis subsp. michiganensis. There are 110 pseudogenes with overrepresentation in functions associated with carbohydrate metabolism, transcriptional regulation, and pathogenicity. Genome comparisons also indicated that there is substantial gene content diversity within the species, probably due to differential gene acquisition and loss. These genomic features and evolutionary dating suggest that there was recent adaptation for life in a restricted niche where nutrient diversity and perhaps competition are low, correlated with a reduced ability to exploit previously occupied complex niches outside the plant. Toleration of factors such as multiplication and integration of insertion sequence elements, genome rearrangements, and functional disruption of many genes and operons seems to indicate that there has been general relaxation of selective pressure on a large proportion of the genome. PMID:18192393

  3. New Era of Studying RNA Secondary Structure and Its Influence on Gene Regulation in Plants.

    PubMed

    Yang, Xiaofei; Yang, Minglei; Deng, Hongjing; Ding, Yiliang

    2018-01-01

    The dynamic structure of RNA plays a central role in post-transcriptional regulation of gene expression such as RNA maturation, degradation, and translation. With the rise of next-generation sequencing, the study of RNA structure has been transformed from in vitro low-throughput RNA structure probing methods to in vivo high-throughput RNA structure profiling. The development of these methods enables incremental studies on the function of RNA structure to be performed, revealing new insights of novel regulatory mechanisms of RNA structure in plants. Genome-wide scale RNA structure profiling allows us to investigate general RNA structural features over 10s of 1000s of mRNAs and to compare RNA structuromes between plant species. Here, we provide a comprehensive and up-to-date overview of: (i) RNA structure probing methods; (ii) the biological functions of RNA structure; (iii) genome-wide RNA structural features corresponding to their regulatory mechanisms; and (iv) RNA structurome evolution in plants.

  4. Comparative genomic analysis of the MHC: the evolution of class I duplication blocks, diversity and complexity from shark to man.

    PubMed

    Kulski, Jerzy K; Shiina, Takashi; Anzai, Tatsuya; Kohara, Sakae; Inoko, Hidetoshi

    2002-12-01

    The major histocompatibility complex (MHC) genomic region is composed of a group of linked genes involved functionally with the adaptive and innate immune systems. The class I and class II genes are intrinsic features of the MHC and have been found in all the jawed vertebrates studied so far. The MHC genomic regions of the human and the chicken (B locus) have been fully sequenced and mapped, and the mouse MHC sequence is almost finished. Information on the MHC genomic structures (size, complexity, genic and intergenic composition and organization, gene order and number) of other vertebrates is largely limited or nonexistent. Therefore, we are mapping, sequencing and analyzing the MHC genomic regions of different human haplotypes and at least eight nonhuman species. Here, we review our progress with these sequences and compare the human MHC structure with that of the nonhuman primates (chimpanzee and rhesus macaque), other mammals (pigs, mice and rats) and nonmammalian vertebrates such as birds (chicken and quail), bony fish (medaka, pufferfish and zebrafish) and cartilaginous fish (nurse shark). This comparison reveals a complex MHC structure for mammals and a relatively simpler design for nonmammalian animals with a hypothetical prototypic structure for the shark. In the mammalian MHC, there are two to five different class I duplication blocks embedded within a framework of conserved nonclass I and/or nonclass II genes. With a few exceptions, the class I framework genes are absent from the MHC of birds, bony fish and sharks. Comparative genomics of the MHC reveal a highly plastic region with major structural differences between the mammalian and nonmammalian vertebrates. Additional genomic data are needed on animals of the reptilia, crocodilia and marsupial classes to find the origins of the class I framework genes and examples of structures that may be intermediate between the simple and complex MHC organizations of birds and mammals, respectively.

  5. Comparative map and trait viewer (CMTV): an integrated bioinformatic tool to construct consensus maps and compare QTL and functional genomics data across genomes and experiments.

    PubMed

    Sawkins, M C; Farmer, A D; Hoisington, D; Sullivan, J; Tolopko, A; Jiang, Z; Ribaut, J-M

    2004-10-01

    In the past few decades, a wealth of genomic data has been produced in a wide variety of species using a diverse array of functional and molecular marker approaches. In order to unlock the full potential of the information contained in these independent experiments, researchers need efficient and intuitive means to identify common genomic regions and genes involved in the expression of target phenotypic traits across diverse conditions. To address this need, we have developed a Comparative Map and Trait Viewer (CMTV) tool that can be used to construct dynamic aggregations of a variety of types of genomic datasets. By algorithmically determining correspondences between sets of objects on multiple genomic maps, the CMTV can display syntenic regions across taxa, combine maps from separate experiments into a consensus map, or project data from different maps into a common coordinate framework using dynamic coordinate translations between source and target maps. We present a case study that illustrates the utility of the tool for managing large and varied datasets by integrating data collected by CIMMYT in maize drought tolerance research with data from public sources. This example will focus on one of the visualization features for Quantitative Trait Locus (QTL) data, using likelihood ratio (LR) files produced by generic QTL analysis software and displaying the data in a unique visual manner across different combinations of traits, environments and crosses. Once a genomic region of interest has been identified, the CMTV can search and display additional QTLs meeting a particular threshold for that region, or other functional data such as sets of differentially expressed genes located in the region; it thus provides an easily used means for organizing and manipulating data sets that have been dynamically integrated under the focus of the researcher's specific hypothesis.

  6. Prevalent Role of Gene Features in Determining Evolutionary Fates of Whole-Genome Duplication Duplicated Genes in Flowering Plants1[W][OA

    PubMed Central

    Jiang, Wen-kai; Liu, Yun-long; Xia, En-hua; Gao, Li-zhi

    2013-01-01

    The evolution of genes and genomes after polyploidization has been the subject of extensive studies in evolutionary biology and plant sciences. While a significant number of duplicated genes are rapidly removed during a process called fractionation, which operates after the whole-genome duplication (WGD), another considerable number of genes are retained preferentially, leading to the phenomenon of biased gene retention. However, the evolutionary mechanisms underlying gene retention after WGD remain largely unknown. Through genome-wide analyses of sequence and functional data, we comprehensively investigated the relationships between gene features and the retention probability of duplicated genes after WGDs in six plant genomes, Arabidopsis (Arabidopsis thaliana), poplar (Populus trichocarpa), soybean (Glycine max), rice (Oryza sativa), sorghum (Sorghum bicolor), and maize (Zea mays). The results showed that multiple gene features were correlated with the probability of gene retention. Using a logistic regression model based on principal component analysis, we resolved evolutionary rate, structural complexity, and GC3 content as the three major contributors to gene retention. Cluster analysis of these features further classified retained genes into three distinct groups in terms of gene features and evolutionary behaviors. Type I genes are more prone to be selected by dosage balance; type II genes are possibly subject to subfunctionalization; and type III genes may serve as potential targets for neofunctionalization. This study highlights that gene features are able to act jointly as primary forces when determining the retention and evolution of WGD-derived duplicated genes in flowering plants. These findings thus may help to provide a resolution to the debate on different evolutionary models of gene fates after WGDs. PMID:23396833

  7. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction

    PubMed Central

    Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah; Shin, Hyunjung; Park, Yu Rang; Ritchie, Marylyn D; Kim, Ju Han

    2015-01-01

    Objective Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Methods Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Results Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Conclusions Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies. PMID:25002459

  8. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction.

    PubMed

    Kim, Dokyoon; Joung, Je-Gun; Sohn, Kyung-Ah; Shin, Hyunjung; Park, Yu Rang; Ritchie, Marylyn D; Kim, Ju Han

    2015-01-01

    Cancer can involve gene dysregulation via multiple mechanisms, so no single level of genomic data fully elucidates tumor behavior due to the presence of numerous genomic variations within or between levels in a biological system. We have previously proposed a graph-based integration approach that combines multi-omics data including copy number alteration, methylation, miRNA, and gene expression data for predicting clinical outcome in cancer. However, genomic features likely interact with other genomic features in complex signaling or regulatory networks, since cancer is caused by alterations in pathways or complete processes. Here we propose a new graph-based framework for integrating multi-omics data and genomic knowledge to improve power in predicting clinical outcomes and elucidate interplay between different levels. To highlight the validity of our proposed framework, we used an ovarian cancer dataset from The Cancer Genome Atlas for predicting stage, grade, and survival outcomes. Integrating multi-omics data with genomic knowledge to construct pre-defined features resulted in higher performance in clinical outcome prediction and higher stability. For the grade outcome, the model with gene expression data produced an area under the receiver operating characteristic curve (AUC) of 0.7866. However, models of the integration with pathway, Gene Ontology, chromosomal gene set, and motif gene set consistently outperformed the model with genomic data only, attaining AUCs of 0.7873, 0.8433, 0.8254, and 0.8179, respectively. Integrating multi-omics data and genomic knowledge to improve understanding of molecular pathogenesis and underlying biology in cancer should improve diagnostic and prognostic indicators and the effectiveness of therapies. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association.

  9. Lynch Syndrome: Genomics Update and Imaging Review.

    PubMed

    Cox, Veronica L; Saeed Bamashmos, Anas A; Foo, Wai Chin; Gupta, Shiva; Yedururi, Sireesha; Garg, Naveen; Kang, Hyunseon Christine

    2018-01-01

    Lynch syndrome is the most common hereditary cancer syndrome, the most common cause of heritable colorectal cancer, and the only known heritable cause of endometrial cancer. Other cancers associated with Lynch syndrome include cancers of the ovary, stomach, urothelial tract, and small bowel, and less frequently, cancers of the brain, biliary tract, pancreas, and prostate. The oncogenic tendency of Lynch syndrome stems from a set of genomic alterations of mismatch repair proteins. Defunct mismatch repair proteins cause unusually high instability of regions of the genome called microsatellites. Over time, the accumulation of mutations in microsatellites and elsewhere in the genome can affect the production of important cellular proteins, spurring tumorigenesis. Universal testing of colorectal tumors for microsatellite instability (MSI) is now recommended to (a) prevent cases of Lynch syndrome being missed owing to the use of clinical criteria alone, (b) reduce morbidity and mortality among the relatives of affected individuals, and (c) guide management decisions. Organ-specific cancer risks and associated screening paradigms vary according to the sex of the affected individual and the type of germline DNA alteration causing the MSI. Furthermore, Lynch syndrome-associated cancers have different pathologic, radiologic, and clinical features compared with their sporadic counterparts. Most notably, Lynch syndrome-associated tumors tend to be more indolent than non-Lynch syndrome-associated neoplasms and thus may respond differently to traditional chemotherapy regimens. The high MSI in cases of colorectal cancer reflects a difference in the biologic features of the tumor, possibly with a unique susceptibility to immunotherapy. © RSNA, 2018.

  10. A novel roseobacter phage possesses features of podoviruses, siphoviruses, prophages and gene transfer agents

    PubMed Central

    Zhan, Yuanchao; Huang, Sijun; Voget, Sonja; Simon, Meinhard; Chen, Feng

    2016-01-01

    Bacteria in the Roseobacter lineage have been studied extensively due to their significant biogeochemical roles in the marine ecosystem. However, our knowledge on bacteriophage which infects the Roseobacter clade is still very limited. Here, we report a new bacteriophage, phage DSS3Φ8, which infects marine roseobacter Ruegeria pomeroyi DSS-3. DSS3Φ8 is a lytic siphovirus. Genomic analysis showed that DSS3Φ8 is most closely related to a group of siphoviruses, CbK-like phages, which infect freshwater bacterium Caulobacter crescentus. DSS3Φ8 contains a smaller capsid and has a reduced genome size (146 kb) compared to the CbK-like phages (205–279 kb). DSS3Φ8 contains the DNA polymerase gene which is closely related to T7-like podoviruses. DSS3Φ8 also contains the integrase and repressor genes, indicating its potential to involve in lysogenic cycle. In addition, four GTA (gene transfer agent) genes were identified in the DSS3Φ8 genome. Genomic analysis suggests that DSS3Φ8 is a highly mosaic phage that inherits the genetic features from siphoviruses, podoviruses, prophages and GTAs. This is the first report of CbK-like phages infecting marine bacteria. We believe phage isolation is still a powerful tool that can lead to discovery of new phages and help interpret the overwhelming unknown sequences in the viral metagenomics. PMID:27460944

  11. A novel roseobacter phage possesses features of podoviruses, siphoviruses, prophages and gene transfer agents

    NASA Astrophysics Data System (ADS)

    Zhan, Yuanchao; Huang, Sijun; Voget, Sonja; Simon, Meinhard; Chen, Feng

    2016-07-01

    Bacteria in the Roseobacter lineage have been studied extensively due to their significant biogeochemical roles in the marine ecosystem. However, our knowledge on bacteriophage which infects the Roseobacter clade is still very limited. Here, we report a new bacteriophage, phage DSS3Φ8, which infects marine roseobacter Ruegeria pomeroyi DSS-3. DSS3Φ8 is a lytic siphovirus. Genomic analysis showed that DSS3Φ8 is most closely related to a group of siphoviruses, CbK-like phages, which infect freshwater bacterium Caulobacter crescentus. DSS3Φ8 contains a smaller capsid and has a reduced genome size (146 kb) compared to the CbK-like phages (205-279 kb). DSS3Φ8 contains the DNA polymerase gene which is closely related to T7-like podoviruses. DSS3Φ8 also contains the integrase and repressor genes, indicating its potential to involve in lysogenic cycle. In addition, four GTA (gene transfer agent) genes were identified in the DSS3Φ8 genome. Genomic analysis suggests that DSS3Φ8 is a highly mosaic phage that inherits the genetic features from siphoviruses, podoviruses, prophages and GTAs. This is the first report of CbK-like phages infecting marine bacteria. We believe phage isolation is still a powerful tool that can lead to discovery of new phages and help interpret the overwhelming unknown sequences in the viral metagenomics.

  12. TCGA study identifies genomic features of cervical cancer

    Cancer.gov

    Investigators with The Cancer Genome Atlas (TCGA) Research Network have identified novel genomic and molecular characteristics of cervical cancer that will aid in subclassification of the disease and may help target therapies that are most appropriate for each patient.

  13. Expression Quantitative Trait Locus Mapping across Water Availability Environments Reveals Contrasting Associations with Genomic Features in Arabidopsis[C][W][OPEN

    PubMed Central

    Lowry, David B.; Logan, Tierney L.; Santuari, Luca; Hardtke, Christian S.; Richards, James H.; DeRose-Wilson, Leah J.; McKay, John K.; Sen, Saunak; Juenger, Thomas E.

    2013-01-01

    The regulation of gene expression is crucial for an organism’s development and response to stress, and an understanding of the evolution of gene expression is of fundamental importance to basic and applied biology. To improve this understanding, we conducted expression quantitative trait locus (eQTL) mapping in the Tsu-1 (Tsushima, Japan) × Kas-1 (Kashmir, India) recombinant inbred line population of Arabidopsis thaliana across soil drying treatments. We then used genome resequencing data to evaluate whether genomic features (promoter polymorphism, recombination rate, gene length, and gene density) are associated with genes responding to the environment (E) or with genes with genetic variation (G) in gene expression in the form of eQTLs. We identified thousands of genes that responded to soil drying and hundreds of main-effect eQTLs. However, we identified very few statistically significant eQTLs that interacted with the soil drying treatment (GxE eQTL). Analysis of genome resequencing data revealed associations of several genomic features with G and E genes. In general, E genes had lower promoter diversity and local recombination rates. By contrast, genes with eQTLs (G) had significantly greater promoter diversity and were located in genomic regions with higher recombination. These results suggest that genomic architecture may play an important a role in the evolution of gene expression. PMID:24045022

  14. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  15. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

    PubMed

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

    2013-07-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

  16. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  17. The Transition from a Phytopathogenic Smut Ancestor to an Anamorphic Biocontrol Agent Deciphered by Comparative Whole-Genome Analysis[W][OPEN

    PubMed Central

    Lefebvre, François; Joly, David L.; Labbé, Caroline; Teichmann, Beate; Linning, Rob; Belzile, François; Bakkeren, Guus; Bélanger, Richard R.

    2013-01-01

    Pseudozyma flocculosa is related to the model plant pathogen Ustilago maydis yet is not a phytopathogen but rather a biocontrol agent of powdery mildews; this relationship makes it unique for the study of the evolution of plant pathogenicity factors. The P. flocculosa genome of ∼23 Mb includes 6877 predicted protein coding genes. Genome features, including hallmarks of pathogenicity, are very similar in P. flocculosa and U. maydis, Sporisorium reilianum, and Ustilago hordei. Furthermore, P. flocculosa, a strict anamorph, revealed conserved and seemingly intact mating-type and meiosis loci typical of Ustilaginales. By contrast, we observed the loss of a specific subset of candidate secreted effector proteins reported to influence virulence in U. maydis as the singular divergence that could explain its nonpathogenic nature. These results suggest that P. flocculosa could have once been a virulent smut fungus that lost the specific effectors necessary for host compatibility. Interestingly, the biocontrol agent appears to have acquired genes encoding secreted proteins not found in the compared Ustilaginales, including necrosis-inducing-Phytophthora-protein- and Lysin-motif- containing proteins believed to have direct relevance to its lifestyle. The genome sequence should contribute to new insights into the subtle genetic differences that can lead to drastic changes in fungal pathogen lifestyles. PMID:23800965

  18. The Ditylenchus destructor genome provides new insights into the evolution of plant parasitic nematodes.

    PubMed

    Zheng, Jinshui; Peng, Donghai; Chen, Ling; Liu, Hualin; Chen, Feng; Xu, Mengci; Ju, Shouyong; Ruan, Lifang; Sun, Ming

    2016-07-27

    Plant-parasitic nematodes were found in 4 of the 12 clades of phylum Nematoda. These nematodes in different clades may have originated independently from their free-living fungivorous ancestors. However, the exact evolutionary process of these parasites is unclear. Here, we sequenced the genome sequence of a migratory plant nematode, Ditylenchus destructor We performed comparative genomics among the free-living nematode, Caenorhabditis elegans and all the plant nematodes with genome sequences available. We found that, compared with C. elegans, the core developmental control processes underwent heavy reduction, though most signal transduction pathways were conserved. We also found D. destructor contained more homologies of the key genes in the above processes than the other plant nematodes. We suggest that Ditylenchus spp. may be an intermediate evolutionary history stage from free-living nematodes that feed on fungi to obligate plant-parasitic nematodes. Based on the facts that D. destructor can feed on fungi and has a relatively short life cycle, and that it has similar features to both C. elegans and sedentary plant-parasitic nematodes from clade 12, we propose it as a new model to study the biology, biocontrol of plant nematodes and the interaction between nematodes and plants. © 2016 The Author(s).

  19. Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

    PubMed Central

    Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

    2011-01-01

    Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716

  20. Characterization of the complete chloroplast genome of the endangered species Carya sinensis (Juglandaceae)

    Treesearch

    Yiheng Hu; Xi Chen; Xiaojia Feng; Keith E. Woeste; Peng Zhao

    2016-01-01

    Carya sinensis (Chinese Hickory, beaked walnut, or beaked hickory) is an endangered species that needs urgent conservation action. Here, we reported the complete chloroplast (cp) genome sequence and the genomic features of the C. sinensis cp, which is the first complete cp genome of any member of Carya. The...

  1. The Genome of Winter Moth (Operophtera brumata) Provides a Genomic Perspective on Sexual Dimorphism and Phenology.

    PubMed

    Derks, Martijn F L; Smit, Sandra; Salis, Lucia; Schijlen, Elio; Bossers, Alex; Mateman, Christa; Pijl, Agata S; de Ridder, Dick; Groenen, Martien A M; Visser, Marcel E; Megens, Hendrik-Jan

    2015-07-29

    The winter moth (Operophtera brumata) belongs to one of the most species-rich families in Lepidoptera, the Geometridae (approximately 23,000 species). This family is of great economic importance as most species are herbivorous and capable of defoliating trees. Genome assembly of the winter moth allows the study of genes and gene families, such as the cytochrome P450 gene family, which is known to be vital in plant secondary metabolite detoxification and host-plant selection. It also enables exploration of the genomic basis for female brachyptery (wing reduction), a feature of sexual dimorphism in winter moth, and for seasonal timing, a trait extensively studied in this species. Here we present a reference genome for the winter moth, the first geometrid and largest sequenced Lepidopteran genome to date (638 Mb) including a set of 16,912 predicted protein-coding genes. This allowed us to assess the dynamics of evolution on a genome-wide scale using the P450 gene family. We also identified an expanded gene family potentially linked to female brachyptery, and annotated the genes involved in the circadian clock mechanism as main candidates for involvement in seasonal timing. The genome will contribute to Lepidopteran genomic resources and comparative genomics. In addition, the genome enhances our ability to understand the genetic and molecular basis of insect seasonal timing and thereby provides a reference for future evolutionary and population studies on the winter moth. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Dynamic Evolution of the Chloroplast Genome in the Green Algal Classes Pedinophyceae and Trebouxiophyceae.

    PubMed

    Turmel, Monique; Otis, Christian; Lemieux, Claude

    2015-07-01

    Previous studies of trebouxiophycean chloroplast genomes revealed little information regarding the evolutionary dynamics of this genome because taxon sampling was too sparse and the relationships between the sampled taxa were unknown. We recently sequenced the chloroplast genomes of 27 trebouxiophycean and 2 pedinophycean green algae to resolve the relationships among the main lineages recognized for the Trebouxiophyceae. These taxa and the previously sampled members of the Pedinophyceae and Trebouxiophyceae are included in the comparative chloroplast genome analysis we report here. The 38 genomes examined display considerable variability at all levels, except gene content. Our results highlight the high propensity of the rDNA-containing large inverted repeat (IR) to vary in size, gene content and gene order as well as the repeated losses it experienced during trebouxiophycean evolution. Of the seven predicted IR losses, one event demarcates a superclade of 11 taxa representing 5 late-diverging lineages. IR expansions/contractions account not only for changes in gene content in this region but also for changes in gene order and gene duplications. Inversions also led to gene rearrangements within the IR, including the reversal or disruption of the rDNA operon in some lineages. Most of the 20 IR-less genomes are more rearranged compared with their IR-containing homologs and tend to show an accelerated rate of sequence evolution. In the IR-less superclade, several ancestral operons were disrupted, a few genes were fragmented, and a subgroup of taxa features a G+C-biased nucleotide composition. Our analyses also unveiled putative cases of gene acquisitions through horizontal transfer. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. A pan-genomic approach to understand the basis of host adaptation in Achromobacter.

    PubMed

    Jeukens, J; Freschi, L; Vincent, A T; Emond-Rheault, J G; Kukavica-Ibrulj, I; Charette, S J; Levesque, R C

    2017-04-05

    Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis (CF) lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the CF lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of A. xylosoxidans A insuavis A. dolens and A. ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared to other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus's resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types.

    PubMed

    Cheng, Feixiong; Liu, Chuang; Lin, Chen-Ching; Zhao, Junfei; Jia, Peilin; Li, Wen-Hsiung; Zhao, Zhongming

    2015-09-01

    Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics.

  5. A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types

    PubMed Central

    Lin, Chen-Ching; Zhao, Junfei; Jia, Peilin; Li, Wen-Hsiung; Zhao, Zhongming

    2015-01-01

    Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics. PMID:26352260

  6. Structural and functional analysis of the finished genome of the recently isolated toxic Anabaena sp. WA102.

    PubMed

    Brown, Nathan M; Mueller, Ryan S; Shepardson, Jonathan W; Landry, Zachary C; Morré, Jeffrey T; Maier, Claudia S; Hardy, F Joan; Dreher, Theo W

    2016-06-13

    Very few closed genomes of the cyanobacteria that commonly produce toxic blooms in lakes and reservoirs are available, limiting our understanding of the properties of these organisms. A new anatoxin-a-producing member of the Nostocaceae, Anabaena sp. WA102, was isolated from a freshwater lake in Washington State, USA, in 2013 and maintained in non-axenic culture. The Anabaena sp. WA102 5.7 Mbp genome assembly has been closed with long-read, single-molecule sequencing and separately a draft genome assembly has been produced with short-read sequencing technology. The closed and draft genome assemblies are compared, showing a correlation between long repeats in the genome and the many gaps in the short-read assembly. Anabaena sp. WA102 encodes anatoxin-a biosynthetic genes, as does its close relative Anabaena sp. AL93 (also introduced in this study). These strains are distinguished by differences in the genes for light-harvesting phycobilins, with Anabaena sp. AL93 possessing a phycoerythrocyanin operon. Biologically relevant structural variants in the Anabaena sp. WA102 genome were detected only by long-read sequencing: a tandem triplication of the anaBCD promoter region in the anatoxin-a synthase gene cluster (not triplicated in Anabaena sp. AL93) and a 5-kbp deletion variant present in two-thirds of the population. The genome has a large number of mobile elements (160). Strikingly, there was no synteny with the genome of its nearest fully assembled relative, Anabaena sp. 90. Structural and functional genome analyses indicate that Anabaena sp. WA102 has a flexible genome. Genome closure, which can be readily achieved with long-read sequencing, reveals large scale (e.g., gene order) and local structural features that should be considered in understanding genome evolution and function.

  7. Gene calling and bacterial genome annotation with BG7.

    PubMed

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  8. Genome-wide study of correlations between genomic features and their relationship with the regulation of gene expression.

    PubMed

    Kravatsky, Yuri V; Chechetkin, Vladimir R; Tchurikov, Nikolai A; Kravatskaya, Galina I

    2015-02-01

    The broad class of tasks in genetics and epigenetics can be reduced to the study of various features that are distributed over the genome (genome tracks). The rapid and efficient processing of the huge amount of data stored in the genome-scale databases cannot be achieved without the software packages based on the analytical criteria. However, strong inhomogeneity of genome tracks hampers the development of relevant statistics. We developed the criteria for the assessment of genome track inhomogeneity and correlations between two genome tracks. We also developed a software package, Genome Track Analyzer, based on this theory. The theory and software were tested on simulated data and were applied to the study of correlations between CpG islands and transcription start sites in the Homo sapiens genome, between profiles of protein-binding sites in chromosomes of Drosophila melanogaster, and between DNA double-strand breaks and histone marks in the H. sapiens genome. Significant correlations between transcription start sites on the forward and the reverse strands were observed in genomes of D. melanogaster, Caenorhabditis elegans, Mus musculus, H. sapiens, and Danio rerio. The observed correlations may be related to the regulation of gene expression in eukaryotes. Genome Track Analyzer is freely available at http://ancorr.eimb.ru/. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  9. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  10. A multigene locus containing the Manx and bobcat genes is required for development of chordate features in the ascidian tadpole larva.

    PubMed

    Swalla, B J; Just, M A; Pederson, E L; Jeffery, W R

    1999-04-01

    The Manx gene is required for the development of the tail and other chordate features in the ascidian tadpole larva. To determine the structure of the Manx gene, we isolated and sequenced genomic clones from the tailed ascidian Molgula oculata. The Manx gene contains 9 exons and encodes both major and minor Manx mRNAs, which differ in the length of their 5' untranslated regions. The coding region of the single-copy bobcat gene, which encodes a DEAD-box RNA helicase, is embedded within the first Manx intron. The organization of the bobcat and Manx transcription units was determined by comparing genomic and cDNA clones. The Manx-bobcat gene locus has an unusual organization in which a non-coding first exon is alternatively spliced at the 5' end of two different mRNAs. The bobcat and Manx genes are expressed coordinately during oogenesis and embryogenesis, but not during spermatogenesis, in which bobcat mRNA accumulates independently of Manx mRNA. Similar to Manx, zygotic bobcat transcripts accumulate in the embryonic primordia responsible for generating chordate features, including the dorsal neural tube and notochord, are downregulated during embryogenesis in the tailless species Molgula occulta and are upregulated in M. occulta X M. oculata hybrids, which restore these chordate features. Antisense experiments indicate that zygotic bobcat expression is required for development of the same suite of chordate features as Manx. The results show that the Manx-bobcat gene complex has a role in the development of chordate features in ascidian tadpole larvae.

  11. The mitochondrial genome of the pathogenic yeast Candida subhashii: GC-rich linear DNA with a protein covalently attached to the 5′ termini

    PubMed Central

    Fricova, Dominika; Valach, Matus; Farkas, Zoltan; Pfeiffer, Ilona; Kucsera, Judit; Tomaska, Lubomir; Nosek, Jozef

    2010-01-01

    As a part of our initiative aimed at a large-scale comparative analysis of fungal mitochondrial genomes, we determined the complete DNA sequence of the mitochondrial genome of the yeast Candida subhashii and found that it exhibits a number of peculiar features. First, the mitochondrial genome is represented by linear dsDNA molecules of uniform length (29 795 bp), with an unusually high content of guanine and cytosine residues (52.7 %). Second, the coding sequences lack introns; thus, the genome has a relatively compact organization. Third, the termini of the linear molecules consist of long inverted repeats and seem to contain a protein covalently bound to terminal nucleotides at the 5′ ends. This architecture resembles the telomeres in a number of linear viral and plasmid DNA genomes classified as invertrons, in which the terminal proteins serve as specific primers for the initiation of DNA synthesis. Finally, although the mitochondrial genome of C. subhashii contains essentially the same set of genes as other closely related pathogenic Candida species, we identified additional ORFs encoding two homologues of the family B protein-priming DNA polymerases and an unknown protein. The terminal structures and the genes for DNA polymerases are reminiscent of linear mitochondrial plasmids, indicating that this genome architecture might have emerged from fortuitous recombination between an ancestral, presumably circular, mitochondrial genome and an invertron-like element. PMID:20395267

  12. The Complete Chloroplast Genome of Catha edulis: A Comparative Analysis of Genome Features with Related Species

    PubMed Central

    Tembrock, Luke R.; Zheng, Shaoyu; Wu, Zhiqiang

    2018-01-01

    Qat (Catha edulis, Celastraceae) is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp) genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae. PMID:29425128

  13. Comparative Genomic and Morphological Analyses of Listeria Phages Isolated from Farm Environments

    PubMed Central

    Denes, Thomas; Ackermann, Hans-Wolfgang; Moreno Switt, Andrea I.; Wiedmann, Martin; den Bakker, Henk C.

    2014-01-01

    The genus Listeria is ubiquitous in the environment and includes the globally important food-borne pathogen Listeria monocytogenes. While the genomic diversity of Listeria has been well studied, considerably less is known about the genomic and morphological diversity of Listeria bacteriophages. In this study, we sequenced and analyzed the genomes of 14 Listeria phages isolated mostly from New York dairy farm environments as well as one related Enterococcus faecalis phage to obtain information on genome characteristics and diversity. We also examined 12 of the phages by electron microscopy to characterize their morphology. These Listeria phages, based on gene orthology and morphology, together with previously sequenced Listeria phages could be classified into five orthoclusters, including one novel orthocluster. One orthocluster (orthocluster I) consists of large-genome (∼135-kb) myoviruses belonging to the genus “Twort-like viruses,” three orthoclusters (orthoclusters II to IV) contain small-genome (36- to 43-kb) siphoviruses with icosahedral heads, and the novel orthocluster V contains medium-sized-genome (∼66-kb) siphoviruses with elongated heads. A novel orthocluster (orthocluster VI) of E. faecalis phages, with medium-sized genomes (∼56 kb), was identified, which grouped together and shares morphological features with the novel Listeria phage orthocluster V. This new group of phages (i.e., orthoclusters V and VI) is composed of putative lytic phages that may prove to be useful in phage-based applications for biocontrol, detection, and therapeutic purposes. PMID:24837381

  14. Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation

    PubMed Central

    Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena

    2015-01-01

    Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. PMID:26205871

  15. Methylation guide RNA evolution in archaea: structure, function and genomic organization of 110 C/D box sRNA families across six Pyrobaculum species.

    PubMed

    Lui, Lauren M; Uzilov, Andrew V; Bernick, David L; Corredor, Andrea; Lowe, Todd M; Dennis, Patrick P

    2018-05-16

    Archaeal homologs of eukaryotic C/D box small nucleolar RNAs (C/D box sRNAs) guide precise 2'-O-methyl modification of ribosomal and transfer RNAs. Although C/D box sRNA genes constitute one of the largest RNA gene families in archaeal thermophiles, most genomes have incomplete sRNA gene annotation because reliable, fully automated detection methods are not available. We expanded and curated a comprehensive gene set across six species of the crenarchaeal genus Pyrobaculum, particularly rich in C/D box sRNA genes. Using high-throughput small RNA sequencing, specialized computational searches and comparative genomics, we analyzed 526 Pyrobaculum C/D box sRNAs, organizing them into 110 families based on synteny and conservation of guide sequences which determine methylation targets. We examined gene duplications and rearrangements, including one family that has expanded in a pattern similar to retrotransposed repetitive elements in eukaryotes. New training data and inclusion of kink-turn secondary structural features enabled creation of an improved search model. Our analyses provide the most comprehensive, dynamic view of C/D box sRNA evolutionary history within a genus, in terms of modification function, feature plasticity, and gene mobility.

  16. Comparative and demographic analysis of orangutan genomes

    PubMed Central

    Locke, Devin P.; Hillier, LaDeana W.; Warren, Wesley C.; Worley, Kim C.; Nazareth, Lynne V.; Muzny, Donna M.; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T.; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D.; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A.; Fulton, Robert S.; Nelson, Joanne O.; Magrini, Vincent; Pohl, Craig; Graves, Tina A.; Markovic, Chris; Cree, Andy; Dinh, Huyen H.; Hume, Jennifer; Kovar, Christie L.; Fowler, Gerald R.; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P.; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M.; Eichler, Evan E.; White, Simon; Searle, Stephen; Vilella, Albert J.; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Valle, Giuliano Della; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K.; Walker, Jerilyn A.; Ullmer, Brygg; Batzer, Mark A.; Smit, Arian F. A.; Hubley, Robert; Casola, Claudio; Schrider, Daniel R.; Hahn, Matthew W.; Quesada, Victor; Puente, Xose S.; Ordoñez, Gonzalo R.; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S.; Miller, Webb; Kosiol, Carolin; Lawson, Heather A.; Taliwal, Vikas; Martins, André L.; Siepel, Adam; RoyChoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D.; Gutenkunst, Ryan N.; Mailund, Thomas; Dutheil, Julien Y.; Hobolth, Asger; Schierup, Mikkel H.; Chemnick, Leona; Ryder, Oliver A.; Yoshinaga, Yuko; de Jong, Pieter J.; Weinstock, George M.; Rogers, Jeffrey; Mardis, Elaine R.; Gibbs, Richard A.; Wilson, Richard K.

    2011-01-01

    “Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities in evolutionary genomics, insights into hominid biology, and an extensive database of variation for conservation efforts. PMID:21270892

  17. Genus-Wide Comparative Genomics of Malassezia Delineates Its Phylogeny, Physiology, and Niche Adaptation on Human Skin

    PubMed Central

    Wu, Guangxi; Zhao, He; Li, Chenhao; Rajapakse, Menaka Priyadarsani; Wong, Wing Cheong; Xu, Jun; Saunders, Charles W.; Reeder, Nancy L.; Reilman, Raymond A.; Scheynius, Annika; Sun, Sheng; Billmyre, Blake Robert; Li, Wenjun; Averette, Anna Floyd; Mieczkowski, Piotr; Heitman, Joseph; Theelen, Bart; Schröder, Markus S.; De Sessions, Paola Florez; Butler, Geraldine; Maurer-Stroh, Sebastian; Boekhout, Teun; Nagarajan, Niranjan; Dawson, Thomas L.

    2015-01-01

    Malassezia is a unique lipophilic genus in class Malasseziomycetes in Ustilaginomycotina, (Basidiomycota, fungi) that otherwise consists almost exclusively of plant pathogens. Malassezia are typically isolated from warm-blooded animals, are dominant members of the human skin mycobiome and are associated with common skin disorders. To characterize the genetic basis of the unique phenotypes of Malassezia spp., we sequenced the genomes of all 14 accepted species and used comparative genomics against a broad panel of fungal genomes to comprehensively identify distinct features that define the Malassezia gene repertoire: gene gain and loss; selection signatures; and lineage-specific gene family expansions. Our analysis revealed key gene gain events (64) with a single gene conserved across all Malassezia but absent in all other sequenced Basidiomycota. These likely horizontally transferred genes provide intriguing gain-of-function events and prime candidates to explain the emergence of Malassezia. A larger set of genes (741) were lost, with enrichment for glycosyl hydrolases and carbohydrate metabolism, concordant with adaptation to skin’s carbohydrate-deficient environment. Gene family analysis revealed extensive turnover and underlined the importance of secretory lipases, phospholipases, aspartyl proteases, and other peptidases. Combining genomic analysis with a re-evaluation of culture characteristics, we establish the likely lipid-dependence of all Malassezia. Our phylogenetic analysis sheds new light on the relationship between Malassezia and other members of Ustilaginomycotina, as well as phylogenetic lineages within the genus. Overall, our study provides a unique genomic resource for understanding Malassezia niche-specificity and potential virulence, as well as their abundance and distribution in the environment and on human skin. PMID:26539826

  18. Lifestyle Evolution in Cyanobacterial Symbionts of Sponges

    PubMed Central

    Burgsdorf, Ilia; Slaby, Beate M.; Handley, Kim M.; Haber, Markus; Blom, Jochen; Marshall, Christopher W.; Gilbert, Jack A.; Hentschel, Ute

    2015-01-01

    ABSTRACT The “Candidatus Synechococcus spongiarum” group includes different clades of cyanobacteria with high 16S rRNA sequence identity (~99%) and is the most abundant and widespread cyanobacterial symbiont of marine sponges. The first draft genome of a “Ca. Synechococcus spongiarum” group member was recently published, providing evidence of genome reduction by loss of genes involved in several nonessential functions. However, “Ca. Synechococcus spongiarum” includes a variety of clades that may differ widely in genomic repertoire and consequently in physiology and symbiotic function. Here, we present three additional draft genomes of “Ca. Synechococcus spongiarum,” each from a different clade. By comparing all four symbiont genomes to those of free-living cyanobacteria, we revealed general adaptations to life inside sponges and specific adaptations of each phylotype. Symbiont genomes shared about half of their total number of coding genes. Common traits of “Ca. Synechococcus spongiarum” members were a high abundance of DNA modification and recombination genes and a reduction in genes involved in inorganic ion transport and metabolism, cell wall biogenesis, and signal transduction mechanisms. Moreover, these symbionts were characterized by a reduced number of antioxidant enzymes and low-weight peptides of photosystem II compared to their free-living relatives. Variability within the “Ca. Synechococcus spongiarum” group was mostly related to immune system features, potential for siderophore-mediated iron transport, and dependency on methionine from external sources. The common absence of genes involved in synthesis of residues, typical of the O antigen of free-living Synechococcus species, suggests a novel mechanism utilized by these symbionts to avoid sponge predation and phage attack. PMID:26037118

  19. Lifestyle Evolution in Cyanobacterial Symbionts of Sponges

    DOE PAGES

    Burgsdorf, Ilia; Slaby, Beate M.; Handley, Kim M.; ...

    2015-06-02

    The “Candidatus Synechococcus spongiarum” group includes different clades of cyanobacteria with high 16S rRNA sequence identity (~99%) and is the most abundant and widespread cyanobacterial symbiont of marine sponges. The first draft genome of a “Ca. Synechococcus spongiarum” group member was recently published, providing evidence of genome reduction by loss of genes involved in several nonessential functions. However, “Ca. Synechococcus spongiarum” includes a variety of clades that may differ widely in genomic repertoire and consequently in physiology and symbiotic function. Here, we present three additional draft genomes of “Ca. Synechococcus spongiarum,” each from a different clade. By comparing all fourmore » symbiont genomes to those of free-living cyanobacteria, we revealed general adaptations to life inside sponges and specific adaptations of each phylotype. Symbiont genomes shared about half of their total number of coding genes. Common traits of “Ca. Synechococcus spongiarum” members were a high abundance of DNA modification and recombination genes and a reduction in genes involved in inorganic ion transport and metabolism, cell wall biogenesis, and signal transduction mechanisms. Moreover, these symbionts were characterized by a reduced number of antioxidant enzymes and low-weight peptides of photosystem II compared to their free-living relatives. Variability within the “Ca. Synechococcus spongiarum” group was mostly related to immune system features, potential for siderophore-mediated iron transport, and dependency on methionine from external sources. The common absence of genes involved in synthesis of residues, typical of the O antigen of free-living Synechococcus species, suggests a novel mechanism utilized by these symbionts to avoid sponge predation and phage attack.« less

  20. Genus-Wide Comparative Genomics of Malassezia Delineates Its Phylogeny, Physiology, and Niche Adaptation on Human Skin.

    PubMed

    Wu, Guangxi; Zhao, He; Li, Chenhao; Rajapakse, Menaka Priyadarsani; Wong, Wing Cheong; Xu, Jun; Saunders, Charles W; Reeder, Nancy L; Reilman, Raymond A; Scheynius, Annika; Sun, Sheng; Billmyre, Blake Robert; Li, Wenjun; Averette, Anna Floyd; Mieczkowski, Piotr; Heitman, Joseph; Theelen, Bart; Schröder, Markus S; De Sessions, Paola Florez; Butler, Geraldine; Maurer-Stroh, Sebastian; Boekhout, Teun; Nagarajan, Niranjan; Dawson, Thomas L

    2015-11-01

    Malassezia is a unique lipophilic genus in class Malasseziomycetes in Ustilaginomycotina, (Basidiomycota, fungi) that otherwise consists almost exclusively of plant pathogens. Malassezia are typically isolated from warm-blooded animals, are dominant members of the human skin mycobiome and are associated with common skin disorders. To characterize the genetic basis of the unique phenotypes of Malassezia spp., we sequenced the genomes of all 14 accepted species and used comparative genomics against a broad panel of fungal genomes to comprehensively identify distinct features that define the Malassezia gene repertoire: gene gain and loss; selection signatures; and lineage-specific gene family expansions. Our analysis revealed key gene gain events (64) with a single gene conserved across all Malassezia but absent in all other sequenced Basidiomycota. These likely horizontally transferred genes provide intriguing gain-of-function events and prime candidates to explain the emergence of Malassezia. A larger set of genes (741) were lost, with enrichment for glycosyl hydrolases and carbohydrate metabolism, concordant with adaptation to skin's carbohydrate-deficient environment. Gene family analysis revealed extensive turnover and underlined the importance of secretory lipases, phospholipases, aspartyl proteases, and other peptidases. Combining genomic analysis with a re-evaluation of culture characteristics, we establish the likely lipid-dependence of all Malassezia. Our phylogenetic analysis sheds new light on the relationship between Malassezia and other members of Ustilaginomycotina, as well as phylogenetic lineages within the genus. Overall, our study provides a unique genomic resource for understanding Malassezia niche-specificity and potential virulence, as well as their abundance and distribution in the environment and on human skin.

  1. Lifestyle Evolution in Cyanobacterial Symbionts of Sponges

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burgsdorf, Ilia; Slaby, Beate M.; Handley, Kim M.

    The “Candidatus Synechococcus spongiarum” group includes different clades of cyanobacteria with high 16S rRNA sequence identity (~99%) and is the most abundant and widespread cyanobacterial symbiont of marine sponges. The first draft genome of a “Ca. Synechococcus spongiarum” group member was recently published, providing evidence of genome reduction by loss of genes involved in several nonessential functions. However, “Ca. Synechococcus spongiarum” includes a variety of clades that may differ widely in genomic repertoire and consequently in physiology and symbiotic function. Here, we present three additional draft genomes of “Ca. Synechococcus spongiarum,” each from a different clade. By comparing all fourmore » symbiont genomes to those of free-living cyanobacteria, we revealed general adaptations to life inside sponges and specific adaptations of each phylotype. Symbiont genomes shared about half of their total number of coding genes. Common traits of “Ca. Synechococcus spongiarum” members were a high abundance of DNA modification and recombination genes and a reduction in genes involved in inorganic ion transport and metabolism, cell wall biogenesis, and signal transduction mechanisms. Moreover, these symbionts were characterized by a reduced number of antioxidant enzymes and low-weight peptides of photosystem II compared to their free-living relatives. Variability within the “Ca. Synechococcus spongiarum” group was mostly related to immune system features, potential for siderophore-mediated iron transport, and dependency on methionine from external sources. The common absence of genes involved in synthesis of residues, typical of the O antigen of free-living Synechococcus species, suggests a novel mechanism utilized by these symbionts to avoid sponge predation and phage attack.« less

  2. Proteiniphilum saccharofermentans str. M3/6T isolated from a laboratory biogas reactor is versatile in polysaccharide and oligopeptide utilization as deduced from genome-based metabolic reconstructions.

    PubMed

    Tomazetto, Geizecler; Hahnke, Sarah; Wibberg, Daniel; Pühler, Alfred; Klocke, Michael; Schlüter, Andreas

    2018-06-01

    Proteiniphilum saccharofermentans str. M3/6 T is a recently described species within the family Porphyromonadaceae (phylum Bacteroidetes ), which was isolated from a mesophilic laboratory-scale biogas reactor. The genome of the strain was completely sequenced and manually annotated to reconstruct its metabolic potential regarding biomass degradation and fermentation pathways. The P. saccharofermentans str. M3/6 T genome consists of a 4,414,963 bp chromosome featuring an average GC-content of 43.63%. Genome analyses revealed that the strain possesses 3396 protein-coding sequences. Among them are 158 genes assigned to the carbohydrate-active-enzyme families as defined by the CAZy database, including 116 genes encoding glycosyl hydrolases (GHs) involved in pectin, arabinogalactan, hemicellulose (arabinan, xylan, mannan, β-glucans), starch, fructan and chitin degradation. The strain also features several transporter genes, some of which are located in polysaccharide utilization loci (PUL). PUL gene products are involved in glycan binding, transport and utilization at the cell surface. In the genome of strain M3/6 T , 64 PUL are present and most of them in association with genes encoding carbohydrate-active enzymes. Accordingly, the strain was predicted to metabolize several sugars yielding carbon dioxide, hydrogen, acetate, formate, propionate and isovalerate as end-products of the fermentation process. Moreover, P. saccharofermentans str. M3/6 T encodes extracellular and intracellular proteases and transporters predicted to be involved in protein and oligopeptide degradation. Comparative analyses between P. saccharofermentans str. M3/6 T and its closest described relative P. acetatigenes str. DSM 18083 T indicate that both strains share a similar metabolism regarding decomposition of complex carbohydrates and fermentation of sugars.

  3. Camelid genomes reveal evolution and adaptation to desert environments.

    PubMed

    Wu, Huiguang; Guang, Xuanmin; Al-Fageeh, Mohamed B; Cao, Junwei; Pan, Shengkai; Zhou, Huanmin; Zhang, Li; Abutarboush, Mohammed H; Xing, Yanping; Xie, Zhiyuan; Alshanqeeti, Ali S; Zhang, Yanru; Yao, Qiulin; Al-Shomrani, Badr M; Zhang, Dong; Li, Jiang; Manee, Manee M; Yang, Zili; Yang, Linfeng; Liu, Yiyi; Zhang, Jilin; Altammami, Musaad A; Wang, Shenyuan; Yu, Lili; Zhang, Wenbin; Liu, Sanyang; Ba, La; Liu, Chunxia; Yang, Xukui; Meng, Fanhua; Wang, Shaowei; Li, Lu; Li, Erli; Li, Xueqiong; Wu, Kaifeng; Zhang, Shu; Wang, Junyi; Yin, Ye; Yang, Huanming; Al-Swailem, Abdulaziz M; Wang, Jun

    2014-10-21

    Bactrian camel (Camelus bactrianus), dromedary (Camelus dromedarius) and alpaca (Vicugna pacos) are economically important livestock. Although the Bactrian camel and dromedary are large, typically arid-desert-adapted mammals, alpacas are adapted to plateaus. Here we present high-quality genome sequences of these three species. Our analysis reveals the demographic history of these species since the Tortonian Stage of the Miocene and uncovers a striking correlation between large fluctuations in population size and geological time boundaries. Comparative genomic analysis reveals complex features related to desert adaptations, including fat and water metabolism, stress responses to heat, aridity, intense ultraviolet radiation and choking dust. Transcriptomic analysis of Bactrian camels further reveals unique osmoregulation, osmoprotection and compensatory mechanisms for water reservation underpinned by high blood glucose levels. We hypothesize that these physiological mechanisms represent kidney evolutionary adaptations to the desert environment. This study advances our understanding of camelid evolution and the adaptation of camels to arid-desert environments.

  4. Secondary structure of the 3'-noncoding region of flavivirus genomes: comparative analysis of base pairing probabilities.

    PubMed

    Rauscher, S; Flamm, C; Mandl, C W; Heinz, F X; Stadler, P F

    1997-07-01

    The prediction of the complete matrix of base pairing probabilities was applied to the 3' noncoding region (NCR) of flavivirus genomes. This approach identifies not only well-defined secondary structure elements, but also regions of high structural flexibility. Flaviviruses, many of which are important human pathogens, have a common genomic organization, but exhibit a significant degree of RNA sequence diversity in the functionally important 3'-NCR. We demonstrate the presence of secondary structures shared by all flaviviruses, as well as structural features that are characteristic for groups of viruses within the genus reflecting the established classification scheme. The significance of most of the predicted structures is corroborated by compensatory mutations. The availability of infectious clones for several flaviviruses will allow the assessment of these structural elements in processes of the viral life cycle, such as replication and assembly.

  5. Genomic signatures of evolutionary transitions from solitary to group living

    PubMed Central

    Kapheim, Karen M.; Pan, Hailin; Li, Cai; Salzberg, Steven L.; Puiu, Daniela; Magoc, Tanja; Robertson, Hugh M.; Hudson, Matthew E.; Venkat, Aarti; Fischman, Brielle J.; Hernandez, Alvaro; Yandell, Mark; Ence, Daniel; Holt, Carson; Yocum, George D.; Kemp, William P.; Bosch, Jordi; Waterhouse, Robert M.; Zdobnov, Evgeny M.; Stolle, Eckart; Kraus, F. Bernhard; Helbing, Sophie; Moritz, Robin F. A.; Glastad, Karl M.; Hunt, Brendan G.; Goodisman, Michael A. D.; Hauser, Frank; Grimmelikhuijzen, Cornelis J. P.; Pinheiro, Daniel Guariz; Nunes, Francis Morais Franco; Soares, Michelle Prioli Miranda; Tanaka, Érica Donato; Simões, Zilá Luz Paulino; Hartfelder, Klaus; Evans, Jay D.; Barribeau, Seth M.; Johnson, Reed M.; Massey, Jonathan H.; Southey, Bruce R.; Hasselmann, Martin; Hamacher, Daniel; Biewer, Matthias; Kent, Clement F.; Zayed, Amro; Blatti, Charles; Sinha, Saurabh; Johnston, J. Spencer; Hanrahan, Shawn J.; Kocher, Sarah D.; Wang, Jun; Robinson, Gene E.; Zhang, Guojie

    2017-01-01

    The evolution of eusociality is one of the major transitions in evolution, but the underlying genomic changes are unknown. We compared the genomes of 10 bee species that vary in social complexity, representing multiple independent transitions in social evolution, and report three major findings. First, many important genes show evidence of neutral evolution as a consequence of relaxed selection with increasing social complexity. Second, there is no single road map to eusociality; independent evolutionary transitions in sociality have independent genetic underpinnings. Third, though clearly independent in detail, these transitions do have similar general features, including an increase in constrained protein evolution accompanied by increases in the potential for gene regulation and decreases in diversity and abundance of transposable elements. Eusociality may arise through different mechanisms each time, but would likely always involve an increase in the complexity of gene networks. PMID:25977371

  6. Social evolution. Genomic signatures of evolutionary transitions from solitary to group living.

    PubMed

    Kapheim, Karen M; Pan, Hailin; Li, Cai; Salzberg, Steven L; Puiu, Daniela; Magoc, Tanja; Robertson, Hugh M; Hudson, Matthew E; Venkat, Aarti; Fischman, Brielle J; Hernandez, Alvaro; Yandell, Mark; Ence, Daniel; Holt, Carson; Yocum, George D; Kemp, William P; Bosch, Jordi; Waterhouse, Robert M; Zdobnov, Evgeny M; Stolle, Eckart; Kraus, F Bernhard; Helbing, Sophie; Moritz, Robin F A; Glastad, Karl M; Hunt, Brendan G; Goodisman, Michael A D; Hauser, Frank; Grimmelikhuijzen, Cornelis J P; Pinheiro, Daniel Guariz; Nunes, Francis Morais Franco; Soares, Michelle Prioli Miranda; Tanaka, Érica Donato; Simões, Zilá Luz Paulino; Hartfelder, Klaus; Evans, Jay D; Barribeau, Seth M; Johnson, Reed M; Massey, Jonathan H; Southey, Bruce R; Hasselmann, Martin; Hamacher, Daniel; Biewer, Matthias; Kent, Clement F; Zayed, Amro; Blatti, Charles; Sinha, Saurabh; Johnston, J Spencer; Hanrahan, Shawn J; Kocher, Sarah D; Wang, Jun; Robinson, Gene E; Zhang, Guojie

    2015-06-05

    The evolution of eusociality is one of the major transitions in evolution, but the underlying genomic changes are unknown. We compared the genomes of 10 bee species that vary in social complexity, representing multiple independent transitions in social evolution, and report three major findings. First, many important genes show evidence of neutral evolution as a consequence of relaxed selection with increasing social complexity. Second, there is no single road map to eusociality; independent evolutionary transitions in sociality have independent genetic underpinnings. Third, though clearly independent in detail, these transitions do have similar general features, including an increase in constrained protein evolution accompanied by increases in the potential for gene regulation and decreases in diversity and abundance of transposable elements. Eusociality may arise through different mechanisms each time, but would likely always involve an increase in the complexity of gene networks. Copyright © 2015, American Association for the Advancement of Science.

  7. Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction.

    PubMed

    Wang, Xia; Xu, Yuantao; Zhang, Siqi; Cao, Li; Huang, Yue; Cheng, Junfeng; Wu, Guizhi; Tian, Shilin; Chen, Chunli; Liu, Yan; Yu, Huiwen; Yang, Xiaoming; Lan, Hong; Wang, Nan; Wang, Lun; Xu, Jidi; Jiang, Xiaolin; Xie, Zongzhou; Tan, Meilian; Larkin, Robert M; Chen, Ling-Ling; Ma, Bin-Guang; Ruan, Yijun; Deng, Xiuxin; Xu, Qiang

    2017-05-01

    The emergence of apomixis-the transition from sexual to asexual reproduction-is a prominent feature of modern citrus. Here we de novo sequenced and comprehensively studied the genomes of four representative citrus species. Additionally, we sequenced 100 accessions of primitive, wild and cultivated citrus. Comparative population analysis suggested that genomic regions harboring energy- and reproduction-associated genes are probably under selection in cultivated citrus. We also narrowed the genetic locus responsible for citrus polyembryony, a form of apomixis, to an 80-kb region containing 11 candidate genes. One of these, CitRWP, is expressed at higher levels in ovules of polyembryonic cultivars. We found a miniature inverted-repeat transposable element insertion in the promoter region of CitRWP that cosegregated with polyembryony. This study provides new insights into citrus apomixis and constitutes a promising resource for the mining of agriculturally important genes.

  8. Beyond Linear Sequence Comparisons: The use of genome-levelcharacters for phylogenetic reconstruction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.

    2004-11-27

    Although the phylogenetic relationships of many organisms have been convincingly resolved by the comparisons of nucleotide or amino acid sequences, others have remained equivocal despite great effort. Now that large-scale genome sequencing projects are sampling many lineages, it is becoming feasible to compare large data sets of genome-level features and to develop this as a tool for phylogenetic reconstruction that has advantages over conventional sequence comparisons. Although it is unlikely that these will address a large number of evolutionary branch points across the broad tree of life due to the infeasibility of such sampling, they have great potential for convincinglymore » resolving many critical, contested relationships for which no other data seems promising. However, it is important that we recognize potential pitfalls, establish reasonable standards for acceptance, and employ rigorous methodology to guard against a return to earlier days of scenario-driven evolutionary reconstructions.« less

  9. MetaQUAST: evaluation of metagenome assemblies.

    PubMed

    Mikheenko, Alla; Saveliev, Vladislav; Gurevich, Alexey

    2016-04-01

    During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. http://bioinf.spbau.ru/metaquast aleksey.gurevich@spbu.ru Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Rapid sequencing of the bamboo mitochondrial genome using Illumina technology and parallel episodic evolution of organelle genomes in grasses.

    PubMed

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects.

  11. Rapid Sequencing of the Bamboo Mitochondrial Genome Using Illumina Technology and Parallel Episodic Evolution of Organelle Genomes in Grasses

    PubMed Central

    Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

    2012-01-01

    Background Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. Methodology/Principal Findings We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Conclusions/Significance Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects. PMID:22272330

  12. Applying a radiomics approach to predict prognosis of lung cancer patients

    NASA Astrophysics Data System (ADS)

    Emaminejad, Nastaran; Yan, Shiju; Wang, Yunzhi; Qian, Wei; Guan, Yubao; Zheng, Bin

    2016-03-01

    Radiomics is an emerging technology to decode tumor phenotype based on quantitative analysis of image features computed from radiographic images. In this study, we applied Radiomics concept to investigate the association among the CT image features of lung tumors, which are either quantitatively computed or subjectively rated by radiologists, and two genomic biomarkers namely, protein expression of the excision repair cross-complementing 1 (ERCC1) genes and a regulatory subunit of ribonucleotide reductase (RRM1), in predicting disease-free survival (DFS) of lung cancer patients after surgery. An image dataset involving 94 patients was used. Among them, 20 had cancer recurrence within 3 years, while 74 patients remained DFS. After tumor segmentation, 35 image features were computed from CT images. Using the Weka data mining software package, we selected 10 non-redundant image features. Applying a SMOTE algorithm to generate synthetic data to balance case numbers in two DFS ("yes" and "no") groups and a leave-one-case-out training/testing method, we optimized and compared a number of machine learning classifiers using (1) quantitative image (QI) features, (2) subjective rated (SR) features, and (3) genomic biomarkers (GB). Data analyses showed relatively lower correlation among the QI, SR and GB prediction results (with Pearson correlation coefficients < 0.5 including between ERCC1 and RRM1 biomarkers). By using area under ROC curve as an assessment index, the QI, SR and GB based classifiers yielded AUC = 0.89+/-0.04, 0.73+/-0.06 and 0.76+/-0.07, respectively, which showed that all three types of features had prediction power (AUC>0.5). Among them, using QI yielded the highest performance.

  13. Analysis of recombination QTLs, segregation distortion, and epistasis for fitness in maize multiple populations using ultra-high-density markers

    USDA-ARS?s Scientific Manuscript database

    Understanding the maize genomic features would be useful for the study of genetic diversity and evolution and for maize breeding. Here, we used two maize nested association mapping (NAM) populations separately derived in China (CN-NAM) and the US (US-NAM) to explore the maize genomic features. The t...

  14. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

    DOE PAGES

    Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...

    2016-11-29

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less

  15. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken

    Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less

  16. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

    PubMed Central

    Bengelsdorf, Frank R.; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood–Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (PthlA) from C. acetobutylicum or native pta-ack promoter (Ppta-ack) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2-propanol. Furthermore, the ClosTronTM system was used to construct an adhE1 integration mutant. These results provide extensive insights into genetic features of industrially relevant bacterial biocatalysts and expand the toolbox for metabolic engineering of acetogenic bacteria able to ferment syngas. PMID:27458439

  17. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  18. A genome-wide association scan in admixed Latin Americans identifies loci influencing facial and scalp hair features

    PubMed Central

    Adhikari, Kaustubh; Fontanil, Tania; Cal, Santiago; Mendoza-Revilla, Javier; Fuentes-Guajardo, Macarena; Chacón-Duque, Juan-Camilo; Al-Saadi, Farah; Johansson, Jeanette A.; Quinto-Sanchez, Mirsha; Acuña-Alonzo, Victor; Jaramillo, Claudia; Arias, William; Barquera Lozano, Rodrigo; Macín Pérez, Gastón; Gómez-Valdés, Jorge; Villamil-Ramírez, Hugo; Hunemeier, Tábita; Ramallo, Virginia; Silva de Cerqueira, Caio C.; Hurtado, Malena; Villegas, Valeria; Granja, Vanessa; Gallo, Carla; Poletti, Giovanni; Schuler-Faccini, Lavinia; Salzano, Francisco M.; Bortolini, Maria-Cátira; Canizales-Quinteros, Samuel; Rothhammer, Francisco; Bedoya, Gabriel; Gonzalez-José, Rolando; Headon, Denis; López-Otín, Carlos; Tobin, Desmond J.; Balding, David; Ruiz-Linares, Andrés

    2016-01-01

    We report a genome-wide association scan in over 6,000 Latin Americans for features of scalp hair (shape, colour, greying, balding) and facial hair (beard thickness, monobrow, eyebrow thickness). We found 18 signals of association reaching genome-wide significance (P values 5 × 10−8 to 3 × 10−119), including 10 novel associations. These include novel loci for scalp hair shape and balding, and the first reported loci for hair greying, monobrow, eyebrow and beard thickness. A newly identified locus influencing hair shape includes a Q30R substitution in the Protease Serine S1 family member 53 (PRSS53). We demonstrate that this enzyme is highly expressed in the hair follicle, especially the inner root sheath, and that the Q30R substitution affects enzyme processing and secretion. The genome regions associated with hair features are enriched for signals of selection, consistent with proposals regarding the evolution of human hair. PMID:26926045

  19. Genome-wide analysis of tandem repeats in plants and green algae

    Treesearch

    Zhixin Zhao; Cheng Guo; Sreeskandarajan Sutharzan; Pei Li; Craig Echt; Jie Zhang; Chun Liang

    2014-01-01

    Tandem repeats (TRs) extensively exist in the genomes of prokaryotes and eukaryotes. Based on the sequenced genomes and gene annotations of 31 plant and algal species in Phytozome version 8.0 (http://www.phytozome.net/), we examined TRs in a genome-wide scale, characterized their distributions and motif features, and explored their putative biological functions. Among...

  20. Genome size and metabolic intensity in tetrapods: a tale of two lines

    PubMed Central

    Vinogradov, Alexander E; Anatskaya, Olga V

    2005-01-01

    We show the negative link between genome size and metabolic intensity in tetrapods, using the heart index (relative heart mass) as a unified indicator of metabolic intensity in poikilothermal and homeothermal animals. We found two separate regression lines of heart index on genome size for reptiles–birds and amphibians–mammals (the slope of regression is steeper in reptiles–birds). We also show a negative correlation between GC content and nucleosome formation potential in vertebrate DNA, and, consistent with this relationship, a positive correlation between genome GC content and nuclear size (independent of genome size). It is known that there are two separate regression lines of genome GC content on genome size for reptiles–birds and amphibians–mammals: reptiles–birds have the relatively higher GC content (for their genome sizes) compared to amphibians–mammals. Our results suggest uniting all these data into one concept. The slope of negative regression between GC content and nucleosome formation potential is steeper in exons than in non-coding DNA (where nucleosome formation potential is generally higher), which indicates a special role of non-coding DNA for orderly chromatin organization. The chromatin condensation and nuclear size are supposed to be key parameters that accommodate the effects of both genome size and GC content and connect them with metabolic intensity. Our data suggest that the reptilian–birds clade evolved special relationships among these parameters, whereas mammals preserved the amphibian-like relationships. Surprisingly, mammals, although acquiring a more complex general organization, seem to retain certain genome-related properties that are similar to amphibians. At the same time, the slope of regression between nucleosome formation potential and GC content is steeper in poikilothermal than in homeothermal genomes, which suggests that mammals and birds acquired certain common features of genomic organization. PMID:16519230

  1. Comparative Genomic and Transcriptomic Characterization of the Toxigenic Marine Dinoflagellate Alexandrium ostenfeldii

    PubMed Central

    Jaeckisch, Nina; Yang, Ines; Wohlrab, Sylke; Glöckner, Gernot; Kroymann, Juergen; Vogel, Heiko; Cembella, Allan; John, Uwe

    2011-01-01

    Many dinoflagellate species are notorious for the toxins they produce and ecological and human health consequences associated with harmful algal blooms (HABs). Dinoflagellates are particularly refractory to genomic analysis due to the enormous genome size, lack of knowledge about their DNA composition and structure, and peculiarities of gene regulation, such as spliced leader (SL) trans-splicing and mRNA transposition mechanisms. Alexandrium ostenfeldii is known to produce macrocyclic imine toxins, described as spirolides. We characterized the genome of A. ostenfeldii using a combination of transcriptomic data and random genomic clones for comparison with other dinoflagellates, particularly Alexandrium species. Examination of SL sequences revealed similar features as in other dinoflagellates, including Alexandrium species. SL sequences in decay indicate frequent retro-transposition of mRNA species. This probably contributes to overall genome complexity by generating additional gene copies. Sequencing of several thousand fosmid and bacterial artificial chromosome (BAC) ends yielded a wealth of simple repeats and tandemly repeated longer sequence stretches which we estimated to comprise more than half of the whole genome. Surprisingly, the repeats comprise a very limited set of 79–97 bp sequences; in part the genome is thus a relatively uniform sequence space interrupted by coding sequences. Our genomic sequence survey (GSS) represents the largest genomic data set of a dinoflagellate to date. Alexandrium ostenfeldii is a typical dinoflagellate with respect to its transcriptome and mRNA transposition but demonstrates Alexandrium-like stop codon usage. The large portion of repetitive sequences and the organization within the genome is in agreement with several other studies on dinoflagellates using different approaches. It remains to be determined whether this unusual composition is directly correlated to the exceptionally genome organization of dinoflagellates with a low amount of histones and histone-like proteins. PMID:22164224

  2. Pan-genome analysis of the emerging foodborne pathogen Cronobacter spp. suggests a species-level bidirectional divergence driven by niche adaptation

    PubMed Central

    2013-01-01

    Background Members of the genus Cronobacter are causes of rare but severe illness in neonates and preterm infants following the ingestion of contaminated infant formula. Seven species have been described and two of the species genomes were subsequently published. In this study, we performed comparative genomics on eight strains of Cronobacter, including six that we sequenced (representing six of the seven species) and two previously published, closed genomes. Results We identified and characterized the features associated with the core and pan genome of the genus Cronobacter in an attempt to understand the evolution of these bacteria and the genetic content of each species. We identified 84 genomic regions that are present in two or more Cronobacter genomes, along with 45 unique genomic regions. Many potentially horizontally transferred genes, such as lysogenic prophages, were also identified. Most notable among these were several type six secretion system gene clusters, transposons that carried tellurium, copper and/or silver resistance genes, and a novel integrative conjugative element. Conclusions Cronobacter have diverged into two clusters, one consisting of C. dublinensis and C. muytjensii (Cdub-Cmuy) and the other comprised of C. sakazakii, C. malonaticus, C. universalis, and C. turicensis, (Csak-Cmal-Cuni-Ctur) from the most recent common ancestral species. While several genetic determinants for plant-association and human virulence could be found in the core genome of Cronobacter, the four Cdub-Cmuy clade genomes contained several accessory genomic regions important for survival in a plant-associated environmental niche, while the Csak-Cmal-Cuni-Ctur clade genomes harbored numerous virulence-related genetic traits. PMID:23724777

  3. Phylogenomics databases for facilitating functional genomics in rice.

    PubMed

    Jung, Ki-Hong; Cao, Peijian; Sharma, Rita; Jain, Rashmi; Ronald, Pamela C

    2015-12-01

    The completion of whole genome sequence of rice (Oryza sativa) has significantly accelerated functional genomics studies. Prior to the release of the sequence, only a few genes were assigned a function each year. Since sequencing was completed in 2005, the rate has exponentially increased. As of 2014, 1,021 genes have been described and added to the collection at The Overview of functionally characterized Genes in Rice online database (OGRO). Despite this progress, that number is still very low compared with the total number of genes estimated in the rice genome. One limitation to progress is the presence of functional redundancy among members of the same rice gene family, which covers 51.6 % of all non-transposable element-encoding genes. There remain a significant portion or rice genes that are not functionally redundant, as reflected in the recovery of loss-of-function mutants. To more accurately analyze functional redundancy in the rice genome, we have developed a phylogenomics databases for six large gene families in rice, including those for glycosyltransferases, glycoside hydrolases, kinases, transcription factors, transporters, and cytochrome P450 monooxygenases. In this review, we introduce key features and applications of these databases. We expect that they will serve as a very useful guide in the post-genomics era of research.

  4. Genome Analysis of the Fruiting Body-Forming Myxobacterium Chondromyces crocatus Reveals High Potential for Natural Product Biosynthesis

    PubMed Central

    Zaburannyi, Nestor; Bunk, Boyke; Maier, Josef; Overmann, Jörg

    2016-01-01

    Here, we report the complete genome sequence of the type strain of the myxobacterial genus Chondromyces, Chondromyces crocatus Cm c5. It presents one of the largest prokaryotic genomes featuring a single circular chromosome and no plasmids. Analysis revealed an enlarged set of tRNA genes, along with reduced pressure on preferred codon usage compared to that of other bacterial genomes. The large coding capacity and the plethora of encoded secondary metabolite biosynthetic gene clusters are in line with the capability of Cm c5 to produce an arsenal of antibacterial, antifungal, and cytotoxic compounds. Known pathways of the ajudazol, chondramide, chondrochloren, crocacin, crocapeptin, and thuggacin compound families are complemented by many more natural compound biosynthetic gene clusters in the chromosome. Whole-genome comparison of the fruiting-body-forming type strain (Cm c5, DSM 14714) to an accustomed laboratory strain which has lost this ability (nonfruiting phenotype, Cm c5 fr−) revealed genetic changes in three loci. In addition to the low synteny found with the closest sequenced representative of the same family, Sorangium cellulosum, extensive genetic information duplication and broad application of eukaryotic-type signal transduction systems are hallmarks of this 11.3-Mbp prokaryotic genome. PMID:26773087

  5. Fox gene loci in Takifugu rubripes and Tetraodon nigroviridis genomes and comparison with those of medaka and zebrafish genomes.

    PubMed

    Shen, Xueyan; Cui, Jianzhou; Gong, Qingli

    2011-12-01

    Members of the Fox gene family of transcriptional regulators are essential for animal development and have been extensively studied in vertebrates. The mouse and human genomes contain at least 40 FOX genes which are divided into 19 subclasses based on the sequence similarity of the highly conserved forkhead domain. Using the genome sequence of the Takifugu rubripes and Tetraodon nigroviridis , we examined the genomic complement of fox genes in these organisms to gain insight into the evolutionary relationship of this gene family. We identified 53 fox genes in Tetraodon nigroviridis and Takifugu rubripes genome by searching the forkhead domain. These genes are divided into 18 subclasses as follows: 8 fox genes in subclass O; 6 in subclass P ; 4 in subclasses D, J, and N; 3 in subclasses A, B, C, E, F, and I; 2 in subclasses K, L, and Q; and 1 in subclasses G, H, M, and R. Together with the forkhead domain sequences of human, chicken, frog, zebrafish, medaka, and Caenorhabditis elegans, the phylogenetic relationship of the fox genes in Takifugu rubripes and Tetraodon nigroviridis were analyzed and compared. The genes structure, general features, and the three-dimensional model of these genes were also discussed.

  6. Host Specific Diversity in Lactobacillus johnsonii as Evidenced by a Major Chromosomal Inversion and Phage Resistance Mechanisms

    PubMed Central

    Guinane, Caitriona M.; Kent, Robert M.; Norberg, Sarah; Hill, Colin; Fitzgerald, Gerald F.; Stanton, Catherine; Ross, R. Paul

    2011-01-01

    Genetic diversity and genomic rearrangements are a driving force in bacterial evolution and niche adaptation. We sequenced and annotated the genome of Lactobacillus johnsonii DPC6026, a strain isolated from the porcine intestinal tract. Although the genome of DPC6026 is similar in size (1.97mbp) and GC content (34.8%) to the sequenced human isolate L. johnsonii NCC 533, a large symmetrical inversion of approximately 750 kb differentiated the two strains. Comparative analysis among 12 other strains of L. johnsonii including 8 porcine, 3 human and 1 poultry isolate indicated that the genome architecture found in DPC6026 is more common within the species than that of NCC 533. Furthermore a number of unique features were annotated in DPC6026, some of which are likely to have been acquired by horizontal gene transfer (HGT) and contribute to protection against phage infection. A putative type III restriction-modification system was identified, as were novel Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) elements. Interestingly, these particular elements are not widely distributed among L. johnsonii strains. Taken together these data suggest intra-species genomic rearrangements and significant genetic diversity within the L. johnsonii species and indicate towards a host-specific divergence of L. johnsonii strains with respect to genome inversion and phage exposure. PMID:21533100

  7. Host specific diversity in Lactobacillus johnsonii as evidenced by a major chromosomal inversion and phage resistance mechanisms.

    PubMed

    Guinane, Caitriona M; Kent, Robert M; Norberg, Sarah; Hill, Colin; Fitzgerald, Gerald F; Stanton, Catherine; Ross, R Paul

    2011-04-20

    Genetic diversity and genomic rearrangements are a driving force in bacterial evolution and niche adaptation. We sequenced and annotated the genome of Lactobacillus johnsonii DPC6026, a strain isolated from the porcine intestinal tract. Although the genome of DPC6026 is similar in size (1.97 mbp) and GC content (34.8%) to the sequenced human isolate L. johnsonii NCC 533, a large symmetrical inversion of approximately 750 kb differentiated the two strains. Comparative analysis among 12 other strains of L. johnsonii including 8 porcine, 3 human and 1 poultry isolate indicated that the genome architecture found in DPC6026 is more common within the species than that of NCC 533. Furthermore a number of unique features were annotated in DPC6026, some of which are likely to have been acquired by horizontal gene transfer (HGT) and contribute to protection against phage infection. A putative type III restriction-modification system was identified, as were novel Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) elements. Interestingly, these particular elements are not widely distributed among L. johnsonii strains. Taken together these data suggest intra-species genomic rearrangements and significant genetic diversity within the L. johnsonii species and indicate towards a host-specific divergence of L. johnsonii strains with respect to genome inversion and phage exposure.

  8. Clinical implications of chromosomal abnormalities in gastric adenocarcinomas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Chew-Wun; Chen, Gen-Der; Fann, Cathy S.-J.

    2003-06-23

    Gastric carcinoma (GC) is one of the most common malignancies worldwide and has a very poor prognosis. Genetic imbalances in 62 primary gastric adenocarcinomas of various histopathologic types and pathologic stages and six gastric cancer-derived cell lines were analyzed by comparative genomic hybridization, and the relationship of genomic abnormalities to clinical features in primary GC was evaluated at a genome-wide level. Eighty-four percent of the tumors and all six cell lines showed DNA copy number changes. The recurrent chromosomal abnormalities including gains at 15 regions and losses at 8 regions were identified. Statistical analyses revealed that gains at 17q24-qter (53more » percent), 20q13-qter (48 percent), 1p32-p36 (42 percent), 22q12-qter (27 percent), 17p13-pter (24 percent), 16p13-pter (21 percent), 6p21-pter (19 percent), 20p12-pter (19 percent), 7p21-pter (18 percent), 3q28-qter (8 percent), and 13q13-q14 (8 percent), and losses at 18q12-qter (11 percent), 3p12 (8 percent), 3p25-pter (8 percent), 5q14-q23 (8 percent), and 9p21-p23 (5 percent), are associated with unique patient or tumor-related features. GCs of differing histopathologic features were shown to be associated with distinct patterns of genetic alterations, supporting the notion that they evolve through distinct genetic pathways. Metastatic tumors were also associated with specific genetic changes. These regions may harbor candidate genes involved in the pathogenesis of this malignancy.« less

  9. Fast and Accurate Approximation to Significance Tests in Genome-Wide Association Studies

    PubMed Central

    Zhang, Yu; Liu, Jun S.

    2011-01-01

    Genome-wide association studies commonly involve simultaneous tests of millions of single nucleotide polymorphisms (SNP) for disease association. The SNPs in nearby genomic regions, however, are often highly correlated due to linkage disequilibrium (LD, a genetic term for correlation). Simple Bonferonni correction for multiple comparisons is therefore too conservative. Permutation tests, which are often employed in practice, are both computationally expensive for genome-wide studies and limited in their scopes. We present an accurate and computationally efficient method, based on Poisson de-clumping heuristics, for approximating genome-wide significance of SNP associations. Compared with permutation tests and other multiple comparison adjustment approaches, our method computes the most accurate and robust p-value adjustments for millions of correlated comparisons within seconds. We demonstrate analytically that the accuracy and the efficiency of our method are nearly independent of the sample size, the number of SNPs, and the scale of p-values to be adjusted. In addition, our method can be easily adopted to estimate false discovery rate. When applied to genome-wide SNP datasets, we observed highly variable p-value adjustment results evaluated from different genomic regions. The variation in adjustments along the genome, however, are well conserved between the European and the African populations. The p-value adjustments are significantly correlated with LD among SNPs, recombination rates, and SNP densities. Given the large variability of sequence features in the genome, we further discuss a novel approach of using SNP-specific (local) thresholds to detect genome-wide significant associations. This article has supplementary material online. PMID:22140288

  10. Genome Reduction for Niche Association in Campylobacter Hepaticus, A Cause of Spotty Liver Disease in Poultry.

    PubMed

    Petrovska, Liljana; Tang, Yue; Jansen van Rensburg, Melissa J; Cawthraw, Shaun; Nunez, Javier; Sheppard, Samuel K; Ellis, Richard J; Whatmore, Adrian M; Crawshaw, Tim R; Irvine, Richard M

    2017-01-01

    The term "spotty liver disease" (SLD) has been used since the late 1990s for a condition seen in the UK and Australia that primarily affects free range laying hens around peak lay, causing acute mortality and a fall in egg production. A novel thermophilic SLD-associated Campylobacter was reported in the United Kingdom (UK) in 2015. Subsequently, similar isolates occurring in Australia were formally described as a new species, Campylobacter hepaticus . We describe the comparative genomics of 10 C. hepaticus isolates recovered from 5 geographically distinct poultry holdings in the UK between 2010 and 2012. Hierarchical gene-by-gene analyses of the study isolates and representatives of 24 known Campylobacter species indicated that C. hepaticus is most closely related to the major pathogens Campylobacter jejuni and Campylobacter coli . We observed low levels of within-farm variation, even between isolates collected over almost 3 years. With respect to C. hepaticus genome features, we noted that the study isolates had a ~140 Kb reduction in genome size, ~144 fewer genes, and a lower GC content compared to C. jejuni . The most notable reduction was in the subsystem containing genes for iron acquisition and metabolism, supported by reduced growth of C. hepaticus in an iron depletion assay. Genome reduction is common among many pathogens and in C. hepaticus has likely been driven at least in part by specialization following the occupation of a new niche, the chicken liver.

  11. Metavir 2: new tools for viral metagenome comparison and assembled virome analysis

    PubMed Central

    2014-01-01

    Background Metagenomics, based on culture-independent sequencing, is a well-fitted approach to provide insights into the composition, structure and dynamics of environmental viral communities. Following recent advances in sequencing technologies, new challenges arise for existing bioinformatic tools dedicated to viral metagenome (i.e. virome) analysis as (i) the number of viromes is rapidly growing and (ii) large genomic fragments can now be obtained by assembling the huge amount of sequence data generated for each metagenome. Results To face these challenges, a new version of Metavir was developed. First, all Metavir tools have been adapted to support comparative analysis of viromes in order to improve the analysis of multiple datasets. In addition to the sequence comparison previously provided, viromes can now be compared through their k-mer frequencies, their taxonomic compositions, recruitment plots and phylogenetic trees containing sequences from different datasets. Second, a new section has been specifically designed to handle assembled viromes made of thousands of large genomic fragments (i.e. contigs). This section includes an annotation pipeline for uploaded viral contigs (gene prediction, similarity search against reference viral genomes and protein domains) and an extensive comparison between contigs and reference genomes. Contigs and their annotations can be explored on the website through specifically developed dynamic genomic maps and interactive networks. Conclusions The new features of Metavir 2 allow users to explore and analyze viromes composed of raw reads or assembled fragments through a set of adapted tools and a user-friendly interface. PMID:24646187

  12. Whole-genome de novo sequencing reveals unique genes that contributed to the adaptive evolution of the Mikado pheasant.

    PubMed

    Lee, Chien-Yueh; Hsieh, Ping-Han; Chiang, Li-Mei; Chattopadhyay, Amrita; Li, Kuan-Yi; Lee, Yi-Fang; Lu, Tzu-Pin; Lai, Liang-Chuan; Lin, En-Chung; Lee, Hsinyu; Ding, Shih-Torng; Tsai, Mong-Hsun; Chen, Chien-Yu; Chuang, Eric Y

    2018-05-01

    The Mikado pheasant (Syrmaticus mikado) is a nearly endangered species indigenous to high-altitude regions of Taiwan. This pheasant provides an opportunity to investigate evolutionary processes following geographic isolation. Currently, the genetic background and adaptive evolution of the Mikado pheasant remain unclear. We present the draft genome of the Mikado pheasant, which consists of 1.04 Gb of DNA and 15,972 annotated protein-coding genes. The Mikado pheasant displays expansion and positive selection of genes related to features that contribute to its adaptive evolution, such as energy metabolism, oxygen transport, hemoglobin binding, radiation response, immune response, and DNA repair. To investigate the molecular evolution of the major histocompatibility complex (MHC) across several avian species, 39 putative genes spanning 227 kb on a contiguous region were annotated and manually curated. The MHC loci of the pheasant revealed a high level of synteny, several rapidly evolving genes, and inverse regions compared to the same loci in the chicken. The complete mitochondrial genome was also sequenced, assembled, and compared against four long-tailed pheasants. The results from molecular clock analysis suggest that ancestors of the Mikado pheasant migrated from the north to Taiwan about 3.47 million years ago. This study provides a valuable genomic resource for the Mikado pheasant, insights into its adaptation to high altitude, and the evolutionary history of the genus Syrmaticus, which could potentially be useful for future studies that investigate molecular evolution, genomics, ecology, and immunogenetics.

  13. Genome-wide analysis of copper, iron and zinc transporters in the arbuscular mycorrhizal fungus Rhizophagus irregularis.

    PubMed

    Tamayo, Elisabeth; Gómez-Gallego, Tamara; Azcón-Aguilar, Concepción; Ferrol, Nuria

    2014-01-01

    Arbuscular mycorrhizal fungi (AMF), belonging to the Glomeromycota, are soil microorganisms that establish mutualistic symbioses with the majority of higher plants. The efficient uptake of low mobility mineral nutrients by the fungal symbiont and their further transfer to the plant is a major feature of this symbiosis. Besides improving plant mineral nutrition, AMF can alleviate heavy metal toxicity to their host plants and are able to tolerate high metal concentrations in the soil. Nevertheless, we are far from understanding the key molecular determinants of metal homeostasis in these organisms. To get some insights into these mechanisms, a genome-wide analysis of Cu, Fe and Zn transporters was undertaken, making use of the recently published whole genome of the AMF Rhizophagus irregularis. This in silico analysis allowed identification of 30 open reading frames in the R. irregularis genome, which potentially encode metal transporters. Phylogenetic comparisons with the genomes of a set of reference fungi showed an expansion of some metal transporter families. Analysis of the published transcriptomic profiles of R. irregularis revealed that a set of genes were up-regulated in mycorrhizal roots compared to germinated spores and extraradical mycelium, which suggests that metals are important for plant colonization.

  14. Genome Sequence of Torulaspora delbrueckii NRRL Y-50541, Isolated from Mezcal Fermentation.

    PubMed

    Gomez-Angulo, Jorge; Vega-Alvarado, Leticia; Escalante-García, Zazil; Grande, Ricardo; Gschaedler-Mathis, Anne; Amaya-Delgado, Lorena; Arrizon, Javier; Sanchez-Flores, Alejandro

    2015-07-23

    Torulaspora delbrueckii presents metabolic features interesting for biotechnological applications (in the dairy and wine industries). Recently, the T. delbrueckii CBS 1146 genome, which has been maintained under laboratory conditions since 1970, was published. Thus, a genome of a new mezcal yeast was sequenced and characterized and showed genetic differences and a higher genome assembly quality, offering a better reference genome. Copyright © 2015 Gomez-Angulo et al.

  15. Radiogenomic analysis of lower grade glioma: a pilot multi-institutional study shows an association between quantitative image features and tumor genomics

    NASA Astrophysics Data System (ADS)

    Mazurowski, Maciej A.; Clark, Kal; Czarnek, Nicholas M.; Shamsesfandabadi, Parisa; Peters, Katherine B.; Saha, Ashirbani

    2017-03-01

    Recent studies showed that genomic analysis of lower grade gliomas can be very effective for stratification of patients into groups with different prognosis and proposed specific genomic classifications. In this study, we explore the association of one of those genomic classifications with imaging parameters to determine whether imaging could serve a similar role to genomics in cancer patient treatment. Specifically, we analyzed imaging and genomics data for 110 patients from 5 institutions from The Cancer Genome Atlas and The Cancer Imaging Archive datasets. The analyzed imaging data contained preoperative FLAIR sequence for each patient. The images were analyzed using the in-house algorithms which quantify 2D and 3D aspects of the tumor shape. Genomic data consisted of a cluster of clusters classification proposed in a very recent and leading publication in the field of lower grade glioma genomics. Our statistical analysis showed that there is a strong association between the tumor cluster-of-clusters subtype and two imaging features: bounding ellipsoid volume ratio and angular standard deviation. This result shows high promise for the potential use of imaging as a surrogate measure for genomics in the decision process regarding treatment of lower grade glioma patients.

  16. FDA Escherichia coli Identification (FDA-ECID) Microarray: a Pangenome Molecular Toolbox for Serotyping, Virulence Profiling, Molecular Epidemiology, and Phylogeny

    PubMed Central

    Patel, Isha R.; Gangiredla, Jayanthi; Lacher, David W.; Mammel, Mark K.; Jackson, Scott A.; Lampel, Keith A.

    2016-01-01

    ABSTRACT Most Escherichia coli strains are nonpathogenic. However, for clinical diagnosis and food safety analysis, current identification methods for pathogenic E. coli either are time-consuming and/or provide limited information. Here, we utilized a custom DNA microarray with informative genetic features extracted from 368 sequence sets for rapid and high-throughput pathogen identification. The FDA Escherichia coli Identification (FDA-ECID) platform contains three sets of molecularly informative features that together stratify strain identification and relatedness. First, 53 known flagellin alleles, 103 alleles of wzx and wzy, and 5 alleles of wzm provide molecular serotyping utility. Second, 41,932 probe sets representing the pan-genome of E. coli provide strain-level gene content information. Third, approximately 125,000 single nucleotide polymorphisms (SNPs) of available whole-genome sequences (WGS) were distilled to 9,984 SNPs capable of recapitulating the E. coli phylogeny. We analyzed 103 diverse E. coli strains with available WGS data, including those associated with past foodborne illnesses, to determine robustness and accuracy. The array was able to accurately identify the molecular O and H serotypes, potentially correcting serological failures and providing better resolution for H-nontypeable/nonmotile phenotypes. In addition, molecular risk assessment was possible with key virulence marker identifications. Epidemiologically, each strain had a unique comparative genomic fingerprint that was extended to an additional 507 food and clinical isolates. Finally, a 99.7% phylogenetic concordance was established between microarray analysis and WGS using SNP-level data for advanced genome typing. Our study demonstrates FDA-ECID as a powerful tool for epidemiology and molecular risk assessment with the capacity to profile the global landscape and diversity of E. coli. IMPORTANCE This study describes a robust, state-of-the-art platform developed from available whole-genome sequences of E. coli and Shigella spp. by distilling useful signatures for epidemiology and molecular risk assessment into one assay. The FDA-ECID microarray contains features that enable comprehensive molecular serotyping and virulence profiling along with genome-scale genotyping and SNP analysis. Hence, it is a molecular toolbox that stratifies strain identification and pathogenic potential in the contexts of epidemiology and phylogeny. We applied this tool to strains from food, environmental, and clinical sources, resulting in significantly greater phylogenetic and strain-specific resolution than previously reported for available typing methods. PMID:27037122

  17. Systematic genomic identification of colorectal cancer genes delineating advanced from early clinical stage and metastasis

    PubMed Central

    2013-01-01

    Background Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis. Methods We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities. Results A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis. Conclusions We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression. PMID:24308539

  18. Genomic features of intertypic recombinant sabin poliovirus strains excreted by primary vaccinees.

    PubMed

    Cuervo, N S; Guillot, S; Romanenkova, N; Combiescu, M; Aubert-Combiescu, A; Seghier, M; Caro, V; Crainic, R; Delpeyroux, F

    2001-07-01

    The trivalent oral poliomyelitis vaccine (OPV) contains three different poliovirus serotypes. It use therefore creates particularly favorable conditions for mixed infection of gut cells, and indeed intertypic vaccine-derived recombinants (VdRec) have been frequently found in patients with vaccine-associated paralytic poliomyelitis. Nevertheless, there have not been extensive searches for VdRec in healthy vaccinees following immunization with OPV. To determine the incidence of VdRec and their excretion kinetics in primary vaccinees, and to establish the general genomic features of the corresponding recombinant genomes, we characterized poliovirus isolates excreted by vaccinees following primary immunization with OPV. Isolates were collected from 67 children 2 to 60 days following vaccination. Recombinant strains were identified by multiple restriction fragment length polymorphism assays. The localization of junction sites in recombinant genomes was also determined. VdRec excreted by vaccinees were first detected 2 to 4 days after vaccination. The highest rate of recombinants was on day 14. The frequency of VdRec depends strongly on the serotype of the analyzed isolates (2, 53, and 79% of recombinant strains in the last-excreted type 1, 2, and 3 isolates, respectively). Particular associations of genomic segments were preferred in the recombinant genomes, and recombination junctions were found in the genomic region encoding the nonstructural proteins. Recombination junctions generally clustered in particular subgenomic regions that were dependent on the serotype of the isolate and/or on the associations of genomic segments in recombinants. Thus, VdRec are frequently excreted by vaccinees, and the poliovirus replication machinery requirements or selection factors appear to act in vivo to shape the features of the recombinant genomes.

  19. From the Cover: Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features

    NASA Astrophysics Data System (ADS)

    Derelle, Evelyne; Ferraz, Conchita; Rombauts, Stephane; Rouzé, Pierre; Worden, Alexandra Z.; Robbens, Steven; Partensky, Frédéric; Degroeve, Sven; Echeynié, Sophie; Cooke, Richard; Saeys, Yvan; Wuyts, Jan; Jabbari, Kamel; Bowler, Chris; Panaud, Olivier; Piégu, Benoît; Ball, Steven G.; Ral, Jean-Philippe; Bouget, François-Yves; Piganeau, Gwenael; de Baets, Bernard; Picard, André; Delseny, Michel; Demaille, Jacques; van de Peer, Yves; Moreau, Hervé

    2006-08-01

    The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry. genome heterogeneity | genome sequence | green alga | Prasinophyceae | gene prediction

  20. Whole-Genome Sequencing of Staphylococcus haemolyticus Uncovers the Extreme Plasticity of Its Genome and the Evolution of Human-Colonizing Staphylococcal Species

    PubMed Central

    Takeuchi, Fumihiko; Watanabe, Shinya; Baba, Tadashi; Yuzawa, Harumi; Ito, Teruyo; Morimoto, Yuh; Kuroda, Makoto; Cui, Longzhu; Takahashi, Mikio; Ankai, Akiho; Baba, Shin-ichi; Fukui, Shigehiro; Lee, Jean C.; Hiramatsu, Keiichi

    2005-01-01

    Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the “oriC environ,” likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance. PMID:16237012

Top