Sample records for comparing genomic expression

  1. Comparative methods for the analysis of gene-expression evolution: an example using yeast functional genomic data.

    PubMed

    Oakley, Todd H; Gu, Zhenglong; Abouheif, Ehab; Patel, Nipam H; Li, Wen-Hsiung

    2005-01-01

    Understanding the evolution of gene function is a primary challenge of modern evolutionary biology. Despite an expanding database from genomic and developmental studies, we are lacking quantitative methods for analyzing the evolution of some important measures of gene function, such as gene-expression patterns. Here, we introduce phylogenetic comparative methods to compare different models of gene-expression evolution in a maximum-likelihood framework. We find that expression of duplicated genes has evolved according to a nonphylogenetic model, where closely related genes are no more likely than more distantly related genes to share common expression patterns. These results are consistent with previous studies that found rapid evolution of gene expression during the history of yeast. The comparative methods presented here are general enough to test a wide range of evolutionary hypotheses using genomic-scale data from any organism.

  2. Gramene 2016: comparative plant genomics and pathway resources

    PubMed Central

    Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen

    2016-01-01

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

  3. CGI: Java Software for Mapping and Visualizing Data from Array-based Comparative Genomic Hybridization and Expression Profiling

    PubMed Central

    Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H.; Lau, Ching C.; Behl, Sanjiv; Man, Tsz-Kwong

    2007-01-01

    With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License. PMID:19936083

  4. CGI: Java software for mapping and visualizing data from array-based comparative genomic hybridization and expression profiling.

    PubMed

    Gu, Joyce Xiuweu-Xu; Wei, Michael Yang; Rao, Pulivarthi H; Lau, Ching C; Behl, Sanjiv; Man, Tsz-Kwong

    2007-10-06

    With the increasing application of various genomic technologies in biomedical research, there is a need to integrate these data to correlate candidate genes/regions that are identified by different genomic platforms. Although there are tools that can analyze data from individual platforms, essential software for integration of genomic data is still lacking. Here, we present a novel Java-based program called CGI (Cytogenetics-Genomics Integrator) that matches the BAC clones from array-based comparative genomic hybridization (aCGH) to genes from RNA expression profiling datasets. The matching is computed via a fast, backend MySQL database containing UCSC Genome Browser annotations. This program also provides an easy-to-use graphical user interface for visualizing and summarizing the correlation of DNA copy number changes and RNA expression patterns from a set of experiments. In addition, CGI uses a Java applet to display the copy number values of a specific BAC clone in aCGH experiments side by side with the expression levels of genes that are mapped back to that BAC clone from the microarray experiments. The CGI program is built on top of extensible, reusable graphic components specifically designed for biologists. It is cross-platform compatible and the source code is freely available under the General Public License.

  5. Gramene 2016: comparative plant genomics and pathway resources.

    PubMed

    Tello-Ruiz, Marcela K; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A; Huerta, Laura; Keays, Maria; Tang, Y Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J; Jaiswal, Pankaj; Ware, Doreen

    2016-01-04

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  6. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    PubMed Central

    Callister, Stephen J.; McCue, Lee Ann; Turse, Joshua E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-01-01

    While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits. PMID:18253490

  7. BμG@Sbase—a microbial gene expression and comparative genomic database

    PubMed Central

    Witney, Adam A.; Waldron, Denise E.; Brooks, Lucy A.; Tyler, Richard H.; Withers, Michael; Stoker, Neil G.; Wren, Brendan W.; Butcher, Philip D.; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future. PMID:21948792

  8. BμG@Sbase--a microbial gene expression and comparative genomic database.

    PubMed

    Witney, Adam A; Waldron, Denise E; Brooks, Lucy A; Tyler, Richard H; Withers, Michael; Stoker, Neil G; Wren, Brendan W; Butcher, Philip D; Hinds, Jason

    2012-01-01

    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BμG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BμG@Sbase, a microbial gene expression and comparative genomic database. BμG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future.

  9. Genomic expression patterns in medication overuse headaches

    PubMed Central

    Hershey, Andrew D; Burdine, Danny; Kabbouche, Marielle A; Powers, Scott W

    2016-01-01

    Background Chronic daily headache (CDH) and chronic migraine (CM) are one of the most frequent problems encountered in neurology, are often difficult to treat, and frequently complicated by medication-overuse headache (MOH). Proper recognition of MOH may alter treatment outcome and prevent long term disability. Objective This study identifies the unique genomic expression pattern MOH that respond to cessation of the overused medication. Methods Baseline occurrence of MOH and typical pattern of response to medication cessation were measured from a large database. Whole blood samples from patients with CM with or without MOH were obtained and their genomic profile was assessed. Affymetrix human U133 plus2 arrays were used to examine the genomic expression patterns prior to treatment and 6–12 weeks later. Headache characterisation and response to treatment based on headache frequency and disability were compared. Results Of 1311 patients reporting daily or continuous headaches, 513 (39.1%) reported overusing analgesic medication. At follow-up, 44.5% had a 50% or greater reduction in headache frequency, while 41.6% had no change. Blood genomic expression patterns were obtained on 33 patients with 19 (57.6%) overusing analgesic medication with a unique genomic expression pattern in MOH that responded to cessation of analgesics. Gene ontology of these samples indicated a significant number were involved with brain and immunological tissues, including multiple signalling pathways and apoptosis. Conclusions Blood genomic patterns can accurately identify MOH patients that respond to medication cessation. These results suggest that MOH involves a unique molecular biology pathway that can be identified with a specific biomarker. PMID:20974594

  10. Dcode.org anthology of comparative genomic tools.

    PubMed

    Loots, Gabriela G; Ovcharenko, Ivan

    2005-07-01

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.

  11. Comparative genome sequencing reveals genomic signature of extreme desiccation tolerance in the anhydrobiotic midge

    PubMed Central

    Gusev, Oleg; Suetsugu, Yoshitaka; Cornette, Richard; Kawashima, Takeshi; Logacheva, Maria D.; Kondrashov, Alexey S.; Penin, Aleksey A.; Hatanaka, Rie; Kikuta, Shingo; Shimura, Sachiko; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Shagimardanova, Elena; Alexeev, Dmitry; Govorun, Vadim; Wisecaver, Jennifer; Mikheyev, Alexander; Koyanagi, Ryo; Fujie, Manabu; Nishiyama, Tomoaki; Shigenobu, Shuji; Shibata, Tomoko F.; Golygina, Veronika; Hasebe, Mitsuyasu; Okuda, Takashi; Satoh, Nori; Kikawada, Takahiro

    2014-01-01

    Anhydrobiosis represents an extreme example of tolerance adaptation to water loss, where an organism can survive in an ametabolic state until water returns. Here we report the first comparative analysis examining the genomic background of extreme desiccation tolerance, which is exclusively found in larvae of the only anhydrobiotic insect, Polypedilum vanderplanki. We compare the genomes of P. vanderplanki and a congeneric desiccation-sensitive midge P. nubifer. We determine that the genome of the anhydrobiotic species specifically contains clusters of multi-copy genes with products that act as molecular shields. In addition, the genome possesses several groups of genes with high similarity to known protective proteins. However, these genes are located in distinct paralogous clusters in the genome apart from the classical orthologues of the corresponding genes shared by both chironomids and other insects. The transcripts of these clustered paralogues contribute to a large majority of the mRNA pool in the desiccating larvae and most likely define successful anhydrobiosis. Comparison of expression patterns of orthologues between two chironomid species provides evidence for the existence of desiccation-specific gene expression systems in P. vanderplanki. PMID:25216354

  12. DCODE.ORG Anthology of Comparative Genomic Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loots, G G; Ovcharenko, I

    2005-01-11

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a toolmore » for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.« less

  13. Genomic Expression Patterns in Menstrually-Related Migraine in Adolescents

    PubMed Central

    Hershey, Andrew; Horn, Paul; Kabbouche, Marielle; O'Brien, Hope; Powers, Scott

    2011-01-01

    Background Exacerbation of migraine with menses is common in adolescent girls and women with migraine, occurring in up to 60% of females with migraine. These migraines are oftentimes longer and more disabling and may be related to estrogen levels and hormonal fluctuations. Objective This study identifies the unique genomic expression pattern of menstrually-related migraine (MRM) in comparison to migraine occurring outside the menstrual period and headache free controls. Methods Whole blood samples were obtained from female subjects having an acute migraine during their menstrual period (MRM) or outside of their menstrual period (nonMRM) and controls (C) – females having a menstrual period without any history of headache. The mRNA was isolated from these samples and genomic profile was assessed. Affymetrix Human Exon ST 1.0 arrays were used to examine the genomic expression pattern differences between these three groups. Results Blood genomic expression patterns were obtained on 56 subjects (MRM = 18, nonMRM = 18 and C = 20). Unique genomic expression patterns were observed for both MRM and nonMRM. For MRM, 77 genes were identified that were unique to MRM, while 61 genes were commonly expressed for MRM and nonMRM and 127 genes appeared to have a unique expression pattern for nonMRM. In addition, there were 279 genes that differentially expressed for MRM compared to nonMRM that were not differentially expressed for nonMRM. Gene ontology of these samples indicated many of these groups of genes were functionally related and included categories of immunomodulation/inflammation, mitochondrial function and DNA homeostasis. Conclusions Blood genomic patterns can accurately differentiate MRM from nonMRM. These results indicate that MRM involves a unique molecular biology pathway that can be identified with a specific biomarker and suggest that individuals with MRM have a different underlying genetic etiology. PMID:22220971

  14. Comparative genomics approaches to understanding and manipulating plant metabolism.

    PubMed

    Bradbury, Louis M T; Niehaus, Tom D; Hanson, Andrew D

    2013-04-01

    Over 3000 genomes, including numerous plant genomes, are now sequenced. However, their annotation remains problematic as illustrated by the many conserved genes with no assigned function, vague annotations such as 'kinase', or even wrong ones. Around 40% of genes of unknown function that are conserved between plants and microbes are probably metabolic enzymes or transporters; finding functions for these genes is a major challenge. Comparative genomics has correctly predicted functions for many such genes by analyzing genomic context, and gene fusions, distributions and co-expression. Comparative genomics complements genetic and biochemical approaches to dissect metabolism, continues to increase in power and decrease in cost, and has a pivotal role in modeling and engineering by helping identify functions for all metabolic genes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Methods for Genome-Wide Analysis of Gene Expression Changes in Polyploids

    PubMed Central

    Wang, Jianlin; Lee, Jinsuk J.; Tian, Lu; Lee, Hyeon-Se; Chen, Meng; Rao, Sheetal; Wei, Edward N.; Doerge, R. W.; Comai, Luca; Jeffrey Chen, Z.

    2007-01-01

    Polyploidy is an evolutionary innovation, providing extra sets of genetic material for phenotypic variation and adaptation. It is predicted that changes of gene expression by genetic and epigenetic mechanisms are responsible for novel variation in nascent and established polyploids (Liu and Wendel, 2002; Osborn et al., 2003; Pikaard, 2001). Studying gene expression changes in allopolyploids is more complicated than in autopolyploids, because allopolyploids contain more than two sets of genomes originating from divergent, but related, species. Here we describe two methods that are applicable to the genome-wide analysis of gene expression differences resulting from genome duplication in autopolyploids or interactions between homoeologous genomes in allopolyploids. First, we describe an amplified fragment length polymorphism (AFLP)–complementary DNA (cDNA) display method that allows the discrimination of homoeologous loci based on restriction polymorphisms between the progenitors. Second, we describe microarray analyses that can be used to compare gene expression differences between the allopolyploids and respective progenitors using appropriate experimental design and statistical analysis. We demonstrate the utility of these two complementary methods and discuss the pros and cons of using the methods to analyze gene expression changes in autopolyploids and allopolyploids. Furthermore, we describe these methods in general terms to be of wider applicability for comparative gene expression in a variety of evolutionary, genetic, biological, and physiological contexts. PMID:15865985

  16. The Fanconi anemia/BRCA gene network in zebrafish: embryonic expression and comparative genomics.

    PubMed

    Titus, Tom A; Yan, Yi-Lin; Wilson, Catherine; Starks, Amber M; Frohnmayer, Jonathan D; Bremiller, Ruth A; Cañestro, Cristian; Rodriguez-Mari, Adriana; He, Xinjun; Postlethwait, John H

    2009-07-31

    Fanconi anemia (FA) is a genetic disease resulting in bone marrow failure, high cancer risks, and infertility, and developmental anomalies including microphthalmia, microcephaly, hypoplastic radius and thumb. Here we present cDNA sequences, genetic mapping, and genomic analyses for the four previously undescribed zebrafish FA genes (fanci, fancj, fancm, and fancn), and show that they reverted to single copy after the teleost genome duplication. We tested the hypothesis that FA genes are expressed during embryonic development in tissues that are disrupted in human patients by investigating fanc gene expression patterns. We found fanc gene maternal message, which can provide Fanc proteins to repair DNA damage encountered in rapid cleavage divisions. Zygotic expression was broad but especially strong in eyes, central nervous system and hematopoietic tissues. In the pectoral fin bud at hatching, fanc genes were expressed specifically in the apical ectodermal ridge, a signaling center for fin/limb development that may be relevant to the radius/thumb anomaly of FA patients. Hatching embryos expressed fanc genes strongly in the oral epithelium, a site of squamous cell carcinomas in FA patients. Larval and adult zebrafish expressed fanc genes in proliferative regions of the brain, which may be related to microcephaly in FA. Mature ovaries and testes expressed fanc genes in specific stages of oocyte and spermatocyte development, which may be related to DNA repair during homologous recombination in meiosis and to infertility in human patients. The intestine strongly expressed some fanc genes specifically in proliferative zones. Our results show that zebrafish has a complete complement of fanc genes in single copy and that these genes are expressed in zebrafish embryos and adults in proliferative tissues that are often affected in FA patients. These results support the notion that zebrafish offers an attractive experimental system to help unravel mechanisms relevant not only

  17. The Fanconi anemia/BRCA gene network in zebrafish: Embryonic expression and comparative genomics

    PubMed Central

    Titus, Tom A.; Yan, Yi-Lin; Wilson, Catherine; Starks, Amber M.; Frohnmayer, Jonathan D.; Canestro, Cristian; Rodriguez-Mari, Adriana; He, Xinjun; Postlethwait, John H.

    2008-01-01

    Fanconi anemia (FA) is a genic disease resulting in bone marrow failure, high cancer risks, and infertility, and developmental anomalies including microphthalmia, microcephaly, hypoplastic radius and thumb. Here we present cDNA sequences, genetic mapping, and genomic analyses for the four previously undescribed zebrafish FA genes (fanci, fancj, fancm, and fancn, and show that they reverted to single copy after the teleost genome duplication. We tested the hypothesis that FA genes are expressed during embryonic development in tissues that are disrupted in human patients by investigating fanc gene expression patterns. We found fanc gene maternal message, which can provide Fanc proteins to repair DNA damage encountered in rapid cleavage divisions. Zygotic expression was broad but especially strong in eyes, central nervous system and hematopoietic tissues. In the pectoral fin bud at hatching, fanc genes were expressed specifically in the apical ectodermal ridge, a signaling center for fin/limb development that may be relevant to the radius/thumb anomaly of FA patients. Hatching embryos expressed fanc genes strongly in the oral epithelium, a site of squamous cell carcinomas in FA patients. Larval and adult zebrafish expressed fanc genes in proliferative regions of the brain, which may be related to microcephaly in FA. Mature ovaries and testes expressed fanc genes in specific stages of oocyte and spermatocyte development, which may be related to DNA repair during homologous recombination in meiosis and to infertility in human patients. The intestine strongly expressed some fanc genes specifically in proliferative zones. Our results show that zebrafish has a complete complement of fanc genes in single copy and that these genes are expressed in zebrafish embryos and adults in proliferative tissues that are often affected in FA patients. These results support the notion that zebrafish offers an attractive experimental system to help unravel mechanisms relevant not only to

  18. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  19. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set

  20. Gramene 2013: comparative plant genomics resources.

    PubMed

    Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

    2014-01-01

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.

  1. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  2. Comparative genome analysis in the integrated microbial genomes (IMG) system.

    PubMed

    Markowitz, Victor M; Kyrpides, Nikos C

    2007-01-01

    Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.

  3. Comparative Genomics in Drosophila.

    PubMed

    Oti, Martin; Pane, Attilio; Sammeth, Michael

    2018-01-01

    Since the pioneering studies of Thomas Hunt Morgan and coworkers at the dawn of the twentieth century, Drosophila melanogaster and its sister species have tremendously contributed to unveil the rules underlying animal genetics, development, behavior, evolution, and human disease. Recent advances in DNA sequencing technologies launched Drosophila into the post-genomic era and paved the way for unprecedented comparative genomics investigations. The complete sequencing and systematic comparison of the genomes from 12 Drosophila species represents a milestone achievement in modern biology, which allowed a plethora of different studies ranging from the annotation of known and novel genomic features to the evolution of chromosomes and, ultimately, of entire genomes. Despite the efforts of countless laboratories worldwide, the vast amount of data that were produced over the past 15 years is far from being fully explored.In this chapter, we will review some of the bioinformatic approaches that were developed to interrogate the genomes of the 12 Drosophila species. Setting off from alignments of the entire genomic sequences, the degree of conservation can be separately evaluated for every region of the genome, providing already first hints about elements that are under purifying selection and therefore likely functional. Furthermore, the careful analysis of repeated sequences sheds light on the evolutionary dynamics of transposons, an enigmatic and fascinating class of mobile elements housed in the genomes of animals and plants. Comparative genomics also aids in the computational identification of the transcriptionally active part of the genome, first and foremost of protein-coding loci, but also of transcribed nevertheless apparently noncoding regions, which were once considered "junk" DNA. Eventually, the synergy between functional and comparative genomics also facilitates in silico and in vivo studies on cis-acting regulatory elements, like transcription factor binding

  4. Cloud computing for comparative genomics.

    PubMed

    Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

    2010-05-18

    Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.

  5. Cloud computing for comparative genomics

    PubMed Central

    2010-01-01

    Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786

  6. A Comparative Genomic Study in Schizophrenic and in Bipolar Disorder Patients, Based on Microarray Expression Profiling Meta-Analysis

    PubMed Central

    Logotheti, Marianthi; Papadodima, Olga; Venizelos, Nikolaos; Chatziioannou, Aristotelis; Kolisis, Fragiskos

    2013-01-01

    Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%–5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets. PMID:23554570

  7. Ebolavirus comparative genomics

    DOE PAGES

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less

  8. An Expressed Sequence Tag (EST)-enriched genetic map of turbot (Scophthalmus maximus): a useful framework for comparative genomics across model and farmed teleosts

    PubMed Central

    2012-01-01

    Background The turbot (Scophthalmus maximus) is a relevant species in European aquaculture. The small turbot genome provides a source for genomics strategies to use in order to understand the genetic basis of productive traits, particularly those related to sex, growth and pathogen resistance. Genetic maps represent essential genomic screening tools allowing to localize quantitative trait loci (QTL) and to identify candidate genes through comparative mapping. This information is the backbone to develop marker-assisted selection (MAS) programs in aquaculture. Expressed sequenced tag (EST) resources have largely increased in turbot, thus supplying numerous type I markers suitable for extending the previous linkage map, which was mostly based on anonymous loci. The aim of this study was to construct a higher-resolution turbot genetic map using EST-linked markers, which will turn out to be useful for comparative mapping studies. Results A consensus gene-enriched genetic map of the turbot was constructed using 463 SNP and microsatellite markers in nine reference families. This map contains 438 markers, 180 EST-linked, clustered at 24 linkage groups. Linkage and comparative genomics evidences suggested additional linkage group fusions toward the consolidation of turbot map according to karyotype information. The linkage map showed a total length of 1402.7 cM with low average intermarker distance (3.7 cM; ~2 Mb). A global 1.6:1 female-to-male recombination frequency (RF) ratio was observed, although largely variable among linkage groups and chromosome regions. Comparative sequence analysis revealed large macrosyntenic patterns against model teleost genomes, significant hits decreasing from stickleback (54%) to zebrafish (20%). Comparative mapping supported particular chromosome rearrangements within Acanthopterygii and aided to assign unallocated markers to specific turbot linkage groups. Conclusions The new gene-enriched high-resolution turbot map represents a

  9. CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome.

    PubMed

    Lee, Mikyung; Kim, Yangseok

    2009-12-16

    Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules. To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square

  10. Comparative Genomics in Homo sapiens.

    PubMed

    Oti, Martin; Sammeth, Michael

    2018-01-01

    Genomes can be compared at different levels of divergence, either between species or within species. Within species genomes can be compared between different subpopulations, such as human subpopulations from different continents. Investigating the genomic differences between different human subpopulations is important when studying complex diseases that are affected by many genetic variants, as the variants involved can differ between populations. The 1000 Genomes Project collected genome-scale variation data for 2504 human individuals from 26 different populations, enabling a systematic comparison of variation between human subpopulations. In this chapter, we present step-by-step a basic protocol for the identification of population-specific variants employing the 1000 Genomes data. These variants are subsequently further investigated for those that affect the proteome or RNA splice sites, to investigate potentially biologically relevant differences between the populations.

  11. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  12. Phytozome Comparative Plant Genomics Portal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Goodstein, David; Batra, Sajeev; Carlson, Joseph

    2014-09-09

    The Dept. of Energy Joint Genome Institute is a genomics user facility supporting DOE mission science in the areas of Bioenergy, Carbon Cycling, and Biogeochemistry. The Plant Program at the JGI applies genomic, analytical, computational and informatics platforms and methods to: 1. Understand and accelerate the improvement (domestication) of bioenergy crops 2. Characterize and moderate plant response to climate change 3. Use comparative genomics to identify constrained elements and infer gene function 4. Build high quality genomic resource platforms of JGI Plant Flagship genomes for functional and experimental work 5. Expand functional genomic resources for Plant Flagship genomes

  13. Genomic expression patterns of cardiac tissues from dogs with dilated cardiomyopathy.

    PubMed

    Oyama, Mark A; Chittur, Sridar

    2005-07-01

    To evaluate global genome expression patterns of left ventricular tissues from dogs with dilated cardiomyopathy (DCM). Tissues obtained from the left ventricle of 2 Doberman Pinschers with end-stage DCM and 5 healthy control dogs. Transcriptional activities of 23,851 canine DNA sequences were determined by use of an oligonucleotide microarray. Genome expression patterns of DCM tissue were evaluated by measuring the relative amount of complementary RNA hybridization to the microarray probes and comparing it with gene expression for tissues from 5 healthy control dogs. 478 transcripts were differentially expressed (> or = 2.5-fold change). In DCM tissue, expression of 173 transcripts was upregulated and expression of 305 transcripts was downregulated, compared with expression for control tissues. Of the 478 transcripts, 167 genes could be specifically identified. These genes were grouped into 1 of 8 categories on the basis of their primary physiologic function. Grouping revealed that pathways involving cellular energy production, signaling and communication, and cell structure were generally downregulated, whereas pathways involving cellular defense and stress responses were upregulated. Many previously unreported genes that may contribute to the pathophysiologic aspects of heart disease were identified. Evaluation of global expression patterns provides a molecular portrait of heart failure, yields insights into the pathophysiologic aspects of DCM, and identifies intriguing genes and pathways for further study.

  14. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  15. Comparative analysis of genomics and proteomics in Bacillus thuringiensis 4.0718.

    PubMed

    Rang, Jie; He, Hao; Wang, Ting; Ding, Xuezhi; Zuo, Mingxing; Quan, Meifang; Sun, Yunjun; Yu, Ziquan; Hu, Shengbiao; Xia, Liqiu

    2015-01-01

    Bacillus thuringiensis is a widely used biopesticide that produced various insecticidal active substances during its life cycle. Separation and purification of numerous insecticide active substances have been difficult because of the relatively short half-life of such substances. On the other hand, substances can be synthetized at different times during development, so samples at different stages have to be studied, further complicating the analysis. A dual genomic and proteomic approach would enhance our ability to identify such substances, and particularily using mass spectrometry-based proteomic methods. The comparative analysis for genomic and proteomic data have showed that not all of the products deduced from the annotated genome could be identified among the proteomic data. For instance, genome annotation results showed that 39 coding sequences in the whole genome were related to insect pathogenicity, including five cry genes. However, Cry2Ab, Cry1Ia, Cytotoxin K, Bacteriocin, Exoenzyme C3 and Alveolysin could not be detected in the proteomic data obtained. The sporulation-related proteins were also compared analysis, results showed that the great majority sporulation-related proteins can be detected by mass spectrometry. This analysis revealed Spo0A~P, SigF, SigE(+), SigK(+) and SigG(+), all known to play an important role in the process of spore formation regulatory network, also were displayed in the proteomic data. Through the comparison of the two data sets, it was possible to infer that some genes were silenced or were expressed at very low levels. For instance, found that cry2Ab seems to lack a functional promoter while cry1Ia may not be expressed due to the presence of transposons. With this comparative study a relatively complete database can be constructed and used to transform hereditary material, thereby prompting the high expression of toxic proteins. A theoretical basis is provided for constructing highly virulent engineered bacteria and for

  16. Comparative genomics of xylose-fermenting fungi for enhanced biofuel production

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wohlbach, Dana J.; Kuo, Alan; Sato, Trey K.

    Cellulosic biomass is an abundant and underused substrate for biofuel production. The inability of many microbes to metabolize the pentose sugars abundant within hemicellulose creates specific challenges for microbial biofuel production from cellulosic material. Although engineered strains of Saccharomyces cerevisiae can use the pentose xylose, the fermentative capacity pales in comparison with glucose, limiting the economic feasibility of industrial fermentations. To better understand xylose utilization for subsequent microbial engineering, we sequenced the genomes of two xylose-fermenting, beetle-associated fungi, Spathaspora passalidarum and Candida tenuis. To identify genes involved in xylose metabolism, we applied a comparative genomic approach across 14 Ascomycete genomes,more » mapping phenotypes and genotypes onto the fungal phylogeny, and measured genomic expression across five Hemiascomycete species with different xylose-consumption phenotypes. This approach implicated many genes and processes involved in xylose assimilation. Several of these genes significantly improved xylose utilization when engineered into S. cerevisiae, demonstrating the power of comparative methods in rapidly identifying genes for biomass conversion while reflecting on fungal ecology.« less

  17. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  19. Comparative Reannotation of 21 Aspergillus Genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Salamov, Asaf; Riley, Robert; Kuo, Alan

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one whichmore » most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.« less

  20. Genome engineering and gene expression control for bacterial strain development.

    PubMed

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Comparative genomic hybridization.

    PubMed

    Pinkel, Daniel; Albertson, Donna G

    2005-01-01

    Altering DNA copy number is one of the many ways that gene expression and function may be modified. Some variations are found among normal individuals ( 14, 35, 103 ), others occur in the course of normal processes in some species ( 33 ), and still others participate in causing various disease states. For example, many defects in human development are due to gains and losses of chromosomes and chromosomal segments that occur prior to or shortly after fertilization, whereas DNA dosage alterations that occur in somatic cells are frequent contributors to cancer. Detecting these aberrations, and interpreting them within the context of broader knowledge, facilitates identification of critical genes and pathways involved in biological processes and diseases, and provides clinically relevant information. Over the past several years array comparative genomic hybridization (array CGH) has demonstrated its value for analyzing DNA copy number variations. In this review we discuss the state of the art of array CGH and its applications in medical genetics and cancer, emphasizing general concepts rather than specific results.

  2. A universal genomic coordinate translator for comparative genomics

    PubMed Central

    2014-01-01

    expression levels across species. Conclusions Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken. PMID:24976580

  3. A universal genomic coordinate translator for comparative genomics.

    PubMed

    Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

    2014-06-30

    Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across

  4. Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

    PubMed Central

    Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

    2011-01-01

    Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non

  5. GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface.

    PubMed

    Lajugie, Julien; Fourel, Nicolas; Bouhassira, Eric E

    2015-01-01

    Parallel visualization of multiple individual human genomes is a complex endeavor that is rapidly gaining importance with the increasing number of personal, phased and cancer genomes that are being generated. It requires the display of variants such as SNPs, indels and structural variants that are unique to specific genomes and the introduction of multiple overlapping gaps in the reference sequence. Here, we describe GenPlay Multi-Genome, an application specifically written to visualize and analyze multiple human genomes in parallel. GenPlay Multi-Genome is ideally suited for the comparison of allele-specific expression and functional genomic data obtained from multiple phased genomes in a graphical interface with access to multiple-track operation. It also allows the analysis of data that have been aligned to custom genomes rather than to a standard reference and can be used as a variant calling format file browser and as a tool to compare different genome assembly, such as hg19 and hg38. GenPlay is available under the GNU public license (GPL-3) from http://genplay.einstein.yu.edu. The source code is available at https://github.com/JulienLajugie/GenPlay. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Plant Comparative and Functional Genomics

    DOE PAGES

    Yang, Xiaohan; Leebens-Mack, Jim; Chen, Feng; ...

    2015-01-01

    Plants form the foundation for our global ecosystem and are essential for environmental and human health. An increasing number of available plant genomes and tractable experimental systems, comparative and functional plant genomics research is greatly expanding our knowledge of the molecular basis of economically and nutritionally important traits in crop plants. Inferences drawn from comparative genomics are motivating experimental investigations of gene function and gene interactions. In this special issue aims to highlight recent advances made in comparative and functional genomics research in plants. Nine original research articles in this special issue cover five important topics: (1) transcription factor genemore » families relevant to abiotic stress tolerance; (2) plant secondary metabolism; (3) transcriptomebased markers for quantitative trait locus; (4) epigenetic modifications in plant-microbe interactions; and (5) computational prediction of protein-protein interactions. Finally, we studied the plant species in these articles which include model species as well as nonmodel plant species of economic importance (e.g., food crops and medicinal plants).« less

  7. Plant Comparative and Functional Genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Leebens-Mack, Jim; Chen, Feng

    Plants form the foundation for our global ecosystem and are essential for environmental and human health. An increasing number of available plant genomes and tractable experimental systems, comparative and functional plant genomics research is greatly expanding our knowledge of the molecular basis of economically and nutritionally important traits in crop plants. Inferences drawn from comparative genomics are motivating experimental investigations of gene function and gene interactions. In this special issue aims to highlight recent advances made in comparative and functional genomics research in plants. Nine original research articles in this special issue cover five important topics: (1) transcription factor genemore » families relevant to abiotic stress tolerance; (2) plant secondary metabolism; (3) transcriptomebased markers for quantitative trait locus; (4) epigenetic modifications in plant-microbe interactions; and (5) computational prediction of protein-protein interactions. Finally, we studied the plant species in these articles which include model species as well as nonmodel plant species of economic importance (e.g., food crops and medicinal plants).« less

  8. Comparative primate genomics: emerging patterns of genome content and dynamics

    PubMed Central

    Rogers, Jeffrey; Gibbs, Richard A.

    2014-01-01

    Preface Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for several primates, with analyses of several others underway. Whole genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other nonhuman primates provide valuable insight into genetic similarities and differences among species used as models for disease-related research. This review summarizes current knowledge regarding primate genome content and dynamics and offers a series of goals for the near future. PMID:24709753

  9. Comparative primate genomics: emerging patterns of genome content and dynamics.

    PubMed

    Rogers, Jeffrey; Gibbs, Richard A

    2014-05-01

    Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for various primate species, and analyses of several others are underway. Whole-genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other non-human primates offer valuable insights into genetic similarities and differences among species that are used as models for disease-related research. This Review summarizes current knowledge regarding primate genome content and dynamics, and proposes a series of goals for the near future.

  10. Morphological, Genome and Gene Expression Changes in Newly Induced Autopolyploid Chrysanthemum lavandulifolium (Fisch. ex Trautv.) Makino.

    PubMed

    Gao, Ri; Wang, Haibin; Dong, Bin; Yang, Xiaodong; Chen, Sumei; Jiang, Jiafu; Zhang, Zhaohe; Liu, Chen; Zhao, Nan; Chen, Fadi

    2016-10-09

    Autopolyploidy is widespread in higher plants and plays an important role in the process of evolution. The present study successfully induced autotetraploidys from Chrysanthemum lavandulifolium by colchicine. The plant morphology, genomic, transcriptomic, and epigenetic changes between tetraploid and diploid plants were investigated. Ligulate flower, tubular flower and leaves of tetraploid plants were greater than those of the diploid plants. Compared with diploid plants, the genome changed as a consequence of polyploidization in tetraploid plants, namely, 1.1% lost fragments and 1.6% novel fragments occurred. In addition, DNA methylation increased after genome doubling in tetraploid plants. Among 485 common transcript-derived fragments (TDFs), which existed in tetraploid and diploid progenitors, 62 fragments were detected as differentially expressed TDFs, 6.8% of TDFs exhibited up-regulated gene expression in the tetraploid plants and 6.0% exhibited down-regulation. The present study provides a reference for further studying the autopolyploidization role in the evolution of C. lavandulifolium. In conclusion, the autopolyploid C. lavandulifolium showed a global change in morphology, genome and gene expression compared with corresponding diploid.

  11. The eastern oyster genome: A resource for comparative genomics in shellfish aquaculture species

    USDA-ARS?s Scientific Manuscript database

    Oyster aquaculture is an important sector of world food production. As such, it is imperative to develop a high quality reference genome for the eastern oyster, Crassostrea virginica, to assist in the elucidation of the genomic basis of commercially important traits. All genetic, gene expression and...

  12. Refined annotation and assembly of the Tetrahymena thermophila genome sequence through EST analysis, comparative genomic hybridization, and targeted gap closure

    PubMed Central

    Coyne, Robert S; Thiagarajan, Mathangi; Jones, Kristie M; Wortman, Jennifer R; Tallon, Luke J; Haas, Brian J; Cassidy-Hanley, Donna M; Wiley, Emily A; Smith, Joshua J; Collins, Kathleen; Lee, Suzanne R; Couvillion, Mary T; Liu, Yifan; Garg, Jyoti; Pearlman, Ronald E; Hamilton, Eileen P; Orias, Eduardo; Eisen, Jonathan A; Methé, Barbara A

    2008-01-01

    Background Tetrahymena thermophila, a widely studied model for cellular and molecular biology, is a binucleated single-celled organism with a germline micronucleus (MIC) and somatic macronucleus (MAC). The recent draft MAC genome assembly revealed low sequence repetitiveness, a result of the epigenetic removal of invasive DNA elements found only in the MIC genome. Such low repetitiveness makes complete closure of the MAC genome a feasible goal, which to achieve would require standard closure methods as well as removal of minor MIC contamination of the MAC genome assembly. Highly accurate preliminary annotation of Tetrahymena's coding potential was hindered by the lack of both comparative genomic sequence information from close relatives and significant amounts of cDNA evidence, thus limiting the value of the genomic information and also leaving unanswered certain questions, such as the frequency of alternative splicing. Results We addressed the problem of MIC contamination using comparative genomic hybridization with purified MIC and MAC DNA probes against a whole genome oligonucleotide microarray, allowing the identification of 763 genome scaffolds likely to contain MIC-limited DNA sequences. We also employed standard genome closure methods to essentially finish over 60% of the MAC genome. For the improvement of annotation, we have sequenced and analyzed over 60,000 verified EST reads from a variety of cellular growth and development conditions. Using this EST evidence, a combination of automated and manual reannotation efforts led to updates that affect 16% of the current protein-coding gene models. By comparing EST abundance, many genes showing apparent differential expression between these conditions were identified. Rare instances of alternative splicing and uses of the non-standard amino acid selenocysteine were also identified. Conclusion We report here significant progress in genome closure and reannotation of Tetrahymena thermophila. Our experience to date

  13. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.

    PubMed

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian

    2017-04-27

    The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.

  14. Comparative Genomics of the Cucurbitaceae

    USDA-ARS?s Scientific Manuscript database

    The genome size for watermelon, melon, cucumber, and pumpkin is 425, 454, 367, and 502 Mbp, respectively, and considered medium size as compared with most other crops. Whole-genome duplication is common in angiosperm plants. Research has revealed a paleohexaploidy (') event in the common ancestor of...

  15. Ebolavirus comparative genomics

    PubMed Central

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035

  16. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  17. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus

    PubMed Central

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria. PMID:26900859

  18. Ensembl 2002: accommodating comparative genomics.

    PubMed

    Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

    2003-01-01

    The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.

  19. Comparative genomic data of the Avian Phylogenomics Project.

    PubMed

    Zhang, Guojie; Li, Bo; Li, Cai; Gilbert, M Thomas P; Jarvis, Erich D; Wang, Jun

    2014-01-01

    The evolutionary relationships of modern birds are among the most challenging to understand in systematic biology and have been debated for centuries. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders, and used the genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomics analyses (Jarvis et al. in press; Zhang et al. in press). Here we release assemblies and datasets associated with the comparative genome analyses, which include 38 newly sequenced avian genomes plus previously released or simultaneously released genomes of Chicken, Zebra finch, Turkey, Pigeon, Peregrine falcon, Duck, Budgerigar, Adelie penguin, Emperor penguin and the Medium Ground Finch. We hope that this resource will serve future efforts in phylogenomics and comparative genomics. The 38 bird genomes were sequenced using the Illumina HiSeq 2000 platform and assembled using a whole genome shotgun strategy. The 48 genomes were categorized into two groups according to the N50 scaffold size of the assemblies: a high depth group comprising 23 species sequenced at high coverage (>50X) with multiple insert size libraries resulting in N50 scaffold sizes greater than 1 Mb (except the White-throated Tinamou and Bald Eagle); and a low depth group comprising 25 species sequenced at a low coverage (~30X) with two insert size libraries resulting in an average N50 scaffold size of about 50 kb. Repetitive elements comprised 4%-22% of the bird genomes. The assembled scaffolds allowed the homology-based annotation of 13,000 ~ 17000 protein coding genes in each avian genome relative to chicken, zebra finch and human, as well as comparative and sequence conservation analyses. Here we release full genome assemblies of 38 newly sequenced avian species, link genome assembly downloads for the 7 of the remaining 10 species, and provide a guideline of

  20. Maintenance and expression of the S. cerevisiae mitochondrial genome--from genetics to evolution and systems biology.

    PubMed

    Lipinski, Kamil A; Kaniak-Golik, Aneta; Golik, Pawel

    2010-01-01

    As a legacy of their endosymbiotic eubacterial origin, mitochondria possess a residual genome, encoding only a few proteins and dependent on a variety of factors encoded by the nuclear genome for its maintenance and expression. As a facultative anaerobe with well understood genetics and molecular biology, Saccharomyces cerevisiae is the model system of choice for studying nucleo-mitochondrial genetic interactions. Maintenance of the mitochondrial genome is controlled by a set of nuclear-coded factors forming intricately interconnected circuits responsible for replication, recombination, repair and transmission to buds. Expression of the yeast mitochondrial genome is regulated mostly at the post-transcriptional level, and involves many general and gene-specific factors regulating splicing, RNA processing and stability and translation. A very interesting aspect of the yeast mitochondrial system is the relationship between genome maintenance and gene expression. Deletions of genes involved in many different aspects of mitochondrial gene expression, notably translation, result in an irreversible loss of functional mtDNA. The mitochondrial genetic system viewed from the systems biology perspective is therefore very fragile and lacks robustness compared to the remaining systems of the cell. This lack of robustness could be a legacy of the reductive evolution of the mitochondrial genome, but explanations involving selective advantages of increased evolvability have also been postulated. Copyright © 2009 Elsevier B.V. All rights reserved.

  1. Schistosoma comparative genomics: integrating genome structure, parasite biology and anthelmintic discovery

    PubMed Central

    Swain, Martin T.; Larkin, Denis M.; Caffrey, Conor R.; Davies, Stephen J.; Loukas, Alex; Skelly, Patrick J.; Hoffmann, Karl F.

    2011-01-01

    Schistosoma genomes provide a comprehensive resource for identifying the molecular processes that shape parasite evolution and for discovering novel chemotherapeutic or immunoprophylactic targets. Here, we demonstrate how intra- and intergenus comparative genomics can be used to drive these investigations forward, illustrate the advantages and limitations of these approaches and review how post genomic technologies offer complementary strategies for genome characterisation. While sequencing and functional characterisation of other schistosome/platyhelminth genomes continues to expedite anthelmintic discovery, we contend that future priorities should equally focus on improving assembly quality, and chromosomal assignment, of existing schistosome/platyhelminth genomes. PMID:22024648

  2. MicroScope: a platform for microbial genome annotation and comparative genomics

    PubMed Central

    Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of

  3. MicroScope: a platform for microbial genome annotation and comparative genomics.

    PubMed

    Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C

    2009-01-01

    The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of

  4. Genome-Wide Expression Profiling of Complex Regional Pain Syndrome

    PubMed Central

    Jin, Eun-Heui; Zhang, Enji; Ko, Youngkwon; Sim, Woo Seog; Moon, Dong Eon; Yoon, Keon Jung; Hong, Jang Hee; Lee, Won Hyung

    2013-01-01

    Complex regional pain syndrome (CRPS) is a chronic, progressive, and devastating pain syndrome characterized by spontaneous pain, hyperalgesia, allodynia, altered skin temperature, and motor dysfunction. Although previous gene expression profiling studies have been conducted in animal pain models, there genome-wide expression profiling in the whole blood of CRPS patients has not been reported yet. Here, we successfully identified certain pain-related genes through genome-wide expression profiling in the blood from CRPS patients. We found that 80 genes were differentially expressed between 4 CRPS patients (2 CRPS I and 2 CRPS II) and 5 controls (cut-off value: 1.5-fold change and p<0.05). Most of those genes were associated with signal transduction, developmental processes, cell structure and motility, and immunity and defense. The expression levels of major histocompatibility complex class I A subtype (HLA-A29.1), matrix metalloproteinase 9 (MMP9), alanine aminopeptidase N (ANPEP), l-histidine decarboxylase (HDC), granulocyte colony-stimulating factor 3 receptor (G-CSF3R), and signal transducer and activator of transcription 3 (STAT3) genes selected from the microarray were confirmed in 24 CRPS patients and 18 controls by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). We focused on the MMP9 gene that, by qRT-PCR, showed a statistically significant difference in expression in CRPS patients compared to controls with the highest relative fold change (4.0±1.23 times and p = 1.4×10−4). The up-regulation of MMP9 gene in the blood may be related to the pain progression in CRPS patients. Our findings, which offer a valuable contribution to the understanding of the differential gene expression in CRPS may help in the understanding of the pathophysiology of CRPS pain progression. PMID:24244504

  5. Comparative Genome Analysis of Enterobacter cloacae

    PubMed Central

    Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching

    2013-01-01

    The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314

  6. Genome wide identification, phylogeny, and expression of bone morphogenetic protein genes in tetraploidized common carp (Cyprinus carpio).

    PubMed

    Chen, Lin; Dong, Chuanju; Kong, Shengnan; Zhang, Jiangfan; Li, Xuejun; Xu, Peng

    2017-09-05

    Bone morphogenetic proteins (Bmps) are a group of signaling molecules known to play important roles during formation and maintenance of various organs, not only bone, but also muscle, blood and so on. Common carp (Cyprinus carpio) is one of the most intensively studied fish due to its economic and environmental importance. Besides, common carp has encountered an additional round of whole genome duplication (WGD) compared with many closely related diploid teleost, which make it one of the most important models for genome evolutionary studies in teleost. Comprehensive genome resources of common carp have been developed recently, which facilitate the thorough characterization of bmp gene family in the tetraploidized common carp genome. We identified a total of 44 bmps from the common carp genome, which are twice as many as that of zebrafish. Phylogenetic analysis revealed that most of bmps are highly conserved. Comparative analysis was performed across six typical vertebrate genomes. It appeared that all the bmp genes in common carp were duplicated. Obviously, the expansion of the bmp gene family in common carp was due to the latest additional round of whole genome duplication and made it more abundant than other diploid teleosts. Expression signatures were assessed in major tissues, including gill, intestine, liver, spleen, skin, heart, gonad, muscle, kidney, head kidney, brain and blood, which demonstrated the comprehensive expression profiles of bmp genes in the tetraploidized genome. Significant gene expression divergences were observed which revealed substantial functional divergences of those duplicated bmp genes post the latest WGD event. The conserved synteny blocks of bmp5s revealed the genome rearrangement of common carp post the 4R WGD. The whole set of bmp gene family in common carp provides insight into gene fate of tetraploidized common carp genome post recent WGD. Copyright © 2017. Published by Elsevier B.V.

  7. Unclassified renal cell carcinoma: a clinicopathological, comparative genomic hybridization, and whole-genome exon sequencing study.

    PubMed

    Hu, Zhen-Yan; Pang, Li-Juan; Qi, Yan; Kang, Xue-Ling; Hu, Jian-Ming; Wang, Lianghai; Liu, Kun-Peng; Ren, Yuan; Cui, Mei; Song, Li-Li; Li, Hong-An; Zou, Hong; Li, Feng

    2014-01-01

    Unclassified renal cell carcinoma (URCC) is a rare variant of RCC, accounting for only 3-5% of all cases. Studies on the molecular genetics of URCC are limited, and hence, we report on 2 cases of URCC analyzed using comparative genome hybridization (CGH) and the genome-wide human exon GeneChip technique to identify the genomic alterations of URCC. Both URCC patients (mean age, 72 years) presented at an advanced stage and died within 30 months post-surgery. Histologically, the URCCs were composed of undifferentiated, multinucleated, giant cells with eosinophilic cytoplasm. Immunostaining revealed that both URCC cases had strong p53 protein expression and partial expression of cluster of differentiation-10 and cytokeratin. The CGH profiles showed chromosomal imbalances in both URCC cases: gains were observed in chromosomes 1p11-12, 1q12-13, 2q20-23, 3q22-23, 8p12, and 16q11-15, whereas losses were detected on chromosomes 1q22-23, 3p12-22, 5p30-ter, 6p, 11q, 16q18-22, 17p12-14, and 20p. Compared with 18 normal renal tissues, 40 mutated genes were detected in the URCC tissues, including 32 missense and 8 silent mutations. Functional enrichment analysis revealed that the missense mutation genes were involved in 11 different biological processes and pathways, including cell cycle regulation, lipid localization and transport, neuropeptide signaling, organic ether metabolism, and ATP-binding cassette transporter signaling. Our findings indicate that URCC may be a highly aggressive cancer, and the genetic alterations identified herein may provide clues regarding the tumorigenesis of URCC and serve as a basis for the development of targeted therapies against URCC in the future.

  8. Unclassified renal cell carcinoma: a clinicopathological, comparative genomic hybridization, and whole-genome exon sequencing study

    PubMed Central

    Hu, Zhen-Yan; Pang, Li-Juan; Qi, Yan; Kang, Xue-Ling; Hu, Jian-Ming; Wang, Lianghai; Liu, Kun-Peng; Ren, Yuan; Cui, Mei; Song, Li-Li; Li, Hong-An; Zou, Hong; Li, Feng

    2014-01-01

    Unclassified renal cell carcinoma (URCC) is a rare variant of RCC, accounting for only 3-5% of all cases. Studies on the molecular genetics of URCC are limited, and hence, we report on 2 cases of URCC analyzed using comparative genome hybridization (CGH) and the genome-wide human exon GeneChip technique to identify the genomic alterations of URCC. Both URCC patients (mean age, 72 years) presented at an advanced stage and died within 30 months post-surgery. Histologically, the URCCs were composed of undifferentiated, multinucleated, giant cells with eosinophilic cytoplasm. Immunostaining revealed that both URCC cases had strong p53 protein expression and partial expression of cluster of differentiation-10 and cytokeratin. The CGH profiles showed chromosomal imbalances in both URCC cases: gains were observed in chromosomes 1p11-12, 1q12-13, 2q20-23, 3q22-23, 8p12, and 16q11-15, whereas losses were detected on chromosomes 1q22-23, 3p12-22, 5p30-ter, 6p, 11q, 16q18-22, 17p12-14, and 20p. Compared with 18 normal renal tissues, 40 mutated genes were detected in the URCC tissues, including 32 missense and 8 silent mutations. Functional enrichment analysis revealed that the missense mutation genes were involved in 11 different biological processes and pathways, including cell cycle regulation, lipid localization and transport, neuropeptide signaling, organic ether metabolism, and ATP-binding cassette transporter signaling. Our findings indicate that URCC may be a highly aggressive cancer, and the genetic alterations identified herein may provide clues regarding the tumorigenesis of URCC and serve as a basis for the development of targeted therapies against URCC in the future. PMID:25120763

  9. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additionalmore » genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.« less

  10. Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics

    PubMed Central

    del Val, Coral; Rivas, Elena; Torres-Quesada, Omar; Toro, Nicolás; Jiménez-Zurdo, José I

    2007-01-01

    Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related α-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5′-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of α-proteobacteria with their eukaryotic hosts. PMID:17971083

  11. Genome-wide comparative analysis of NBS-encoding genes between Brassica species and Arabidopsis thaliana.

    PubMed

    Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi

    2014-01-03

    Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome

  12. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  13. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  14. Comparing Mycobacterium tuberculosis genomes using genome topology networks.

    PubMed

    Jiang, Jianping; Gu, Jianlei; Zhang, Liang; Zhang, Chenyi; Deng, Xiao; Dou, Tonghai; Zhao, Guoping; Zhou, Yan

    2015-02-14

    Over the last decade, emerging research methods, such as comparative genomic analysis and phylogenetic study, have yielded new insights into genotypes and phenotypes of closely related bacterial strains. Several findings have revealed that genomic structural variations (SVs), including gene gain/loss, gene duplication and genome rearrangement, can lead to different phenotypes among strains, and an investigation of genes affected by SVs may extend our knowledge of the relationships between SVs and phenotypes in microbes, especially in pathogenic bacteria. In this work, we introduce a 'Genome Topology Network' (GTN) method based on gene homology and gene locations to analyze genomic SVs and perform phylogenetic analysis. Furthermore, the concept of 'unfixed ortholog' has been proposed, whose members are affected by SVs in genome topology among close species. To improve the precision of 'unfixed ortholog' recognition, a strategy to detect annotation differences and complete gene annotation was applied. To assess the GTN method, a set of thirteen complete M. tuberculosis genomes was analyzed as a case study. GTNs with two different gene homology-assigning methods were built, the Clusters of Orthologous Groups (COG) method and the orthoMCL clustering method, and two phylogenetic trees were constructed accordingly, which may provide additional insights into whole genome-based phylogenetic analysis. We obtained 24 unfixable COG groups, of which most members were related to immunogenicity and drug resistance, such as PPE-repeat proteins (COG5651) and transcriptional regulator TetR gene family members (COG1309). The GTN method has been implemented in PERL and released on our website. The tool can be downloaded from http://homepage.fudan.edu.cn/zhouyan/gtn/ , and allows re-annotating the 'lost' genes among closely related genomes, analyzing genes affected by SVs, and performing phylogenetic analysis. With this tool, many immunogenic-related and drug resistance-related genes

  15. Genome-wide gene expression perturbation induced by loss of C2 chromosome in allotetraploid Brassica napus L.

    PubMed Central

    Zhu, Bin; Shao, Yujiao; Pan, Qi; Ge, Xianhong; Li, Zaiyun

    2015-01-01

    Aneuploidy with loss of entire chromosomes from normal complement disrupts the balanced genome and is tolerable only by polyploidy plants. In this study, the monosomic and nullisomic plants losing one or two copies of C2 chromosome from allotetraploid Brassica napus L. (2n = 38, AACC) were produced and compared for their phenotype and transcriptome. The monosomics gave a plant phenotype very similar to the original donor, but the nullisomics had much smaller stature and also shorter growth period. By the comparative analyses on the global transcript profiles with the euploid donor, genome-wide alterations in gene expression were revealed in two aneuploids, and their majority of differentially expressed genes (DEGs) resulted from the trans-acting effects of the zero and one copy of C2 chromosome. The higher number of up-regulated genes than down-regulated genes on other chromosomes suggested that the genome responded to the C2 loss via enhancing the expression of certain genes. Particularly, more DEGs were detected in the monosomics than nullisomics, contrasting with their phenotypes. The gene expression of the other chromosomes was differently affected, and several dysregulated domains in which up- or downregulated genes obviously clustered were identifiable. But the mean gene expression (MGE) for homoeologous chromosome A2 reduced with the C2 loss. Some genes and their expressions on C2 were correlated with the phenotype deviations in the aneuploids. These results provided new insights into the transcriptomic perturbation of the allopolyploid genome elicited by the loss of individual chromosome. PMID:26442076

  16. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    PubMed Central

    Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

    2008-01-01

    Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client

  17. Comparative analysis and visualization of multiple collinear genomes

    PubMed Central

    2012-01-01

    Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

  18. Reference-free comparative genomics of 174 chloroplasts.

    PubMed

    Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R; Yu, Jun; Cannon, Charles H

    2012-01-01

    Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and

  19. Mosaic Graphs and Comparative Genomics in Phage Communities

    PubMed Central

    Belcaid, Mahdi; Bergeron, Anne

    2010-01-01

    Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413

  20. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium.

    PubMed

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur , amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.

  1. In silico identification and comparative analysis of differentially expressed genes in human and mouse tissues

    PubMed Central

    Pao, Sheng-Ying; Lin, Win-Li; Hwang, Ming-Jing

    2006-01-01

    Background Screening for differentially expressed genes on the genomic scale and comparative analysis of the expression profiles of orthologous genes between species to study gene function and regulation are becoming increasingly feasible. Expressed sequence tags (ESTs) are an excellent source of data for such studies using bioinformatic approaches because of the rich libraries and tremendous amount of data now available in the public domain. However, any large-scale EST-based bioinformatics analysis must deal with the heterogeneous, and often ambiguous, tissue and organ terms used to describe EST libraries. Results To deal with the issue of tissue source, in this work, we carefully screened and organized more than 8 million human and mouse ESTs into 157 human and 108 mouse tissue/organ categories, to which we applied an established statistic test using different thresholds of the p value to identify genes differentially expressed in different tissues. Further analysis of the tissue distribution and level of expression of human and mouse orthologous genes showed that tissue-specific orthologs tended to have more similar expression patterns than those lacking significant tissue specificity. On the other hand, a number of orthologs were found to have significant disparity in their expression profiles, hinting at novel functions, divergent regulation, or new ortholog relationships. Conclusion Comprehensive statistics on the tissue-specific expression of human and mouse genes were obtained in this very large-scale, EST-based analysis. These statistical results have been organized into a database, freely accessible at our website , for easy searching of human and mouse tissue-specific genes and for investigating gene expression profiles in the context of comparative genomics. Comparative analysis showed that, although highly tissue-specific genes tend to exhibit similar expression profiles in human and mouse, there are significant exceptions, indicating that orthologous

  2. Comparative map and trait viewer (CMTV): an integrated bioinformatic tool to construct consensus maps and compare QTL and functional genomics data across genomes and experiments.

    PubMed

    Sawkins, M C; Farmer, A D; Hoisington, D; Sullivan, J; Tolopko, A; Jiang, Z; Ribaut, J-M

    2004-10-01

    In the past few decades, a wealth of genomic data has been produced in a wide variety of species using a diverse array of functional and molecular marker approaches. In order to unlock the full potential of the information contained in these independent experiments, researchers need efficient and intuitive means to identify common genomic regions and genes involved in the expression of target phenotypic traits across diverse conditions. To address this need, we have developed a Comparative Map and Trait Viewer (CMTV) tool that can be used to construct dynamic aggregations of a variety of types of genomic datasets. By algorithmically determining correspondences between sets of objects on multiple genomic maps, the CMTV can display syntenic regions across taxa, combine maps from separate experiments into a consensus map, or project data from different maps into a common coordinate framework using dynamic coordinate translations between source and target maps. We present a case study that illustrates the utility of the tool for managing large and varied datasets by integrating data collected by CIMMYT in maize drought tolerance research with data from public sources. This example will focus on one of the visualization features for Quantitative Trait Locus (QTL) data, using likelihood ratio (LR) files produced by generic QTL analysis software and displaying the data in a unique visual manner across different combinations of traits, environments and crosses. Once a genomic region of interest has been identified, the CMTV can search and display additional QTLs meeting a particular threshold for that region, or other functional data such as sets of differentially expressed genes located in the region; it thus provides an easily used means for organizing and manipulating data sets that have been dynamically integrated under the focus of the researcher's specific hypothesis.

  3. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium

    PubMed Central

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur, amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms. PMID:28706512

  4. Comparative genomics of Toll-like receptor signalling in five species

    PubMed Central

    Jann, Oliver C; King, Annemarie; Corrales, Nestor Lopez; Anderson, Susan I; Jensen, Kirsty; Ait-ali, Tahar; Tang, Haizhou; Wu, Chunhua; Cockett, Noelle E; Archibald, Alan L; Glass, Elizabeth J

    2009-01-01

    Background Over the last decade, several studies have identified quantitative trait loci (QTL) affecting variation of immune related traits in mammals. Recent studies in humans and mice suggest that part of this variation may be caused by polymorphisms in genes involved in Toll-like receptor (TLR) signalling. In this project, we used a comparative approach to investigate the importance of TLR-related genes in comparison with other immunologically relevant genes for resistance traits in five species by associating their genomic location with previously published immune-related QTL regions. Results We report the genomic localisation of TLR1-10 and ten associated signalling molecules in sheep and pig using in-silico and/or radiation hybrid (RH) mapping techniques and compare their positions with their annotated homologues in the human, cattle and mouse whole genome sequences. We also report medium-density RH maps for porcine chromosomes 8 and 13. A comparative analysis of the positions of previously published relevant QTLs allowed the identification of homologous regions that are associated with similar health traits in several species and which contain TLR related and other immunologically relevant genes. Additional evidence was gathered by examining relevant gene expression and association studies. Conclusion This comparative genomic approach identified eight genes as potentially causative genes for variations of health related traits. These include susceptibility to clinical mastitis in dairy cattle, general disease resistance in sheep, cattle, humans and mice, and tolerance to protozoan infection in cattle and mice. Four TLR-related genes (TLR1, 6, MyD88, IRF3) appear to be the most likely candidate genes underlying QTL regions which control the resistance to the same or similar pathogens in several species. Further studies are required to investigate the potential role of polymorphisms within these genes. PMID:19432955

  5. Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

    PubMed Central

    Khedkar, Supriya; Seshasayee, Aswin Sai Narain

    2016-01-01

    Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. PMID:27172194

  6. Gramene 2013: Comparative plant genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...

  7. Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana

    2012-03-27

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to themore » un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent

  8. Reference-Free Comparative Genomics of 174 Chloroplasts

    PubMed Central

    Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.

    2012-01-01

    Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied

  9. Pernicious plans revealed: Plasmodium falciparum genome wide expression analysis.

    PubMed

    Llinás, Manuel; DeRisi, Joseph L

    2004-08-01

    The asexual intraerythrocytic developmental cycle (IDC) of Plasmodium falciparum is responsible for the majority of the clinical manifestations of malaria in humans. Although malaria has been studied for over a century, the elucidation of the full genome sequence of P. falciparum has now allowed for in-depth studies of gene expression throughout the entire intraerythrocytic stage. As the mainstays of anti-malarial chemotherapy become increasingly ineffective, we need a deeper understanding of fundamental plasmodial bioregulatory mechanisms to successfully subvert them. Recent gene expression studies have begun to examine different aspects of the IDC and are providing key insights into the basic mechanisms of Plasmodium gene regulation and are helping to define gene functions. However, to date, no transcription factor has been fully characterized from Plasmodium and the definitive identification of cis-acting regulatory elements along with their corresponding trans-acting partners is still lacking. The characterization of the transcriptome of P. falciparum is the first major step towards the understanding of the genome wide regulation of gene expression in this parasite. IDC expression data for almost every gene in the P. falciparum genome can now be publicly queried at and. The results of these studies suggest promising leads for identifying novel targets for anti-malarial therapeutics and vaccines in addition to providing a solid foundation for the ongoing elucidation of plasmodial gene expression.

  10. Genome-wide expression profiling in pediatric septic shock

    PubMed Central

    Wong, Hector R.

    2013-01-01

    For nearly a decade, our research group has had the privilege of developing and mining a multi-center, microarray-based, genome-wide expression database of critically ill children (≤ 10 years of age) with septic shock. Using bioinformatic and systems biology approaches, the expression data generated through this discovery-oriented, exploratory approach have been leveraged for a variety of objectives, which will be reviewed. Fundamental observations include wide spread repression of gene programs corresponding to the adaptive immune system, and biologically significant differential patterns of gene expression across developmental age groups. The data have also identified gene expression-based subclasses of pediatric septic shock having clinically relevant phenotypic differences. The data have also been leveraged for the discovery of novel therapeutic targets, and for the discovery and development of novel stratification and diagnostic biomarkers. Almost a decade of genome-wide expression profiling in pediatric septic shock is now demonstrating tangible results. The studies have progressed from an initial discovery-oriented and exploratory phase, to a new phase where the data are being translated and applied to address several areas of clinical need. PMID:23329198

  11. Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

    PubMed

    Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

    2017-05-22

    Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.

  12. PGSB PlantsDB: updates to the database framework for comparative plant genome research.

    PubMed

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C; Martis, Mihaela M; Seidel, Michael; Kugler, Karl G; Gundlach, Heidrun; Mayer, Klaus F X

    2016-01-04

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Comparative analysis of prophages in Streptococcus mutans genomes

    PubMed Central

    Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

    2017-01-01

    Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986

  14. Integrated analysis of copy number alteration and RNA expression profiles of cancer using a high-resolution whole-genome oligonucleotide array.

    PubMed

    Jung, Seung-Hyun; Shin, Seung-Hun; Yim, Seon-Hee; Choi, Hye-Sun; Lee, Sug-Hyung; Chung, Yeun-Jun

    2009-07-31

    Recently, microarray-based comparative genomic hybridization (array-CGH) has emerged as a very efficient technology with higher resolution for the genome-wide identification of copy number alterations (CNA). Although CNAs are thought to affect gene expression, there is no platform currently available for the integrated CNA-expression analysis. To achieve high-resolution copy number analysis integrated with expression profiles, we established human 30k oligoarray-based genome-wide copy number analysis system and explored the applicability of this system for integrated genome and transcriptome analysis using MDA-MB-231 cell line. We compared the CNAs detected by the oligoarray with those detected by the 3k BAC array for validation. The oligoarray identified the single copy difference more accurately and sensitively than the BAC array. Seventeen CNAs detected by both platforms in MDA-MB-231 such as gains of 5p15.33-13.1, 8q11.22-8q21.13, 17p11.2, and losses of 1p32.3, 8p23.3-8p11.21, and 9p21 were consistently identified in previous studies on breast cancer. There were 122 other small CNAs (mean size 1.79 mb) that were detected by oligoarray only, not by BAC-array. We performed genomic qPCR targeting 7 CNA regions, detected by oligoarray only, and one non-CNA region to validate the oligoarray CNA detection. All qPCR results were consistent with the oligoarray-CGH results. When we explored the possibility of combined interpretation of both DNA copy number and RNA expression profiles, mean DNA copy number and RNA expression levels showed a significant correlation. In conclusion, this 30k oligoarray-CGH system can be a reasonable choice for analyzing whole genome CNAs and RNA expression profiles at a lower cost.

  15. Sequencing and comparing whole mitochondrial genomes ofanimals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based onmore » our experiences to date with determining and comparing complete mtDNA sequences.« less

  16. Genomic resources for songbird research and their use in characterizing gene expression during brain development

    PubMed Central

    Li, XiaoChing; Wang, Xiu-Jie; Tannenhauser, Jonathan; Podell, Sheila; Mukherjee, Piali; Hertel, Moritz; Biane, Jeremy; Masuda, Shoko; Nottebohm, Fernando; Gaasterland, Terry

    2007-01-01

    Vocal learning and neuronal replacement have been studied extensively in songbirds, but until recently, few molecular and genomic tools for songbird research existed. Here we describe new molecular/genomic resources developed in our laboratory. We made cDNA libraries from zebra finch (Taeniopygia guttata) brains at different developmental stages. A total of 11,000 cDNA clones from these libraries, representing 5,866 unique gene transcripts, were randomly picked and sequenced from the 3′ ends. A web-based database was established for clone tracking, sequence analysis, and functional annotations. Our cDNA libraries were not normalized. Sequencing ESTs without normalization produced many developmental stage-specific sequences, yielding insights into patterns of gene expression at different stages of brain development. In particular, the cDNA library made from brains at posthatching day 30–50, corresponding to the period of rapid song system development and song learning, has the most diverse and richest set of genes expressed. We also identified five microRNAs whose sequences are highly conserved between zebra finch and other species. We printed cDNA microarrays and profiled gene expression in the high vocal center of both adult male zebra finches and canaries (Serinus canaria). Genes differentially expressed in the high vocal center were identified from the microarray hybridization results. Selected genes were validated by in situ hybridization. Networks among the regulated genes were also identified. These resources provide songbird biologists with tools for genome annotation, comparative genomics, and microarray gene expression analysis. PMID:17426146

  17. The dog genome map and its use in mammalian comparative genomics.

    PubMed

    Switonski, Marek; Szczerbal, Izabela; Nowacka, Joanna

    2004-01-01

    The dog genome organization was extensively studied in the last ten years. The most important achievements are the well-developed marker genome maps, including over 3200 marker loci, and a survey of the DNA genome sequence. This knowledge, along with the most advanced map of the human genome, turned out to be very useful in comparative genomic studies. On the one hand, it has promoted the development of marker genome maps of other species of the family Canidae (red fox, arctic fox, Chinese raccoon dog) as well as studies on the evolution of their karyotype. But the most important approach is the comparative analysis of human and canine hereditary diseases. At present, causative gene mutations are known for 30 canine hereditary diseases. A majority of them have human counterparts with similar clinical and molecular features. Studies on identification of genes having a major impact on some multifactorial diseases (hip dysplasia, epilepsy) and cancers (multifocal renal cystadenocarcinoma and nodular dermatofibrosis) are advanced. Very promising are the results of gene therapy for certain canine monogenic diseases (haemophilia, hereditary retinal dystrophy, mucopolysaccharidosis), which have human equivalents. The above-mentioned examples prove a very important model role of the dog in studies of human genetic diseases. On the other hand, the identification of gene mutations responsible for hereditary diseases has a substantial impact on breeding strategy in the dog.

  18. Comparative functional genomics of salt stress in related model and cultivated plants identifies and overcomes limitations to translational genomics.

    PubMed

    Sanchez, Diego H; Pieckenstain, Fernando L; Szymanski, Jedrzey; Erban, Alexander; Bromke, Mariusz; Hannah, Matthew A; Kraemer, Ute; Kopka, Joachim; Udvardi, Michael K

    2011-02-14

    One of the objectives of plant translational genomics is to use knowledge and genes discovered in model species to improve crops. However, the value of translational genomics to plant breeding, especially for complex traits like abiotic stress tolerance, remains uncertain. Using comparative genomics (ionomics, transcriptomics and metabolomics) we analyzed the responses to salinity of three model and three cultivated species of the legume genus Lotus. At physiological and ionomic levels, models responded to salinity in a similar way to crop species, and changes in the concentration of shoot Cl(-) correlated well with tolerance. Metabolic changes were partially conserved, but divergence was observed amongst the genotypes. Transcriptome analysis showed that about 60% of expressed genes were responsive to salt treatment in one or more species, but less than 1% was responsive in all. Therefore, genotype-specific transcriptional and metabolic changes overshadowed conserved responses to salinity and represent an impediment to simple translational genomics. However, 'triangulation' from multiple genotypes enabled the identification of conserved and tolerant-specific responses that may provide durable tolerance across species.

  19. IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

    PubMed Central

    Lee, Wonhoon; Park, Jongsun; Choi, Jaeyoung; Jung, Kyongyong; Park, Bongsoo; Kim, Donghan; Lee, Jaeyoung; Ahn, Kyohun; Song, Wonho; Kang, Seogchan; Lee, Yong-Hwan; Lee, Seunghwan

    2009-01-01

    Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site . PMID:19351385

  20. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus.

    PubMed

    Li, Fagen; Zhou, Changpin; Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10-56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa.

  1. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus

    PubMed Central

    Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10–56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa. PMID:26695430

  2. A genome-wide 20 K citrus microarray for gene expression analysis

    PubMed Central

    Martinez-Godoy, M Angeles; Mauri, Nuria; Juarez, Jose; Marques, M Carmen; Santiago, Julia; Forment, Javier; Gadea, Jose

    2008-01-01

    Background Understanding of genetic elements that contribute to key aspects of citrus biology will impact future improvements in this economically important crop. Global gene expression analysis demands microarray platforms with a high genome coverage. In the last years, genome-wide EST collections have been generated in citrus, opening the possibility to create new tools for functional genomics in this crop plant. Results We have designed and constructed a publicly available genome-wide cDNA microarray that include 21,081 putative unigenes of citrus. As a functional companion to the microarray, a web-browsable database [1] was created and populated with information about the unigenes represented in the microarray, including cDNA libraries, isolated clones, raw and processed nucleotide and protein sequences, and results of all the structural and functional annotation of the unigenes, like general description, BLAST hits, putative Arabidopsis orthologs, microsatellites, putative SNPs, GO classification and PFAM domains. We have performed a Gene Ontology comparison with the full set of Arabidopsis proteins to estimate the genome coverage of the microarray. We have also performed microarray hybridizations to check its usability. Conclusion This new cDNA microarray replaces the first 7K microarray generated two years ago and allows gene expression analysis at a more global scale. We have followed a rational design to minimize cross-hybridization while maintaining its utility for different citrus species. Furthermore, we also provide access to a website with full structural and functional annotation of the unigenes represented in the microarray, along with the ability to use this site to directly perform gene expression analysis using standard tools at different publicly available servers. Furthermore, we show how this microarray offers a good representation of the citrus genome and present the usefulness of this genomic tool for global studies in citrus by using it to

  3. Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

    PubMed

    Khedkar, Supriya; Seshasayee, Aswin Sai Narain

    2016-06-01

    Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. Copyright © 2016 Khedkar and Seshasayee.

  4. Soybean seed extracts preferentially express genomic loci of Bradyrhizobium japonicum in the initial interaction with soybean, Glycine max (L.) Merr.

    PubMed

    Wei, Min; Yokoyama, Tadashi; Minamisawa, Kiwamu; Mitsui, Hisayuki; Itakura, Manabu; Kaneko, Takakazu; Tabata, Satoshi; Saeki, Kazuhiko; Omori, Hirofumi; Tajima, Shigeyuki; Uchiumi, Toshiki; Abe, Mikiko; Ohwada, Takuji

    2008-08-01

    Initial interaction between rhizobia and legumes actually starts via encounters of both partners in the rhizosphere. In this study, the global expression profiles of Bradyrhizobium japonicum USDA 110 in response to soybean (Glycine max) seed extracts (SSE) and genistein, a major soybean-released isoflavone for nod genes induction of B. japonicum, were compared. SSE induced many genomic loci as compared with genistein (5.0 microM), nevertheless SSE-supplemented medium contained 4.7 microM genistein. SSE markedly induced four predominant genomic regions within a large symbiosis island (681 kb), which include tts genes (type III secretion system) and various nod genes. In addition, SSE-treated cells expressed many genomic loci containing genes for polygalacturonase (cell-wall degradation), exopolysaccharide synthesis, 1-aminocyclopropane-1-carboxylate deaminase, ribosome proteins family and energy metabolism even outside symbiosis island. On the other hand, genistein-treated cells exclusively showed one expression cluster including common nod gene operon within symbiosis island and six expression loci including multidrug resistance, which were shared with SSE-treated cells. Twelve putatively regulated genes were indeed validated by quantitative RT-PCR. Several SSE-induced genomic loci likely participate in the initial interaction with legumes. Thus, these results can provide a basic knowledge for screening novel genes relevant to the B. japonicum- soybean symbiosis.

  5. Gramene database: navigating plant comparative genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  6. Leveraging the rice genome sequence for monocot comparative and translational genomics.

    PubMed

    Lohithaswa, H C; Feltus, F A; Singh, H P; Bacon, C D; Bailey, C D; Paterson, A H

    2007-07-01

    Common genome anchor points across many taxa greatly facilitate translational and comparative genomics and will improve our understanding of the Tree of Life. To add to the repertoire of genomic tools applicable to the study of monocotyledonous plants in general, we aligned Allium and Musa ESTs to Oryza BAC sequences and identified candidate Allium-Oryza and Musa-Oryza conserved intron-scanning primers (CISPs). A random sampling of 96 CISP primer pairs, representing loci from 11 of the 12 chromosomes in rice, were tested on seven members of the order Poales and on representatives of the Arecales, Asparagales, and Zingiberales monocot orders. The single-copy amplification success rates of Allium (31.3%), Cynodon (31.4%), Hordeum (30.2%), Musa (37.5%), Oryza (61.5%), Pennisetum (33.3%), Sorghum (47.9%), Zea (33.3%), Triticum (30.2%), and representatives of the palm family (32.3%) suggest that subsets of these primers will provide DNA markers suitable for comparative and translational genomics in orphan crops, as well as for applications in conservation biology, ecology, invasion biology, population biology, systematic biology, and related fields.

  7. Identification of DNA Methyltransferase Genes in Human Pathogenic Bacteria by Comparative Genomics.

    PubMed

    Brambila-Tapia, Aniel Jessica Leticia; Poot-Hernández, Augusto Cesar; Perez-Rueda, Ernesto; Rodríguez-Vázquez, Katya

    2016-06-01

    DNA methylation plays an important role in gene expression and virulence in some pathogenic bacteria. In this report, we describe DNA methyltransferases (MTases) present in human pathogenic bacteria and compared them with related species, which are not pathogenic or less pathogenic, based in comparative genomics. We performed a search in the KEGG database of the KEGG database orthology groups associated with adenine and cytosine DNA MTase activities (EC: 2.1.1.37, EC: 2.1.1.113 and EC: 2.1.1.72) in 37 human pathogenic species and 18 non/less pathogenic relatives and performed comparisons of the number of these MTases sequences according to their genome size, the DNA MTase type and with their non-less pathogenic relatives. We observed that Helicobacter pylori and Neisseria spp. presented the highest number of MTases while ten different species did not present a predicted DNA MTase. We also detected a significant increase of adenine MTases over cytosine MTases (2.19 vs. 1.06, respectively, p < 0.001). Adenine MTases were the only MTases associated with restriction modification systems and DNA MTases associated with type I restriction modification systems were more numerous than those associated with type III restriction modification systems (0.84 vs. 0.17, p < 0.001); additionally, there was no correlation with the genome size and the total number of DNA MTases, indicating that the number of DNA MTases is related to the particular evolution and lifestyle of specific species, regulating the expression of virulence genes in some pathogenic bacteria.

  8. Alignment of Common Wheat and Other Grass Genomes Establishes a Comparative Genomics Research Platform

    PubMed Central

    Sun, Sangrong; Wang, Jinpeng; Yu, Jigao; Meng, Fanbo; Xia, Ruiyan; Wang, Li; Wang, Zhenyi; Ge, Weina; Liu, Xiaojian; Li, Yuxian; Liu, Yinzhe; Yang, Nanshan; Wang, Xiyin

    2017-01-01

    Grass genomes are complicated structures as they share a common tetraploidization, and particular genomes have been further affected by extra polyploidizations. These events and the following genomic re-patternings have resulted in a complex, interweaving gene homology both within a genome, and between genomes. Accurately deciphering the structure of these complicated plant genomes would help us better understand their compositional and functional evolution at multiple scales. Here, we build on our previous research by performing a hierarchical alignment of the common wheat genome vis-à-vis eight other sequenced grass genomes with most up-to-date assemblies, and annotations. With this data, we constructed a list of the homologous genes, and then, in a layer-by-layer process, separated their orthology, and paralogy that were established by speciations and recursive polyploidizations, respectively. Compared with the other grasses, the far fewer collinear outparalogous genes within each of three subgenomes of common wheat suggest that homoeologous recombination, and genomic fractionation should have occurred after its formation. In sum, this work contributes to the establishment of an important and timely comparative genomics platform for researchers in the grass community and possibly beyond. Homologous gene list can be found in Supplemental material. PMID:28912789

  9. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Genome-wide comparative analysis of four Indian Drosophila species.

    PubMed

    Mohanty, Sujata; Khanna, Radhika

    2017-12-01

    Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.

  11. TEcandidates: Prediction of genomic origin of expressed Transposable Elements using RNA-seq data.

    PubMed

    Valdebenito-Maturana, Braulio; Riadi, Gonzalo

    2018-06-01

    In recent years, Transposable Elements (TEs) have been related to gene regulation. However, estimating the origin of expression of TEs through RNA-seq is complicated by multimapping reads coming from their repetitive sequences. Current approaches that address multimapping reads are focused in expression quantification and not in finding the origin of expression. Addressing the genomic origin of expressed TEs could further aid in understanding the role that TEs might have in the cell. We have developed a new pipeline called TEcandidates, based on de novo transcriptome assembly to assess the instances of TEs being expressed, along with their location, to include in downstream DE analysis. TEcandidates takes as input the RNA-seq data, the genome sequence and the TE annotation file, and returns a list of coordinates of candidate TEs being expressed, the TEs that have been removed, and the genome sequence with removed TEs as masked. This masked genome is suited to include TEs in downstream expression analysis, as the ambiguity of reads coming from TEs is significantly reduced in the mapping step of the analysis. The script which runs the pipeline can be downloaded at http://www.mobilomics.org/tecandidates/downloads or http://github.com/TEcandidates/TEcandidates. griadi@utalca.cl. Supplementary data are available at Bioinformatics online.

  12. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    PubMed

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  13. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  14. Multi-targeted priming for genome-wide gene expression assays.

    PubMed

    Adomas, Aleksandra B; Lopez-Giraldez, Francesc; Clark, Travis A; Wang, Zheng; Townsend, Jeffrey P

    2010-08-17

    Complementary approaches to assaying global gene expression are needed to assess gene expression in regions that are poorly assayed by current methodologies. A key component of nearly all gene expression assays is the reverse transcription of transcribed sequences that has traditionally been performed by priming the poly-A tails on many of the transcribed genes in eukaryotes with oligo-dT, or by priming RNA indiscriminately with random hexamers. We designed an algorithm to find common sequence motifs that were present within most protein-coding genes of Saccharomyces cerevisiae and of Neurospora crassa, but that were not present within their ribosomal RNA or transfer RNA genes. We then experimentally tested whether degenerately priming these motifs with multi-targeted primers improved the accuracy and completeness of transcriptomic assays. We discovered two multi-targeted primers that would prime a preponderance of genes in the genomes of Saccharomyces cerevisiae and Neurospora crassa while avoiding priming ribosomal RNA or transfer RNA. Examining the response of Saccharomyces cerevisiae to nitrogen deficiency and profiling Neurospora crassa early sexual development, we demonstrated that using multi-targeted primers in reverse transcription led to superior performance of microarray profiling and next-generation RNA tag sequencing. Priming with multi-targeted primers in addition to oligo-dT resulted in higher sensitivity, a larger number of well-measured genes and greater power to detect differences in gene expression. Our results provide the most complete and detailed expression profiles of the yeast nitrogen starvation response and N. crassa early sexual development to date. Furthermore, our multi-targeting priming methodology for genome-wide gene expression assays provides selective targeting of multiple sequences and counter-selection against undesirable sequences, facilitating a more complete and precise assay of the transcribed sequences within the genome.

  15. Comparative genomics of grass EST libraries reveals previously uncharacterized splicing events in crop plants.

    PubMed

    Chuang, Trees-Juen; Yang, Min-Yu; Lin, Chuang-Chieh; Hsieh, Ping-Hung; Hung, Li-Yuan

    2015-02-05

    Crop plants such as rice, maize and sorghum play economically-important roles as main sources of food, fuel, and animal feed. However, current genome annotations of crop plants still suffer false-positive predictions; a more comprehensive registry of alternative splicing (AS) events is also in demand. Comparative genomics of crop plants is largely unexplored. We performed a large-scale comparative analysis (ExonFinder) of the expressed sequence tag (EST) library from nine grass plants against three crop genomes (rice, maize, and sorghum) and identified 2,879 previously-unannotated exons (i.e., novel exons) in the three crops. We validated 81% of the tested exons by RT-PCR-sequencing, supporting the effectiveness of our in silico strategy. Evolutionary analysis reveals that the novel exons, comparing with their flanking annotated ones, are generally under weaker selection pressure at the protein level, but under stronger pressure at the RNA level, suggesting that most of the novel exons also represent novel alternatively spliced variants (ASVs). However, we also observed the consistency of evolutionary rates between certain novel exons and their flanking exons, which provided further evidence of their co-occurrence in the transcripts, suggesting that previously-annotated isoforms might be subject to erroneous predictions. Our validation showed that 54% of the tested genes expressed the newly-identified isoforms that contained the novel exons, rather than the previously-annotated isoforms that excluded them. The consistent results were steadily observed across cultivated (Oryza sativa and O. glaberrima) and wild (O. rufipogon and O. nivara) rice species, asserting the necessity of our curation of the crop genome annotations. Our comparative analyses also inferred the common ancestral transcriptome of grass plants and gain- and loss-of-ASV events. We have reannotated the rice, maize, and sorghum genomes, and showed that evolutionary rates might serve as an indicator

  16. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/. © The Author 2014. Published by Oxford University Press on behalf of Japanese Society of Plant Physiologists. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  17. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  18. openSputnik--a database to ESTablish comparative plant genomics using unsaturated sequence collections.

    PubMed

    Rudd, Stephen

    2005-01-01

    The public expressed sequence tag collections are continually being enriched with high-quality sequences that represent an ever-expanding range of taxonomically diverse plant species. While these sequence collections provide biased insight into the populations of expressed genes available within individual species and their associated tissues, the information is conceivably of wider relevance in a comparative context. When we consider the available expressed sequence tag (EST) collections of summer 2004, most of the major plant taxonomic clades are at least superficially represented. Investigation of the five million available plant ESTs provides a wealth of information that has applications in modelling the routes of plant genome evolution and the identification of lineage-specific genes and gene families. Over four million ESTs from over 50 distinct plant species have been collated within an EST analysis pipeline called openSputnik. The ESTs were resolved down into approximately one million unigene sequences. These have been annotated using orthology-based annotation transfer from reference plant genomes and using a variety of contemporary bioinformatics methods to assign peptide, structural and functional attributes. The openSputnik database is available at http://sputnik.btk.fi.

  19. Analysis of Epstein-Barr Virus Genomes and Expression Profiles in Gastric Adenocarcinoma.

    PubMed

    Borozan, Ivan; Zapatka, Marc; Frappier, Lori; Ferretti, Vincent

    2018-01-15

    Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis. IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the

  20. Comparative genomics of Lactobacillus

    PubMed Central

    Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

    2011-01-01

    Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712

  1. Comparative genomics meets topology: a novel view on genome median and halving problems.

    PubMed

    Alexeev, Nikita; Avdeyev, Pavel; Alekseyev, Max A

    2016-11-11

    Genome median and genome halving are combinatorial optimization problems that aim at reconstruction of ancestral genomes by minimizing the number of evolutionary events between them and genomes of the extant species. While these problems have been widely studied in past decades, their solutions are often either not efficient or not biologically adequate. These shortcomings have been recently addressed by restricting the problems solution space. We show that the restricted variants of genome median and halving problems are, in fact, closely related. We demonstrate that these problems have a neat topological interpretation in terms of embedded graphs and polygon gluings. We illustrate how such interpretation can lead to solutions to these problems in particular cases. This study provides an unexpected link between comparative genomics and topology, and demonstrates advantages of solving genome median and halving problems within the topological framework.

  2. Comparative Genomics and Host Resistance against Infectious Diseases

    PubMed Central

    Qureshi, Salman T.; Skamene, Emil

    1999-01-01

    The large size and complexity of the human genome have limited the identification and functional characterization of components of the innate immune system that play a critical role in front-line defense against invading microorganisms. However, advances in genome analysis (including the development of comprehensive sets of informative genetic markers, improved physical mapping methods, and novel techniques for transcript identification) have reduced the obstacles to discovery of novel host resistance genes. Study of the genomic organization and content of widely divergent vertebrate species has shown a remarkable degree of evolutionary conservation and enables meaningful cross-species comparison and analysis of newly discovered genes. Application of comparative genomics to host resistance will rapidly expand our understanding of human immune defense by facilitating the translation of knowledge acquired through the study of model organisms. We review the rationale and resources for comparative genomic analysis and describe three examples of host resistance genes successfully identified by this approach. PMID:10081670

  3. p53 shapes genome-wide and cell type-specific changes in microRNA expression during the human DNA damage response.

    PubMed

    Hattori, Hiroyoshi; Janky, Rekin's; Nietfeld, Wilfried; Aerts, Stein; Madan Babu, M; Venkitaraman, Ashok R

    2014-01-01

    The human DNA damage response (DDR) triggers profound changes in gene expression, whose nature and regulation remain uncertain. Although certain micro-(mi)RNA species including miR34, miR-18, miR-16 and miR-143 have been implicated in the DDR, there is as yet no comprehensive description of genome-wide changes in the expression of miRNAs triggered by DNA breakage in human cells. We have used next-generation sequencing (NGS), combined with rigorous integrative computational analyses, to describe genome-wide changes in the expression of miRNAs during the human DDR. The changes affect 150 of 1523 miRNAs known in miRBase v18 from 4-24 h after the induction of DNA breakage, in cell-type dependent patterns. The regulatory regions of the most-highly regulated miRNA species are enriched in conserved binding sites for p53. Indeed, genome-wide changes in miRNA expression during the DDR are markedly altered in TP53-/- cells compared to otherwise isogenic controls. The expression levels of certain damage-induced, p53-regulated miRNAs in cancer samples correlate with patient survival. Our work reveals genome-wide and cell type-specific alterations in miRNA expression during the human DDR, which are regulated by the tumor suppressor protein p53. These findings provide a genomic resource to identify new molecules and mechanisms involved in the DDR, and to examine their role in tumor suppression and the clinical outcome of cancer patients.

  4. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  5. Digital gene expression analysis of the zebra finch genome

    PubMed Central

    2010-01-01

    Background In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates

  6. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

    PubMed Central

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-01-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  7. Low-pass sequencing for microbial comparative genomics

    PubMed Central

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

    2004-01-01

    Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich

  8. Cognitive Endophenotypes Inform Genome-Wide Expression Profiling in Schizophrenia

    PubMed Central

    Zheutlin, Amanda B.; Viehman, Rachael W.; Fortgang, Rebecca; Borg, Jacqueline; Smith, Desmond J.; Suvisaari, Jaana; Therman, Sebastian; Hultman, Christina M.; Cannon, Tyrone D.

    2015-01-01

    OBJECTIVE We performed a whole-genome expression study to clarify the nature of the biological processes mediating between inherited genetic variations and cognitive dysfunction in schizophrenia. METHOD Gene expression was assayed from peripheral blood mononuclear cells using Illumina Human WG6 v3.0 chips in twins discordant for schizophrenia or bipolar disorder and control twins. After quality control, expression levels of 18,559 genes were screened for association with California Verbal Learning Test (CVLT) performance, and any memory-related probes were then evaluated for variation by diagnostic status in the discovery sample (N = 190), and in an independent replication sample (N = 73). Heritability of gene expression using the twin design was also assessed. RESULTS After Bonferroni correction (p < 2.69 × 10−6), CVLT performance was significantly related to expression levels for 76 genes, 43 of which were differentially expressed in schizophrenia patients, with comparable effect sizes in the same direction in the replication sample. For 41 of these 43 transcripts, expression levels were heritable. Nearly all identified genes contain common or de novo mutations associated with schizophrenia in prior studies. CONCLUSION Genes increasing risk for schizophrenia appear to do so in part via effects on signaling cascades influencing memory. The genes implicated in these processes are enriched for those related to RNA processing and DNA replication and include genes influencing G-protein coupled signal transduction, cytokine signaling, and oligodendrocyte function. PMID:26710095

  9. Cognitive endophenotypes inform genome-wide expression profiling in schizophrenia.

    PubMed

    Zheutlin, Amanda B; Viehman, Rachael W; Fortgang, Rebecca; Borg, Jacqueline; Smith, Desmond J; Suvisaari, Jaana; Therman, Sebastian; Hultman, Christina M; Cannon, Tyrone D

    2016-01-01

    We performed a whole-genome expression study to clarify the nature of the biological processes mediating between inherited genetic variations and cognitive dysfunction in schizophrenia. Gene expression was assayed from peripheral blood mononuclear cells using Illumina Human WG6 v3.0 chips in twins discordant for schizophrenia or bipolar disorder and control twins. After quality control, expression levels of 18,559 genes were screened for association with the California Verbal Learning Test (CVLT) performance, and any memory-related probes were then evaluated for variation by diagnostic status in the discovery sample (N = 190), and in an independent replication sample (N = 73). Heritability of gene expression using the twin design was also assessed. After Bonferroni correction (p < 2.69 × 10-6), CVLT performance was significantly related to expression levels for 76 genes, 43 of which were differentially expressed in schizophrenia patients, with comparable effect sizes in the same direction in the replication sample. For 41 of these 43 transcripts, expression levels were heritable. Nearly all identified genes contain common or de novo mutations associated with schizophrenia in prior studies. Genes increasing risk for schizophrenia appear to do so in part via effects on signaling cascades influencing memory. The genes implicated in these processes are enriched for those related to RNA processing and DNA replication and include genes influencing G-protein coupled signal transduction, cytokine signaling, and oligodendrocyte function. (c) 2015 APA, all rights reserved).

  10. Inferring the Minimal Genome of Mesoplasma florum by Comparative Genomics and Transposon Mutagenesis.

    PubMed

    Baby, Vincent; Lachance, Jean-Christophe; Gagnon, Jules; Lucier, Jean-François; Matteau, Dominick; Knight, Tom; Rodrigue, Sébastien

    2018-01-01

    The creation and comparison of minimal genomes will help better define the most fundamental mechanisms supporting life. Mesoplasma florum is a near-minimal, fast-growing, nonpathogenic bacterium potentially amenable to genome reduction efforts. In a comparative genomic study of 13 M. florum strains, including 11 newly sequenced genomes, we have identified the core genome and open pangenome of this species. Our results show that all of the strains have approximately 80% of their gene content in common. Of the remaining 20%, 17% of the genes were found in multiple strains and 3% were unique to any given strain. On the basis of random transposon mutagenesis, we also estimated that ~290 out of 720 genes are essential for M. florum L1 in rich medium. We next evaluated different genome reduction scenarios for M. florum L1 by using gene conservation and essentiality data, as well as comparisons with the first working approximation of a minimal organism, Mycoplasma mycoides JCVI-syn3.0. Our results suggest that 409 of the 473 M. mycoides JCVI-syn3.0 genes have orthologs in M. florum L1. Conversely, 57 putatively essential M. florum L1 genes have no homolog in M. mycoides JCVI-syn3.0. This suggests differences in minimal genome compositions, even for these evolutionarily closely related bacteria. IMPORTANCE The last years have witnessed the development of whole-genome cloning and transplantation methods and the complete synthesis of entire chromosomes. Recently, the first minimal cell, Mycoplasma mycoides JCVI-syn3.0, was created. Despite these milestone achievements, several questions remain to be answered. For example, is the composition of minimal genomes virtually identical in phylogenetically related species? On the basis of comparative genomics and transposon mutagenesis, we investigated this question by using an alternative model, Mesoplasma florum, that is also amenable to genome reduction efforts. Our results suggest that the creation of additional minimal

  11. COMPARISON OF COMPARATIVE GENOMIC HYBRIDIZATIONS TECHNOLOGIES ACROSS MICROARRAY PLATFORMS

    EPA Science Inventory

    Comparative Genomic Hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase...

  12. Genome wide transcriptome profiling reveals differential gene expression in secondary metabolite pathway of Cymbopogon winterianus.

    PubMed

    Devi, Kamalakshi; Mishra, Surajit K; Sahu, Jagajjit; Panda, Debashis; Modi, Mahendra K; Sen, Priyabrata

    2016-02-15

    Advances in transcriptome sequencing provide fast, cost-effective and reliable approach to generate large expression datasets especially suitable for non-model species to identify putative genes, key pathway and regulatory mechanism. Citronella (Cymbopogon winterianus) is an aromatic medicinal grass used for anti-tumoral, antibacterial, anti-fungal, antiviral, detoxifying and natural insect repellent properties. Despite of having number of utilities, the genes involved in terpenes biosynthetic pathway is not yet clearly elucidated. The present study is a pioneering attempt to generate an exhaustive molecular information of secondary metabolite pathway and to increase genomic resources in Citronella. Using high-throughput RNA-Seq technology, root and leaf transcriptome was analysed at an unprecedented depth (11.7 Gb). Targeted searches identified majority of the genes associated with metabolic pathway and other natural product pathway viz. antibiotics synthesis along with many novel genes. Terpenoid biosynthesis genes comparative expression results were validated for 15 unigenes by RT-PCR and qRT-PCR. Thus the coverage of these transcriptome is comprehensive enough to discover all known genes of major metabolic pathways. This transcriptome dataset can serve as important public information for gene expression, genomics and function genomics studies in Citronella and shall act as a benchmark for future improvement of the crop.

  13. Comparative genomics of biotechnologically important yeasts

    PubMed Central

    Riley, Robert; Haridas, Sajeet; Wolfe, Kenneth H.; Lopes, Mariana R.; Hittinger, Chris Todd; Göker, Markus; Salamov, Asaf A.; Wisecaver, Jennifer H.; Long, Tanya M.; Aerts, Andrea L.; Barry, Kerrie W.; Choi, Cindy; Clum, Alicia; Coughlan, Aisling Y.; Deshpande, Shweta; Douglass, Alexander P.; Hanson, Sara J.; Klenk, Hans-Peter; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lipzen, Anna M.; Meier-Kolthoff, Jan P.; Ohm, Robin A.; Otillar, Robert P.; Pangilinan, Jasmyn L.; Peng, Yi; Rosa, Carlos A.; Scheuner, Carmen; Sibirny, Andriy A.; Slot, Jason C.; Stielow, J. Benjamin; Sun, Hui; Kurtzman, Cletus P.; Blackwell, Meredith; Grigoriev, Igor V.

    2016-01-01

    Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the clade sister to the known CUG-Ser clade. Our well-resolved yeast phylogeny shows that some traits, such as methylotrophy, are restricted to single clades, whereas others, such as l-rhamnose utilization, have patchy phylogenetic distributions. Gene clusters, with variable organization and distribution, encode many pathways of interest. Genomics can predict some biochemical traits precisely, but the genomic basis of others, such as xylose utilization, remains unresolved. Our data also provide insight into early evolution of ascomycetes. We document the loss of H3K9me2/3 heterochromatin, the origin of ascomycete mating-type switching, and panascomycete synteny at the MAT locus. These data and analyses will facilitate the engineering of efficient biosynthetic and degradative pathways and gateways for genomic manipulation. PMID:27535936

  14. Ancient Duplications and Expression Divergence in the Globin Gene Superfamily of Vertebrates: Insights from the Elephant Shark Genome and Transcriptome

    PubMed Central

    Opazo, Juan C.; Toloza-Villalobos, Jessica; Burmester, Thorsten; Venkatesh, Byrappa; Storz, Jay F.

    2015-01-01

    Comparative analyses of vertebrate genomes continue to uncover a surprising diversity of genes in the globin gene superfamily, some of which have very restricted phyletic distributions despite their antiquity. Genomic analysis of the globin gene repertoire of cartilaginous fish (Chondrichthyes) should be especially informative about the duplicative origins and ancestral functions of vertebrate globins, as divergence between Chondrichthyes and bony vertebrates represents the most basal split within the jawed vertebrates. Here, we report a comparative genomic analysis of the vertebrate globin gene family that includes the complete globin gene repertoire of the elephant shark (Callorhinchus milii). Using genomic sequence data from representatives of all major vertebrate classes, integrated analyses of conserved synteny and phylogenetic relationships revealed that the last common ancestor of vertebrates possessed a repertoire of at least seven globin genes: single copies of androglobin and neuroglobin, four paralogous copies of globin X, and the single-copy progenitor of the entire set of vertebrate-specific globins. Combined with expression data, the genomic inventory of elephant shark globins yielded four especially surprising findings: 1) there is no trace of the neuroglobin gene (a highly conserved gene that is present in all other jawed vertebrates that have been examined to date), 2) myoglobin is highly expressed in heart, but not in skeletal muscle (reflecting a possible ancestral condition in vertebrates with single-circuit circulatory systems), 3) elephant shark possesses two highly divergent globin X paralogs, one of which is preferentially expressed in gonads, and 4) elephant shark possesses two structurally distinct α-globin paralogs, one of which is preferentially expressed in the brain. Expression profiles of elephant shark globin genes reveal distinct specializations of function relative to orthologs in bony vertebrates and suggest hypotheses about

  15. Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species.

    PubMed

    Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V

    2017-09-30

    Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.

  16. Comparative Genomics of Non-TNL Disease Resistance Genes from Six Plant Species

    PubMed Central

    Andersen, Ethan J.; Neupane, Surendra; Benson, Benjamin V.

    2017-01-01

    Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis, we investigated nTNL orthologs in the genomes of common bean, Medicago, soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis, common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence. PMID:28973974

  17. Microarray-based genomic profiling reveals novel genomic aberrations in follicular lymphoma which associate with patient survival and gene expression status.

    PubMed

    Schwaenen, Carsten; Viardot, Andreas; Berger, Hilmar; Barth, Thomas F E; Bentink, Stefan; Döhner, Hartmut; Enz, Martina; Feller, Alfred C; Hansmann, Martin-Leo; Hummel, Michael; Kestler, Hans A; Klapper, Wolfram; Kreuz, Markus; Lenze, Dido; Loeffler, Markus; Möller, Peter; Müller-Hermelink, Hans-Konrad; Ott, German; Rosolowski, Maciej; Rosenwald, Andreas; Ruf, Sandra; Siebert, Reiner; Spang, Rainer; Stein, Harald; Truemper, Lorenz; Lichter, Peter; Bentz, Martin; Wessendorf, Swen

    2009-01-01

    Follicular lymphoma (FL) is characterized by a large number of chromosomal aberrations. However, their exact genomic extension and involved target genes remain to be determined. For this purpose, we used array-based intermediate-high resolution genomic profiling in combination with Affymetrix gene expression analysis. Tumor specimens from 128 FL patients were analyzed for the presence of genomic aberrations and the results were correlated to clinical data sets and mRNA expression levels. In 114 (89%) of the 128 analyzed cases, a total of 688 genomic aberrations (384 gains/amplifications and 304 losses) were detected. Frequent genomic aberrations were: -1p36 (18%), +2p15 (24%), -3q (14%), -6q (25%), +7p (19%), +7q (23%), +8q (14%), -9p (16%), -11q (15%), +12q (20%), -13q (11%), -17p (16%), +18p (18%), and +18q (28%). Critical segments of these imbalances were delineated to genomic fragments with a minimum size down to 0.2 Mb. By comparison of these with mRNA gene expression data, putative candidate genes were identified. Moreover, we found that deletions affecting the tumor suppressor gene CDKN2A/B on 9p21 were detected in nontransformed FL grade I-II. For this aberration as well as for -6q25 and -6q26, an association with inferior survival was observed.

  18. Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

    PubMed

    Li, Sanshu; Breaker, Ronald R

    2017-10-13

    With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs

  19. ZSCAN10 expression corrects the genomic instability of iPSCs from aged donors.

    PubMed

    Skamagki, Maria; Correia, Cristina; Yeung, Percy; Baslan, Timour; Beck, Samuel; Zhang, Cheng; Ross, Christian A; Dang, Lam; Liu, Zhong; Giunta, Simona; Chang, Tzu-Pei; Wang, Joye; Ananthanarayanan, Aparna; Bohndorf, Martina; Bosbach, Benedikt; Adjaye, James; Funabiki, Hironori; Kim, Jonghwan; Lowe, Scott; Collins, James J; Lu, Chi-Wei; Li, Hu; Zhao, Rui; Kim, Kitai

    2017-09-01

    Induced pluripotent stem cells (iPSCs), which are used to produce transplantable tissues, may particularly benefit older patients, who are more likely to suffer from degenerative diseases. However, iPSCs generated from aged donors (A-iPSCs) exhibit higher genomic instability, defects in apoptosis and a blunted DNA damage response compared with iPSCs generated from younger donors. We demonstrated that A-iPSCs exhibit excessive glutathione-mediated reactive oxygen species (ROS) scavenging activity, which blocks the DNA damage response and apoptosis and permits survival of cells with genomic instability. We found that the pluripotency factor ZSCAN10 is poorly expressed in A-iPSCs and addition of ZSCAN10 to the four Yamanaka factors (OCT4, SOX2, KLF4 and c-MYC) during A-iPSC reprogramming normalizes ROS-glutathione homeostasis and the DNA damage response, and recovers genomic stability. Correcting the genomic instability of A-iPSCs will ultimately enhance our ability to produce histocompatible functional tissues from older patients' own cells that are safe for transplantation.

  20. ZSCAN10 expression corrects the genomic instability of iPSCs from aged donors

    PubMed Central

    Skamagki, Maria; Correia, Cristina; Yeung, Percy; Baslan, Timour; Beck, Samuel; Zhang, Cheng; Ross, Christian A.; Dang, Lam; Liu, Zhong; Giunta, Simona; Chang, Tzu-Pei; Wang, Joye; Ananthanarayanan, Aparna; Bohndorf, Martina; Bosbach, Benedikt; Adjaye, James; Funabiki, Hironori; Kim, Jonghwan; Lowe, Scott; Collins, James J.; Lu, Chi-Wei; Li, Hu; Zhao, Rui; Kim, Kitai

    2018-01-01

    Induced pluripotent stem cells (iPSCs), which are used to produce transplantable tissues, may particularly benefit older patients, who are more likely to suffer from degenerative diseases. However, iPSCs generated from aged donors (A-iPSCs) exhibit higher genomic instability, defects in apoptosis and a blunted DNA damage response compared with iPSCs generated from younger donors. We demonstrated that A-iPSCs exhibit excessive glutathione-mediated reactive oxygen species (ROS) scavenging activity, which blocks the DNA damage response and apoptosis and permits survival of cells with genomic instability. We found that the pluripotency factor ZSCAN10 is poorly expressed in A-iPSCs and addition of ZSCAN10 to the four Yamanaka factors (OCT4, SOX2, KLF4 and c-MYC) during A-iPSC reprogramming normalizes ROS–glutathione homeostasis and the DNA damage response, and recovers genomic stability. Correcting the genomic instability of A-iPSCs will ultimately enhance our ability to produce histocompatible functional tissues from older patients’ own cells that are safe for transplantation. PMID:28846095

  1. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  2. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  3. Comparative genomics of wild type yeast strains unveils important genome diversity

    PubMed Central

    Carreto, Laura; Eiriz, Maria F; Gomes, Ana C; Pereira, Patrícia M; Schuller, Dorit; Santos, Manuel AS

    2008-01-01

    Background Genome variability generates phenotypic heterogeneity and is of relevance for adaptation to environmental change, but the extent of such variability in natural populations is still poorly understood. For example, selected Saccharomyces cerevisiae strains are variable at the ploidy level, have gene amplifications, changes in chromosome copy number, and gross chromosomal rearrangements. This suggests that genome plasticity provides important genetic diversity upon which natural selection mechanisms can operate. Results In this study, we have used wild-type S. cerevisiae (yeast) strains to investigate genome variation in natural and artificial environments. We have used comparative genome hybridization on array (aCGH) to characterize the genome variability of 16 yeast strains, of laboratory and commercial origin, isolated from vineyards and wine cellars, and from opportunistic human infections. Interestingly, sub-telomeric instability was associated with the clinical phenotype, while Ty element insertion regions determined genomic differences of natural wine fermentation strains. Copy number depletion of ASP3 and YRF1 genes was found in all wild-type strains. Other gene families involved in transmembrane transport, sugar and alcohol metabolism or drug resistance had copy number changes, which also distinguished wine from clinical isolates. Conclusion We have isolated and genotyped more than 1000 yeast strains from natural environments and carried out an aCGH analysis of 16 strains representative of distinct genotype clusters. Important genomic variability was identified between these strains, in particular in sub-telomeric regions and in Ty-element insertion sites, suggesting that this type of genome variability is the main source of genetic diversity in natural populations of yeast. The data highlights the usefulness of yeast as a model system to unravel intraspecific natural genome diversity and to elucidate how natural selection shapes the yeast genome

  4. CMG-biotools, a free workbench for basic comparative microbial genomics.

    PubMed

    Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

    2013-01-01

    Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training.

  5. CMG-Biotools, a Free Workbench for Basic Comparative Microbial Genomics

    PubMed Central

    Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David

    2013-01-01

    Background Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. Results The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. Conclusion This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training. PMID:23577086

  6. Whole genome comparative studies between chicken and turkey and their implications for avian genome evolution

    PubMed Central

    Griffin, Darren K; Robertson, Lindsay B; Tempest, Helen G; Vignal, Alain; Fillon, Valérie; Crooijmans, Richard PMA; Groenen, Martien AM; Deryusheva, Svetlana; Gaginskaya, Elena; Carré, Wilfrid; Waddington, David; Talbot, Richard; Völker, Martin; Masabanda, Julio S; Burt, Dave W

    2008-01-01

    Background Comparative genomics is a powerful means of establishing inter-specific relationships between gene function/location and allows insight into genomic rearrangements, conservation and evolutionary phylogeny. The availability of the complete sequence of the chicken genome has initiated the development of detailed genomic information in other birds including turkey, an agriculturally important species where mapping has hitherto focused on linkage with limited physical information. No molecular study has yet examined conservation of avian microchromosomes, nor differences in copy number variants (CNVs) between birds. Results We present a detailed comparative cytogenetic map between chicken and turkey based on reciprocal chromosome painting and mapping of 338 chicken BACs to turkey metaphases. Two inter-chromosomal changes (both involving centromeres) and three pericentric inversions have been identified between chicken and turkey; and array CGH identified 16 inter-specific CNVs. Conclusion This is the first study to combine the modalities of zoo-FISH and array CGH between different avian species. The first insight into the conservation of microchromosomes, the first comparative cytogenetic map of any bird and the first appraisal of CNVs between birds is provided. Results suggest that avian genomes have remained relatively stable during evolution compared to mammalian equivalents. PMID:18410676

  7. Complete nucleotide sequence of the Cryptomeria japonica D. Don. chloroplast genome and comparative chloroplast genomics: diversified genomic structure of coniferous species.

    PubMed

    Hirao, Tomonori; Watanabe, Atsushi; Kurita, Manabu; Kondo, Teiji; Takata, Katsuhiko

    2008-06-23

    The recent determination of complete chloroplast (cp) genomic sequences of various plant species has enabled numerous comparative analyses as well as advances in plant and genome evolutionary studies. In angiosperms, the complete cp genome sequences of about 70 species have been determined, whereas those of only three gymnosperm species, Cycas taitungensis, Pinus thunbergii, and Pinus koraiensis have been established. The lack of information regarding the gene content and genomic structure of gymnosperm cp genomes may severely hamper further progress of plant and cp genome evolutionary studies. To address this need, we report here the complete nucleotide sequence of the cp genome of Cryptomeria japonica, the first in the Cupressaceae sensu lato of gymnosperms, and provide a comparative analysis of their gene content and genomic structure that illustrates the unique genomic features of gymnosperms. The C. japonica cp genome is 131,810 bp in length, with 112 single copy genes and two duplicated (trnI-CAU, trnQ-UUG) genes that give a total of 116 genes. Compared to other land plant cp genomes, the C. japonica cp has lost one of the relevant large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperms, such as Cycas and Gingko, and additionally has completely lost its trnR-CCG, partially lost its trnT-GGU, and shows diversification of accD. The genomic structure of the C. japonica cp genome also differs significantly from those of other plant species. For example, we estimate that a minimum of 15 inversions would be required to transform the gene organization of the Pinus thunbergii cp genome into that of C. japonica. In the C. japonica cp genome, direct repeat and inverted repeat sequences are observed at the inversion and translocation endpoints, and these sequences may be associated with the genomic rearrangements. The observed differences in genomic structure between C. japonica and other land plants, including pines, strongly support the

  8. Comparative Genomics of a Parthenogenesis-Inducing Wolbachia Symbiont

    PubMed Central

    Lindsey, Amelia R. I.; Werren, John H.; Richards, Stephen; Stouthamer, Richard

    2016-01-01

    Wolbachia is an intracellular symbiont of invertebrates responsible for inducing a wide variety of phenotypes in its host. These host-Wolbachia relationships span the continuum from reproductive parasitism to obligate mutualism, and provide a unique system to study genomic changes associated with the evolution of symbiosis. We present the genome sequence from a parthenogenesis-inducing Wolbachia strain (wTpre) infecting the minute parasitoid wasp Trichogramma pretiosum. The wTpre genome is the most complete parthenogenesis-inducing Wolbachia genome available to date. We used comparative genomics across 16 Wolbachia strains, representing five supergroups, to identify a core Wolbachia genome of 496 sets of orthologous genes. Only 14 of these sets are unique to Wolbachia when compared to other bacteria from the Rickettsiales. We show that the B supergroup of Wolbachia, of which wTpre is a member, contains a significantly higher number of ankyrin repeat-containing genes than other supergroups. In the wTpre genome, there is evidence for truncation of the protein coding sequences in 20% of ORFs, mostly as a result of frameshift mutations. The wTpre strain represents a conversion from cytoplasmic incompatibility to a parthenogenesis-inducing lifestyle, and is required for reproduction in the Trichogramma host it infects. We hypothesize that the large number of coding frame truncations has accompanied the change in reproductive mode of the wTpre strain. PMID:27194801

  9. Comparative Genomics of a Parthenogenesis-Inducing Wolbachia Symbiont.

    PubMed

    Lindsey, Amelia R I; Werren, John H; Richards, Stephen; Stouthamer, Richard

    2016-07-07

    Wolbachia is an intracellular symbiont of invertebrates responsible for inducing a wide variety of phenotypes in its host. These host-Wolbachia relationships span the continuum from reproductive parasitism to obligate mutualism, and provide a unique system to study genomic changes associated with the evolution of symbiosis. We present the genome sequence from a parthenogenesis-inducing Wolbachia strain (wTpre) infecting the minute parasitoid wasp Trichogramma pretiosum The wTpre genome is the most complete parthenogenesis-inducing Wolbachia genome available to date. We used comparative genomics across 16 Wolbachia strains, representing five supergroups, to identify a core Wolbachia genome of 496 sets of orthologous genes. Only 14 of these sets are unique to Wolbachia when compared to other bacteria from the Rickettsiales. We show that the B supergroup of Wolbachia, of which wTpre is a member, contains a significantly higher number of ankyrin repeat-containing genes than other supergroups. In the wTpre genome, there is evidence for truncation of the protein coding sequences in 20% of ORFs, mostly as a result of frameshift mutations. The wTpre strain represents a conversion from cytoplasmic incompatibility to a parthenogenesis-inducing lifestyle, and is required for reproduction in the Trichogramma host it infects. We hypothesize that the large number of coding frame truncations has accompanied the change in reproductive mode of the wTpre strain. Copyright © 2016 Lindsey et al.

  10. Revealing the missing expressed genes beyond the human reference genome by RNA-Seq.

    PubMed

    Chen, Geng; Li, Ruiyuan; Shi, Leming; Qi, Junyi; Hu, Pengzhan; Luo, Jian; Liu, Mingyao; Shi, Tieliu

    2011-12-02

    The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.

  11. cisprimertool: software to implement a comparative genomics strategy for the development of conserved intron scanning (CIS) markers.

    PubMed

    Jayashree, B; Jagadeesh, V T; Hoisington, D

    2008-05-01

    The availability of complete, annotated genomic sequence information in model organisms is a rich resource that can be extended to understudied orphan crops through comparative genomic approaches. We report here a software tool (cisprimertool) for the identification of conserved intron scanning regions using expressed sequence tag alignments to a completely sequenced model crop genome. The method used is based on earlier studies reporting the assessment of conserved intron scanning primers (called CISP) within relatively conserved exons located near exon-intron boundaries from onion, banana, sorghum and pearl millet alignments with rice. The tool is freely available to academic users at http://www.icrisat.org/gt-bt/CISPTool.htm. © 2007 ICRISAT.

  12. EDGAR: A software framework for the comparative analysis of prokaryotic genomes

    PubMed Central

    Blom, Jochen; Albaum, Stefan P; Doppmeier, Daniel; Pühler, Alfred; Vorhölter, Frank-Jörg; Zakrzewski, Martha; Goesmann, Alexander

    2009-01-01

    Background The introduction of next generation sequencing approaches has caused a rapid increase in the number of completely sequenced genomes. As one result of this development, it is now feasible to analyze large groups of related genomes in a comparative approach. A main task in comparative genomics is the identification of orthologous genes in different genomes and the classification of genes as core genes or singletons. Results To support these studies EDGAR – "Efficient Database framework for comparative Genome Analyses using BLAST score Ratios" – was developed. EDGAR is designed to automatically perform genome comparisons in a high throughput approach. Comparative analyses for 582 genomes across 75 genus groups taken from the NCBI genomes database were conducted with the software and the results were integrated into an underlying database. To demonstrate a specific application case, we analyzed ten genomes of the bacterial genus Xanthomonas, for which phylogenetic studies were awkward due to divergent taxonomic systems. The resultant phylogeny EDGAR provided was consistent with outcomes from traditional approaches performed recently and moreover, it was possible to root each strain with unprecedented accuracy. Conclusion EDGAR provides novel analysis features and significantly simplifies the comparative analysis of related genomes. The software supports a quick survey of evolutionary relationships and simplifies the process of obtaining new biological insights into the differential gene content of kindred genomes. Visualization features, like synteny plots or Venn diagrams, are offered to the scientific community through a web-based and therefore platform independent user interface , where the precomputed data sets can be browsed. PMID:19457249

  13. Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis

    PubMed Central

    Chakrabarti, Kausik; Pearson, Michael; Grate, Leslie; Sterne-Weiler, Timothy; Deans, Jonathan; Donohue, John Paul; Ares, Manuel

    2007-01-01

    As the genomes of more eukaryotic pathogens are sequenced, understanding how molecular differences between parasite and host might be exploited to provide new therapies has become a major focus. Central to cell function are RNA-containing complexes involved in gene expression, such as the ribosome, the spliceosome, snoRNAs, RNase P, and telomerase, among others. In this article we identify by comparative genomics and validate by RNA analysis numerous previously unknown structural RNAs encoded by the Plasmodium falciparum genome, including the telomerase RNA, U3, 31 snoRNAs, as well as previously predicted spliceosomal snRNAs, SRP RNA, MRP RNA, and RNAse P RNA. Furthermore, we identify six new RNA coding genes of unknown function. To investigate the relationships of the RNA coding genes to other genomic features in related parasites, we developed a genome browser for P. falciparum (http://areslab.ucsc.edu/cgi-bin/hgGateway). Additional experiments provide evidence supporting the prediction that snoRNAs guide methylation of a specific position on U4 snRNA, as well as predicting an snRNA promoter element particular to Plasmodium sp. These findings should allow detailed structural comparisons between the RNA components of the gene expression machinery of the parasite and its vertebrate hosts. PMID:17901154

  14. Saccharomyces cerevisiae: gene annotation and genome variability, state of the art through comparative genomics.

    PubMed

    Louis, Ed

    2011-01-01

    In the early days of the yeast genome sequencing project, gene annotation was in its infancy and suffered the problem of many false positive annotations as well as missed genes. The lack of other sequences for comparison also prevented the annotation of conserved, functional sequences that were not coding. We are now in an era of comparative genomics where many closely related as well as more distantly related genomes are available for direct sequence and synteny comparisons allowing for more probable predictions of genes and other functional sequences due to conservation. We also have a plethora of functional genomics data which helps inform gene annotation for previously uncharacterised open reading frames (ORFs)/genes. For Saccharomyces cerevisiae this has resulted in a continuous updating of the gene and functional sequence annotations in the reference genome helping it retain its position as the best characterized eukaryotic organism's genome. A single reference genome for a species does not accurately describe the species and this is quite clear in the case of S. cerevisiae where the reference strain is not ideal for brewing or baking due to missing genes. Recent surveys of numerous isolates, from a variety of sources, using a variety of technologies have revealed a great deal of variation amongst isolates with genome sequence surveys providing information on novel genes, undetectable by other means. We now have a better understanding of the extant variation in S. cerevisiae as a species as well as some idea of how much we are missing from this understanding. As with gene annotation, comparative genomics enhances the discovery and description of genome variation and is providing us with the tools for understanding genome evolution, adaptation and selection, and underlying genetics of complex traits.

  15. The tiger genome and comparative analysis with lion and snow leopard genomes.

    PubMed

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  16. The tiger genome and comparative analysis with lion and snow leopard genomes

    PubMed Central

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  17. Genome-wide identification and comparative expression analysis reveal a rapid expansion and functional divergence of duplicated genes in the WRKY gene family of cabbage, Brassica oleracea var. capitata.

    PubMed

    Yao, Qiu-Yang; Xia, En-Hua; Liu, Fei-Hu; Gao, Li-Zhi

    2015-02-15

    WRKY transcription factors (TFs), one of the ten largest TF families in higher plants, play important roles in regulating plant development and resistance. To date, little is known about the WRKY TF family in Brassica oleracea. Recently, the completed genome sequence of cabbage (B. oleracea var. capitata) allows us to systematically analyze WRKY genes in this species. A total of 148 WRKY genes were characterized and classified into seven subgroups that belong to three major groups. Phylogenetic and synteny analyses revealed that the repertoire of cabbage WRKY genes was derived from a common ancestor shared with Arabidopsis thaliana. The B. oleracea WRKY genes were found to be preferentially retained after the whole-genome triplication (WGT) event in its recent ancestor, suggesting that the WGT event had largely contributed to a rapid expansion of the WRKY gene family in B. oleracea. The analysis of RNA-Seq data from various tissues (i.e., roots, stems, leaves, buds, flowers and siliques) revealed that most of the identified WRKY genes were positively expressed in cabbage, and a large portion of them exhibited patterns of differential and tissue-specific expression, demonstrating that these gene members might play essential roles in plant developmental processes. Comparative analysis of the expression level among duplicated genes showed that gene expression divergence was evidently presented among cabbage WRKY paralogs, indicating functional divergence of these duplicated WRKY genes. Copyright © 2014 Elsevier B.V. All rights reserved.

  18. Genome-scale gene expression characteristics define the follicular initiation and developmental rules during folliculogenesis.

    PubMed

    Shi, Kerong; He, Feng; Yuan, Xuefeng; Zhao, Yaofeng; Deng, Xuemei; Hu, Xiaoxiang; Li, Ning

    2013-08-01

    The ovarian follicle supplies a unique dynamic system for gametes that ensures the propagation of the species. During folliculogenesis, the vast majority of the germ cells are lost or inactivated because of ovarian follicle atresia, resulting in diminished reproductive potency and potential infertility. Understanding the underlying molecular mechanism of folliculogenesis rules is essential. Primordial (P), preantral (M), and large antral (L) porcine follicles were used to reveal their genome-wide gene expression profiles. Results indicate that primordial follicles (P) process a diverse gene expression pattern compared to growing follicles (M and L). The 5,548 differentially expressed genes display a similar expression mode in M and L, with a correlation coefficient of 0.892. The number of regulated (both up and down) genes in M is more than that in L. Also, their regulation folds in M (2-364-fold) are much more acute than in L (2-75-fold). Differentially expressed gene groups with different regulation patterns in certain follicular stages are identified and presumed to be closely related following follicular developmental rules. Interestingly, functional annotation analysis revealed that these gene groups feature distinct biological processes or molecular functions. Moreover, representative candidate genes from these gene groups have had their RNA or protein expressions within follicles confirmed. Our study emphasized genome-scale gene expression characteristics, which provide novel entry points for understanding the folliculogenesis rules on the molecular level, such as follicular initiation, atresia, and dominance. Transcriptional regulatory circuitries in certain follicular stages are expected to be found among the identified differentially expressed gene groups.

  19. Ortholog Identification and Comparative Analysis of Microbial Genomes Using MBGD and RECOG.

    PubMed

    Uchiyama, Ikuo

    2017-01-01

    Comparative genomics is becoming an essential approach for identification of genes associated with a specific function or phenotype. Here, we introduce the microbial genome database for comparative analysis (MBGD), which is a comprehensive ortholog database among the microbial genomes available so far. MBGD contains several precomputed ortholog tables including the standard ortholog table covering the entire taxonomic range and taxon-specific ortholog tables for various major taxa. In addition, MBGD allows the users to create an ortholog table within any specified set of genomes through dynamic calculations. In particular, MBGD has a "My MBGD" mode where users can upload their original genome sequences and incorporate them into orthology analysis. The created ortholog table can serve as the basis for various comparative analyses. Here, we describe the use of MBGD and briefly explain how to utilize the orthology information during comparative genome analysis in combination with the stand-alone comparative genomics software RECOG, focusing on the application to comparison of closely related microbial genomes.

  20. Comparative scaffolding and gap filling of ancient bacterial genomes applied to two ancient Yersinia pestis genomes

    PubMed Central

    Doerr, Daniel; Chauve, Cedric

    2017-01-01

    Yersinia pestis is the causative agent of the bubonic plague, a disease responsible for several dramatic historical pandemics. Progress in ancient DNA (aDNA) sequencing rendered possible the sequencing of whole genomes of important human pathogens, including the ancient Y. pestis strains responsible for outbreaks of the bubonic plague in London in the 14th century and in Marseille in the 18th century, among others. However, aDNA sequencing data are still characterized by short reads and non-uniform coverage, so assembling ancient pathogen genomes remains challenging and often prevents a detailed study of genome rearrangements. It has recently been shown that comparative scaffolding approaches can improve the assembly of ancient Y. pestis genomes at a chromosome level. In the present work, we address the last step of genome assembly, the gap-filling stage. We describe an optimization-based method AGapEs (ancestral gap estimation) to fill in inter-contig gaps using a combination of a template obtained from related extant genomes and aDNA reads. We show how this approach can be used to refine comparative scaffolding by selecting contig adjacencies supported by a mix of unassembled aDNA reads and comparative signal. We applied our method to two Y. pestis data sets from the London and Marseilles outbreaks, for which we obtained highly improved genome assemblies for both genomes, comprised of, respectively, five and six scaffolds with 95 % of the assemblies supported by ancient reads. We analysed the genome evolution between both ancient genomes in terms of genome rearrangements, and observed a high level of synteny conservation between these strains. PMID:29114402

  1. Comparative genomics of pathogenic lineages of Vibrio nigripulchritudo identifies virulence-associated traits

    PubMed Central

    Goudenège, David; Labreuche, Yannick; Krin, Evelyne; Ansquer, Dominique; Mangenot, Sophie; Calteau, Alexandra; Médigue, Claudine; Mazel, Didier; Polz, Martin F; Le Roux, Frédérique

    2013-01-01

    Vibrio nigripulchritudo is an emerging pathogen of farmed shrimp in New Caledonia and other regions in the Indo-Pacific. The molecular determinants of V. nigripulchritudo pathogenicity are unknown; however, molecular epidemiological studies have suggested that pathogenicity is linked to particular lineages. Here, we performed high-throughput sequencing-based comparative genome analysis of 16 V. nigripulchritudo strains to explore the genomic diversity and evolutionary history of pathogen-containing lineages and to identify pathogen-specific genetic elements. Our phylogenetic analysis revealed three pathogen-containing V. nigripulchritudo clades, including two clades previously identified from New Caledonia and one novel clade comprising putatively pathogenic isolates from septicemic shrimp in Madagascar. The similar genetic distance between the three clades indicates that they have diverged from an ancestral population roughly at the same time and recombination analysis indicates that these genomes have, in the past, shared a common gene pool and exchanged genes. As each contemporary lineage is comprised of nearly identical strains, comparative genomics allowed differentiation of genetic elements specific to shrimp pathogenesis of varying severity. Notably, only a large plasmid present in all highly pathogenic (HP) strains encodes a toxin. Although less/non-pathogenic strains contain related plasmids, these are differentiated by a putative toxin locus. Expression of this gene by a non-pathogenic V. nigripulchritudo strain resulted in production of toxic culture supernatant, normally an exclusive feature of HP strains. Thus, this protein, here termed ‘nigritoxin', is implicated to an extent that remains to be precisely determined in the toxicity of V. nigripulchritudo. PMID:23739050

  2. Bamboo Flowering from the Perspective of Comparative Genomics and Transcriptomics

    PubMed Central

    Biswas, Prasun; Chakraborty, Sukanya; Dutta, Smritikana; Pal, Amita; Das, Malay

    2016-01-01

    Bamboos are an important member of the subfamily Bambusoideae, family Poaceae. The plant group exhibits wide variation with respect to the timing (1–120 years) and nature (sporadic vs. gregarious) of flowering among species. Usually flowering in woody bamboos is synchronous across culms growing over a large area, known as gregarious flowering. In many monocarpic bamboos this is followed by mass death and seed setting. While in sporadic flowering an isolated wild clump may flower, set little or no seed and remain alive. Such wide variation in flowering time and extent means that the plant group serves as repositories for genes and expression patterns that are unique to bamboo. Due to the dearth of available genomic and transcriptomic resources, limited studies have been undertaken to identify the potential molecular players in bamboo flowering. The public release of the first bamboo genome sequence Phyllostachys heterocycla, availability of related genomes Brachypodium distachyon and Oryza sativa provide us the opportunity to study this long-standing biological problem in a comparative and functional genomics framework. We identified bamboo genes homologous to those of Oryza and Brachypodium that are involved in established pathways such as vernalization, photoperiod, autonomous, and hormonal regulation of flowering. Additionally, we investigated triggers like stress (drought), physiological maturity and micro RNAs that may play crucial roles in flowering. We also analyzed available transcriptome datasets of different bamboo species to identify genes and their involvement in bamboo flowering. Finally, we summarize potential research hurdles that need to be addressed in future research. PMID:28018419

  3. Gramene 2016: comparative plant genomics and pathway resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the data...

  4. Western environment/lifestyle is associated with increased genome methylation and decreased gene expression in Chinese immigrants living in Australia.

    PubMed

    Zhang, Guicheng; Wang, Kui; Schultz, Ennee; Khoo, Siew-Kim; Zhang, Xiaopeng; Annamalay, Alicia; Laing, Ingrid A; Hales, Belinda J; Goldblatt, Jack; Le Souëf, Peter N

    2016-01-01

    Several human diseases and conditions are disproportionally distributed in the world with a significant "Western-developed" vs. "Eastern-developing" gradient. We compared genome-wide DNA methylation of peripheral blood mononuclear cells in 25 newly arrived Chinese immigrants living in a Western environment for less than 6 months ("Newly arrived") with 23 Chinese immigrants living in the Western environment for more than two years ("Long-term") with a mean of 8.7 years, using the Infinium HumanMethylation450 BeadChip. In a sub-group of both subject groups (n = 12 each) we also investigated genome-wide gene expression using a Human HT-12 v4 expression beadChip. There were 62.5% probes among the total number of 382,250 valid CpG sites with greater mean Beta (β) in "Long-term" than in "Newly arrived". In the regions of CpG islands and gene promoters, compared with the CpG sites in all other regions, lower percentages of CpG sites with mean methylation levels in "Long-term" greater than "Newly arrived" were observed, but still >50%. The increase of methylation was associated with a general decrease of gene expression in Chinese immigrants living in the Western environment for a longer period of time. After adjusting for age, gender and other confounding factors the findings remained. Chinese immigrants living in Australia for a longer period of time have increased overall genome methylation and decreased overall gene expression compared with newly arrived immigrants. © 2015 Wiley Periodicals, Inc.

  5. Gene expression levels as endophenotypes in genome-wide association studies of Alzheimer disease

    PubMed Central

    Zou, F.; Carrasquillo, M. M.; Pankratz, V. S.; Belbin, O.; Morgan, K.; Allen, M.; Wilcox, S. L.; Ma, L.; Walker, L. P.; Kouri, N.; Burgess, J. D.; Younkin, L. H.; Younkin, Samuel G.; Younkin, C. S.; Bisceglio, G. D.; Crook, J. E.; Dickson, D. W.; Petersen, R. C.; Graff-Radford, N.; Younkin, Steven G.; Ertekin-Taner, N.

    2010-01-01

    Background: Late-onset Alzheimer disease (LOAD) is a common disorder with a substantial genetic component. We postulate that many disease susceptibility variants act by altering gene expression levels. Methods: We measured messenger RNA (mRNA) expression levels of 12 LOAD candidate genes in the cerebella of 200 subjects with LOAD. Using the genotypes from our LOAD genome-wide association study for the cis-single nucleotide polymorphisms (SNPs) (n = 619) of these 12 LOAD candidate genes, we tested for associations with expression levels as endophenotypes. The strongest expression cis-SNP was tested for AD association in 7 independent case-control series (2,280 AD and 2,396 controls). Results: We identified 3 SNPs that associated significantly with IDE (insulin degrading enzyme) expression levels. A single copy of the minor allele for each significant SNP was associated with ∼twofold higher IDE expression levels. The most significant SNP, rs7910977, is 4.2 kb beyond the 3′ end of IDE. The association observed with this SNP was significant even at the genome-wide level (p = 2.7 × 10−8). Furthermore, the minor allele of rs7910977 associated significantly (p = 0.0046) with reduced LOAD risk (OR = 0.81 with a 95% CI of 0.70-0.94), as expected biologically from its association with elevated IDE expression. Conclusions: These results provide strong evidence that IDE is a late-onset Alzheimer disease (LOAD) gene with variants that modify risk of LOAD by influencing IDE expression. They also suggest that the use of expression levels as endophenotypes in genome-wide association studies may provide a powerful approach for the identification of disease susceptibility alleles. GLOSSARY AD = Alzheimer disease; CI = confidence interval; GWAS = genome-wide association study; LOAD = late-onset Alzheimer disease; mRNA = messenger RNA; OR = odds ratio; SNP = single nucleotide polymorphism. PMID:20142614

  6. Characterization of canine osteosarcoma by array comparative genomic hybridization and RT-qPCR: signatures of genomic imbalance in canine osteosarcoma parallel the human counterpart.

    PubMed

    Angstadt, Andrea Y; Motsinger-Reif, Alison; Thomas, Rachael; Kisseberth, William C; Guillermo Couto, C; Duval, Dawn L; Nielsen, Dahlia M; Modiano, Jaime F; Breen, Matthew

    2011-11-01

    Osteosarcoma (OS) is the most commonly diagnosed malignant bone tumor in humans and dogs, characterized in both species by extremely complex karyotypes exhibiting high frequencies of genomic imbalance. Evaluation of genomic signatures in human OS using array comparative genomic hybridization (aCGH) has assisted in uncovering genetic mechanisms that result in disease phenotype. Previous low-resolution (10-20 Mb) aCGH analysis of canine OS identified a wide range of recurrent DNA copy number aberrations, indicating extensive genomic instability. In this study, we profiled 123 canine OS tumors by 1 Mb-resolution aCGH to generate a dataset for direct comparison with current data for human OS, concluding that several high frequency aberrations in canine and human OS are orthologous. To ensure complete coverage of gene annotation, we identified the human refseq genes that map to these orthologous aberrant dog regions and found several candidate genes warranting evaluation for OS involvement. Specifically, subsequenct FISH and qRT-PCR analysis of RUNX2, TUSC3, and PTEN indicated that expression levels correlated with genomic copy number status, showcasing RUNX2 as an OS associated gene and TUSC3 as a possible tumor suppressor candidate. Together these data demonstrate the ability of genomic comparative oncology to identify genetic abberations which may be important for OS progression. Large scale screening of genomic imbalance in canine OS further validates the use of the dog as a suitable model for human cancers, supporting the idea that dysregulation discovered in canine cancers will provide an avenue for complementary study in human counterparts. Copyright © 2011 Wiley-Liss, Inc.

  7. IMG: the integrated microbial genomes database and comparative analysis system

    PubMed Central

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640

  8. IMG: the Integrated Microbial Genomes database and comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp).

  9. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene.

    PubMed

    Levy-Lahad, E; Poorkaj, P; Wang, K; Fu, Y H; Oshima, J; Mulligan, J; Schellenberg, G D

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23,737 bp. The first 2 exons encode the 5'-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splice acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system.

  10. MIPS PlantsDB: a database framework for comparative plant genome research.

    PubMed

    Nussbaumer, Thomas; Martis, Mihaela M; Roessner, Stephan K; Pfeifer, Matthias; Bader, Kai C; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB-plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834-D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB.

  11. MIPS PlantsDB: a database framework for comparative plant genome research

    PubMed Central

    Nussbaumer, Thomas; Martis, Mihaela M.; Roessner, Stephan K.; Pfeifer, Matthias; Bader, Kai C.; Sharma, Sapna; Gundlach, Heidrun; Spannagl, Manuel

    2013-01-01

    The rapidly increasing amount of plant genome (sequence) data enables powerful comparative analyses and integrative approaches and also requires structured and comprehensive information resources. Databases are needed for both model and crop plant organisms and both intuitive search/browse views and comparative genomics tools should communicate the data to researchers and help them interpret it. MIPS PlantsDB (http://mips.helmholtz-muenchen.de/plant/genomes.jsp) was initially described in NAR in 2007 [Spannagl,M., Noubibou,O., Haase,D., Yang,L., Gundlach,H., Hindemitt, T., Klee,K., Haberer,G., Schoof,H. and Mayer,K.F. (2007) MIPSPlantsDB–plant database resource for integrative and comparative plant genome research. Nucleic Acids Res., 35, D834–D840] and was set up from the start to provide data and information resources for individual plant species as well as a framework for integrative and comparative plant genome research. PlantsDB comprises database instances for tomato, Medicago, Arabidopsis, Brachypodium, Sorghum, maize, rice, barley and wheat. Building up on that, state-of-the-art comparative genomics tools such as CrowsNest are integrated to visualize and investigate syntenic relationships between monocot genomes. Results from novel genome analysis strategies targeting the complex and repetitive genomes of triticeae species (wheat and barley) are provided and cross-linked with model species. The MIPS Repeat Element Database (mips-REdat) and Catalog (mips-REcat) as well as tight connections to other databases, e.g. via web services, are further important components of PlantsDB. PMID:23203886

  12. Orthogonal control of expression mean and variance by epigenetic features at different genomic loci

    DOE PAGES

    Dey, Siddharth S.; Foley, Jonathan E.; Limsirichai, Prajit; ...

    2015-05-05

    While gene expression noise has been shown to drive dramatic phenotypic variations, the molecular basis for this variability in mammalian systems is not well understood. Gene expression has been shown to be regulated by promoter architecture and the associated chromatin environment. However, the exact contribution of these two factors in regulating expression noise has not been explored. Using a dual-reporter lentiviral model system, we deconvolved the influence of the promoter sequence to systematically study the contribution of the chromatin environment at different genomic locations in regulating expression noise. By integrating a large-scale analysis to quantify mRNA levels by smFISH andmore » protein levels by flow cytometry in single cells, we found that mean expression and noise are uncorrelated across genomic locations. Furthermore, we showed that this independence could be explained by the orthogonal control of mean expression by the transcript burst size and noise by the burst frequency. Finally, we showed that genomic locations displaying higher expression noise are associated with more repressed chromatin, thereby indicating the contribution of the chromatin environment in regulating expression noise.« less

  13. Whole Genome Amplification of Labeled Viable Single Cells Suited for Array-Comparative Genomic Hybridization.

    PubMed

    Kroneis, Thomas; El-Heliebi, Amin

    2015-01-01

    Understanding details of a complex biological system makes it necessary to dismantle it down to its components. Immunostaining techniques allow identification of several distinct cell types thereby giving an inside view of intercellular heterogeneity. Often staining reveals that the most remarkable cells are the rarest. To further characterize the target cells on a molecular level, single cell techniques are necessary. Here, we describe the immunostaining, micromanipulation, and whole genome amplification of single cells for the purpose of genomic characterization. First, we exemplify the preparation of cell suspensions from cultured cells as well as the isolation of peripheral mononucleated cells from blood. The target cell population is then subjected to immunostaining. After cytocentrifugation target cells are isolated by micromanipulation and forwarded to whole genome amplification. For whole genome amplification, we use GenomePlex(®) technology allowing downstream genomic analysis such as array-comparative genomic hybridization.

  14. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  15. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    PubMed Central

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Hubisz, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Zhang, Peili; Liu, Jing; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catharine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenée; Verduzco, Daniel; Clerc-Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2005-01-01

    We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila. PMID:15632085

  16. Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

    PubMed

    Khatri, Indu; Tomar, Rajul; Ganesan, K; Prasad, G S; Subramanian, Srikrishna

    2017-03-23

    The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.

  17. Genome-Wide Comparative Gene Family Classification

    PubMed Central

    Frech, Christian; Chen, Nansheng

    2010-01-01

    Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species. PMID:20976221

  18. Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.

    PubMed

    Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John

    2013-08-01

    Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.

  19. Genome-wide study of correlations between genomic features and their relationship with the regulation of gene expression.

    PubMed

    Kravatsky, Yuri V; Chechetkin, Vladimir R; Tchurikov, Nikolai A; Kravatskaya, Galina I

    2015-02-01

    The broad class of tasks in genetics and epigenetics can be reduced to the study of various features that are distributed over the genome (genome tracks). The rapid and efficient processing of the huge amount of data stored in the genome-scale databases cannot be achieved without the software packages based on the analytical criteria. However, strong inhomogeneity of genome tracks hampers the development of relevant statistics. We developed the criteria for the assessment of genome track inhomogeneity and correlations between two genome tracks. We also developed a software package, Genome Track Analyzer, based on this theory. The theory and software were tested on simulated data and were applied to the study of correlations between CpG islands and transcription start sites in the Homo sapiens genome, between profiles of protein-binding sites in chromosomes of Drosophila melanogaster, and between DNA double-strand breaks and histone marks in the H. sapiens genome. Significant correlations between transcription start sites on the forward and the reverse strands were observed in genomes of D. melanogaster, Caenorhabditis elegans, Mus musculus, H. sapiens, and Danio rerio. The observed correlations may be related to the regulation of gene expression in eukaryotes. Genome Track Analyzer is freely available at http://ancorr.eimb.ru/. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  20. Central genomic regulation of the expression of oestrous behaviour in dairy cows: a review.

    PubMed

    Woelders, H; van der Lende, T; Kommadath, A; te Pas, M F W; Smits, M A; Kaal, L M T E

    2014-05-01

    The expression of oestrous behaviour in Holstein Friesian dairy cows has progressively decreased over the past 50 years. Reduced oestrus expression is one of the factors contributing to the current suboptimal reproductive efficiency in dairy farming. Variation between and within cows in the expression of oestrous behaviour is associated with variation in peripheral blood oestradiol concentrations during oestrus. In addition, there is evidence for a priming role of progesterone for the full display of oestrous behaviour. A higher rate of metabolic clearance of ovarian steroids could be one of the factors leading to lower peripheral blood concentrations of oestradiol and progesterone in high-producing dairy cows. Oestradiol acts on the brain by genomic, non-genomic and growth factor-dependent mechanisms. A firm base of understanding of the ovarian steroid-driven central genomic regulation of female sexual behaviour has been obtained from studies on rodents. These studies have resulted in the definition of five modules of oestradiol-activated genes in the brain, referred to as the GAPPS modules. In a recent series of studies, gene expression in the anterior pituitary and four brain areas (amygdala, hippocampus, dorsal hypothalamus and ventral hypothalamus) in oestrous and luteal phase cows, respectively, has been measured, and the relation with oestrous behaviour of these cows was analysed. These studies identified a number of genes of which the expression was associated with the intensity of oestrous behaviour. These genes could be grouped according to the GAPPS modules, suggesting close similarity of the regulation of oestrous behaviour in cows and female sexual behaviour in rodents. A better understanding of the central genomic regulation of the expression of oestrous behaviour in dairy cows may in due time contribute to improved (genomic) selection strategies for appropriate oestrus expression in high-producing dairy cows.

  1. Comparative genomic and proteomic analyses of Clostridium acetobutylicum Rh8 and its parent strain DSM 1731 revealed new understandings on butanol tolerance

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bao, Guanhui; University of Chinese Academy of Sciences, Beijing; Dong, Hongjun

    Highlights: • Genomes of a butanol tolerant strain and its parent strain were deciphered. • Comparative genomic and proteomic was applied to understand butanol tolerance. • None differentially expressed proteins have mutations in its corresponding genes. • Mutations in ribosome might be responsible for the global difference of proteomics. - Abstract: Clostridium acetobutylicum strain Rh8 is a butanol-tolerant mutant which can tolerate up to 19 g/L butanol, 46% higher than that of its parent strain DSM 1731. We previously performed comparative cytoplasm- and membrane-proteomic analyses to understand the mechanism underlying the improved butanol tolerance of strain Rh8. In this work,more » we further extended this comparison to the genomic level. Compared with the genome of the parent strain DSM 1731, two insertion sites, four deletion sites, and 67 single nucleotide variations (SNVs) are distributed throughout the genome of strain Rh8. Among the 67 SNVs, 16 SNVs are located in the predicted promoters and intergenic regions; while 29 SNVs are located in the coding sequence, affecting a total of 21 proteins involved in transport, cell structure, DNA replication, and protein translation. The remaining 22 SNVs are located in the ribosomal genes, affecting a total of 12 rRNA genes in different operons. Analysis of previous comparative proteomic data indicated that none of the differentially expressed proteins have mutations in its corresponding genes. Rchange Algorithms analysis indicated that the mutations occurred in the ribosomal genes might change the ribosome RNA thermodynamic characteristics, thus affect the translation strength of these proteins. Take together, the improved butanol tolerance of C. acetobutylicum strain Rh8 might be acquired through regulating the translational process to achieve different expression strength of genes involved in butanol tolerance.« less

  2. PLAZA 3.0: an access point for plant comparative genomics

    PubMed Central

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. PMID:25324309

  3. Ancient Duplications and Expression Divergence in the Globin Gene Superfamily of Vertebrates: Insights from the Elephant Shark Genome and Transcriptome.

    PubMed

    Opazo, Juan C; Lee, Alison P; Hoffmann, Federico G; Toloza-Villalobos, Jessica; Burmester, Thorsten; Venkatesh, Byrappa; Storz, Jay F

    2015-07-01

    Comparative analyses of vertebrate genomes continue to uncover a surprising diversity of genes in the globin gene superfamily, some of which have very restricted phyletic distributions despite their antiquity. Genomic analysis of the globin gene repertoire of cartilaginous fish (Chondrichthyes) should be especially informative about the duplicative origins and ancestral functions of vertebrate globins, as divergence between Chondrichthyes and bony vertebrates represents the most basal split within the jawed vertebrates. Here, we report a comparative genomic analysis of the vertebrate globin gene family that includes the complete globin gene repertoire of the elephant shark (Callorhinchus milii). Using genomic sequence data from representatives of all major vertebrate classes, integrated analyses of conserved synteny and phylogenetic relationships revealed that the last common ancestor of vertebrates possessed a repertoire of at least seven globin genes: single copies of androglobin and neuroglobin, four paralogous copies of globin X, and the single-copy progenitor of the entire set of vertebrate-specific globins. Combined with expression data, the genomic inventory of elephant shark globins yielded four especially surprising findings: 1) there is no trace of the neuroglobin gene (a highly conserved gene that is present in all other jawed vertebrates that have been examined to date), 2) myoglobin is highly expressed in heart, but not in skeletal muscle (reflecting a possible ancestral condition in vertebrates with single-circuit circulatory systems), 3) elephant shark possesses two highly divergent globin X paralogs, one of which is preferentially expressed in gonads, and 4) elephant shark possesses two structurally distinct α-globin paralogs, one of which is preferentially expressed in the brain. Expression profiles of elephant shark globin genes reveal distinct specializations of function relative to orthologs in bony vertebrates and suggest hypotheses about

  4. Comparative Genomics of Carp Herpesviruses

    PubMed Central

    Kurobe, Tomofumi; Gatherer, Derek; Cunningham, Charles; Korf, Ian; Fukuda, Hideo; Hedrick, Ronald P.; Waltzek, Thomas B.

    2013-01-01

    Three alloherpesviruses are known to cause disease in cyprinid fish: cyprinid herpesviruses 1 and 3 (CyHV1 and CyHV3) in common carp and koi and cyprinid herpesvirus 2 (CyHV2) in goldfish. We have determined the genome sequences of CyHV1 and CyHV2 and compared them with the published CyHV3 sequence. The CyHV1 and CyHV2 genomes are 291,144 and 290,304 bp, respectively, in size, and thus the CyHV3 genome, at 295,146 bp, remains the largest recorded among the herpesviruses. Each of the three genomes consists of a unique region flanked at each terminus by a sizeable direct repeat. The CyHV1, CyHV2, and CyHV3 genomes are predicted to contain 137, 150, and 155 unique, functional protein-coding genes, respectively, of which six, four, and eight, respectively, are duplicated in the terminal repeat. The three viruses share 120 orthologous genes in a largely colinear arrangement, of which up to 55 are also conserved in the other member of the genus Cyprinivirus, anguillid herpesvirus 1. Twelve genes are conserved convincingly in all sequenced alloherpesviruses, and two others are conserved marginally. The reference CyHV3 strain has been reported to contain five fragmented genes that are presumably nonfunctional. The CyHV2 strain has two fragmented genes, and the CyHV1 strain has none. CyHV1, CyHV2, and CyHV3 have five, six, and five families of paralogous genes, respectively. One family unique to CyHV1 is related to cellular JUNB, which encodes a transcription factor involved in oncogenesis. To our knowledge, this is the first time that JUNB-related sequences have been reported in a herpesvirus. PMID:23269803

  5. Comparative genomics of biotechnologically important yeasts

    USDA-ARS?s Scientific Manuscript database

    Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the...

  6. Comprehensive Genome-Wide Survey, Genomic Constitution and Expression Profiling of the NAC Transcription Factor Family in Foxtail Millet (Setaria italica L.)

    PubMed Central

    Puranik, Swati; Sahu, Pranav Pankaj; Mandal, Sambhu Nath; B., Venkata Suresh; Parida, Swarup Kumar; Prasad, Manoj

    2013-01-01

    The NAC proteins represent a major plant-specific transcription factor family that has established enormously diverse roles in various plant processes. Aided by the availability of complete genomes, several members of this family have been identified in Arabidopsis, rice, soybean and poplar. However, no comprehensive investigation has been presented for the recently sequenced, naturally stress tolerant crop, Setaria italica (foxtail millet) that is famed as a model crop for bioenergy research. In this study, we identified 147 putative NAC domain-encoding genes from foxtail millet by systematic sequence analysis and physically mapped them onto nine chromosomes. Genomic organization suggested that inter-chromosomal duplications may have been responsible for expansion of this gene family in foxtail millet. Phylogenetically, they were arranged into 11 distinct sub-families (I-XI), with duplicated genes fitting into one cluster and possessing conserved motif compositions. Comparative mapping with other grass species revealed some orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of genes. The evolutionary significance as duplication and divergence of NAC genes based on their amino acid substitution rates was understood. Expression profiling against various stresses and phytohormones provides novel insights into specific and/or overlapping expression patterns of SiNAC genes, which may be responsible for functional divergence among individual members in this crop. Further, we performed structure modeling and molecular simulation of a stress-responsive protein, SiNAC128, proffering an initial framework for understanding its molecular function. Taken together, this genome-wide identification and expression profiling unlocks new avenues for systematic functional analysis of novel NAC gene family candidates which may be applied for improvising stress adaption in plants. PMID:23691254

  7. Comprehensive genome-wide survey, genomic constitution and expression profiling of the NAC transcription factor family in foxtail millet (Setaria italica L.).

    PubMed

    Puranik, Swati; Sahu, Pranav Pankaj; Mandal, Sambhu Nath; B, Venkata Suresh; Parida, Swarup Kumar; Prasad, Manoj

    2013-01-01

    The NAC proteins represent a major plant-specific transcription factor family that has established enormously diverse roles in various plant processes. Aided by the availability of complete genomes, several members of this family have been identified in Arabidopsis, rice, soybean and poplar. However, no comprehensive investigation has been presented for the recently sequenced, naturally stress tolerant crop, Setaria italica (foxtail millet) that is famed as a model crop for bioenergy research. In this study, we identified 147 putative NAC domain-encoding genes from foxtail millet by systematic sequence analysis and physically mapped them onto nine chromosomes. Genomic organization suggested that inter-chromosomal duplications may have been responsible for expansion of this gene family in foxtail millet. Phylogenetically, they were arranged into 11 distinct sub-families (I-XI), with duplicated genes fitting into one cluster and possessing conserved motif compositions. Comparative mapping with other grass species revealed some orthologous relationships and chromosomal rearrangements including duplication, inversion and deletion of genes. The evolutionary significance as duplication and divergence of NAC genes based on their amino acid substitution rates was understood. Expression profiling against various stresses and phytohormones provides novel insights into specific and/or overlapping expression patterns of SiNAC genes, which may be responsible for functional divergence among individual members in this crop. Further, we performed structure modeling and molecular simulation of a stress-responsive protein, SiNAC128, proffering an initial framework for understanding its molecular function. Taken together, this genome-wide identification and expression profiling unlocks new avenues for systematic functional analysis of novel NAC gene family candidates which may be applied for improvising stress adaption in plants.

  8. A De-Novo Genome Analysis Pipeline (DeNoGAP) for large-scale comparative prokaryotic genomics studies.

    PubMed

    Thakur, Shalabh; Guttman, David S

    2016-06-30

    Comparative analysis of whole genome sequence data from closely related prokaryotic species or strains is becoming an increasingly important and accessible approach for addressing both fundamental and applied biological questions. While there are number of excellent tools developed for performing this task, most scale poorly when faced with hundreds of genome sequences, and many require extensive manual curation. We have developed a de-novo genome analysis pipeline (DeNoGAP) for the automated, iterative and high-throughput analysis of data from comparative genomics projects involving hundreds of whole genome sequences. The pipeline is designed to perform reference-assisted and de novo gene prediction, homolog protein family assignment, ortholog prediction, functional annotation, and pan-genome analysis using a range of proven tools and databases. While most existing methods scale quadratically with the number of genomes since they rely on pairwise comparisons among predicted protein sequences, DeNoGAP scales linearly since the homology assignment is based on iteratively refined hidden Markov models. This iterative clustering strategy enables DeNoGAP to handle a very large number of genomes using minimal computational resources. Moreover, the modular structure of the pipeline permits easy updates as new analysis programs become available. DeNoGAP integrates bioinformatics tools and databases for comparative analysis of a large number of genomes. The pipeline offers tools and algorithms for annotation and analysis of completed and draft genome sequences. The pipeline is developed using Perl, BioPerl and SQLite on Ubuntu Linux version 12.04 LTS. Currently, the software package accompanies script for automated installation of necessary external programs on Ubuntu Linux; however, the pipeline should be also compatible with other Linux and Unix systems after necessary external programs are installed. DeNoGAP is freely available at https://sourceforge.net/projects/denogap/ .

  9. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  10. CFGP: a web-based, comparative fungal genomics platform

    PubMed Central

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F.; Blair, Jaime E.; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the ‘fill-in-the-form-and-press-SUBMIT’ user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI. PMID:17947331

  11. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  12. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus

    PubMed Central

    Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R.; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D.; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark

    2017-01-01

    Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda. PMID:28749982

  13. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus.

    PubMed

    Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark; Arakawa, Kazuharu

    2017-07-01

    Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda.

  14. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE PAGES

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew; ...

    2015-08-13

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  15. BactoGeNIE: a large-scale comparative genome visualization for big displays

    PubMed Central

    2015-01-01

    Background The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. Results In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE through a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. Conclusions BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics. PMID:26329021

  16. BactoGeNIE: A large-scale comparative genome visualization for big displays

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aurisano, Jillian; Reda, Khairi; Johnson, Andrew

    The volume of complete bacterial genome sequence data available to comparative genomics researchers is rapidly increasing. However, visualizations in comparative genomics--which aim to enable analysis tasks across collections of genomes--suffer from visual scalability issues. While large, multi-tiled and high-resolution displays have the potential to address scalability issues, new approaches are needed to take advantage of such environments, in order to enable the effective visual analysis of large genomics datasets. In this paper, we present Bacterial Gene Neighborhood Investigation Environment, or BactoGeNIE, a novel and visually scalable design for comparative gene neighborhood analysis on large display environments. We evaluate BactoGeNIE throughmore » a case study on close to 700 draft Escherichia coli genomes, and present lessons learned from our design process. In conclusion, BactoGeNIE accommodates comparative tasks over substantially larger collections of neighborhoods than existing tools and explicitly addresses visual scalability. Given current trends in data generation, scalable designs of this type may inform visualization design for large-scale comparative research problems in genomics.« less

  17. Inferring causal genomic alterations in breast cancer using gene expression data

    PubMed Central

    2011-01-01

    Background One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies. Results We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments. Conclusions To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data. PMID:21806811

  18. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymesmore » and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.« less

  19. Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics

    PubMed Central

    Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

    2015-01-01

    The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326

  20. Initial sequencing and comparative analysis of the mouse genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less

  1. PLAZA 3.0: an access point for plant comparative genomics.

    PubMed

    Proost, Sebastian; Van Bel, Michiel; Vaneechoutte, Dries; Van de Peer, Yves; Inzé, Dirk; Mueller-Roeber, Bernd; Vandepoele, Klaas

    2015-01-01

    Comparative sequence analysis has significantly altered our view on the complexity of genome organization and gene functions in different kingdoms. PLAZA 3.0 is designed to make comparative genomics data for plants available through a user-friendly web interface. Structural and functional annotation, gene families, protein domains, phylogenetic trees and detailed information about genome organization can easily be queried and visualized. Compared with the first version released in 2009, which featured nine organisms, the number of integrated genomes is more than four times higher, and now covers 37 plant species. The new species provide a wider phylogenetic range as well as a more in-depth sampling of specific clades, and genomes of additional crop species are present. The functional annotation has been expanded and now comprises data from Gene Ontology, MapMan, UniProtKB/Swiss-Prot, PlnTFDB and PlantTFDB. Furthermore, we improved the algorithms to transfer functional annotation from well-characterized plant genomes to other species. The additional data and new features make PLAZA 3.0 (http://bioinformatics.psb.ugent.be/plaza/) a versatile and comprehensible resource for users wanting to explore genome information to study different aspects of plant biology, both in model and non-model organisms. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Evolutionary and comparative analyses of the soybean genome

    PubMed Central

    Cannon, Steven B.; Shoemaker, Randy C.

    2012-01-01

    The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483

  3. Cassava (Manihot esculenta Krantz) genome harbors KNOX genes differentially expressed during storage root development.

    PubMed

    Guo, D; Li, H L; Tang, X; Peng, S Q

    2014-12-18

    In plants, homeodomain proteins play a critical role in regulating various aspects of plant growth and development. KNOX proteins are members of the homeodomain protein family. The KNOX transcription factors have been reported from Arabidopsis, rice, and other higher plants. The recent publication of the draft genome sequence of cassava (Manihot esculenta Krantz) has allowed a genome-wide search for M. esculenta KNOX (MeKNOX) transcription factors and the comparison of these positively identified proteins with their homologs in model plants. In the present study, we identified 12 MeKNOX genes in the cassava genome and grouped them into two distinct subfamilies based on their domain composition and phylogenetic analysis. Furthermore, semi-quantitative reverse transcription polymerase chain reaction analysis was performed to elucidate the expression profiles of these genes in different tissues and during various stages of root development. The analysis of MeKNOX expression profiles of indicated that 12 MeKNOX genes display differential expressions either in their transcript abundance or expression patterns.

  4. Genomic analysis of expressed sequence tags in American black bear Ursus americanus

    PubMed Central

    2010-01-01

    Background Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Results Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. Conclusion We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes. PMID:20338065

  5. Genomic analysis of expressed sequence tags in American black bear Ursus americanus.

    PubMed

    Zhao, Sen; Shao, Chunxuan; Goropashnaya, Anna V; Stewart, Nathan C; Xu, Yichi; Tøien, Øivind; Barnes, Brian M; Fedorov, Vadim B; Yan, Jun

    2010-03-26

    Species of the bear family (Ursidae) are important organisms for research in molecular evolution, comparative physiology and conservation biology, but relatively little genetic sequence information is available for this group. Here we report the development and analyses of the first large scale Expressed Sequence Tag (EST) resource for the American black bear (Ursus americanus). Comprehensive analyses of molecular functions, alternative splicing, and tissue-specific expression of 38,757 black bear EST sequences were conducted using the dog genome as a reference. We identified 18 genes, involved in functions such as lipid catabolism, cell cycle, and vesicle-mediated transport, that are showing rapid evolution in the bear lineage Three genes, Phospholamban (PLN), cysteine glycine-rich protein 3 (CSRP3) and Troponin I type 3 (TNNI3), are related to heart contraction, and defects in these genes in humans lead to heart disease. Two genes, biphenyl hydrolase-like (BPHL) and CSRP3, contain positively selected sites in bear. Global analysis of evolution rates of hibernation-related genes in bear showed that they are largely conserved and slowly evolving genes, rather than novel and fast-evolving genes. We provide a genomic resource for an important mammalian organism and our study sheds new light on the possible functions and evolution of bear genes.

  6. Genome Information Broker (GIB): data retrieval and comparative analysis system for completed microbial genomes and more

    PubMed Central

    Fumoto, Masaki; Miyazaki, Satoru; Sugawara, Hideaki

    2002-01-01

    Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. PMID:11752256

  7. A Guide to the PLAZA 3.0 Plant Comparative Genomic Database.

    PubMed

    Vandepoele, Klaas

    2017-01-01

    PLAZA 3.0 is an online resource for comparative genomics and offers a versatile platform to study gene functions and gene families or to analyze genome organization and evolution in the green plant lineage. Starting from genome sequence information for over 35 plant species, precomputed comparative genomic data sets cover homologous gene families, multiple sequence alignments, phylogenetic trees, and genomic colinearity information within and between species. Complementary functional data sets, a Workbench, and interactive visualization tools are available through a user-friendly web interface, making PLAZA an excellent starting point to translate sequence or omics data sets into biological knowledge. PLAZA is available at http://bioinformatics.psb.ugent.be/plaza/ .

  8. Comparative genomics of parasitic silkworm microsporidia reveal an association between genome expansion and host adaptation

    PubMed Central

    2013-01-01

    Background Microsporidian Nosema bombycis has received much attention because the pébrine disease of domesticated silkworms results in great economic losses in the silkworm industry. So far, no effective treatment could be found for pébrine. Compared to other known Nosema parasites, N. bombycis can unusually parasitize a broad range of hosts. To gain some insights into the underlying genetic mechanism of pathological ability and host range expansion in this parasite, a comparative genomic approach is conducted. The genome of two Nosema parasites, N. bombycis and N. antheraeae (an obligatory parasite to undomesticated silkworms Antheraea pernyi), were sequenced and compared with their distantly related species, N. ceranae (an obligatory parasite to honey bees). Results Our comparative genomics analysis show that the N. bombycis genome has greatly expanded due to the following three molecular mechanisms: 1) the proliferation of host-derived transposable elements, 2) the acquisition of many horizontally transferred genes from bacteria, and 3) the production of abundnant gene duplications. To our knowledge, duplicated genes derived not only from small-scale events (e.g., tandem duplications) but also from large-scale events (e.g., segmental duplications) have never been seen so abundant in any reported microsporidia genomes. Our relative dating analysis further indicated that these duplication events have arisen recently over very short evolutionary time. Furthermore, several duplicated genes involving in the cytotoxic metabolic pathway were found to undergo positive selection, suggestive of the role of duplicated genes on the adaptive evolution of pathogenic ability. Conclusions Genome expansion is rarely considered as the evolutionary outcome acting on those highly reduced and compact parasitic microsporidian genomes. This study, for the first time, demonstrates that the parasitic genomes can expand, instead of shrink, through several common molecular mechanisms

  9. Comparative Genomics Identifies Epidermal Proteins Associated with the Evolution of the Turtle Shell

    PubMed Central

    Holthaus, Karin Brigit; Strasser, Bettina; Sipos, Wolfgang; Schmidt, Heiko A.; Mlitz, Veronika; Sukseree, Supawadee; Weissenbacher, Anton; Tschachler, Erwin; Alibardi, Lorenzo; Eckhart, Leopold

    2016-01-01

    The evolution of reptiles, birds, and mammals was associated with the origin of unique integumentary structures. Studies on lizards, chicken, and humans have suggested that the evolution of major structural proteins of the outermost, cornified layers of the epidermis was driven by the diversification of a gene cluster called Epidermal Differentiation Complex (EDC). Turtles have evolved unique defense mechanisms that depend on mechanically resilient modifications of the epidermis. To investigate whether the evolution of the integument in these reptiles was associated with specific adaptations of the sequences and expression patterns of EDC-related genes, we utilized newly available genome sequences to determine the epidermal differentiation gene complement of turtles. The EDC of the western painted turtle (Chrysemys picta bellii) comprises more than 100 genes, including at least 48 genes that encode proteins referred to as beta-keratins or corneous beta-proteins. Several EDC proteins have evolved cysteine/proline contents beyond 50% of total amino acid residues. Comparative genomics suggests that distinct subfamilies of EDC genes have been expanded and partly translocated to loci outside of the EDC in turtles. Gene expression analysis in the European pond turtle (Emys orbicularis) showed that EDC genes are differentially expressed in the skin of the various body sites and that a subset of beta-keratin genes within the EDC as well as those located outside of the EDC are expressed predominantly in the shell. Our findings give strong support to the hypothesis that the evolutionary innovation of the turtle shell involved specific molecular adaptations of epidermal differentiation. PMID:26601937

  10. Genomic structure and expression of STM2, the chromosome 1 familial Alzheimer disease gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Levy-Lahad, E.; Wang, Kai; Fu, Ying Hui

    1996-06-01

    Mutations in the gene STM2 result in autosomal dominant familial Alzheimer disease. To screen for mutations and to identify regulatory elements for this gene, the genomic DNA sequence and intron-exon structure were determined. Twelve exons including 10 coding exons were identified in a genomic region spanning 23, 737 bp. The first 2 exons encode the 5{prime}-untranslated region. Expression analysis of STM2 indicates that two transcripts of 2.4 and 2.8 kb are found in skeletal muscle, pancreas, and heart. In addition, a splice variant of the 2.4-kb transcript was identified that is the result of the use of an alternative splicemore » acceptor site located in exon 10. The use of this site results in a transcript lacking a single glutamate. The promotor for this gene and the alternatively spliced exons leading to the 2.8-kb form of the gene remain to be identified. Expression of STM2 was high in skeletal muscle and pancreas, with comparatively low levels observed in brain. This expression pattern is intriguing since in Alzheimer disease, pathology and degeneration are observed only in the central nervous system. 19 refs., 2 figs., 3 tabs.« less

  11. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    PubMed

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  12. The plant ontology as a tool for comparative plant anatomy and genomic analyses

    USDA-ARS?s Scientific Manuscript database

    Plant science is now a major player in the fields of genomics, gene expression analysis, phenomics and metabolomics. Recent advances in sequencing technologies have led to a windfall of data, with new species being added rapidly to the list of species whose genomes have been decoded. The Plant Ontol...

  13. Comparative genomics and transcriptional profiles of Saccharopolyspora erythraea NRRL 2338 and a classically improved erythromycin over-producing strain

    PubMed Central

    2012-01-01

    Background The molecular mechanisms altered by the traditional mutation and screening approach during the improvement of antibiotic-producing microorganisms are still poorly understood although this information is essential to design rational strategies for industrial strain improvement. In this study, we applied comparative genomics to identify all genetic changes occurring during the development of an erythromycin overproducer obtained using the traditional mutate-and- screen method. Results Compared with the parental Saccharopolyspora erythraea NRRL 2338, the genome of the overproducing strain presents 117 deletion, 78 insertion and 12 transposition sites, with 71 insertion/deletion sites mapping within coding sequences (CDSs) and generating frame-shift mutations. Single nucleotide variations are present in 144 CDSs. Overall, the genomic variations affect 227 proteins of the overproducing strain and a considerable number of mutations alter genes of key enzymes in the central carbon and nitrogen metabolism and in the biosynthesis of secondary metabolites, resulting in the redirection of common precursors toward erythromycin biosynthesis. Interestingly, several mutations inactivate genes coding for proteins that play fundamental roles in basic transcription and translation machineries including the transcription anti-termination factor NusB and the transcription elongation factor Efp. These mutations, along with those affecting genes coding for pleiotropic or pathway-specific regulators, affect global expression profile as demonstrated by a comparative analysis of the parental and overproducer expression profiles. Genomic data, finally, suggest that the mutate-and-screen process might have been accelerated by mutations in DNA repair genes. Conclusions This study helps to clarify the mechanisms underlying antibiotic overproduction providing valuable information about new possible molecular targets for rationale strain improvement. PMID:22401291

  14. Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics.

    PubMed

    Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

    2015-01-01

    The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Regulation of human genome expression and RNA splicing by human papillomavirus 16 E2 protein.

    PubMed

    Gauson, Elaine J; Windle, Brad; Donaldson, Mary M; Caffarel, Maria M; Dornan, Edward S; Coleman, Nicholas; Herzyk, Pawel; Henderson, Scott C; Wang, Xu; Morgan, Iain M

    2014-11-01

    Human papillomavirus 16 (HPV16) is causative in human cancer. The E2 protein regulates transcription from and replication of the viral genome; the role of E2 in regulating the host genome has been less well studied. We have expressed HPV16 E2 (E2) stably in U2OS cells; these cells tolerate E2 expression well and gene expression analysis identified 74 genes showing differential expression specific to E2. Analysis of published gene expression data sets during cervical cancer progression identified 20 of the genes as being altered in a similar direction as the E2 specific genes. In addition, E2 altered the splicing of many genes implicated in cancer and cell motility. The E2 expressing cells showed no alteration in cell growth but were altered in cell motility, consistent with the E2 induced altered splicing predicted to affect this cellular function. The results present a model system for investigating E2 regulation of the host genome. Copyright © 2014 Elsevier Inc. All rights reserved.

  16. Decoherence in yeast cell populations and its implications for genome-wide expression noise.

    PubMed

    Briones, M R S; Bosco, F

    2009-01-20

    Gene expression "noise" is commonly defined as the stochastic variation of gene expression levels in different cells of the same population under identical growth conditions. Here, we tested whether this "noise" is amplified with time, as a consequence of decoherence in global gene expression profiles (genome-wide microarrays) of synchronized cells. The stochastic component of transcription causes fluctuations that tend to be amplified as time progresses, leading to a decay of correlations of expression profiles, in perfect analogy with elementary relaxation processes. Measuring decoherence, defined here as a decay in the auto-correlation function of yeast genome-wide expression profiles, we found a slowdown in the decay of correlations, opposite to what would be expected if, as in mixing systems, correlations decay exponentially as the equilibrium state is reached. Our results indicate that the populational variation in gene expression (noise) is a consequence of temporal decoherence, in which the slow decay of correlations is a signature of strong interdependence of the transcription dynamics of different genes.

  17. The Immunological Genome Project: networks of gene expression in immune cells.

    PubMed

    Heng, Tracy S P; Painter, Michio W

    2008-10-01

    The Immunological Genome Project combines immunology and computational biology laboratories in an effort to establish a complete 'road map' of gene-expression and regulatory networks in all immune cells.

  18. Genome-Wide Identification and Expression Analysis of WRKY Transcription Factors under Multiple Stresses in Brassica napus

    PubMed Central

    He, Yajun; Mao, Shaoshuai; Gao, Yulong; Zhu, Liying; Wu, Daoming; Cui, Yixin; Li, Jiana; Qian, Wei

    2016-01-01

    WRKY transcription factors play important roles in responses to environmental stress stimuli. Using a genome-wide domain analysis, we identified 287 WRKY genes with 343 WRKY domains in the sequenced genome of Brassica napus, 139 in the A sub-genome and 148 in the C sub-genome. These genes were classified into eight groups based on phylogenetic analysis. In the 343 WRKY domains, a total of 26 members showed divergence in the WRKY domain, and 21 belonged to group I. This finding suggested that WRKY genes in group I are more active and variable compared with genes in other groups. Using genome-wide identification and analysis of the WRKY gene family in Brassica napus, we observed genome duplication, chromosomal/segmental duplications and tandem duplication. All of these duplications contributed to the expansion of the WRKY gene family. The duplicate segments that were detected indicated that genome duplication events occurred in the two diploid progenitors B. rapa and B. olearecea before they combined to form B. napus. Analysis of the public microarray database and EST database for B. napus indicated that 74 WRKY genes were induced or preferentially expressed under stress conditions. According to the public QTL data, we identified 77 WRKY genes in 31 QTL regions related to various stress tolerance. We further evaluated the expression of 26 BnaWRKY genes under multiple stresses by qRT-PCR. Most of the genes were induced by low temperature, salinity and drought stress, indicating that the WRKYs play important roles in B. napus stress responses. Further, three BnaWRKY genes were strongly responsive to the three multiple stresses simultaneously, which suggests that these 3 WRKY may have multi-functional roles in stress tolerance and can potentially be used in breeding new rapeseed cultivars. We also found six tandem repeat pairs exhibiting similar expression profiles under the various stress conditions, and three pairs were mapped in the stress related QTL regions

  19. Genome-Wide Identification and Expression Analysis of WRKY Transcription Factors under Multiple Stresses in Brassica napus.

    PubMed

    He, Yajun; Mao, Shaoshuai; Gao, Yulong; Zhu, Liying; Wu, Daoming; Cui, Yixin; Li, Jiana; Qian, Wei

    2016-01-01

    WRKY transcription factors play important roles in responses to environmental stress stimuli. Using a genome-wide domain analysis, we identified 287 WRKY genes with 343 WRKY domains in the sequenced genome of Brassica napus, 139 in the A sub-genome and 148 in the C sub-genome. These genes were classified into eight groups based on phylogenetic analysis. In the 343 WRKY domains, a total of 26 members showed divergence in the WRKY domain, and 21 belonged to group I. This finding suggested that WRKY genes in group I are more active and variable compared with genes in other groups. Using genome-wide identification and analysis of the WRKY gene family in Brassica napus, we observed genome duplication, chromosomal/segmental duplications and tandem duplication. All of these duplications contributed to the expansion of the WRKY gene family. The duplicate segments that were detected indicated that genome duplication events occurred in the two diploid progenitors B. rapa and B. olearecea before they combined to form B. napus. Analysis of the public microarray database and EST database for B. napus indicated that 74 WRKY genes were induced or preferentially expressed under stress conditions. According to the public QTL data, we identified 77 WRKY genes in 31 QTL regions related to various stress tolerance. We further evaluated the expression of 26 BnaWRKY genes under multiple stresses by qRT-PCR. Most of the genes were induced by low temperature, salinity and drought stress, indicating that the WRKYs play important roles in B. napus stress responses. Further, three BnaWRKY genes were strongly responsive to the three multiple stresses simultaneously, which suggests that these 3 WRKY may have multi-functional roles in stress tolerance and can potentially be used in breeding new rapeseed cultivars. We also found six tandem repeat pairs exhibiting similar expression profiles under the various stress conditions, and three pairs were mapped in the stress related QTL regions

  20. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  1. Genome-Wide Survey on Genomic Variation, Expression Divergence, and Evolution in Two Contrasting Rice Genotypes under High Salinity Stress

    PubMed Central

    Jiang, Shu-Ye; Ma, Ali; Ramamoorthy, Rengasamy; Ramachandran, Srinivasan

    2013-01-01

    Expression profiling is one of the most important tools for dissecting biological functions of genes and the upregulation or downregulation of gene expression is sufficient for recreating phenotypic differences. Expression divergence of genes significantly contributes to phenotypic variations. However, little is known on the molecular basis of expression divergence and evolution among rice genotypes with contrasting phenotypes. In this study, we have implemented an integrative approach using bioinformatics and experimental analyses to provide insights into genomic variation, expression divergence, and evolution between salinity-sensitive rice variety Nipponbare and tolerant rice line Pokkali under normal and high salinity stress conditions. We have detected thousands of differentially expressed genes between these two genotypes and thousands of up- or downregulated genes under high salinity stress. Many genes were first detected with expression evidence using custom microarray analysis. Some gene families were preferentially regulated by high salinity stress and might play key roles in stress-responsive biological processes. Genomic variations in promoter regions resulted from single nucleotide polymorphisms, indels (1–10 bp of insertion/deletion), and structural variations significantly contributed to the expression divergence and regulation. Our data also showed that tandem and segmental duplication, CACTA and hAT elements played roles in the evolution of gene expression divergence and regulation between these two contrasting genotypes under normal or high salinity stress conditions. PMID:24121498

  2. Comparative Genome Analysis Provides Insights into Both the Lifestyle of Acidithiobacillus ferrivorans Strain CF27 and the Chimeric Nature of the Iron-Oxidizing Acidithiobacilli Genomes.

    PubMed

    Tran, Tam T T; Mangenot, Sophie; Magdelenat, Ghislaine; Payen, Emilie; Rouy, Zoé; Belahbib, Hassiba; Grail, Barry M; Johnson, D Barrie; Bonnefoy, Violaine; Talla, Emmanuel

    2017-01-01

    The iron-oxidizing species Acidithiobacillus ferrivorans is one of few acidophiles able to oxidize ferrous iron and reduced inorganic sulfur compounds at low temperatures (<10°C). To complete the genome of At. ferrivorans strain CF27, new sequences were generated, and an update assembly and functional annotation were undertaken, followed by a comparative analysis with other Acidithiobacillus species whose genomes are publically available. The At. ferrivorans CF27 genome comprises a 3,409,655 bp chromosome and a 46,453 bp plasmid. At. ferrivorans CF27 possesses genes allowing its adaptation to cold, metal(loid)-rich environments, as well as others that enable it to sense environmental changes, allowing At. ferrivorans CF27 to escape hostile conditions and to move toward favorable locations. Interestingly, the genome of At. ferrivorans CF27 exhibits a large number of genomic islands (mostly containing genes of unknown function), suggesting that a large number of genes has been acquired by horizontal gene transfer over time. Furthermore, several genes specific to At. ferrivorans CF27 have been identified that could be responsible for the phenotypic differences of this strain compared to other Acidithiobacillus species. Most genes located inside At. ferrivorans CF27-specific gene clusters which have been analyzed were expressed by both ferrous iron-grown and sulfur-attached cells, indicating that they are not pseudogenes and may play a role in both situations. Analysis of the taxonomic composition of genomes of the Acidithiobacillia infers that they are chimeric in nature, supporting the premise that they belong to a particular taxonomic class, distinct to other proteobacterial subgroups.

  3. Comparative and demographic analysis of orangutan genomes

    PubMed Central

    Locke, Devin P.; Hillier, LaDeana W.; Warren, Wesley C.; Worley, Kim C.; Nazareth, Lynne V.; Muzny, Donna M.; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T.; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D.; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A.; Fulton, Robert S.; Nelson, Joanne O.; Magrini, Vincent; Pohl, Craig; Graves, Tina A.; Markovic, Chris; Cree, Andy; Dinh, Huyen H.; Hume, Jennifer; Kovar, Christie L.; Fowler, Gerald R.; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P.; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M.; Eichler, Evan E.; White, Simon; Searle, Stephen; Vilella, Albert J.; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Valle, Giuliano Della; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K.; Walker, Jerilyn A.; Ullmer, Brygg; Batzer, Mark A.; Smit, Arian F. A.; Hubley, Robert; Casola, Claudio; Schrider, Daniel R.; Hahn, Matthew W.; Quesada, Victor; Puente, Xose S.; Ordoñez, Gonzalo R.; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S.; Miller, Webb; Kosiol, Carolin; Lawson, Heather A.; Taliwal, Vikas; Martins, André L.; Siepel, Adam; RoyChoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D.; Gutenkunst, Ryan N.; Mailund, Thomas; Dutheil, Julien Y.; Hobolth, Asger; Schierup, Mikkel H.; Chemnick, Leona; Ryder, Oliver A.; Yoshinaga, Yuko; de Jong, Pieter J.; Weinstock, George M.; Rogers, Jeffrey; Mardis, Elaine R.; Gibbs, Richard A.; Wilson, Richard K.

    2011-01-01

    “Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities

  4. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources

    PubMed Central

    Klima, Cassidy L.; Cook, Shaun R.; Zaheer, Rahat; Laing, Chad; Gannon, Vick P.; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W.; McAllister, Tim A.

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2–8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  5. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    PubMed

    Klima, Cassidy L; Cook, Shaun R; Zaheer, Rahat; Laing, Chad; Gannon, Vick P; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W; McAllister, Tim A

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  6. Genome Expression Pathway Analysis Tool – Analysis and visualization of microarray gene expression data under genomic, proteomic and metabolic context

    PubMed Central

    Weniger, Markus; Engelmann, Julia C; Schultz, Jörg

    2007-01-01

    Background Regulation of gene expression is relevant to many areas of biology and medicine, in the study of treatments, diseases, and developmental stages. Microarrays can be used to measure the expression level of thousands of mRNAs at the same time, allowing insight into or comparison of different cellular conditions. The data derived out of microarray experiments is highly dimensional and often noisy, and interpretation of the results can get intricate. Although programs for the statistical analysis of microarray data exist, most of them lack an integration of analysis results and biological interpretation. Results We have developed GEPAT, Genome Expression Pathway Analysis Tool, offering an analysis of gene expression data under genomic, proteomic and metabolic context. We provide an integration of statistical methods for data import and data analysis together with a biological interpretation for subsets of probes or single probes on the chip. GEPAT imports various types of oligonucleotide and cDNA array data formats. Different normalization methods can be applied to the data, afterwards data annotation is performed. After import, GEPAT offers various statistical data analysis methods, as hierarchical, k-means and PCA clustering, a linear model based t-test or chromosomal profile comparison. The results of the analysis can be interpreted by enrichment of biological terms, pathway analysis or interaction networks. Different biological databases are included, to give various information for each probe on the chip. GEPAT offers no linear work flow, but allows the usage of any subset of probes and samples as a start for a new data analysis. GEPAT relies on established data analysis packages, offers a modular approach for an easy extension, and can be run on a computer grid to allow a large number of users. It is freely available under the LGPL open source license for academic and commercial users at . Conclusion GEPAT is a modular, scalable and professional

  7. Whole genome annotation and comparative genomic analyses of bio-control fungus Purpureocillium lilacinum.

    PubMed

    Prasad, Pushplata; Varshney, Deepti; Adholeya, Alok

    2015-11-25

    The fungus Purpureocillium lilacinum is widely known as a biological control agent against plant parasitic nematodes. This research article consists of genomic annotation of the first draft of whole genome sequence of P. lilacinum. The study aims to decipher the putative genetic components of the fungus involved in nematode pathogenesis by performing comparative genomic analysis with nine closely related fungal species in Hypocreales. de novo genomic assembly was done and a total of 301 scaffolds were constructed for P. lilacinum genomic DNA. By employing structural genome prediction models, 13, 266 genes coding for proteins were predicted in the genome. Approximately 73% of the predicted genes were functionally annotated using Blastp, InterProScan and Gene Ontology. A 14.7% fraction of the predicted genes shared significant homology with genes in the Pathogen Host Interactions (PHI) database. The phylogenomic analysis carried out using maximum likelihood RAxML algorithm provided insight into the evolutionary relationship of P. lilacinum. In congruence with other closely related species in the Hypocreales namely, Metarhizium spp., Pochonia chlamydosporia, Cordyceps militaris, Trichoderma reesei and Fusarium spp., P. lilacinum has large gene sets coding for G-protein coupled receptors (GPCRs), proteases, glycoside hydrolases and carbohydrate esterases that are required for degradation of nematode-egg shell components. Screening of the genome by Antibiotics & Secondary Metabolite Analysis Shell (AntiSMASH) pipeline indicated that the genome potentially codes for a variety of secondary metabolites, possibly required for adaptation to heterogeneous lifestyles reported for P. lilacinum. Significant up-regulation of subtilisin-like serine protease genes in presence of nematode eggs in quantitative real-time analyses suggested potential role of serine proteases in nematode pathogenesis. The data offer a better understanding of Purpureocillium lilacinum genome and will

  8. Functional Genome Mining for Metabolites Encoded by Large Gene Clusters through Heterologous Expression of a Whole-Genome Bacterial Artificial Chromosome Library in Streptomyces spp.

    PubMed Central

    Xu, Min; Wang, Yemin; Zhao, Zhilong; Gao, Guixi; Huang, Sheng-Xiong; Kang, Qianjin; He, Xinyi; Lin, Shuangjun; Pang, Xiuhua; Deng, Zixin

    2016-01-01

    ABSTRACT Genome sequencing projects in the last decade revealed numerous cryptic biosynthetic pathways for unknown secondary metabolites in microbes, revitalizing drug discovery from microbial metabolites by approaches called genome mining. In this work, we developed a heterologous expression and functional screening approach for genome mining from genomic bacterial artificial chromosome (BAC) libraries in Streptomyces spp. We demonstrate mining from a strain of Streptomyces rochei, which is known to produce streptothricins and borrelidin, by expressing its BAC library in the surrogate host Streptomyces lividans SBT5, and screening for antimicrobial activity. In addition to the successful capture of the streptothricin and borrelidin biosynthetic gene clusters, we discovered two novel linear lipopeptides and their corresponding biosynthetic gene cluster, as well as a novel cryptic gene cluster for an unknown antibiotic from S. rochei. This high-throughput functional genome mining approach can be easily applied to other streptomycetes, and it is very suitable for the large-scale screening of genomic BAC libraries for bioactive natural products and the corresponding biosynthetic pathways. IMPORTANCE Microbial genomes encode numerous cryptic biosynthetic gene clusters for unknown small metabolites with potential biological activities. Several genome mining approaches have been developed to activate and bring these cryptic metabolites to biological tests for future drug discovery. Previous sequence-guided procedures relied on bioinformatic analysis to predict potentially interesting biosynthetic gene clusters. In this study, we describe an efficient approach based on heterologous expression and functional screening of a whole-genome library for the mining of bioactive metabolites from Streptomyces. The usefulness of this function-driven approach was demonstrated by the capture of four large biosynthetic gene clusters for metabolites of various chemical types, including

  9. Different waves of effector genes with contrasted genomic location are expressed by Leptosphaeria maculans during cotyledon and stem colonization of oilseed rape.

    PubMed

    Gervais, Julie; Plissonneau, Clémence; Linglin, Juliette; Meyer, Michel; Labadie, Karine; Cruaud, Corinne; Fudal, Isabelle; Rouxel, Thierry; Balesdent, Marie-Hélène

    2017-10-01

    Leptosphaeria maculans, the causal agent of stem canker disease, colonizes oilseed rape (Brassica napus) in two stages: a short and early colonization stage corresponding to cotyledon or leaf colonization, and a late colonization stage during which the fungus colonizes systemically and symptomlessly the plant during several months before stem canker appears. To date, the determinants of the late colonization stage are poorly understood; L. maculans may either successfully escape plant defences, leading to stem canker development, or the plant may develop an 'adult-stage' resistance reducing canker incidence. To obtain an insight into these determinants, we performed an RNA-sequencing (RNA-seq) pilot project comparing fungal gene expression in infected cotyledons and in symptomless or necrotic stems. Despite the low fraction of fungal material in infected stems, sufficient fungal transcripts were detected and a large number of fungal genes were expressed, thus validating the feasibility of the approach. Our analysis showed that all avirulence genes previously identified are under-expressed during stem colonization compared with cotyledon colonization. A validation RNA-seq experiment was then performed to investigate the expression of candidate effector genes during systemic colonization. Three hundred and seven 'late' effector candidates, under-expressed in the early colonization stage and over-expressed in the infected stems, were identified. Finally, our analysis revealed a link between the regulation of expression of effectors and their genomic location: the 'late' effector candidates, putatively involved in systemic colonization, are located in gene-rich genomic regions, whereas the 'early' effector genes, over-expressed in the early colonization stage, are located in gene-poor regions of the genome. © 2016 BSPP AND JOHN WILEY & SONS LTD.

  10. Comparative genomics of Ceriporiopsis subvermispora and Phanerochaete chrysosporium provide insight into selective ligninolysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fernandez-Fueyo, Elena; Ruiz-Duenas, Francisco J.; Ferreira, Patrica

    Efficient lignin depolymerization is unique to the wood decay basidiomycetes, collectively referred to as white rot fungi. Phanerochaete chrysosporium simultaneously degrades lignin and cellulose, whereas the closely related species, Ceriporiopsis subvermispora, also depolymerizes lignin but may do so with relatively little cellulose degradation. To investigate the basis for selective ligninolysis, we conducted comparative genome analysis of C. subvermispora and P. chrysosporium. Genes encoding manganese peroxidase numbered 13 and five in C. subvermispora and P. chrysosporium, respectively. In addition, the C. subvermispora genome contains at least seven genes predicted to encode laccases, whereas the P. chrysosporium genome contains none. We alsomore » observed expansion of the number of C. subvermispora desaturase-encoding genes putatively involved in lipid metabolism. Microarray-based transcriptome analysis showed substantial up-regulation of several desaturase and MnP genes in wood-containing medium. MS identified MnP proteins in C. subvermispora culture filtrates, but none in P. chrysosporium cultures. These results support the importance of MnP and a lignin degradation mechanism whereby cleavage of the dominant nonphenolic structures is mediated by lipid peroxidation products. Two C. subvermispora genes were predicted to encode peroxidases structurally similar to P. chrysosporium lignin peroxidase and, following heterologous expression in Escherichia coli, the enzymes were shown to oxidize high redox potential substrates, but not Mn2. Apart from oxidative lignin degradation, we also examined cellulolytic and hemicellulolytic systems in both fungi. In summary, the C. subvermispora genetic inventory and expression patterns exhibit increased oxidoreductase potential and diminished cellulolytic capability relative to P. chrysosporium.« less

  11. Comparative genomics of Ceriporiopsis subvermispora and Phanerochaete chrysosporium provide insight into selective ligninolysis

    PubMed Central

    Fernandez-Fueyo, Elena; Ruiz-Dueñas, Francisco J.; Ferreira, Patricia; Floudas, Dimitrios; Hibbett, David S.; Canessa, Paulo; Larrondo, Luis F.; James, Tim Y.; Seelenfreund, Daniela; Lobos, Sergio; Polanco, Rubén; Tello, Mario; Honda, Yoichi; Watanabe, Takahito; Watanabe, Takashi; Ryu, Jae San; Kubicek, Christian P.; Schmoll, Monika; Gaskell, Jill; Hammel, Kenneth E.; St. John, Franz J.; Vanden Wymelenberg, Amber; Sabat, Grzegorz; Splinter BonDurant, Sandra; Syed, Khajamohiddin; Yadav, Jagjit S.; Doddapaneni, Harshavardhan; Subramanian, Venkataramanan; Lavín, José L.; Oguiza, José A.; Perez, Gumer; Pisabarro, Antonio G.; Ramirez, Lucia; Santoyo, Francisco; Master, Emma; Coutinho, Pedro M.; Henrissat, Bernard; Lombard, Vincent; Magnuson, Jon Karl; Kües, Ursula; Hori, Chiaki; Igarashi, Kiyohiko; Samejima, Masahiro; Held, Benjamin W.; Barry, Kerrie W.; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lucas, Susan M.; Riley, Robert; Salamov, Asaf A.; Hoffmeister, Dirk; Schwenk, Daniel; Hadar, Yitzhak; Yarden, Oded; de Vries, Ronald P.; Wiebenga, Ad; Stenlid, Jan; Eastwood, Daniel; Grigoriev, Igor V.; Berka, Randy M.; Blanchette, Robert A.; Kersten, Phil; Martinez, Angel T.; Vicuna, Rafael; Cullen, Dan

    2012-01-01

    Efficient lignin depolymerization is unique to the wood decay basidiomycetes, collectively referred to as white rot fungi. Phanerochaete chrysosporium simultaneously degrades lignin and cellulose, whereas the closely related species, Ceriporiopsis subvermispora, also depolymerizes lignin but may do so with relatively little cellulose degradation. To investigate the basis for selective ligninolysis, we conducted comparative genome analysis of C. subvermispora and P. chrysosporium. Genes encoding manganese peroxidase numbered 13 and five in C. subvermispora and P. chrysosporium, respectively. In addition, the C. subvermispora genome contains at least seven genes predicted to encode laccases, whereas the P. chrysosporium genome contains none. We also observed expansion of the number of C. subvermispora desaturase-encoding genes putatively involved in lipid metabolism. Microarray-based transcriptome analysis showed substantial up-regulation of several desaturase and MnP genes in wood-containing medium. MS identified MnP proteins in C. subvermispora culture filtrates, but none in P. chrysosporium cultures. These results support the importance of MnP and a lignin degradation mechanism whereby cleavage of the dominant nonphenolic structures is mediated by lipid peroxidation products. Two C. subvermispora genes were predicted to encode peroxidases structurally similar to P. chrysosporium lignin peroxidase and, following heterologous expression in Escherichia coli, the enzymes were shown to oxidize high redox potential substrates, but not Mn2+. Apart from oxidative lignin degradation, we also examined cellulolytic and hemicellulolytic systems in both fungi. In summary, the C. subvermispora genetic inventory and expression patterns exhibit increased oxidoreductase potential and diminished cellulolytic capability relative to P. chrysosporium. PMID:22434909

  12. Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

    PubMed

    Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

    2015-10-05

    Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome

  13. Comparative whole genome analysis of six diagnostic brucellaphages.

    PubMed

    Farlow, Jason; Filippov, Andrey A; Sergueev, Kirill V; Hang, Jun; Kotorashvili, Adam; Nikolich, Mikeljon P

    2014-05-15

    Whole genome sequencing of six diagnostic brucellaphages, Tbilisi (Tb), Firenze (Fz), Weybridge (Wb), S708, Berkeley (Bk) and R/C, was followed with genomic comparisons including recently described genomes of the Tb phage from Mexico (TbM) and Pr phage to elucidate genomic diversity and candidate host range determinants. Comparative whole genome analysis revealed high sequence homogeneity among these brucellaphage genomes and resolved three genetic groups consistent with defined host range phenotypes. Group I was composed of Tb and Fz phages that are predominantly lytic for Brucella abortus and Brucella neotomae; Group II included Bk, R/C, and Pr phages that are lytic mainly for B. abortus, Brucella melitensis and Brucella suis; Group III was composed of Wb and S708 phages that are lytic for B. suis, B. abortus and B. neotomae. We found that the putative phage collar protein is a variable locus with features that may be contributing to the host specificities exhibited by different brucellaphage groups. The presence of several candidate host range determinants is illustrated herein for future dissection of the differential host specificity observed among these phages. Published by Elsevier B.V.

  14. Comparative genome analysis of Pediococcus damnosus LMG 28219, a strain well-adapted to the beer environment.

    PubMed

    Snauwaert, Isabel; Stragier, Pieter; De Vuyst, Luc; Vandamme, Peter

    2015-04-03

    Pediococcus damnosus LMG 28219 is a lactic acid bacterium dominating the maturation phase of Flemish acid beer productions. It proved to be capable of growing in beer, thereby resisting this environment, which is unfavorable for microbial growth. The molecular mechanisms underlying its metabolic capabilities and niche adaptations were unknown up to now. In the present study, whole-genome sequencing and comparative genome analysis were used to investigate this strain's mechanisms to reside in the beer niche, with special focus on not only stress and hop resistances but also folate biosynthesis and exopolysaccharide (EPS) production. The draft genome sequence of P. damnosus LMG 28219 harbored 183 contigs, including an intact prophage region and several coding sequences involved in plasmid replication. The annotation of 2178 coding sequences revealed the presence of many transporters and transcriptional regulators and several genes involved in oxidative stress response, hop resistance, de novo folate biosynthesis, and EPS production. Comparative genome analysis of P. damnosus LMG 28219 with Pediococcus claussenii ATCC BAA-344(T) (beer origin) and Pediococcus pentosaceus ATCC 25745 (plant origin) revealed that various hop resistance genes and genes involved in de novo folate biosynthesis were unique to the strains isolated from beer. This contrasted with the genes related to osmotic stress responses, which were shared between the strains compared. Furthermore, transcriptional regulators were enriched in the genomes of bacteria capable of growth in beer, suggesting that those cause rapid up- or down-regulation of gene expression. Genome sequence analysis of P. damnosus LMG 28219 provided insights into the underlying mechanisms of its adaptation to the beer niche. The results presented will enable analysis of the transcriptome and proteome of P. damnosus LMG 28219, which will result in additional knowledge on its metabolic activities.

  15. Transcriptional Activity, Chromosomal Distribution and Expression Effects of Transposable Elements in Coffea Genomes

    PubMed Central

    da Silva, Carlos R. M.; Andrade, Alan C.; Marraccini, Pierre; Teixeira, João B.; Carazzolle, Marcelo F.; Pereira, Gonçalo A. G.; Pereira, Luiz Filipe P.; Vanzela, André L. L.; Wang, Lu; Jordan, I. King; Carareto, Claudia M. A.

    2013-01-01

    Plant genomes are massively invaded by transposable elements (TEs), many of which are located near host genes and can thus impact gene expression. In flowering plants, TE expression can be activated (de-repressed) under certain stressful conditions, both biotic and abiotic, as well as by genome stress caused by hybridization. In this study, we examined the effects of these stress agents on TE expression in two diploid species of coffee, Coffea canephora and C. eugenioides, and their allotetraploid hybrid C. arabica. We also explored the relationship of TE repression mechanisms to host gene regulation via the effects of exonized TE sequences. Similar to what has been seen for other plants, overall TE expression levels are low in Coffea plant cultivars, consistent with the existence of effective TE repression mechanisms. TE expression patterns are highly dynamic across the species and conditions assayed here are unrelated to their classification at the level of TE class or family. In contrast to previous results, cell culture conditions per se do not lead to the de-repression of TE expression in C. arabica. Results obtained here indicate that differing plant drought stress levels relate strongly to TE repression mechanisms. TEs tend to be expressed at significantly higher levels in non-irrigated samples for the drought tolerant cultivars but in drought sensitive cultivars the opposite pattern was shown with irrigated samples showing significantly higher TE expression. Thus, TE genome repression mechanisms may be finely tuned to the ideal growth and/or regulatory conditions of the specific plant cultivars in which they are active. Analysis of TE expression levels in cell culture conditions underscored the importance of nonsense-mediated mRNA decay (NMD) pathways in the repression of Coffea TEs. These same NMD mechanisms can also regulate plant host gene expression via the repression of genes that bear exonized TE sequences. PMID:24244387

  16. Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria.

    PubMed

    Sun, Eric I; Leyn, Semen A; Kazanov, Marat D; Saier, Milton H; Novichkov, Pavel S; Rodionov, Dmitry A

    2013-09-02

    In silico comparative genomics approaches have been efficiently used for functional prediction and reconstruction of metabolic and regulatory networks. Riboswitches are metabolite-sensing structures often found in bacterial mRNA leaders controlling gene expression on transcriptional or translational levels.An increasing number of riboswitches and other cis-regulatory RNAs have been recently classified into numerous RNA families in the Rfam database. High conservation of these RNA motifs provides a unique advantage for their genomic identification and comparative analysis. A comparative genomics approach implemented in the RegPredict tool was used for reconstruction and functional annotation of regulons controlled by RNAs from 43 Rfam families in diverse taxonomic groups of Bacteria. The inferred regulons include ~5200 cis-regulatory RNAs and more than 12000 target genes in 255 microbial genomes. All predicted RNA-regulated genes were classified into specific and overall functional categories. Analysis of taxonomic distribution of these categories allowed us to establish major functional preferences for each analyzed cis-regulatory RNA motif family. Overall, most RNA motif regulons showed predictable functional content in accordance with their experimentally established effector ligands. Our results suggest that some RNA motifs (including thiamin pyrophosphate and cobalamin riboswitches that control the cofactor metabolism) are widespread and likely originated from the last common ancestor of all bacteria. However, many more analyzed RNA motifs are restricted to a narrow taxonomic group of bacteria and likely represent more recent evolutionary innovations. The reconstructed regulatory networks for major known RNA motifs substantially expand the existing knowledge of transcriptional regulation in bacteria. The inferred regulons can be used for genetic experiments, functional annotations of genes, metabolic reconstruction and evolutionary analysis. The obtained genome

  17. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia

    PubMed Central

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M.

    2016-01-01

    Abstract Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints. PMID:28175287

  18. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia.

    PubMed

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M; Ruiz-Herrera, Aurora

    2016-12-01

    Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints.

  19. Comparative Genomic Analyses of the Human NPHP1 Locus Reveal Complex Genomic Architecture and Its Regional Evolution in Primates

    PubMed Central

    Yuan, Bo; Liu, Pengfei; Gupta, Aditya; Beck, Christine R.; Tejomurtula, Anusha; Campbell, Ian M.; Gambin, Tomasz; Simmons, Alexandra D.; Withers, Marjorie A.; Harris, R. Alan; Rogers, Jeffrey; Schwartz, David C.; Lupski, James R.

    2015-01-01

    Many loci in the human genome harbor complex genomic structures that can result in susceptibility to genomic rearrangements leading to various genomic disorders. Nephronophthisis 1 (NPHP1, MIM# 256100) is an autosomal recessive disorder that can be caused by defects of NPHP1; the gene maps within the human 2q13 region where low copy repeats (LCRs) are abundant. Loss of function of NPHP1 is responsible for approximately 85% of the NPHP1 cases—about 80% of such individuals carry a large recurrent homozygous NPHP1 deletion that occurs via nonallelic homologous recombination (NAHR) between two flanking directly oriented ~45 kb LCRs. Published data revealed a non-pathogenic inversion polymorphism involving the NPHP1 gene flanked by two inverted ~358 kb LCRs. Using optical mapping and array-comparative genomic hybridization, we identified three potential novel structural variant (SV) haplotypes at the NPHP1 locus that may protect a haploid genome from the NPHP1 deletion. Inter-species comparative genomic analyses among primate genomes revealed massive genomic changes during evolution. The aggregated data suggest that dynamic genomic rearrangements occurred historically within the NPHP1 locus and generated SV haplotypes observed in the human population today, which may confer differential susceptibility to genomic instability and the NPHP1 deletion within a personal genome. Our study documents diverse SV haplotypes at a complex LCR-laden human genomic region. Comparative analyses provide a model for how this complex region arose during primate evolution, and studies among humans suggest that intra-species polymorphism may potentially modulate an individual’s susceptibility to acquiring disease-associated alleles. PMID:26641089

  20. Identification of rhizome-specific genes by genome-wide differential expression Analysis in Oryza longistaminata

    PubMed Central

    2011-01-01

    Background Rhizomatousness is a key component of perenniality of many grasses that contribute to competitiveness and invasiveness of many noxious grass weeds, but can potentially be used to develop perennial cereal crops for sustainable farmers in hilly areas of tropical Asia. Oryza longistaminata, a perennial wild rice with strong rhizomes, has been used as the model species for genetic and molecular dissection of rhizome development and in breeding efforts to transfer rhizome-related traits into annual rice species. In this study, an effort was taken to get insights into the genes and molecular mechanisms underlying the rhizomatous trait in O. longistaminata by comparative analysis of the genome-wide tissue-specific gene expression patterns of five different tissues of O. longistaminata using the Affymetrix GeneChip Rice Genome Array. Results A total of 2,566 tissue-specific genes were identified in five different tissues of O. longistaminata, including 58 and 61 unique genes that were specifically expressed in the rhizome tips (RT) and internodes (RI), respectively. In addition, 162 genes were up-regulated and 261 genes were down-regulated in RT compared to the shoot tips. Six distinct cis-regulatory elements (CGACG, GCCGCC, GAGAC, AACGG, CATGCA, and TAAAG) were found to be significantly more abundant in the promoter regions of genes differentially expressed in RT than in the promoter regions of genes uniformly expressed in all other tissues. Many of the RT and/or RI specifically or differentially expressed genes were located in the QTL regions associated with rhizome expression, rhizome abundance and rhizome growth-related traits in O. longistaminata and thus are good candidate genes for these QTLs. Conclusion The initiation and development of the rhizomatous trait in O. longistaminata are controlled by very complex gene networks involving several plant hormones and regulatory genes, different members of gene families showing tissue specificity and their

  1. Identification of rhizome-specific genes by genome-wide differential expression analysis in Oryza longistaminata.

    PubMed

    Hu, Fengyi; Wang, Di; Zhao, Xiuqin; Zhang, Ting; Sun, Haixi; Zhu, Linghua; Zhang, Fan; Li, Lijuan; Li, Qiong; Tao, Dayun; Fu, Binying; Li, Zhikang

    2011-01-24

    Rhizomatousness is a key component of perenniality of many grasses that contribute to competitiveness and invasiveness of many noxious grass weeds, but can potentially be used to develop perennial cereal crops for sustainable farmers in hilly areas of tropical Asia. Oryza longistaminata, a perennial wild rice with strong rhizomes, has been used as the model species for genetic and molecular dissection of rhizome development and in breeding efforts to transfer rhizome-related traits into annual rice species. In this study, an effort was taken to get insights into the genes and molecular mechanisms underlying the rhizomatous trait in O. longistaminata by comparative analysis of the genome-wide tissue-specific gene expression patterns of five different tissues of O. longistaminata using the Affymetrix GeneChip Rice Genome Array. A total of 2,566 tissue-specific genes were identified in five different tissues of O. longistaminata, including 58 and 61 unique genes that were specifically expressed in the rhizome tips (RT) and internodes (RI), respectively. In addition, 162 genes were up-regulated and 261 genes were down-regulated in RT compared to the shoot tips. Six distinct cis-regulatory elements (CGACG, GCCGCC, GAGAC, AACGG, CATGCA, and TAAAG) were found to be significantly more abundant in the promoter regions of genes differentially expressed in RT than in the promoter regions of genes uniformly expressed in all other tissues. Many of the RT and/or RI specifically or differentially expressed genes were located in the QTL regions associated with rhizome expression, rhizome abundance and rhizome growth-related traits in O. longistaminata and thus are good candidate genes for these QTLs. The initiation and development of the rhizomatous trait in O. longistaminata are controlled by very complex gene networks involving several plant hormones and regulatory genes, different members of gene families showing tissue specificity and their regulated pathways. Auxin

  2. Comparative Genomics Identifies Epidermal Proteins Associated with the Evolution of the Turtle Shell.

    PubMed

    Holthaus, Karin Brigit; Strasser, Bettina; Sipos, Wolfgang; Schmidt, Heiko A; Mlitz, Veronika; Sukseree, Supawadee; Weissenbacher, Anton; Tschachler, Erwin; Alibardi, Lorenzo; Eckhart, Leopold

    2016-03-01

    The evolution of reptiles, birds, and mammals was associated with the origin of unique integumentary structures. Studies on lizards, chicken, and humans have suggested that the evolution of major structural proteins of the outermost, cornified layers of the epidermis was driven by the diversification of a gene cluster called Epidermal Differentiation Complex (EDC). Turtles have evolved unique defense mechanisms that depend on mechanically resilient modifications of the epidermis. To investigate whether the evolution of the integument in these reptiles was associated with specific adaptations of the sequences and expression patterns of EDC-related genes, we utilized newly available genome sequences to determine the epidermal differentiation gene complement of turtles. The EDC of the western painted turtle (Chrysemys picta bellii) comprises more than 100 genes, including at least 48 genes that encode proteins referred to as beta-keratins or corneous beta-proteins. Several EDC proteins have evolved cysteine/proline contents beyond 50% of total amino acid residues. Comparative genomics suggests that distinct subfamilies of EDC genes have been expanded and partly translocated to loci outside of the EDC in turtles. Gene expression analysis in the European pond turtle (Emys orbicularis) showed that EDC genes are differentially expressed in the skin of the various body sites and that a subset of beta-keratin genes within the EDC as well as those located outside of the EDC are expressed predominantly in the shell. Our findings give strong support to the hypothesis that the evolutionary innovation of the turtle shell involved specific molecular adaptations of epidermal differentiation. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  3. MIPSPlantsDB—plant database resource for integrative and comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Noubibou, Octave; Haase, Dirk; Yang, Li; Gundlach, Heidrun; Hindemitt, Tobias; Klee, Kathrin; Haberer, Georg; Schoof, Heiko; Mayer, Klaus F. X.

    2007-01-01

    Genome-oriented plant research delivers rapidly increasing amount of plant genome data. Comprehensive and structured information resources are required to structure and communicate genome and associated analytical data for model organisms as well as for crops. The increase in available plant genomic data enables powerful comparative analysis and integrative approaches. PlantsDB aims to provide data and information resources for individual plant species and in addition to build a platform for integrative and comparative plant genome research. PlantsDB is constituted from genome databases for Arabidopsis, Medicago, Lotus, rice, maize and tomato. Complementary data resources for cis elements, repetive elements and extensive cross-species comparisons are implemented. The PlantsDB portal can be reached at . PMID:17202173

  4. Comparative Genomics of Bacillus species and its Relevance in Industrial Microbiology.

    PubMed

    Sharma, Archana; Satyanarayana, T

    2013-01-01

    With the advent of high throughput sequencing platforms and relevant analytical tools, the rate of microbial genome sequencing has accelerated which has in turn led to better understanding of microbial molecular biology and genetics. The complete genome sequences of important industrial organisms provide opportunities for human health, industry, and the environment. Bacillus species are the dominant workhorses in industrial fermentations. Today, genome sequences of several Bacillus species are available, and comparative genomics of this genus helps in understanding their physiology, biochemistry, and genetics. The genomes of these bacterial species are the sources of many industrially important enzymes and antibiotics and, therefore, provide an opportunity to tailor enzymes with desired properties to suit a wide range of applications. A comparative account of strengths and weaknesses of the different sequencing platforms are also highlighted in the review.

  5. Comparative mapping in intraspecific populations uncovers a high degree of macrosynteny between A- and B-genome diploid species of peanut

    PubMed Central

    2012-01-01

    Background Cultivated peanut or groundnut (Arachis hypogaea L.) is an important oilseed crop with an allotetraploid genome (AABB, 2n = 4x = 40). Both the low level of genetic variation within the cultivated gene pool and its polyploid nature limit the utilization of molecular markers to explore genome structure and facilitate genetic improvement. Nevertheless, a wealth of genetic diversity exists in diploid Arachis species (2n = 2x = 20), which represent a valuable gene pool for cultivated peanut improvement. Interspecific populations have been used widely for genetic mapping in diploid species of Arachis. However, an intraspecific mapping strategy was essential to detect chromosomal rearrangements among species that could be obscured by mapping in interspecific populations. To develop intraspecific reference linkage maps and gain insights into karyotypic evolution within the genus, we comparatively mapped the A- and B-genome diploid species using intraspecific F2 populations. Exploring genome organization among diploid peanut species by comparative mapping will enhance our understanding of the cultivated tetraploid peanut genome. Moreover, new sources of molecular markers that are highly transferable between species and developed from expressed genes will be required to construct saturated genetic maps for peanut. Results A total of 2,138 EST-SSR (expressed sequence tag-simple sequence repeat) markers were developed by mining a tetraploid peanut EST assembly including 101,132 unigenes (37,916 contigs and 63,216 singletons) derived from 70,771 long-read (Sanger) and 270,957 short-read (454) sequences. A set of 97 SSR markers were also developed by mining 9,517 genomic survey sequences of Arachis. An SSR-based intraspecific linkage map was constructed using an F2 population derived from a cross between K 9484 (PI 298639) and GKBSPSc 30081 (PI 468327) in the B-genome species A. batizocoi. A high degree of macrosynteny was observed when comparing the

  6. Comparative Genomics as a Foundation for Evo-Devo Studies in Birds.

    PubMed

    Grayson, Phil; Sin, Simon Y W; Sackton, Timothy B; Edwards, Scott V

    2017-01-01

    Developmental genomics is a rapidly growing field, and high-quality genomes are a useful foundation for comparative developmental studies. A high-quality genome forms an essential reference onto which the data from numerous assays and experiments, including ChIP-seq, ATAC-seq, and RNA-seq, can be mapped. A genome also streamlines and simplifies the development of primers used to amplify putative regulatory regions for enhancer screens, cDNA probes for in situ hybridization, microRNAs (miRNAs) or short hairpin RNAs (shRNA) for RNA interference (RNAi) knockdowns, mRNAs for misexpression studies, and even guide RNAs (gRNAs) for CRISPR knockouts. Finally, much can be gleaned from comparative genomics alone, including the identification of highly conserved putative regulatory regions. This chapter provides an overview of laboratory and bioinformatics protocols for DNA extraction, library preparation, library quantification, and genome assembly, from fresh or frozen tissue to a draft avian genome. Generating a high-quality draft genome can provide a developmental research group with excellent resources for their study organism, opening the doors to many additional assays and experiments.

  7. Systems perspectives on erythromycin biosynthesis by comparative genomic and transcriptomic analyses of S. erythraea E3 and NRRL23338 strains

    PubMed Central

    2013-01-01

    Background S. erythraea is a Gram-positive filamentous bacterium used for the industrial-scale production of erythromycin A which is of high clinical importance. In this work, we sequenced the whole genome of a high-producing strain (E3) obtained by random mutagenesis and screening from the wild-type strain NRRL23338, and examined time-series expression profiles of both E3 and NRRL23338. Based on the genomic data and transcriptpmic data of these two strains, we carried out comparative analysis of high-producing strain and wild-type strain at both the genomic level and the transcriptomic level. Results We observed a large number of genetic variants including 60 insertions, 46 deletions and 584 single nucleotide variations (SNV) in E3 in comparison with NRRL23338, and the analysis of time series transcriptomic data indicated that the genes involved in erythromycin biosynthesis and feeder pathways were significantly up-regulated during the 60 hours time-course. According to our data, BldD, a previously identified ery cluster regulator, did not show any positive correlations with the expression of ery cluster, suggesting the existence of alternative regulation mechanisms of erythromycin synthesis in S. erythraea. Several potential regulators were then proposed by integration analysis of genomic and transcriptomic data. Conclusion This is a demonstration of the functional comparative genomics between an industrial S. erythraea strain and the wild-type strain. These findings help to understand the global regulation mechanisms of erythromycin biosynthesis in S. erythraea, providing useful clues for genetic and metabolic engineering in the future. PMID:23902230

  8. What constitutes an Arabian Helicobacter pylori? Lessons from comparative genomics.

    PubMed

    Kumar, Narender; Albert, M John; Al Abkal, Hanan; Siddique, Iqbal; Ahmed, Niyaz

    2017-02-01

    Helicobacter pylori, the human gastric pathogen, causes a variety of gastric diseases ranging from mild gastritis to gastric cancer. While the studies on H. pylori are dominated by those based on either East Asian or Western strains, information regarding H. pylori strains prevalent in the Middle East remains scarce. Therefore, we carried out whole-genome sequencing and comparative analysis of three H. pylori strains isolated from three native Arab, Kuwaiti patients. H. pylori strains were sequenced using Illumina platform. The sequence reads were filtered and draft genomes were assembled and annotated. Various pathogenicity-associated regions and phages present within the genomes were identified. Phylogenetic analysis was carried out to determine the genetic relatedness of Kuwaiti strains to various lineages of H. pylori. The core genome content and virulence-related genes were analyzed to assess the pathogenic potential. The three genomes clustered along with HpEurope strains in the phylogenetic tree comprising various H. pylori lineages. A total of 1187 genes spread among various functional classes were identified in the core genome analysis. The three genomes possessed a complete cagPAI and also retained most of the known outer membrane proteins as well as virulence-related genes. The cagA gene in all three strains consisted of an AB-C type EPIYA motif. The comparative genomic analysis of Kuwaiti H. pylori strains revealed a European ancestry and a high pathogenic potential. © 2016 John Wiley & Sons Ltd.

  9. Cocoa/Cotton Comparative Genomics

    USDA-ARS?s Scientific Manuscript database

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  10. Comparative genome analysis of Basidiomycete fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism.more » Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.« less

  11. Comparative ruminant genomics highlights segmental duplication and mobile element insertion diversity

    USDA-ARS?s Scientific Manuscript database

    We have expanded upon a previously reported comparative genomics approach using a read-depth (JaRMs) and a hybrid read-pair, split-read (RAPTR-SV) copy number variation (CNV) detection method that uses read alignments to the cattle reference genome in order to identify species-specific genomic rearr...

  12. Genome-wide transcriptome and expression profile analysis of Phalaenopsis during explant browning.

    PubMed

    Xu, Chuanjun; Zeng, Biyu; Huang, Junmei; Huang, Wen; Liu, Yumei

    2015-01-01

    Explant browning presents a major problem for in vitro culture, and can lead to the death of the explant and failure of regeneration. Considerable work has examined the physiological mechanisms underlying Phalaenopsis leaf explant browning, but the molecular mechanisms of browning remain elusive. In this study, we used whole genome RNA sequencing to examine Phalaenopsis leaf explant browning at genome-wide level. We first used Illumina high-throughput technology to sequence the transcriptome of Phalaenopsis and then performed de novo transcriptome assembly. We assembled 79,434,350 clean reads into 31,708 isogenes and generated 26,565 annotated unigenes. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript. Using the transcriptome data as a reference, we next analyzed the differential gene expression of explants cultured for 0, 3, and 6 d, respectively. We then identified differentially expressed genes (DEGs) before and after Phalaenopsis explant browning. We also performed GO, KEGG functional enrichment and Pfam analysis of all DEGs. Finally, we selected 11 genes for quantitative real-time PCR (qPCR) analysis to confirm the expression profile analysis. Here, we report the first comprehensive analysis of transcriptome and expression profiles during Phalaenopsis explant browning. Our results suggest that Phalaenopsis explant browning may be due in part to gene expression changes that affect the secondary metabolism, such as: phenylpropanoid pathway and flavonoid biosynthesis. Genes involved in photosynthesis and ATPase activity have been found to be changed at transcription level; these changes may perturb energy metabolism and thus lead to the decay of plant cells and tissues. This study provides comprehensive gene expression data for Phalaenopsis browning. Our data constitute an important resource for further functional studies to prevent explant browning.

  13. Genome-Wide Transcriptome and Expression Profile Analysis of Phalaenopsis during Explant Browning

    PubMed Central

    Xu, Chuanjun; Zeng, Biyu; Huang, Junmei; Huang, Wen; Liu, Yumei

    2015-01-01

    Background Explant browning presents a major problem for in vitro culture, and can lead to the death of the explant and failure of regeneration. Considerable work has examined the physiological mechanisms underlying Phalaenopsis leaf explant browning, but the molecular mechanisms of browning remain elusive. In this study, we used whole genome RNA sequencing to examine Phalaenopsis leaf explant browning at genome-wide level. Methodology/Principal Findings We first used Illumina high-throughput technology to sequence the transcriptome of Phalaenopsis and then performed de novo transcriptome assembly. We assembled 79,434,350 clean reads into 31,708 isogenes and generated 26,565 annotated unigenes. We assigned Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG) annotations, and potential Pfam domains to each transcript. Using the transcriptome data as a reference, we next analyzed the differential gene expression of explants cultured for 0, 3, and 6 d, respectively. We then identified differentially expressed genes (DEGs) before and after Phalaenopsis explant browning. We also performed GO, KEGG functional enrichment and Pfam analysis of all DEGs. Finally, we selected 11 genes for quantitative real-time PCR (qPCR) analysis to confirm the expression profile analysis. Conclusions/Significance Here, we report the first comprehensive analysis of transcriptome and expression profiles during Phalaenopsis explant browning. Our results suggest that Phalaenopsis explant browning may be due in part to gene expression changes that affect the secondary metabolism, such as: phenylpropanoid pathway and flavonoid biosynthesis. Genes involved in photosynthesis and ATPase activity have been found to be changed at transcription level; these changes may perturb energy metabolism and thus lead to the decay of plant cells and tissues. This study provides comprehensive gene expression data for Phalaenopsis browning. Our data constitute an important resource for further

  14. Strategies used for genetically modifying bacterial genome: ite-directed mutagenesis, gene inactivation, and gene over-expression*

    PubMed Central

    Xu, Jian-zhong; Zhang, Wei-guo

    2016-01-01

    With the availability of the whole genome sequence of Escherichia coli or Corynebacterium glutamicum, strategies for directed DNA manipulation have developed rapidly. DNA manipulation plays an important role in understanding the function of genes and in constructing novel engineering bacteria according to requirement. DNA manipulation involves modifying the autologous genes and expressing the heterogenous genes. Two alternative approaches, using electroporation linear DNA or recombinant suicide plasmid, allow a wide variety of DNA manipulation. However, the over-expression of the desired gene is generally executed via plasmid-mediation. The current review summarizes the common strategies used for genetically modifying E. coli and C. glutamicum genomes, and discusses the technical problem of multi-layered DNA manipulation. Strategies for gene over-expression via integrating into genome are proposed. This review is intended to be an accessible introduction to DNA manipulation within the bacterial genome for novices and a source of the latest experimental information for experienced investigators. PMID:26834010

  15. A Genome-Wide Landscape of Retrocopies in Primate Genomes.

    PubMed

    Navarro, Fábio C P; Galante, Pedro A F

    2015-07-29

    Gene duplication is a key factor contributing to phenotype diversity across and within species. Although the availability of complete genomes has led to the extensive study of genomic duplications, the dynamics and variability of gene duplications mediated by retrotransposition are not well understood. Here, we predict mRNA retrotransposition and use comparative genomics to investigate their origin and variability across primates. Analyzing seven anthropoid primate genomes, we found a similar number of mRNA retrotranspositions (∼7,500 retrocopies) in Catarrhini (Old Word Monkeys, including humans), but a surprising large number of retrocopies (∼10,000) in Platyrrhini (New World Monkeys), which may be a by-product of higher long interspersed nuclear element 1 activity in these genomes. By inferring retrocopy orthology, we dated most of the primate retrocopy origins, and estimated a decrease in the fixation rate in recent primate history, implying a smaller number of species-specific retrocopies. Moreover, using RNA-Seq data, we identified approximately 3,600 expressed retrocopies. As expected, most of these retrocopies are located near or within known genes, present tissue-specific and even species-specific expression patterns, and no expression correlation to their parental genes. Taken together, our results provide further evidence that mRNA retrotransposition is an active mechanism in primate evolution and suggest that retrocopies may not only introduce great genetic variability between lineages but also create a large reservoir of potentially functional new genomic loci in primate genomes. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  16. Genome profiling of ovarian adenocarcinomas using pangenomic BACs microarray comparative genomic hybridization

    PubMed Central

    Caserta, Donatella; Benkhalifa, Moncef; Baldi, Marina; Fiorentino, Francesco; Qumsiyeh, Mazin; Moscarini, Massimo

    2008-01-01

    Background Routine cytogenetic investigations for ovarian cancers are limited by culture failure and poor growth of cancer cells compared to normal cells. Fluorescence in situ Hybridization (FISH) application or classical comparative genome hybridization techniques are also have their own limitations in detecting genome imbalance especially for small changes that are not known ahead of time and for which FISH probes could not be thus designed. Methods We applied microarray comparative genomic hybridization (A-CGH) using one mega base BAC arrays to investigate chromosomal disorders in ovarian adenocarcinoma in patients with familial history. Results Our data on 10 cases of ovarian cancer revealed losses of 6q (4 cases mainly mosaic loss), 9p (4 cases), 10q (3 cases), 21q (3 cases), 22q (4 cases) with association to a monosomy X and gains of 8q and 9q (occurring together in 8 cases) and gain of 12p. There were other abnormalities such as loss of 17p that were noted in two profiles of the studied cases. Total or mosaic segmental gain of 2p, 3q, 4q, 7q and 13q were also observed. Seven of 10 patients were investigated by FISH to control array CGH results. The FISH data showed a concordance between the 2 methods. Conclusion The data suggest that A-CGH detects unique and common abnormalities with certain exceptions such as tetraploidy and balanced translocation, which may lead to understanding progression of genetic changes as well as aid in early diagnosis and have an impact on therapy and prognosis. PMID:18492273

  17. Comparative genomics of the Bifidobacterium breve taxon.

    PubMed

    Bottacini, Francesca; O'Connell Motherway, Mary; Kuczynski, Justin; O'Connell, Kerry Joan; Serafini, Fausta; Duranti, Sabrina; Milani, Christian; Turroni, Francesca; Lugli, Gabriele Andrea; Zomer, Aldert; Zhurina, Daria; Riedel, Christian; Ventura, Marco; van Sinderen, Douwe

    2014-03-01

    Bifidobacteria are commonly found as part of the microbiota of the gastrointestinal tract (GIT) of a broad range of hosts, where their presence is positively correlated with the host's health status. In this study, we assessed the genomes of thirteen representatives of Bifidobacterium breve, which is not only a frequently encountered component of the (adult and infant) human gut microbiota, but can also be isolated from human milk and vagina. In silico analysis of genome sequences from thirteen B. breve strains isolated from different environments (infant and adult faeces, human milk, human vagina) shows that the genetic variability of this species principally consists of hypothetical genes and mobile elements, but, interestingly, also genes correlated with the adaptation to host environment and gut colonization. These latter genes specify the biosynthetic machinery for sortase-dependent pili and exopolysaccharide production, as well as genes that provide protection against invasion of foreign DNA (i.e. CRISPR loci and restriction/modification systems), and genes that encode enzymes responsible for carbohydrate fermentation. Gene-trait matching analysis showed clear correlations between known metabolic capabilities and characterized genes, and it also allowed the identification of a gene cluster involved in the utilization of the alcohol-sugar sorbitol. Genome analysis of thirteen representatives of the B. breve species revealed that the deduced pan-genome exhibits an essentially close trend. For this reason our analyses suggest that this number of B. breve representatives is sufficient to fully describe the pan-genome of this species. Comparative genomics also facilitated the genetic explanation for differential carbon source utilization phenotypes previously observed in different strains of B. breve.

  18. Comparative Life Cycle Transcriptomics Revises Leishmania mexicana Genome Annotation and Links a Chromosome Duplication with Parasitism of Vertebrates

    PubMed Central

    Fiebig, Michael; Kelly, Steven; Gluenz, Eva

    2015-01-01

    Leishmania spp. are protozoan parasites that have two principal life cycle stages: the motile promastigote forms that live in the alimentary tract of the sandfly and the amastigote forms, which are adapted to survive and replicate in the harsh conditions of the phagolysosome of mammalian macrophages. Here, we used Illumina sequencing of poly-A selected RNA to characterise and compare the transcriptomes of L. mexicana promastigotes, axenic amastigotes and intracellular amastigotes. These data allowed the production of the first transcriptome evidence-based annotation of gene models for this species, including genome-wide mapping of trans-splice sites and poly-A addition sites. The revised genome annotation encompassed 9,169 protein-coding genes including 936 novel genes as well as modifications to previously existing gene models. Comparative analysis of gene expression across promastigote and amastigote forms revealed that 3,832 genes are differentially expressed between promastigotes and intracellular amastigotes. A large proportion of genes that were downregulated during differentiation to amastigotes were associated with the function of the motile flagellum. In contrast, those genes that were upregulated included cell surface proteins, transporters, peptidases and many uncharacterized genes, including 293 of the 936 novel genes. Genome-wide distribution analysis of the differentially expressed genes revealed that the tetraploid chromosome 30 is highly enriched for genes that were upregulated in amastigotes, providing the first evidence of a link between this whole chromosome duplication event and adaptation to the vertebrate host in this group. Peptide evidence for 42 proteins encoded by novel transcripts supports the idea of an as yet uncharacterised set of small proteins in Leishmania spp. with possible implications for host-pathogen interactions. PMID:26452044

  19. Optimized paired-sgRNA/Cas9 cloning and expression cassette triggers high-efficiency multiplex genome editing in kiwifruit.

    PubMed

    Wang, Zupeng; Wang, Shuaibin; Li, Dawei; Zhang, Qiong; Li, Li; Zhong, Caihong; Liu, Yifei; Huang, Hongwen

    2018-01-13

    Kiwifruit is an important fruit crop; however, technologies for its functional genomic and molecular improvement are limited. The clustered regulatory interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein (Cas) system has been successfully applied to genetic improvement in many crops, but its editing capability is variable depending on the different combinations of the synthetic guide RNA (sgRNA) and Cas9 protein expression devices. Optimizing conditions for its use within a particular species is therefore needed to achieve highly efficient genome editing. In this study, we developed a new cloning strategy for generating paired-sgRNA/Cas9 vectors containing four sgRNAs targeting the kiwifruit phytoene desaturase gene (AcPDS). Comparing to the previous method of paired-sgRNA cloning, our strategy only requires the synthesis of two gRNA-containing primers which largely reduces the cost. We further compared efficiencies of paired-sgRNA/Cas9 vectors containing different sgRNA expression devices, including both the polycistronic tRNA-sgRNA cassette (PTG) and the traditional CRISPR expression cassette. We found the mutagenesis frequency of the PTG/Cas9 system was 10-fold higher than that of the CRISPR/Cas9 system, coinciding with the relative expressions of sgRNAs in two different expression cassettes. In particular, we identified large chromosomal fragment deletions induced by the paired-sgRNAs of the PTG/Cas9 system. Finally, as expected, we found both systems can successfully induce the albino phenotype of kiwifruit plantlets regenerated from the G418-resistance callus lines. We conclude that the PTG/Cas9 system is a more powerful system than the traditional CRISPR/Cas9 system for kiwifruit genome editing, which provides valuable clues for optimizing CRISPR/Cas9 editing system in other plants. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons

  20. Comparative genomics of the marine bacterial genus Glaciecola reveals the high degree of genomic diversity and genomic characteristic for cold adaptation.

    PubMed

    Qin, Qi-Long; Xie, Bin-Bin; Yu, Yong; Shu, Yan-Li; Rong, Jin-Cheng; Zhang, Yan-Jiao; Zhao, Dian-Li; Chen, Xiu-Lan; Zhang, Xi-Ying; Chen, Bo; Zhou, Bai-Cheng; Zhang, Yu-Zhong

    2014-06-01

    To what extent the genomes of different species belonging to one genus can be diverse and the relationship between genomic differentiation and environmental factor remain unclear for oceanic bacteria. With many new bacterial genera and species being isolated from marine environments, this question warrants attention. In this study, we sequenced all the type strains of the published species of Glaciecola, a recently defined cold-adapted genus with species from diverse marine locations, to study the genomic diversity and cold-adaptation strategy in this genus.The genome size diverged widely from 3.08 to 5.96 Mb, which can be explained by massive gene gain and loss events. Horizontal gene transfer and new gene emergence contributed substantially to the genome size expansion. The genus Glaciecola had an open pan-genome. Comparative genomic research indicated that species of the genus Glaciecola had high diversity in genome size, gene content and genetic relatedness. This may be prevalent in marine bacterial genera considering the dynamic and complex environments of the ocean. Species of Glaciecola had some common genomic features related to cold adaptation, which enable them to thrive and play a role in biogeochemical cycle in the cold marine environments.

  1. The genomes and comparative genomics of Lactobacillus delbrueckii phages.

    PubMed

    Riipinen, Katja-Anneli; Forsman, Päivi; Alatossava, Tapani

    2011-07-01

    Lactobacillus delbrueckii phages are a great source of genetic diversity. Here, the genome sequences of Lb. delbrueckii phages LL-Ku, c5 and JCL1032 were analyzed in detail, and the genetic diversity of Lb. delbrueckii phages belonging to different taxonomic groups was explored. The lytic isometric group b phages LL-Ku (31,080 bp) and c5 (31,841 bp) showed a minimum nucleotide sequence identity of 90% over about three-fourths of their genomes. The genomic locations of their lysis modules were unique, and the genomes featured several putative overlapping transcription units of genes. LL-Ku and c5 virions displayed peptidoglycan hydrolytic activity associated with a ~36-kDa protein similar in size to the endolysin. Unexpectedly, the 49,433-bp genome of the prolate phage JCL1032 (temperate, group c) revealed a conserved gene order within its structural genes. Lb. delbrueckii phages representing groups a (a phage LL-H), b and c possessed only limited protein sequence homology. Genomic comparison of LL-Ku and c5 suggested that diversification of Lb. delbrueckii phages is mainly due to insertions, deletions and recombination. For the first time, the complete genome sequences of group b and c Lb. delbrueckii phages are reported.

  2. Comparative genomics of 9 novel Paenibacillus larvae bacteriophages

    PubMed Central

    Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.

    2016-01-01

    ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559

  3. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation.

    PubMed

    Sharma, Virag; Hiller, Michael

    2017-08-21

    Genome alignments provide a powerful basis to transfer gene annotations from a well-annotated reference genome to many other aligned genomes. The completeness of these annotations crucially depends on the sensitivity of the underlying genome alignment. Here, we investigated the impact of the genome alignment parameters and found that parameters with a higher sensitivity allow the detection of thousands of novel alignments between orthologous exons that have been missed before. In particular, comparisons between species separated by an evolutionary distance of >0.75 substitutions per neutral site, like human and other non-placental vertebrates, benefit from increased sensitivity. To systematically test if increased sensitivity improves comparative gene annotations, we built a multiple alignment of 144 vertebrate genomes and used this alignment to map human genes to the other 143 vertebrates with CESAR. We found that higher alignment sensitivity substantially improves the completeness of comparative gene annotations by adding on average 2382 and 7440 novel exons and 117 and 317 novel genes for mammalian and non-mammalian species, respectively. Our results suggest a more sensitive alignment strategy that should generally be used for genome alignments between distantly-related species. Our 144-vertebrate genome alignment and the comparative gene annotations (https://bds.mpi-cbg.de/hillerlab/144VertebrateAlignment_CESAR/) are a valuable resource for comparative genomics. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Comparative genomics of Enterococcus faecalis from healthy Norwegian infants

    PubMed Central

    Solheim, Margrete; Aakra, Ågot; Snipen, Lars G; Brede, Dag A; Nes, Ingolf F

    2009-01-01

    Background Enterococcus faecalis, traditionally considered a harmless commensal of the intestinal tract, is now ranked among the leading causes of nosocomial infections. In an attempt to gain insight into the genetic make-up of commensal E. faecalis, we have studied genomic variation in a collection of community-derived E. faecalis isolated from the feces of Norwegian infants. Results The E. faecalis isolates were first sequence typed by multilocus sequence typing (MLST) and characterized with respect to antibiotic resistance and properties associated with virulence. A subset of the isolates was compared to the vancomycin resistant strain E. faecalis V583 (V583) by whole genome microarray comparison (comparative genomic hybridization (CGH)). Several of the putative enterococcal virulence factors were found to be highly prevalent among the commensal baby isolates. The genomic variation as observed by CGH was less between isolates displaying the same MLST sequence type than between isolates belonging to different evolutionary lineages. Conclusion The variations in gene content observed among the investigated commensal E. faecalis is comparable to the genetic variation previously reported among strains of various origins thought to be representative of the major E. faecalis lineages. Previous MLST analysis of E. faecalis have identified so-called high-risk enterococcal clonal complexes (HiRECC), defined as genetically distinct subpopulations, epidemiologically associated with enterococcal infections. The observed correlation between CGH and MLST presented here, may offer a method for the identification of lineage-specific genes, and may therefore add clues on how to distinguish pathogenic from commensal E. faecalis. In this work, information on the core genome of E. faecalis is also substantially extended. PMID:19393078

  5. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    PubMed

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  6. Understanding the direction of evolution in Burkholderia glumae through comparative genomics.

    PubMed

    Lee, Hyun-Hee; Park, Jungwook; Kim, Jinnyun; Park, Inmyoung; Seo, Young-Su

    2016-02-01

    Members of the genus Burkholderia occupy remarkably diverse niches, with genome sizes ranging from ~3.75 to 11.29 Mbp. The genome of Burkholderia glumae ranges in size from ~5.81 to 7.89 Mbp. Unlike other plant pathogenic bacteria, B. glumae can infect a wide range of monocot and dicot plants. Comparative genome analysis of B. glumae strains can provide insight into genome variation as well as differential features of whole metabolism or pathways between multiple strains of B. glumae infecting the same host. Comparative analysis of complete genomes among B. glumae BGR1, B. glumae LMG 2196, and B. glumae PG1 revealed the largest departmentalization of genes onto separate replicons in B. glumae BGR1 and considerable downsizing of the genome in B. glumae LMG 2196. In addition, the presence of large-scale evolutionary events such as rearrangement and inversion and the development of highly specialized systems were found to be related to virulence-associated features in the three B. glumae strains. This connection may explain why this bacterium broadens its host range and reinforces its interaction with hosts.

  7. Evolutionary trajectories of snake genes and genomes revealed by comparative analyses of five-pacer viper

    PubMed Central

    Yin, Wei; Wang, Zong-ji; Li, Qi-ye; Lian, Jin-ming; Zhou, Yang; Lu, Bing-zheng; Jin, Li-jun; Qiu, Peng-xin; Zhang, Pei; Zhu, Wen-bo; Wen, Bo; Huang, Yi-jun; Lin, Zhi-long; Qiu, Bi-tao; Su, Xing-wen; Yang, Huan-ming; Zhang, Guo-jie; Yan, Guang-mei; Zhou, Qi

    2016-01-01

    Snakes have numerous features distinctive from other tetrapods and a rich history of genome evolution that is still obscure. Here, we report the high-quality genome of the five-pacer viper, Deinagkistrodon acutus, and comparative analyses with other representative snake and lizard genomes. We map the evolutionary trajectories of transposable elements (TEs), developmental genes and sex chromosomes onto the snake phylogeny. TEs exhibit dynamic lineage-specific expansion, and many viper TEs show brain-specific gene expression along with their nearby genes. We detect signatures of adaptive evolution in olfactory, venom and thermal-sensing genes and also functional degeneration of genes associated with vision and hearing. Lineage-specific relaxation of functional constraints on respective Hox and Tbx limb-patterning genes supports fossil evidence for a successive loss of forelimbs then hindlimbs during snake evolution. Finally, we infer that the ZW sex chromosome pair had undergone at least three recombination suppression events in the ancestor of advanced snakes. These results altogether forge a framework for our deep understanding into snakes' history of molecular evolution. PMID:27708285

  8. Comparative study of four interleukin 17 cytokines of tongue sole Cynoglossus semilaevis: Genomic structure, expression pattern, and promoter activity.

    PubMed

    Chi, Heng; Sun, Li

    2015-11-01

    The interleukin (IL)-17 cytokine family participates in the regulation of many cellular functions. In the present study, we analyzed the genomic structure, expression, and promoter activity of four IL-17 members from the teleost fish tongue sole (Cynoglossus semilaevis), i.e. CsIL-17C CsIL-17D, CsIL-17F, and IL-17F like (IL-17Fl). We found that CsIL-17C, CsIL-17D, CsIL-17F, and CsIL-17Fl share 21.2%-28.6% overall sequence identities among themselves and 31.5%-71.2% overall sequence identities with their counterparts in other teleost. All four CsIL-17 members possess an IL-17 domain and four conserved cysteine residues. Phylogenetic analysis classified the four CsIL-17 members into three clusters. Under normal physiological conditions, the four CsIL-17 expressed in multiple tissues, especially non-immune tissues. Bacterial infection upregulated the expression of all four CsIL-17, while viral infection upregulated the expression of CsIL-17D and CsIL-17Fl but downregulated the expression of CsIL-17C and CsIL-17F. The 1.2 kb 5'-flanking regions of the four CsIL-17 exhibited apparent promoter activity and contain a number of putative transcription factor-binding sites. Furthermore, the promoter activities of CsIL-17C, CsIL-17D, and CsIL-17F, but not CsIL-17Fl, were modulated to significant extents by lipopolysaccharide, PolyI:C, and PMA. This study provides the first evidence that in teleost, different IL-17 members differ in expression pattern and promoter activity. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Epigenomics of Total Acute Sleep Deprivation in Relation to Genome-Wide DNA Methylation Profiles and RNA Expression.

    PubMed

    Nilsson, Emil K; Boström, Adrian E; Mwinyi, Jessica; Schiöth, Helgi B

    2016-06-01

    Despite an established link between sleep deprivation and epigenetic processes in humans, it remains unclear to what extent sleep deprivation modulates DNA methylation. We performed a within-subject randomized blinded study with 16 healthy subjects to examine the effect of one night of total sleep deprivation (TSD) on the genome-wide methylation profile in blood compared with that in normal sleep. Genome-wide differences in methylation between both conditions were assessed by applying a paired regression model that corrected for monocyte subpopulations. In addition, the correlations between the methylation of genes detected to be modulated by TSD and gene expression were examined in a separate, publicly available cohort of 10 healthy male donors (E-GEOD-49065). Sleep deprivation significantly affected the DNA methylation profile both independently and in dependency of shifts in monocyte composition. Our study detected differential methylation of 269 probes. Notably, one CpG site was located 69 bp upstream of ING5, which has been shown to be differentially expressed after sleep deprivation. Gene set enrichment analysis detected the Notch and Wnt signaling pathways to be enriched among the differentially methylated genes. These results provide evidence that total acute sleep deprivation alters the methylation profile in healthy human subjects. This is, to our knowledge, the first study that systematically investigated the impact of total acute sleep deprivation on genome-wide DNA methylation profiles in blood and related the epigenomic findings to the expression data.

  10. Aquatic Plant Genomics: Advances, Applications, and Prospects

    PubMed Central

    Li, Gaojie; Yang, Jingjing

    2017-01-01

    Genomics is a discipline in genetics that studies the genome composition of organisms and the precise structure of genes and their expression and regulation. Genomics research has resolved many problems where other biological methods have failed. Here, we summarize advances in aquatic plant genomics with a focus on molecular markers, the genes related to photosynthesis and stress tolerance, comparative study of genomes and genome/transcriptome sequencing technology. PMID:28900619

  11. ChloroMitoCU: Codon patterns across organelle genomes for functional genomics and evolutionary applications.

    PubMed

    Sablok, Gaurav; Chen, Ting-Wen; Lee, Chi-Ching; Yang, Chi; Gan, Ruei-Chi; Wegrzyn, Jill L; Porta, Nicola L; Nayak, Kinshuk C; Huang, Po-Jung; Varotto, Claudio; Tang, Petrus

    2017-06-01

    Organelle genomes are widely thought to have arisen from reduction events involving cyanobacterial and archaeal genomes, in the case of chloroplasts, or α-proteobacterial genomes, in the case of mitochondria. Heterogeneity in base composition and codon preference has long been the subject of investigation of topics ranging from phylogenetic distortion to the design of overexpression cassettes for transgenic expression. From the overexpression point of view, it is critical to systematically analyze the codon usage patterns of the organelle genomes. In light of the importance of codon usage patterns in the development of hyper-expression organelle transgenics, we present ChloroMitoCU, the first-ever curated, web-based reference catalog of the codon usage patterns in organelle genomes. ChloroMitoCU contains the pre-compiled codon usage patterns of 328 chloroplast genomes (29,960 CDS) and 3,502 mitochondrial genomes (49,066 CDS), enabling genome-wide exploration and comparative analysis of codon usage patterns across species. ChloroMitoCU allows the phylogenetic comparison of codon usage patterns across organelle genomes, the prediction of codon usage patterns based on user-submitted transcripts or assembled organelle genes, and comparative analysis with the pre-compiled patterns across species of interest. ChloroMitoCU can increase our understanding of the biased patterns of codon usage in organelle genomes across multiple clades. ChloroMitoCU can be accessed at: http://chloromitocu.cgu.edu.tw/. © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. Comparative genomics of transcriptional regulation of methionine metabolism in proteobacteria

    DOE PAGES

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; ...

    2014-11-20

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ~200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific andmore » genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.« less

  13. Comparative Analysis of the First Complete Enterococcus faecium Genome

    PubMed Central

    Lam, Margaret M. C.; Seemann, Torsten; Bulach, Dieter M.; Gladman, Simon L.; Chen, Honglei; Haring, Volker; Moore, Robert J.; Ballard, Susan; Grayson, M. Lindsay; Johnson, Paul D. R.; Howden, Benjamin P.

    2012-01-01

    Vancomycin-resistant enterococci (VRE) are one of the leading causes of nosocomial infections in health care facilities around the globe. In particular, infections caused by vancomycin-resistant Enterococcus faecium are becoming increasingly common. Comparative and functional genomic studies of E. faecium isolates have so far been limited owing to the lack of a fully assembled E. faecium genome sequence. Here we address this issue and report the complete 3.0-Mb genome sequence of the multilocus sequence type 17 vancomycin-resistant Enterococcus faecium strain Aus0004, isolated from the bloodstream of a patient in Melbourne, Australia, in 1998. The genome comprises a 2.9-Mb circular chromosome and three circular plasmids. The chromosome harbors putative E. faecium virulence factors such as enterococcal surface protein, hemolysin, and collagen-binding adhesin. Aus0004 has a very large accessory genome (38%) that includes three prophage and two genomic islands absent among 22 other E. faecium genomes. One of the prophage was present as inverted 50-kb repeats that appear to have facilitated a 683-kb chromosomal inversion across the replication terminus, resulting in a striking replichore imbalance. Other distinctive features include 76 insertion sequence elements and a single chromosomal copy of Tn1549 containing the vanB vancomycin resistance element. A complete E. faecium genome will be a useful resource to assist our understanding of this emerging nosocomial pathogen. PMID:22366422

  14. Comparative genomic analysis of Mycobacterium tuberculosis clinical isolates.

    PubMed

    Liu, Fei; Hu, Yongfei; Wang, Qi; Li, Hong Min; Gao, George F; Liu, Cui Hua; Zhu, Baoli

    2014-06-13

    Due to excessive antibiotic use, drug-resistant Mycobacterium tuberculosis has become a serious public health threat and a major obstacle to disease control in many countries. To better understand the evolution of drug-resistant M. tuberculosis strains, we performed whole genome sequencing for 7 M. tuberculosis clinical isolates with different antibiotic resistance profiles and conducted comparative genomic analysis of gene variations among them. We observed that all 7 M. tuberculosis clinical isolates with different levels of drug resistance harbored similar numbers of SNPs, ranging from 1409-1464. The numbers of insertion/deletions (Indels) identified in the 7 isolates were also similar, ranging from 56 to 101. A total of 39 types of mutations were identified in drug resistance-associated loci, including 14 previously reported ones and 25 newly identified ones. Sixteen of the identified large Indels spanned PE-PPE-PGRS genes, which represents a major source of antigenic variability. Aside from SNPs and Indels, a CRISPR locus with varied spacers was observed in all 7 clinical isolates, suggesting that they might play an important role in plasticity of the M. tuberculosis genome. The nucleotide diversity (Л value) and selection intensity (dN/dS value) of the whole genome sequences of the 7 isolates were similar. The dN/dS values were less than 1 for all 7 isolates (range from 0.608885 to 0.637365), supporting the notion that M. tuberculosis genomes undergo purifying selection. The Л values and dN/dS values were comparable between drug-susceptible and drug-resistant strains. In this study, we show that clinical M. tuberculosis isolates exhibit distinct variations in terms of the distribution of SNP, Indels, CRISPR-cas locus, as well as the nucleotide diversity and selection intensity, but there are no generalizable differences between drug-susceptible and drug-resistant isolates on the genomic scale. Our study provides evidence strengthening the notion that

  15. In silico method for modelling metabolism and gene product expression at genome scale

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lerman, Joshua A.; Hyduke, Daniel R.; Latif, Haythem

    2012-07-03

    Transcription and translation use raw materials and energy generated metabolically to create the macromolecular machinery responsible for all cellular functions, including metabolism. A biochemically accurate model of molecular biology and metabolism will facilitate comprehensive and quantitative computations of an organism's molecular constitution as a function of genetic and environmental parameters. Here we formulate a model of metabolism and macromolecular expression. Prototyping it using the simple microorganism Thermotoga maritima, we show our model accurately simulates variations in cellular composition and gene expression. Moreover, through in silico comparative transcriptomics, the model allows the discovery of new regulons and improving the genome andmore » transcription unit annotations. Our method presents a framework for investigating molecular biology and cellular physiology in silico and may allow quantitative interpretation of multi-omics data sets in the context of an integrated biochemical description of an organism.« less

  16. Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains.

    PubMed

    Bhattacharyya, Anamitra; Stilwagen, Stephanie; Ivanova, Natalia; D'Souza, Mark; Bernal, Axel; Lykidis, Athanasios; Kapatral, Vinayak; Anderson, Iain; Larsen, Niels; Los, Tamara; Reznik, Gary; Selkov, Eugene; Walunas, Theresa L; Feil, Helene; Feil, William S; Purcell, Alexander; Lassez, Jean-Louis; Hawkins, Trevor L; Haselkorn, Robert; Overbeek, Ross; Predki, Paul F; Kyrpides, Nikos C

    2002-09-17

    Xylella fastidiosa (Xf) causes wilt disease in plants and is responsible for major economic and crop losses globally. Owing to the public importance of this phytopathogen we embarked on a comparative analysis of the complete genome of Xf pv citrus and the partial genomes of two recently sequenced strains of this species: Xf pv almond and Xf pv oleander, which cause leaf scorch in almond and oleander plants, respectively. We report a reanalysis of the previously sequenced Xf 9a5c (CVC, citrus) strain and the two "gapped" Xf genomes revealing ORFs encoding critical functions in pathogenicity and conjugative transfer. Second, a detailed whole-genome functional comparison was based on the three sequenced Xf strains, identifying the unique genes present in each strain, in addition to those shared between strains. Third, an "in silico" cellular reconstruction of these organisms was made, based on a comparison of their core functional subsystems that led to a characterization of their conjugative transfer machinery, identification of potential differences in their adhesion mechanisms, and highlighting of the absence of a classical quorum-sensing mechanism. This study demonstrates the effectiveness of comparative analysis strategies in the interpretation of genomes that are closely related.

  17. Genome-wide patterns of promoter sharing and co-expression in bovine skeletal muscle.

    PubMed

    Gu, Quan; Nagaraj, Shivashankar H; Hudson, Nicholas J; Dalrymple, Brian P; Reverter, Antonio

    2011-01-12

    Gene regulation by transcription factors (TF) is species, tissue and time specific. To better understand how the genetic code controls gene expression in bovine muscle we associated gene expression data from developing Longissimus thoracis et lumborum skeletal muscle with bovine promoter sequence information. We created a highly conserved genome-wide promoter landscape comprising 87,408 interactions relating 333 TFs with their 9,242 predicted target genes (TGs). We discovered that the complete set of predicted TGs share an average of 2.75 predicted TF binding sites (TFBSs) and that the average co-expression between a TF and its predicted TGs is higher than the average co-expression between the same TF and all genes. Conversely, pairs of TFs sharing predicted TGs showed a co-expression correlation higher that pairs of TFs not sharing TGs. Finally, we exploited the co-occurrence of predicted TFBS in the context of muscle-derived functionally-coherent modules including cell cycle, mitochondria, immune system, fat metabolism, muscle/glycolysis, and ribosome. Our findings enabled us to reverse engineer a regulatory network of core processes, and correctly identified the involvement of E2F1, GATA2 and NFKB1 in the regulation of cell cycle, fat, and muscle/glycolysis, respectively. The pivotal implication of our research is two-fold: (1) there exists a robust genome-wide expression signal between TFs and their predicted TGs in cattle muscle consistent with the extent of promoter sharing; and (2) this signal can be exploited to recover the cellular mechanisms underpinning transcription regulation of muscle structure and development in bovine. Our study represents the first genome-wide report linking tissue specific co-expression to co-regulation in a non-model vertebrate.

  18. Improving membrane protein expression and function using genomic edits

    DOE PAGES

    Jensen, Heather M.; Eng, Thomas; Chubukov, Victor; ...

    2017-10-12

    Expression of membrane proteins often leads to growth inhibition and perturbs central metabolism and this burden varies with the protein being overexpressed. There are also known strain backgrounds that allow greater expression of membrane proteins but that differ in efficacy across proteins. Here, we hypothesized that for any membrane protein, it may be possible to identify a modified strain background where its expression can be accommodated with less burden. To directly test this hypothesis, we used a bar-coded transposon insertion library in tandem with cell sorting to assess genome-wide impact of gene deletions on membrane protein expression. The expression ofmore » five membrane proteins (CyoB, CydB, MdlB, YidC, and LepI) and one soluble protein (GST), each fused to GFP, was examined. We identified Escherichia coli mutants that demonstrated increased membrane protein expression relative to that in wild type. For two of the proteins (CyoB and CydB), we conducted functional assays to confirm that the increase in protein expression also led to phenotypic improvement in function. This study represents a systematic approach to broadly identify genetic loci that can be used to improve membrane protein expression, and our method can be used to improve expression of any protein that poses a cellular burden.« less

  19. Improving membrane protein expression and function using genomic edits

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jensen, Heather M.; Eng, Thomas; Chubukov, Victor

    Expression of membrane proteins often leads to growth inhibition and perturbs central metabolism and this burden varies with the protein being overexpressed. There are also known strain backgrounds that allow greater expression of membrane proteins but that differ in efficacy across proteins. Here, we hypothesized that for any membrane protein, it may be possible to identify a modified strain background where its expression can be accommodated with less burden. To directly test this hypothesis, we used a bar-coded transposon insertion library in tandem with cell sorting to assess genome-wide impact of gene deletions on membrane protein expression. The expression ofmore » five membrane proteins (CyoB, CydB, MdlB, YidC, and LepI) and one soluble protein (GST), each fused to GFP, was examined. We identified Escherichia coli mutants that demonstrated increased membrane protein expression relative to that in wild type. For two of the proteins (CyoB and CydB), we conducted functional assays to confirm that the increase in protein expression also led to phenotypic improvement in function. This study represents a systematic approach to broadly identify genetic loci that can be used to improve membrane protein expression, and our method can be used to improve expression of any protein that poses a cellular burden.« less

  20. The Genomic Impact of DNA CpG Methylation on Gene Expression; Relationships in Prostate Cancer.

    PubMed

    Long, Mark D; Smiraglia, Dominic J; Campbell, Moray J

    2017-02-14

    The process of DNA CpG methylation has been extensively investigated for over 50 years and revealed associations between changing methylation status of CpG islands and gene expression. As a result, DNA CpG methylation is implicated in the control of gene expression in developmental and homeostasis processes, as well as being a cancer-driver mechanism. The development of genome-wide technologies and sophisticated statistical analytical approaches has ushered in an era of widespread analyses, for example in the cancer arena, of the relationships between altered DNA CpG methylation, gene expression, and tumor status. The remarkable increase in the volume of such genomic data, for example, through investigators from the Cancer Genome Atlas (TCGA), has allowed dissection of the relationships between DNA CpG methylation density and distribution, gene expression, and tumor outcome. In this manner, it is now possible to test that the genome-wide correlations are measurable between changes in DNA CpG methylation and gene expression. Perhaps surprisingly is that these associations can only be detected for hundreds, but not thousands, of genes, and the direction of the correlations are both positive and negative. This, perhaps, suggests that CpG methylation events in cancer systems can act as disease drivers but the effects are possibly more restricted than suspected. Additionally, the positive and negative correlations suggest direct and indirect events and an incomplete understanding. Within the prostate cancer TCGA cohort, we examined the relationships between expression of genes that control DNA methylation, known targets of DNA methylation and tumor status. This revealed that genes that control the synthesis of S -adenosyl-l-methionine (SAM) associate with altered expression of DNA methylation targets in a subset of aggressive tumors.

  1. Genome-wide comparative transcriptome analysis of CMS-D2 and its maintainer and restorer lines in upland cotton.

    PubMed

    Wu, Jianyong; Zhang, Meng; Zhang, Bingbing; Zhang, Xuexian; Guo, Liping; Qi, Tingxiang; Wang, Hailin; Zhang, Jinfa; Xing, Chaozhu

    2017-06-08

    Cytoplasmic male sterility (CMS) conferred by the cytoplasm from Gossypium harknessii (D2) is an important system for hybrid seed production in Upland cotton (G. hirsutum). The male sterility of CMS-D2 (i.e., A line) can be restored to fertility by a restorer (i.e., R line) carrying the restorer gene Rf1 transferred from the D2 nuclear genome. However, the molecular mechanisms of CMS-D2 and its restoration are poorly understood. In this study, a genome-wide comparative transcriptome analysis was performed to identify differentially expressed genes (DEGs) in flower buds among the isogenic fertile R line and sterile A line derived from a backcross population (BC 8 F 1 ) and the recurrent parent, i.e., the maintainer (B line). A total of 1464 DEGs were identified among the three isogenic lines, and the Rf1-carrying Chr_D05 and its homeologous Chr_A05 had more DEGs than other chromosomes. The results of GO and KEGG enrichment analysis showed differences in circadian rhythm between the fertile and sterile lines. Eleven DEGs were selected for validation using qRT-PCR, confirming the accuracy of the RNA-seq results. Through genome-wide comparative transcriptome analysis, the differential expression profiles of CMS-D2 and its maintainer and restorer lines in Upland cotton were identified. Our results provide an important foundation for further studies into the molecular mechanisms of the interactions between the restorer gene Rf1 and the CMS-D2 cytoplasm.

  2. Xp11 translocation renal cell carcinoma in adults: a clinicopathological and comparative genomic hybridization study.

    PubMed

    Zou, Hong; Kang, Xueling; Pang, Li-Juan; Hu, Wenhao; Zhao, Jin; Qi, Yan; Hu, Jianming; Liu, Chunxia; Li, Hongan; Liang, Weihua; Yuan, Xianglin; Li, Feng

    2014-01-01

    To study the clinicopathological and genomic characteristics of Xp11.2 translocation renal cell carcinoma (Xp11.2 RCC) in adults, we analyzed 9 Xp11.2 RCCs, confirmed by transcription factor E3 (TFE3) immunohistochemistry, in patients aged ≥20 years. TFE3 expression was also determined in 12 cases of alveolar soft part sarcoma (ASPS) served as a positive control. Comparative genomic hybridization (CGH) was used to investigate genomic imbalances in all Xp11.2 RCC cases. Most of our Xp11.2 RCC patients (5/9) presented with TNM stages 3-4, and 6 patients died 10 months to 7 years after their operation. Histologically, Xp11.2 RCC was composed of a mixed papillary nested/alveolar growth pattern (8/9). Immunostaining showed that all Xp11.2 RCC and ASPS cases had strong TFE3 expression and high positive ratios for p53 and vimentin. However, there were significant differences in the expression of AMACR (p<0.001), AE1/AE3 (p=0.002), and CD10 (p=0.024) between the 2 diseases. CGH profiles showed chromosomal imbalances in all 9 Xp11.2 RCC cases; gains were observed in chromosomes Xp11 (6/9), 7q20-25, 12q25-31 (5/9), 7p16-24 (4/9), 8p12-13, 8q20-21, 16q20-22, 17q25-26, 20q22-23 (4/9), and losses occurred frequently on chromosomes 3p12-16, 9q31-32, 14q22-24 (4/9). Our Conclusions show Xp11.2 RCC that occur in adults may be aggressive cancers, the expressions of AMACR, CD10, AE1/AE3 are helpful in the differential diagnosis between Xp11.2 RCC and ASPS, and CGH assay is a useful complementary method for confirming the diagnosis of Xp11.2 RCC.

  3. Xp11 translocation renal cell carcinoma in adults: a clinicopathological and comparative genomic hybridization study

    PubMed Central

    Zou, Hong; Kang, Xueling; Pang, Li-Juan; Hu, Wenhao; Zhao, Jin; Qi, Yan; Hu, Jianming; Liu, Chunxia; Li, Hongan; Liang, Weihua; Yuan, Xianglin; Li, Feng

    2014-01-01

    To study the clinicopathological and genomic characteristics of Xp11.2 translocation renal cell carcinoma (Xp11.2 RCC) in adults, we analyzed 9 Xp11.2 RCCs, confirmed by transcription factor E3 (TFE3) immunohistochemistry, in patients aged ≥20 years. TFE3 expression was also determined in 12 cases of alveolar soft part sarcoma (ASPS) served as a positive control. Comparative genomic hybridization (CGH) was used to investigate genomic imbalances in all Xp11.2 RCC cases. Most of our Xp11.2 RCC patients (5/9) presented with TNM stages 3-4, and 6 patients died 10 months to 7 years after their operation. Histologically, Xp11.2 RCC was composed of a mixed papillary nested/alveolar growth pattern (8/9). Immunostaining showed that all Xp11.2 RCC and ASPS cases had strong TFE3 expression and high positive ratios for p53 and vimentin. However, there were significant differences in the expression of AMACR (p<0.001), AE1/AE3 (p=0.002), and CD10 (p=0.024) between the 2 diseases. CGH profiles showed chromosomal imbalances in all 9 Xp11.2 RCC cases; gains were observed in chromosomes Xp11 (6/9), 7q20-25, 12q25-31 (5/9), 7p16-24 (4/9), 8p12-13, 8q20-21, 16q20-22, 17q25-26, 20q22-23 (4/9), and losses occurred frequently on chromosomes 3p12-16, 9q31-32, 14q22-24 (4/9). Our Conclusions show Xp11.2 RCC that occur in adults may be aggressive cancers, the expressions of AMACR, CD10, AE1/AE3 are helpful in the differential diagnosis between Xp11.2 RCC and ASPS, and CGH assay is a useful complementary method for confirming the diagnosis of Xp11.2 RCC. PMID:24427344

  4. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

    PubMed Central

    Bengelsdorf, Frank R.; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood–Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (PthlA) from C. acetobutylicum or native pta-ack promoter (Ppta-ack) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  5. Identification, expression, and comparative genomic analysis of the IPT and CKX gene families in Chinese cabbage (Brassica rapa ssp. pekinensis)

    PubMed Central

    2013-01-01

    Background Cytokinins (CKs) have significant roles in various aspects of plant growth and development, and they are also involved in plant stress adaptations. The fine-tuning of the controlled CK levels in individual tissues, cells, and organelles is properly maintained by isopentenyl transferases (IPTs) and cytokinin oxidase/dehydrogenases (CKXs). Chinese cabbage is one of the most economically important vegetable crops worldwide. The whole genome sequencing of Brassica rapa enables us to perform the genome-wide identification and functional analysis of the IPT and CKX gene families. Results In this study, a total of 13 BrIPT genes and 12 BrCKX genes were identified. The gene structures, conserved domains and phylogenetic relationships were analyzed. The isoelectric point, subcellular localization and glycosylation sites of the proteins were predicted. Segmental duplicates were found in both BrIPT and BrCKX gene families. We also analyzed evolutionary patterns and divergence of the IPT and CKX genes in the Cruciferae family. The transcription levels of BrIPT and BrCKX genes were analyzed to obtain an initial picture of the functions of these genes. Abiotic stress elements related to adverse environmental stimuli were found in the promoter regions of BrIPT and BrCKX genes and they were confirmed to respond to drought and high salinity conditions. The effects of 6-BA and ABA on the expressions of BrIPT and BrCKX genes were also investigated. Conclusions The expansion of BrIPT and BrCKX genes after speciation from Arabidopsis thaliana is mainly attributed to segmental duplication events during the whole genome triplication (WGT) and substantial duplicated genes are lost during the long evolutionary history. Genes produced by segmental duplication events have changed their expression patterns or may adopted new functions and thus are obtained. BrIPT and BrCKX genes respond well to drought and high salinity stresses, and their transcripts are affected by exogenous

  6. Expansion of the CRISPR-Cas9 genome targeting space through the use of H1 promoter-expressed guide RNAs.

    PubMed

    Ranganathan, Vinod; Wahlin, Karl; Maruotti, Julien; Zack, Donald J

    2014-08-08

    The repurposed CRISPR-Cas9 system has recently emerged as a revolutionary genome-editing tool. Here we report a modification in the expression of the guide RNA (gRNA) required for targeting that greatly expands the targetable genome. gRNA expression through the commonly used U6 promoter requires a guanosine nucleotide to initiate transcription, thus constraining genomic-targeting sites to GN19NGG. We demonstrate the ability to modify endogenous genes using H1 promoter-expressed gRNAs, which can be used to target both AN19NGG and GN19NGG genomic sites. AN19NGG sites occur ~15% more frequently than GN19NGG sites in the human genome and the increase in targeting space is also enriched at human genes and disease loci. Together, our results enhance the versatility of the CRISPR technology by more than doubling the number of targetable sites within the human genome and other eukaryotic species.

  7. Genome Sequencing and Comparative Genomics of the Broad Host-Range Pathogen Rhizoctonia solani AG8

    PubMed Central

    Hane, James K.; Anderson, Jonathan P.; Williams, Angela H.; Sperschneider, Jana; Singh, Karam B.

    2014-01-01

    Rhizoctonia solani is a soil-borne basidiomycete fungus with a necrotrophic lifestyle which is classified into fourteen reproductively incompatible anastomosis groups (AGs). One of these, AG8, is a devastating pathogen causing bare patch of cereals, brassicas and legumes. R. solani is a multinucleate heterokaryon containing significant heterozygosity within a single cell. This complexity posed significant challenges for the assembly of its genome. We present a high quality genome assembly of R. solani AG8 and a manually curated set of 13,964 genes supported by RNA-seq. The AG8 genome assembly used novel methods to produce a haploid representation of its heterokaryotic state. The whole-genomes of AG8, the rice pathogen AG1-IA and the potato pathogen AG3 were observed to be syntenic and co-linear. Genes and functions putatively relevant to pathogenicity were highlighted by comparing AG8 to known pathogenicity genes, orthology databases spanning 197 phytopathogenic taxa and AG1-IA. We also observed SNP-level “hypermutation” of CpG dinucleotides to TpG between AG8 nuclei, with similarities to repeat-induced point mutation (RIP). Interestingly, gene-coding regions were widely affected along with repetitive DNA, which has not been previously observed for RIP in mononuclear fungi of the Pezizomycotina. The rate of heterozygous SNP mutations within this single isolate of AG8 was observed to be higher than SNP mutation rates observed across populations of most fungal species compared. Comparative analyses were combined to predict biological processes relevant to AG8 and 308 proteins with effector-like characteristics, forming a valuable resource for further study of this pathosystem. Predicted effector-like proteins had elevated levels of non-synonymous point mutations relative to synonymous mutations (dN/dS), suggesting that they may be under diversifying selection pressures. In addition, the distant relationship to sequenced necrotrophs of the Ascomycota suggests the

  8. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    PubMed

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  9. Comparative genomic analysis by microbial COGs self-attraction rate.

    PubMed

    Santoni, Daniele; Romano-Spica, Vincenzo

    2009-06-21

    Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.

  10. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    PubMed Central

    2011-01-01

    Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1

  11. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits.

    PubMed

    Saski, Christopher A; Li, Zhigang; Feltus, Frank A; Luo, Hong

    2011-07-18

    Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18) to duodecaploid (12X = 108). Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective). Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. The construction of the first switchgrass BAC library and comparative analysis of homoeologous harboring OsBRI1 orthologs present a glimpse into

  12. The Genome of Tolypocladium inflatum: Evolution, Organization, and Expression of the Cyclosporin Biosynthetic Gene Cluster

    PubMed Central

    Bushley, Kathryn E.; Raja, Rajani; Jaiswal, Pankaj; Cumbie, Jason S.; Nonogaki, Mariko; Boyd, Alexander E.; Owensby, C. Alisha; Knaus, Brian J.; Elser, Justin; Miller, Daniel; Di, Yanming; McPhail, Kerry L.; Spatafora, Joseph W.

    2013-01-01

    The ascomycete fungus Tolypocladium inflatum, a pathogen of beetle larvae, is best known as the producer of the immunosuppressant drug cyclosporin. The draft genome of T. inflatum strain NRRL 8044 (ATCC 34921), the isolate from which cyclosporin was first isolated, is presented along with comparative analyses of the biosynthesis of cyclosporin and other secondary metabolites in T. inflatum and related taxa. Phylogenomic analyses reveal previously undetected and complex patterns of homology between the nonribosomal peptide synthetase (NRPS) that encodes for cyclosporin synthetase (simA) and those of other secondary metabolites with activities against insects (e.g., beauvericin, destruxins, etc.), and demonstrate the roles of module duplication and gene fusion in diversification of NRPSs. The secondary metabolite gene cluster responsible for cyclosporin biosynthesis is described. In addition to genes necessary for cyclosporin biosynthesis, it harbors a gene for a cyclophilin, which is a member of a family of immunophilins known to bind cyclosporin. Comparative analyses support a lineage specific origin of the cyclosporin gene cluster rather than horizontal gene transfer from bacteria or other fungi. RNA-Seq transcriptome analyses in a cyclosporin-inducing medium delineate the boundaries of the cyclosporin cluster and reveal high levels of expression of the gene cluster cyclophilin. In medium containing insect hemolymph, weaker but significant upregulation of several genes within the cyclosporin cluster, including the highly expressed cyclophilin gene, was observed. T. inflatum also represents the first reference draft genome of Ophiocordycipitaceae, a third family of insect pathogenic fungi within the fungal order Hypocreales, and supports parallel and qualitatively distinct radiations of insect pathogens. The T. inflatum genome provides additional insight into the evolution and biosynthesis of cyclosporin and lays a foundation for further investigations of the role

  13. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes.

    PubMed

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M

    2016-10-19

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.

  14. Genome-wide comparative analysis of codon usage bias and codon context patterns among cyanobacterial genomes.

    PubMed

    Prabha, Ratna; Singh, Dhananjaya P; Sinha, Swati; Ahmad, Khurshid; Rai, Anil

    2017-04-01

    With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Genome-Level Longitudinal Expression of Signaling Pathways and Gene Networks in Pediatric Septic Shock

    PubMed Central

    Shanley, Thomas P; Cvijanovich, Natalie; Lin, Richard; Allen, Geoffrey L; Thomas, Neal J; Doctor, Allan; Kalyanaraman, Meena; Tofil, Nancy M; Penfil, Scott; Monaco, Marie; Odoms, Kelli; Barnes, Michael; Sakthivel, Bhuvaneswari; Aronow, Bruce J; Wong, Hector R

    2007-01-01

    We have conducted longitudinal studies focused on the expression profiles of signaling pathways and gene networks in children with septic shock. Genome-level expression profiles were generated from whole blood-derived RNA of children with septic shock (n = 30) corresponding to day one and day three of septic shock, respectively. Based on sequential statistical and expression filters, day one and day three of septic shock were characterized by differential regulation of 2,142 and 2,504 gene probes, respectively, relative to controls (n = 15). Venn analysis demonstrated 239 unique genes in the day one dataset, 598 unique genes in the day three dataset, and 1,906 genes common to both datasets. Functional analyses demonstrated time-dependent, differential regulation of genes involved in multiple signaling pathways and gene networks primarily related to immunity and inflammation. Notably, multiple and distinct gene networks involving T cell- and MHC antigen-related biology were persistently downregulated on both day one and day three. Further analyses demonstrated large scale, persistent downregulation of genes corresponding to functional annotations related to zinc homeostasis. These data represent the largest reported cohort of patients with septic shock subjected to longitudinal genome-level expression profiling. The data further advance our genome-level understanding of pediatric septic shock and support novel hypotheses. PMID:17932561

  16. Sequence Search and Comparative Genomic Analysis of SUMO-Activating Enzymes Using CoGe.

    PubMed

    Carretero-Paulet, Lorenzo; Albert, Victor A

    2016-01-01

    The growing number of genome sequences completed during the last few years has made necessary the development of bioinformatics tools for the easy access and retrieval of sequence data, as well as for downstream comparative genomic analyses. Some of these are implemented as online platforms that integrate genomic data produced by different genome sequencing initiatives with data mining tools as well as various comparative genomic and evolutionary analysis possibilities.Here, we use the online comparative genomics platform CoGe ( http://www.genomevolution.org/coge/ ) (Lyons and Freeling. Plant J 53:661-673, 2008; Tang and Lyons. Front Plant Sci 3:172, 2012) (1) to retrieve the entire complement of orthologous and paralogous genes belonging to the SUMO-Activating Enzymes 1 (SAE1) gene family from a set of species representative of the Brassicaceae plant eudicot family with genomes fully sequenced, and (2) to investigate the history, timing, and molecular mechanisms of the gene duplications driving the evolutionary expansion and functional diversification of the SAE1 family in Brassicaceae.

  17. Datasets for evolutionary comparative genomics

    PubMed Central

    Liberles, David A

    2005-01-01

    Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856

  18. Genome-Wide Identification and Expression of Xenopus F-Box Family of Proteins.

    PubMed

    Saritas-Yildirim, Banu; Pliner, Hannah A; Ochoa, Angelica; Silva, Elena M

    2015-01-01

    Protein degradation via the multistep ubiquitin/26S proteasome pathway is a rapid way to alter the protein profile and drive cell processes and developmental changes. Many key regulators of embryonic development are targeted for degradation by E3 ubiquitin ligases. The most studied family of E3 ubiquitin ligases is the SCF ubiquitin ligases, which use F-box adaptor proteins to recognize and recruit target proteins. Here, we used a bioinformatics screen and phylogenetic analysis to identify and annotate the family of F-box proteins in the Xenopus tropicalis genome. To shed light on the function of the F-box proteins, we analyzed expression of F-box genes during early stages of Xenopus development. Many F-box genes are broadly expressed with expression domains localized to diverse tissues including brain, spinal cord, eye, neural crest derivatives, somites, kidneys, and heart. All together, our genome-wide identification and expression profiling of the Xenopus F-box family of proteins provide a foundation for future research aimed to identify the precise role of F-box dependent E3 ubiquitin ligases and their targets in the regulatory circuits of development.

  19. Genome-wide Expression Analysis and Metabolite Profiling Elucidate Transcriptional Regulation of Flavonoid Biosynthesis and Modulation under Abiotic Stresses in Banana

    PubMed Central

    Pandey, Ashutosh; Alok, Anshu; Lakhwani, Deepika; Singh, Jagdeep; Asif, Mehar H.; Trivedi, Prabodh K.

    2016-01-01

    Flavonoid biosynthesis is largely regulated at the transcriptional level due to the modulated expression of genes related to the phenylpropanoid pathway in plants. Although accumulation of different flavonoids has been reported in banana, a staple fruit crop, no detailed information is available on regulation of the biosynthesis in this important plant. We carried out genome-wide analysis of banana (Musa acuminata, AAA genome) and identified 28 genes belonging to 9 gene families associated with flavonoid biosynthesis. Expression analysis suggested spatial and temporal regulation of the identified genes in different tissues of banana. Analysis revealed enhanced expression of genes related to flavonol and proanthocyanidin (PA) biosynthesis in peel and pulp at the early developmental stages of fruit. Genes involved in anthocyanin biosynthesis were highly expressed during banana fruit ripening. In general, higher accumulation of metabolites was observed in the peel as compared to pulp tissue. A correlation between expression of genes and metabolite content was observed at the early stage of fruit development. Furthermore, this study also suggests regulation of flavonoid biosynthesis, at transcriptional level, under light and dark exposures as well as methyl jasmonate (MJ) treatment in banana. PMID:27539368

  20. Genome-wide Expression Analysis and Metabolite Profiling Elucidate Transcriptional Regulation of Flavonoid Biosynthesis and Modulation under Abiotic Stresses in Banana.

    PubMed

    Pandey, Ashutosh; Alok, Anshu; Lakhwani, Deepika; Singh, Jagdeep; Asif, Mehar H; Trivedi, Prabodh K

    2016-08-19

    Flavonoid biosynthesis is largely regulated at the transcriptional level due to the modulated expression of genes related to the phenylpropanoid pathway in plants. Although accumulation of different flavonoids has been reported in banana, a staple fruit crop, no detailed information is available on regulation of the biosynthesis in this important plant. We carried out genome-wide analysis of banana (Musa acuminata, AAA genome) and identified 28 genes belonging to 9 gene families associated with flavonoid biosynthesis. Expression analysis suggested spatial and temporal regulation of the identified genes in different tissues of banana. Analysis revealed enhanced expression of genes related to flavonol and proanthocyanidin (PA) biosynthesis in peel and pulp at the early developmental stages of fruit. Genes involved in anthocyanin biosynthesis were highly expressed during banana fruit ripening. In general, higher accumulation of metabolites was observed in the peel as compared to pulp tissue. A correlation between expression of genes and metabolite content was observed at the early stage of fruit development. Furthermore, this study also suggests regulation of flavonoid biosynthesis, at transcriptional level, under light and dark exposures as well as methyl jasmonate (MJ) treatment in banana.

  1. GenColors: annotation and comparative genomics of prokaryotes made easy.

    PubMed

    Romualdi, Alessandro; Felder, Marius; Rose, Dominic; Gausmann, Ulrike; Schilhabel, Markus; Glöckner, Gernot; Platzer, Matthias; Sühnel, Jürgen

    2007-01-01

    GenColors (gencolors.fli-leibniz.de) is a new web-based software/database system aimed at an improved and accelerated annotation of prokaryotic genomes considering information on related genomes and making extensive use of genome comparison. It offers a seamless integration of data from ongoing sequencing projects and annotated genomic sequences obtained from GenBank. A variety of export/import filters manages an effective data flow from sequence assembly and manipulation programs (e.g., GAP4) to GenColors and back as well as to standard GenBank file(s). The genome comparison tools include best bidirectional hits, gene conservation, syntenies, and gene core sets. Precomputed UniProt matches allow annotation and analysis in an effective manner. In addition to these analysis options, base-specific quality data (coverage and confidence) can also be handled if available. The GenColors system can be used both for annotation purposes in ongoing genome projects and as an analysis tool for finished genomes. GenColors comes in two types, as dedicated genome browsers and as the Jena Prokaryotic Genome Viewer (JPGV). Dedicated genome browsers contain genomic information on a set of related genomes and offer a large number of options for genome comparison. The system has been efficiently used in the genomic sequencing of Borrelia garinii and is currently applied to various ongoing genome projects on Borrelia, Legionella, Escherichia, and Pseudomonas genomes. One of these dedicated browsers, the Spirochetes Genome Browser (sgb.fli-leibniz.de) with Borrelia, Leptospira, and Treponema genomes, is freely accessible. The others will be released after finalization of the corresponding genome projects. JPGV (jpgv.fli-leibniz.de) offers information on almost all finished bacterial genomes, as compared to the dedicated browsers with reduced genome comparison functionality, however. As of January 2006, this viewer includes 632 genomic elements (e.g., chromosomes and plasmids) of 293

  2. Performance and Scalability of Discriminative Metrics for Comparative Gene Identification in 12 Drosophila Genomes

    PubMed Central

    Lin, Michael F.; Deoras, Ameya N.; Rasmussen, Matthew D.; Kellis, Manolis

    2008-01-01

    Comparative genomics of multiple related species is a powerful methodology for the discovery of functional genomic elements, and its power should increase with the number of species compared. Here, we use 12 Drosophila genomes to study the power of comparative genomics metrics to distinguish between protein-coding and non-coding regions. First, we study the relative power of different comparative metrics and their relationship to single-species metrics. We find that even relatively simple multi-species metrics robustly outperform advanced single-species metrics, especially for shorter exons (≤240 nt), which are common in animal genomes. Moreover, the two capture largely independent features of protein-coding genes, with different sensitivity/specificity trade-offs, such that their combinations lead to even greater discriminatory power. In addition, we study how discovery power scales with the number and phylogenetic distance of the genomes compared. We find that species at a broad range of distances are comparably effective informants for pairwise comparative gene identification, but that these are surpassed by multi-species comparisons at similar evolutionary divergence. In particular, while pairwise discovery power plateaued at larger distances and never outperformed the most advanced single-species metrics, multi-species comparisons continued to benefit even from the most distant species with no apparent saturation. Last, we find that genes in functional categories typically considered fast-evolving can nonetheless be recovered at very high rates using comparative methods. Our results have implications for comparative genomics analyses in any species, including the human. PMID:18421375

  3. IMG 4 version of the integrated microbial genomes comparative analysis system

    PubMed Central

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883

  4. IMG 4 version of the integrated microbial genomes comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG's data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG's annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  5. Genome-wide gene expression profiling reveals unsuspected molecular alterations in pemphigus foliaceus

    PubMed Central

    Malheiros, Danielle; Panepucci, Rodrigo A; Roselino, Ana M; Araújo, Amélia G; Zago, Marco A; Petzl-Erler, Maria Luiza

    2014-01-01

    Pemphigus foliaceus (PF) is a complex autoimmune disease characterized by bullous skin lesions and the presence of antibodies against desmoglein 1. In this study we sought to contribute to a better understanding of the molecular processes in endemic PF, as the identification of factors that participate in the pathogenesis is a prerequisite for understanding its biological basis and may lead to novel therapeutic interventions. CD4+ T lymphocytes are central to the development of the disease. Therefore, we compared genome-wide gene expression profiles of peripheral CD4+ T cells of various PF patient subgroups with each other and with that of healthy individuals. The patient sample was subdivided into three groups: untreated patients with the generalized form of the disease, patients submitted to immunosuppressive treatment, and patients with the localized form of the disease. Comparisons between different subgroups resulted in 135, 54 and 64 genes differentially expressed. These genes are mainly related to lymphocyte adhesion and migration, apoptosis, cellular proliferation, cytotoxicity and antigen presentation. Several of these genes were differentially expressed when comparing lesional and uninvolved skin from the same patient. The chromosomal regions 19q13 and 12p13 concentrate differentially expressed genes and are candidate regions for PF susceptibility genes and disease markers. Our results reveal genes involved in disease severity, potential therapeutic targets and previously unsuspected processes involved in the pathogenesis. Besides, this study adds original information that will contribute to the understanding of PF's pathogenesis and of the still poorly defined in vivo functions of most of these genes. PMID:24813052

  6. Predicting gene expression levels from codon biases in alpha-proteobacterial genomes.

    PubMed

    Karlin, Samuel; Barnett, Melanie J; Campbell, Allan M; Fisher, Robert F; Mrazek, Jan

    2003-06-10

    Predicted highly expressed (PHX) genes in five currently available high G+C complete alpha-proteobacterial genomes are analyzed. These include: the nitrogen-fixing plant symbionts Sinorhizobium meliloti (SINME) and Mesorhizobium loti (MESLO), the nonpathogenic aquatic bacterium Caulobacter crescentus (CAUCR), the plant pathogen Agrobacterium tumefaciens (AGRTU), and the mammalian pathogen Brucella melitensis (BRUME). Three of these genomes, SINME, AGRTU, and BRUME, contain multiple chromosomes or megaplasmids (>1 Mb length). PHX genes in these genomes are concentrated mainly in the major (largest) chromosome with few PHX genes found in the secondary chromosomes and megaplasmids. Tricarboxylic acid cycle and aerobic respiration genes are strongly PHX in all five genomes, whereas anaerobic pathways of glycolysis and fermentation are mostly not PHX. Only in MESLO (but not SINME) and BRUME are most glycolysis genes PHX. Many flagellar genes are PHX in MESLO and CAUCR, but mostly are not PHX in SINME and AGRTU. The nonmotile BRUME also carries many flagellar genes but these are generally not PHX and all but one are located in the second chromosome. CAUCR stands out among available prokaryotic genomes with 25 PHX TonB-dependent receptors. These are putatively involved in uptake of iron ions and other nonsoluble compounds.

  7. HTLV-1 Integration into Transcriptionally Active Genomic Regions Is Associated with Proviral Expression and with HAM/TSP

    PubMed Central

    Meekings, Kiran N.; Leipzig, Jeremy; Bushman, Frederic D.; Taylor, Graham P.; Bangham, Charles R. M.

    2008-01-01

    Human T-lymphotropic virus type 1 (HTLV-1) causes leukaemia or chronic inflammatory disease in ∼5% of infected hosts. The level of proviral expression of HTLV-1 differs significantly among infected people, even at the same proviral load (proportion of infected mononuclear cells in the circulation). A high level of expression of the HTLV-1 provirus is associated with a high proviral load and a high risk of the inflammatory disease of the central nervous system known as HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP). But the factors that control the rate of HTLV-1 proviral expression remain unknown. Here we show that proviral integration sites of HTLV-1 in vivo are not randomly distributed within the human genome but are associated with transcriptionally active regions. Comparison of proviral integration sites between individuals with high and low levels of proviral expression, and between provirus-expressing and provirus non-expressing cells from within an individual, demonstrated that frequent integration into transcription units was associated with an increased rate of proviral expression. An increased frequency of integration sites in transcription units in individuals with high proviral expression was also associated with the inflammatory disease HAM/TSP. By comparing the distribution of integration sites in human lymphocytes infected in short-term cell culture with those from persistent infection in vivo, we infer the action of two selective forces that shape the distribution of integration sites in vivo: positive selection for cells containing proviral integration sites in transcriptionally active regions of the genome, and negative selection against cells with proviral integration sites within transcription units. PMID:18369476

  8. Genomic and protein expression profiling identifies CDK6 as novel independent prognostic marker in medulloblastoma.

    PubMed

    Mendrzyk, Frank; Radlwimmer, Bernhard; Joos, Stefan; Kokocinski, Felix; Benner, Axel; Stange, Daniel E; Neben, Kai; Fiegler, Heike; Carter, Nigel P; Reifenberger, Guido; Korshunov, Andrey; Lichter, Peter

    2005-12-01

    Medulloblastoma is the most common malignant brain tumor in children. Despite multimodal aggressive treatment, nearly half of the patients die as a result of this tumor. Identification of molecular markers for prognosis and development of novel pathogenesis-based therapies depends crucially on a better understanding of medulloblastoma pathomechanisms. We performed genome-wide analysis of DNA copy number imbalances in 47 medulloblastomas using comparative genomic hybridization to large insert DNA microarrays (matrix-CGH). The expression of selected candidate genes identified by matrix-CGH was analyzed immunohistochemically on tissue microarrays representing medulloblastomas from 189 clinically well-documented patients. To identify novel prognostic markers, genomic findings and protein expression data were correlated to patient survival. Matrix-CGH analysis revealed frequent DNA copy number alterations of several novel candidate regions. Among these, gains at 17q23.2-qter (P < .01) and losses at 17p13.1 to 17p13.3 (P = .04) were significantly correlated to poor prognosis. Within 17q23.2-qter and 7q21.2, two of the most frequently gained chromosomal regions, confined amplicons were identified that contained the PPM1D and CDK6 genes, respectively. Immunohistochemistry revealed strong expression of PPM1D in 148 (88%) of 168 and CDK6 in 50 (30%) of 169 medulloblastomas. Overexpression of CDK6 correlated significantly with poor prognosis (P < .01) and represented an independent prognostic marker of overall survival on multivariate analysis (P = .02). We identified CDK6 as a novel molecular marker that can be determined by immunohistochemistry on routinely processed tissue specimens and may facilitate the prognostic assessment of medulloblastoma patients. Furthermore, increased protein-levels of PPM1D and CDK6 may link the TP53 and RB1 tumor suppressor pathways to medulloblastoma pathomechanisms.

  9. Comparative genomics of two jute species and insight into fibre biogenesis.

    PubMed

    Islam, Md Shahidul; Saito, Jennifer A; Emdad, Emdadul Mannan; Ahmed, Borhan; Islam, Mohammad Moinul; Halim, Abdul; Hossen, Quazi Md Mosaddeque; Hossain, Md Zakir; Ahmed, Rasel; Hossain, Md Sabbir; Kabir, Shah Md Tamim; Khan, Md Sarwar Alam; Khan, Md Mursalin; Hasan, Rajnee; Aktar, Nasima; Honi, Ummay; Islam, Rahin; Rashid, Md Mamunur; Wan, Xuehua; Hou, Shaobin; Haque, Taslima; Azam, Muhammad Shafiul; Moosa, Mahdi Muhammad; Elias, Sabrina M; Hasan, A M Mahedi; Mahmood, Niaz; Shafiuddin, Md; Shahid, Saima; Shommu, Nusrat Sharmeen; Jahan, Sharmin; Roy, Saroj; Chowdhury, Amlan; Akhand, Ashikul Islam; Nisho, Golam Morshad; Uddin, Khaled Salah; Rabeya, Taposhi; Hoque, S M Ekramul; Snigdha, Afsana Rahman; Mortoza, Sarowar; Matin, Syed Abdul; Islam, Md Kamrul; Lashkar, M Z H; Zaman, Mahboob; Yuryev, Anton; Uddin, Md Kamal; Rahman, Md Sharifur; Haque, Md Samiul; Alam, Md Monjurul; Khan, Haseena; Alam, Maqsudul

    2017-01-30

    Jute (Corchorus sp.) is one of the most important sources of natural fibre, covering ∼80% of global bast fibre production 1 . Only Corchorus olitorius and Corchorus capsularis are commercially cultivated, though there are more than 100 Corchorus species 2 in the Malvaceae family. Here we describe high-quality draft genomes of these two species and their comparisons at the functional genomics level to support tailor-designed breeding. The assemblies cover 91.6% and 82.2% of the estimated genome sizes for C. olitorius and C. capsularis, respectively. In total, 37,031 C. olitorius and 30,096 C. capsularis genes are identified, and most of the genes are validated by cDNA and RNA-seq data. Analyses of clustered gene families and gene collinearity show that jute underwent shared whole-genome duplication ∼18.66 million years (Myr) ago prior to speciation. RNA expression analysis from isolated fibre cells reveals the key regulatory and structural genes involved in fibre formation. This work expands our understanding of the molecular basis of fibre formation laying the foundation for the genetic improvement of jute.

  10. Comparative Analysis of Gene Expression for Convergent Evolution of Camera Eye Between Octopus and Human

    PubMed Central

    Ogura, Atsushi; Ikeo, Kazuho; Gojobori, Takashi

    2004-01-01

    Although the camera eye of the octopus is very similar to that of humans, phylogenetic and embryological analyses have suggested that their camera eyes have been acquired independently. It has been known as a typical example of convergent evolution. To study the molecular basis of convergent evolution of camera eyes, we conducted a comparative analysis of gene expression in octopus and human camera eyes. We sequenced 16,432 ESTs of the octopus eye, leading to 1052 nonredundant genes that have matches in the protein database. Comparing these 1052 genes with 13,303 already-known ESTs of the human eye, 729 (69.3%) genes were commonly expressed between the human and octopus eyes. On the contrary, when we compared octopus eye ESTs with human connective tissue ESTs, the expression similarity was quite low. To trace the evolutionary changes that are potentially responsible for camera eye formation, we also compared octopus-eye ESTs with the completed genome sequences of other organisms. We found that 1019 out of the 1052 genes had already existed at the common ancestor of bilateria, and 875 genes were conserved between humans and octopuses. It suggests that a larger number of conserved genes and their similar gene expression may be responsible for the convergent evolution of the camera eye. PMID:15289475

  11. Novel applications of array comparative genomic hybridization in molecular diagnostics.

    PubMed

    Cheung, Sau W; Bi, Weimin

    2018-05-31

    In 2004, the implementation of array comparative genomic hybridization (array comparative genome hybridization [CGH]) into clinical practice marked a new milestone for genetic diagnosis. Array CGH and single-nucleotide polymorphism (SNP) arrays enable genome-wide detection of copy number changes in a high resolution, and therefore microarray has been recognized as the first-tier test for patients with intellectual disability or multiple congenital anomalies, and has also been applied prenatally for detection of clinically relevant copy number variations in the fetus. Area covered: In this review, the authors summarize the evolution of array CGH technology from their diagnostic laboratory, highlighting exonic SNP arrays developed in the past decade which detect small intragenic copy number changes as well as large DNA segments for the region of heterozygosity. The applications of array CGH to human diseases with different modes of inheritance with the emphasis on autosomal recessive disorders are discussed. Expert commentary: An exonic array is a powerful and most efficient clinical tool in detecting genome wide small copy number variants in both dominant and recessive disorders. However, whole-genome sequencing may become the single integrated platform for detection of copy number changes, single-nucleotide changes as well as balanced chromosomal rearrangements in the near future.

  12. arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays

    PubMed Central

    Menten, Björn; Pattyn, Filip; De Preter, Katleen; Robbrecht, Piet; Michels, Evi; Buysse, Karen; Mortier, Geert; De Paepe, Anne; van Vooren, Steven; Vermeesch, Joris; Moreau, Yves; De Moor, Bart; Vermeulen, Stefan; Speleman, Frank; Vandesompele, Jo

    2005-01-01

    Background The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH). One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment. Results We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment) supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser. Conclusion ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at . PMID:15910681

  13. Cost-effective cloud computing: a case study using the comparative genomics tool, roundup.

    PubMed

    Kudtarkar, Parul; Deluca, Todd F; Fusaro, Vincent A; Tonellato, Peter J; Wall, Dennis P

    2010-12-22

    Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource-Roundup-using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon's Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon's computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing infrastructure.

  14. Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup

    PubMed Central

    Kudtarkar, Parul; DeLuca, Todd F.; Fusaro, Vincent A.; Tonellato, Peter J.; Wall, Dennis P.

    2010-01-01

    Background Comparative genomics resources, such as ortholog detection tools and repositories are rapidly increasing in scale and complexity. Cloud computing is an emerging technological paradigm that enables researchers to dynamically build a dedicated virtual cluster and may represent a valuable alternative for large computational tools in bioinformatics. In the present manuscript, we optimize the computation of a large-scale comparative genomics resource—Roundup—using cloud computing, describe the proper operating principles required to achieve computational efficiency on the cloud, and detail important procedures for improving cost-effectiveness to ensure maximal computation at minimal costs. Methods Utilizing the comparative genomics tool, Roundup, as a case study, we computed orthologs among 902 fully sequenced genomes on Amazon’s Elastic Compute Cloud. For managing the ortholog processes, we designed a strategy to deploy the web service, Elastic MapReduce, and maximize the use of the cloud while simultaneously minimizing costs. Specifically, we created a model to estimate cloud runtime based on the size and complexity of the genomes being compared that determines in advance the optimal order of the jobs to be submitted. Results We computed orthologous relationships for 245,323 genome-to-genome comparisons on Amazon’s computing cloud, a computation that required just over 200 hours and cost $8,000 USD, at least 40% less than expected under a strategy in which genome comparisons were submitted to the cloud randomly with respect to runtime. Our cost savings projections were based on a model that not only demonstrates the optimal strategy for deploying RSD to the cloud, but also finds the optimal cluster size to minimize waste and maximize usage. Our cost-reduction model is readily adaptable for other comparative genomics tools and potentially of significant benefit to labs seeking to take advantage of the cloud as an alternative to local computing

  15. Comparative genomics of 3 farm canids in relation to the dog.

    PubMed

    Switonski, M; Szczerbal, I; Nowacka-Woszuk, J

    2009-01-01

    There are 3 canids besides the dog (Canis familiaris): the red fox (Vulpes vulpes), arctic fox (Alopex lagopus) and Chinese raccoon dog (Nyctereutes procyonoides procyonoides), which have been extensively studied with the use of cytogenetic and molecular genetics techniques. These 3 species are considered as farm fur-bearing animals. In addition, they are also useful models in comparative genomic studies of the canids. In this review genome organization, karyotype evolution, comparative marker maps, DNA polymorphism and similarity of selected gene sequences of the 3 farm species are discussed in relation to the dog. Also the nature and variability of the B chromosomes, present in the red fox and the Chinese raccoon dog, were considered. These comparative analyses showed that among the studied canids the Chinese raccoon dog is phylogenetically the closest species to the dog. On the other hand, the most advanced linkage and cytogenetic marker maps of the red fox genome facilitate genome scanning studies with the aim to search for chromosome locations of QTL regions for behavior and production traits. (c) 2009 S. Karger AG, Basel.

  16. Comparative genomics in chicken and Pekin duck using FISH mapping and microarray analysis

    PubMed Central

    2009-01-01

    Background The availability of the complete chicken (Gallus gallus) genome sequence as well as a large number of chicken probes for fluorescent in-situ hybridization (FISH) and microarray resources facilitate comparative genomic studies between chicken and other bird species. In a previous study, we provided a comprehensive cytogenetic map for the turkey (Meleagris gallopavo) and the first analysis of copy number variants (CNVs) in birds. Here, we extend this approach to the Pekin duck (Anas platyrhynchos), an obvious target for comparative genomic studies due to its agricultural importance and resistance to avian flu. Results We provide a detailed molecular cytogenetic map of the duck genome through FISH assignment of 155 chicken clones. We identified one inter- and six intrachromosomal rearrangements between chicken and duck macrochromosomes and demonstrated conserved synteny among all microchromosomes analysed. Array comparative genomic hybridisation revealed 32 CNVs, of which 5 overlap previously designated "hotspot" regions between chicken and turkey. Conclusion Our results suggest extensive conservation of avian genomes across 90 million years of evolution in both macro- and microchromosomes. The data on CNVs between chicken and duck extends previous analyses in chicken and turkey and supports the hypotheses that avian genomes contain fewer CNVs than mammalian genomes and that genomes of evolutionarily distant species share regions of copy number variation ("CNV hotspots"). Our results will expedite duck genomics, assist marker development and highlight areas of interest for future evolutionary and functional studies. PMID:19656363

  17. Genome-Wide Characterization and Expression Profiling of the AUXIN RESPONSE FACTOR (ARF) Gene Family in Eucalyptus grandis

    PubMed Central

    Yu, Hong; Soler, Marçal; Mila, Isabelle; San Clemente, Hélène; Savelli, Bruno; Dunand, Christophe; Paiva, Jorge A. P.; Myburg, Alexander A.; Bouzayen, Mondher; Grima-Pettenati, Jacqueline; Cassan-Wang, Hua

    2014-01-01

    Auxin is a central hormone involved in a wide range of developmental processes including the specification of vascular stem cells. Auxin Response Factors (ARF) are important actors of the auxin signalling pathway, regulating the transcription of auxin-responsive genes through direct binding to their promoters. The recent availability of the Eucalyptus grandis genome sequence allowed us to examine the characteristics and evolutionary history of this gene family in a woody plant of high economic importance. With 17 members, the E. grandis ARF gene family is slightly contracted, as compared to those of most angiosperms studied hitherto, lacking traces of duplication events. In silico analysis of alternative transcripts and gene truncation suggested that these two mechanisms were preeminent in shaping the functional diversity of the ARF family in Eucalyptus. Comparative phylogenetic analyses with genomes of other taxonomic lineages revealed the presence of a new ARF clade found preferentially in woody and/or perennial plants. High-throughput expression profiling among different organs and tissues and in response to environmental cues highlighted genes expressed in vascular cambium and/or developing xylem, responding dynamically to various environmental stimuli. Finally, this study allowed identification of three ARF candidates potentially involved in the auxin-regulated transcriptional program underlying wood formation. PMID:25269088

  18. Evolution of a Cellular Immune Response in Drosophila: A Phenotypic and Genomic Comparative Analysis

    PubMed Central

    Salazar-Jaramillo, Laura; Paspati, Angeliki; van de Zande, Louis; Vermeulen, Cornelis Joseph; Schwander, Tanja; Wertheim, Bregje

    2014-01-01

    Understanding the genomic basis of evolutionary adaptation requires insight into the molecular basis underlying phenotypic variation. However, even changes in molecular pathways associated with extreme variation, gains and losses of specific phenotypes, remain largely uncharacterized. Here, we investigate the large interspecific differences in the ability to survive infection by parasitoids across 11 Drosophila species and identify genomic changes associated with gains and losses of parasitoid resistance. We show that a cellular immune defense, encapsulation, and the production of a specialized blood cell, lamellocytes, are restricted to a sublineage of Drosophila, but that encapsulation is absent in one species of this sublineage, Drosophila sechellia. Our comparative analyses of hemopoiesis pathway genes and of genes differentially expressed during the encapsulation response revealed that hemopoiesis-associated genes are highly conserved and present in all species independently of their resistance. In contrast, 11 genes that are differentially expressed during the response to parasitoids are novel genes, specific to the Drosophila sublineage capable of lamellocyte-mediated encapsulation. These novel genes, which are predominantly expressed in hemocytes, arose via duplications, whereby five of them also showed signatures of positive selection, as expected if they were recruited for new functions. Three of these novel genes further showed large-scale and presumably loss-of-function sequence changes in D. sechellia, consistent with the loss of resistance in this species. In combination, these convergent lines of evidence suggest that co-option of duplicated genes in existing pathways and subsequent neofunctionalization are likely to have contributed to the evolution of the lamellocyte-mediated encapsulation in Drosophila. PMID:24443439

  19. Evolution of a cellular immune response in Drosophila: a phenotypic and genomic comparative analysis.

    PubMed

    Salazar-Jaramillo, Laura; Paspati, Angeliki; van de Zande, Louis; Vermeulen, Cornelis Joseph; Schwander, Tanja; Wertheim, Bregje

    2014-02-01

    Understanding the genomic basis of evolutionary adaptation requires insight into the molecular basis underlying phenotypic variation. However, even changes in molecular pathways associated with extreme variation, gains and losses of specific phenotypes, remain largely uncharacterized. Here, we investigate the large interspecific differences in the ability to survive infection by parasitoids across 11 Drosophila species and identify genomic changes associated with gains and losses of parasitoid resistance. We show that a cellular immune defense, encapsulation, and the production of a specialized blood cell, lamellocytes, are restricted to a sublineage of Drosophila, but that encapsulation is absent in one species of this sublineage, Drosophila sechellia. Our comparative analyses of hemopoiesis pathway genes and of genes differentially expressed during the encapsulation response revealed that hemopoiesis-associated genes are highly conserved and present in all species independently of their resistance. In contrast, 11 genes that are differentially expressed during the response to parasitoids are novel genes, specific to the Drosophila sublineage capable of lamellocyte-mediated encapsulation. These novel genes, which are predominantly expressed in hemocytes, arose via duplications, whereby five of them also showed signatures of positive selection, as expected if they were recruited for new functions. Three of these novel genes further showed large-scale and presumably loss-of-function sequence changes in D. sechellia, consistent with the loss of resistance in this species. In combination, these convergent lines of evidence suggest that co-option of duplicated genes in existing pathways and subsequent neofunctionalization are likely to have contributed to the evolution of the lamellocyte-mediated encapsulation in Drosophila.

  20. IMG 4 version of the integrated microbial genomes comparative analysis system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts providemore » support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).« less

  1. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  2. Genome-wide organization and expression profiling of the R2R3-MYB transcription factor family in pineapple (Ananas comosus).

    PubMed

    Liu, Chaoyang; Xie, Tao; Chen, Chenjie; Luan, Aiping; Long, Jianmei; Li, Chuhao; Ding, Yaqi; He, Yehua

    2017-07-01

    The MYB proteins comprise one of the largest families of plant transcription factors, which are involved in various plant physiological and biochemical processes. Pineapple (Ananas comosus) is one of three most important tropical fruits worldwide. The completion of pineapple genome sequencing provides a great opportunity to investigate the organization and evolutionary traits of pineapple MYB genes at the genome-wide level. In the present study, a total of 94 pineapple R2R3-MYB genes were identified and further phylogenetically classified into 26 subfamilies, as supported by the conserved gene structures and motif composition. Collinearity analysis indicated that the segmental duplication events played a crucial role in the expansion of pineapple MYB gene family. Further comparative phylogenetic analysis suggested that there have been functional divergences of MYB gene family during plant evolution. RNA-seq data from different tissues and developmental stages revealed distinct temporal and spatial expression profiles of the AcMYB genes. Further quantitative expression analysis showed the specific expression patterns of the selected putative stress-related AcMYB genes in response to distinct abiotic stress and hormonal treatments. The comprehensive expression analysis of the pineapple MYB genes, especially the tissue-preferential and stress-responsive genes, could provide valuable clues for further function characterization. In this work, we systematically identified AcMYB genes by analyzing the pineapple genome sequence using a set of bioinformatics approaches. Our findings provide a global insight into the organization, phylogeny and expression patterns of the pineapple R2R3-MYB genes, and hence contribute to the greater understanding of their biological roles in pineapple.

  3. Genome-wide Gene Expression Profiling of Acute Metal Exposures in Male Zebrafish

    DTIC Science & Technology

    2014-10-23

    Data in Brief Genome-wide gene expression profiling of acute metal exposures in male zebrafish Christine E. Baer a,⁎, Danielle L. Ippolito b, Naissan... Zebrafish Whole organism Nickel Chromium Cobalt Toxicogenomics To capture global responses to metal poisoning and mechanistic insights into metal...toxicity, gene expression changes were evaluated in whole adult male zebrafish following acute 24 h high dose exposure to three metals with known human

  4. Comparative genomic analysis of seven Mycoplasma hyosynoviae strains

    PubMed Central

    Bumgardner, Eric A; Kittichotirat, Weerayuth; Bumgarner, Roger E; Lawrence, Paulraj K

    2015-01-01

    Infection with Mycoplasma hyosynoviae can result in debilitating arthritis in pigs, particularly those aged 10 weeks or older. Strategies for controlling this pathogen are becoming increasingly important due to the rise in the number of cases of arthritis that have been attributed to infection in recent years. In order to begin to develop interventions to prevent arthritis caused by M. hyosynoviae, more information regarding the specific proteins and potential virulence factors that its genome encodes was needed. However, the genome of this emerging swine pathogen had not been sequenced previously. In this report, we present a comparative analysis of the genomes of seven strains of M. hyosynoviae isolated from different locations in North America during the years 2010 to 2013. We identified several putative virulence factors that may contribute to the ability of this pathogen to adhere to host cells. Additionally, we discovered several prophage genes present within the genomes of three strains that show significant similarity to MAV1, a phage isolated from the related species, M. arthritidis. We also identified CRISPR-Cas and type III restriction and modification systems present in two strains that may contribute to their ability to defend against phage infection. PMID:25693846

  5. Genome-wide expressions in autologous eutopic and ectopic endometrium of fertile women with endometriosis.

    PubMed

    Khan, Meraj A; Sengupta, Jayasree; Mittal, Suneeta; Ghosh, Debabrata

    2012-09-24

    In order to obtain a lead of the pathophysiology of endometriosis, genome-wide expressional analyses of eutopic and ectopic endometrium have earlier been reported, however, the effects of stages of severity and phases of menstrual cycle on expressional profiles have not been examined. The effect of genetic heterogeneity and fertility history on transcriptional activity was also not considered. In the present study, a genome-wide expression analysis of autologous, paired eutopic and ectopic endometrial samples obtained from fertile women (n=18) suffering from moderate (stage 3; n=8) or severe (stage 4; n=10) ovarian endometriosis during proliferative (n=13) and secretory (n=5) phases of menstrual cycle was performed. Individual pure RNA samples were subjected to Agilent's Whole Human Genome 44K microarray experiments. Microarray data were validated (P<0.01) by estimating transcript copy numbers by performing real time RT-PCR of seven (7) arbitrarily selected genes in all samples. The data obtained were subjected to differential expression (DE) and differential co-expression (DC) analyses followed by networks and enrichment analysis, and gene set enrichment analysis (GSEA). The reproducibility of prediction based on GSEA implementation of DC results was assessed by examining the relative expressions of twenty eight (28) selected genes in RNA samples obtained from fresh pool of eutopic and ectopic samples from confirmed ovarian endometriosis patients with stages 3 and 4 (n=4/each) during proliferative and secretory (n=4/each) phases. Higher clustering effect of pairing (cluster distance, cd=0.1) in samples from same individuals on expressional arrays among eutopic and ectopic samples was observed as compared to that of clinical stages of severity (cd=0.5) and phases of menstrual cycle (cd=0.6). Post hoc analysis revealed anomaly in the expressional profiles of several genes associated with immunological, neuracrine and endocrine functions and gynecological cancers

  6. In vitro synthesis of oncogenic human papillomaviruses requires episomal genomes for differentiation-dependent late expression.

    PubMed Central

    Frattini, M G; Lim, H B; Laimins, L A

    1996-01-01

    Human papillomavirus (HPV) types 16, 18, 31, and 51 are the etiologic agents of many anogenital cancers including those of the cervix. These "high risk" HPVs specifically target genital squamous epithelia, and their lytic life cycle is closely linked to epithelial differentiation. We have developed a genetic assay for HPV functions during pathogenesis using recircularized cloned HPV 31 genomes that were transfected together with a drug resistance marker into monolayer cultures of normal human foreskin keratinocytes, the natural host cell. After drug selection, cell lines were isolated that stably maintained HPV 31 DNA as episomes and underwent terminal differentiation when grown in organotypic raft cultures. In differentiated rafts, the expression of late viral genes, amplification of viral DNA, and production of viral particles were detected in suprabasal cells. This demonstrated the ability to synthesize HPV 31 virions from transfected DNA templates and allowed an examination of HPV functions during the vegetative viral life cycle. We then used this system to investigate whether an episomal genome was required for the induction of late viral gene expression. When an HPV 31 genome (31E1*) containing a missense mutation in the E1 open reading frame was transfected into normal human keratinocytes, the mutant viral sequences were found to integrate into the host cell chromosomal DNA with both early and late regions intact. While high levels of early viral gene transcription were observed, no late gene expression was detected in rafts of cell lines containing the mutant viral genome despite evidence of terminal differentiation. Therefore, the induction of late viral gene expression required that the viral genomes be maintained as extrachromosomal elements, and terminal differentiation alone was not sufficient. These studies provide the basis for a detailed examination of HPV functions during viral pathogenesis. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 Fig. 7

  7. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  8. Comparative genome analysis identifies two large deletions in the genome of highly-passaged attenuated Streptococcus agalactiae strain YM001 compared to the parental pathogenic strain HN016.

    PubMed

    Wang, Rui; Li, Liping; Huang, Yan; Luo, Fuguang; Liang, Wanwen; Gan, Xi; Huang, Ting; Lei, Aiying; Chen, Ming; Chen, Lianfu

    2015-11-04

    Streptococcus agalactiae (S. agalactiae), also known as group B Streptococcus (GBS), is an important pathogen for neonatal pneumonia, meningitis, bovine mastitis, and fish meningoencephalitis. The global outbreaks of Streptococcus disease in tilapia cause huge economic losses and threaten human food hygiene safety as well. To investigate the mechanism of S. agalactiae pathogenesis in tilapia and develop attenuated S. agalactiae vaccine, this study sequenced and comparatively analyzed the whole genomes of virulent wild-type S. agalactiae strain HN016 and its highly-passaged attenuated strain YM001 derived from tilapia. We performed Illumina sequencing of DNA prepared from strain HN016 and YM001. Sequencedreads were assembled and nucleotide comparisons, single nucleotide polymorphism (SNP) , indels were analyzed between the draft genomes of HN016 and YM001. Clustered regularly interspaced short palindromic repeats (CRISPRs) and prophage were detected and analyzed in different S. agalactiae strains. The genome of S. agalactiae YM001 was 2,047,957 bp with a GC content of 35.61 %; it contained 2044 genes and 88 RNAs. Meanwhile, the genome of S. agalactiae HN016 was 2,064,722 bp with a GC content of 35.66 %; it had 2063 genes and 101 RNAs. Comparative genome analysis indicated that compared with HN016, YM001 genome had two significant large deletions, at the sizes of 5832 and 11,116 bp respectively, resulting in the deletion of three rRNA and ten tRNA genes, as well as the deletion and functional damage of ten genes related to metabolism, transport, growth, anti-stress, etc. Besides these two large deletions, other ten deletions and 28 single nucleotide variations (SNVs) were also identified, mainly affecting the metabolism- and growth-related genes. The genome of attenuated S. agalactiae YM001 showed significant variations, resulting in the deletion of 10 functional genes, compared to the parental pathogenic strain HN016. The deleted and mutated functional genes all

  9. Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions.

    PubMed

    Chen, Yan ping; Pettis, Jeffery S; Zhao, Yan; Liu, Xinyue; Tallon, Luke J; Sadzewicz, Lisa D; Li, Renhua; Zheng, Huoqing; Huang, Shaokang; Zhang, Xuan; Hamilton, Michele C; Pernal, Stephen F; Melathopoulos, Andony P; Yan, Xianghe; Evans, Jay D

    2013-07-05

    The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite.

  10. Genomic and Expression Profiling of Benign and Malignant Nerve Sheath Profiling of Benign and Malignant Nerve Sheath

    DTIC Science & Technology

    2007-05-01

    Benign and Malignant Nerve Sheath Tumors in Neurofibromatosis Patients PRINCIPAL INVESTIGATOR: Matt van de Rijn, M.D., Ph.D. Torsten...Annual 3. DATES COVERED 1 May 2006 –30 Apr 2007 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Genomic and Expression Profiling of Benign and Malignant Nerve...Award Number: DAMD17-03-1-0297 Title: Genomic and Expression Profiling of Benign and Malignant Nerve Sheath Tumors in Neurofibromatosis

  11. Complete genome sequence of Enterococcus faecium strain TX16 and comparative genomic analysis of Enterococcus faecium genomes

    PubMed Central

    2012-01-01

    Background Enterococci are among the leading causes of hospital-acquired infections in the United States and Europe, with Enterococcus faecalis and Enterococcus faecium being the two most common species isolated from enterococcal infections. In the last decade, the proportion of enterococcal infections caused by E. faecium has steadily increased compared to other Enterococcus species. Although the underlying mechanism for the gradual replacement of E. faecalis by E. faecium in the hospital environment is not yet understood, many studies using genotyping and phylogenetic analysis have shown the emergence of a globally dispersed polyclonal subcluster of E. faecium strains in clinical environments. Systematic study of the molecular epidemiology and pathogenesis of E. faecium has been hindered by the lack of closed, complete E. faecium genomes that can be used as references. Results In this study, we report the complete genome sequence of the E. faecium strain TX16, also known as DO, which belongs to multilocus sequence type (ST) 18, and was the first E. faecium strain ever sequenced. Whole genome comparison of the TX16 genome with 21 E. faecium draft genomes confirmed that most clinical, outbreak, and hospital-associated (HA) strains (including STs 16, 17, 18, and 78), in addition to strains of non-hospital origin, group in the same clade (referred to as the HA clade) and are evolutionally considerably more closely related to each other by phylogenetic and gene content similarity analyses than to isolates in the community-associated (CA) clade with approximately a 3–4% average nucleotide sequence difference between the two clades at the core genome level. Our study also revealed that many genomic loci in the TX16 genome are unique to the HA clade. 380 ORFs in TX16 are HA-clade specific and antibiotic resistance genes are enriched in HA-clade strains. Mobile elements such as IS16 and transposons were also found almost exclusively in HA strains, as previously reported

  12. Expression Quantitative Trait Locus Mapping across Water Availability Environments Reveals Contrasting Associations with Genomic Features in Arabidopsis[C][W][OPEN

    PubMed Central

    Lowry, David B.; Logan, Tierney L.; Santuari, Luca; Hardtke, Christian S.; Richards, James H.; DeRose-Wilson, Leah J.; McKay, John K.; Sen, Saunak; Juenger, Thomas E.

    2013-01-01

    The regulation of gene expression is crucial for an organism’s development and response to stress, and an understanding of the evolution of gene expression is of fundamental importance to basic and applied biology. To improve this understanding, we conducted expression quantitative trait locus (eQTL) mapping in the Tsu-1 (Tsushima, Japan) × Kas-1 (Kashmir, India) recombinant inbred line population of Arabidopsis thaliana across soil drying treatments. We then used genome resequencing data to evaluate whether genomic features (promoter polymorphism, recombination rate, gene length, and gene density) are associated with genes responding to the environment (E) or with genes with genetic variation (G) in gene expression in the form of eQTLs. We identified thousands of genes that responded to soil drying and hundreds of main-effect eQTLs. However, we identified very few statistically significant eQTLs that interacted with the soil drying treatment (GxE eQTL). Analysis of genome resequencing data revealed associations of several genomic features with G and E genes. In general, E genes had lower promoter diversity and local recombination rates. By contrast, genes with eQTLs (G) had significantly greater promoter diversity and were located in genomic regions with higher recombination. These results suggest that genomic architecture may play an important a role in the evolution of gene expression. PMID:24045022

  13. Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis.

    PubMed

    Wang, Yan; Stata, Matt; Wang, Wei; Stajich, Jason E; White, Merlin M; Moncalvo, Jean-Marc

    2018-05-15

    Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. IMPORTANCE Insect guts harbor various microbes that are important for host digestion, immune response, and disease dispersal in certain cases. Bacteria, which are among the primary endosymbionts, have been studied extensively. However, fungi, which are also frequently encountered

  14. Genomic organization and expression of the human MSH3 gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Watanabe, Atsushi; Ikejima, Miyoko; Suzuki, Noriko

    1996-02-01

    We have studied the expression and genomic organization of the human MSH3 gene, which encodes a human homologue of the bacterial DNA mismatch repair protein MutS. This gene is located upstream of the dihydrofolate reductase (DHFR) gene. Northern analysis has demonstrated that the hMSH3 gene is expressed in a variety of human tissues at low levels, like the DHFR gene. Characterization of cosmid clones has shown that the hMSH3 gene consists of 24 exons spanning at least 160 kb. All exon-intron junction sequences match the classical GT/AG rule, except that intron 6 has AT and AA at the ends. Twomore » major transcripts of 5.0 and 3.8 kb have been shown to be derived from the differential use of two polyadenylation sites. Elucidation of the complete genomic organization and the nucleotide sequences of the introns of the hMSH3 gene should be useful for studying the function of this gene and the possible involvement of specific mutations of the hMSH3 gene in some diseases. 34 refs., 5 figs., 1 tab.« less

  15. Quantitative analysis of comparative genomic hybridization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Manoir, S. du; Bentz, M.; Joos, S.

    1995-01-01

    Comparative genomic hybridization (CGH) is a new molecular cytogenetic method for the detection of chromosomal imbalances. Following cohybridization of DNA prepared from a sample to be studied and control DNA to normal metaphase spreads, probes are detected via different fluorochromes. The ratio of the test and control fluorescence intensities along a chromosome reflects the relative copy number of segments of a chromosome in the test genome. Quantitative evaluation of CGH experiments is required for the determination of low copy changes, e.g., monosomy or trisomy, and for the definition of the breakpoints involved in unbalanced rearrangements. In this study, a programmore » for quantitation of CGH preparations is presented. This program is based on the extraction of the fluorescence ratio profile along each chromosome, followed by averaging of individual profiles from several metaphase spreads. Objective parameters critical for quantitative evaluations were tested, and the criteria for selection of suitable CGH preparations are described. The granularity of the chromosome painting and the regional inhomogeneity of fluorescence intensities in metaphase spreads proved to be crucial parameters. The coefficient of variation of the ratio value for chromosomes in balanced state (CVBS) provides a general quality criterion for CGH experiments. Different cutoff levels (thresholds) of average fluorescence ratio values were compared for their specificity and sensitivity with regard to the detection of chromosomal imbalances. 27 refs., 15 figs., 1 tab.« less

  16. Comparative genomics of multidrug resistance in Acinetobacter baumannii.

    PubMed

    Fournier, Pierre-Edouard; Vallenet, David; Barbe, Valérie; Audic, Stéphane; Ogata, Hiroyuki; Poirel, Laurent; Richet, Hervé; Robert, Catherine; Mangenot, Sophie; Abergel, Chantal; Nordmann, Patrice; Weissenbach, Jean; Raoult, Didier; Claverie, Jean-Michel

    2006-01-01

    Acinetobacter baumannii is a species of nonfermentative gram-negative bacteria commonly found in water and soil. This organism was susceptible to most antibiotics in the 1970s. It has now become a major cause of hospital-acquired infections worldwide due to its remarkable propensity to rapidly acquire resistance determinants to a wide range of antibacterial agents. Here we use a comparative genomic approach to identify the complete repertoire of resistance genes exhibited by the multidrug-resistant A. baumannii strain AYE, which is epidemic in France, as well as to investigate the mechanisms of their acquisition by comparison with the fully susceptible A. baumannii strain SDF, which is associated with human body lice. The assembly of the whole shotgun genome sequences of the strains AYE and SDF gave an estimated size of 3.9 and 3.2 Mb, respectively. A. baumannii strain AYE exhibits an 86-kb genomic region termed a resistance island--the largest identified to date--in which 45 resistance genes are clustered. At the homologous location, the SDF strain exhibits a 20 kb-genomic island flanked by transposases but devoid of resistance markers. Such a switching genomic structure might be a hotspot that could explain the rapid acquisition of resistance markers under antimicrobial pressure. Sequence similarity and phylogenetic analyses confirm that most of the resistance genes found in the A. baumannii strain AYE have been recently acquired from bacteria of the genera Pseudomonas, Salmonella, or Escherichia. This study also resulted in the discovery of 19 new putative resistance genes. Whole-genome sequencing appears to be a fast and efficient approach to the exhaustive identification of resistance genes in epidemic infectious agents of clinical significance.

  17. Genome sequences and comparative genomics of two Lactobacillus ruminis strains from the bovine and human intestinal tracts

    PubMed Central

    2011-01-01

    Background The genus Lactobacillus is characterized by an extraordinary degree of phenotypic and genotypic diversity, which recent genomic analyses have further highlighted. However, the choice of species for sequencing has been non-random and unequal in distribution, with only a single representative genome from the L. salivarius clade available to date. Furthermore, there is no data to facilitate a functional genomic analysis of motility in the lactobacilli, a trait that is restricted to the L. salivarius clade. Results The 2.06 Mb genome of the bovine isolate Lactobacillus ruminis ATCC 27782 comprises a single circular chromosome, and has a G+C content of 44.4%. In silico analysis identified 1901 coding sequences, including genes for a pediocin-like bacteriocin, a single large exopolysaccharide-related cluster, two sortase enzymes, two CRISPR loci and numerous IS elements and pseudogenes. A cluster of genes related to a putative pilin was identified, and shown to be transcribed in vitro. A high quality draft assembly of the genome of a second L. ruminis strain, ATCC 25644 isolated from humans, suggested a slightly larger genome of 2.138 Mb, that exhibited a high degree of synteny with the ATCC 27782 genome. In contrast, comparative analysis of L. ruminis and L. salivarius identified a lack of long-range synteny between these closely related species. Comparison of the L. salivarius clade core proteins with those of nine other Lactobacillus species distributed across 4 major phylogenetic groups identified the set of shared proteins, and proteins unique to each group. Conclusions The genome of L. ruminis provides a comparative tool for directing functional analyses of other members of the L. salivarius clade, and it increases understanding of the divergence of this distinct Lactobacillus lineage from other commensal lactobacilli. The genome sequence provides a definitive resource to facilitate investigation of the genetics, biochemistry and host interactions of

  18. Upgrading short-read animal genome assemblies to chromosome level using comparative genomics and a universal probe set.

    PubMed

    Damas, Joana; O'Connor, Rebecca; Farré, Marta; Lenis, Vasileios Panagiotis E; Martell, Henry J; Mandawala, Anjali; Fowler, Katie; Joseph, Sunitha; Swain, Martin T; Griffin, Darren K; Larkin, Denis M

    2017-05-01

    Most recent initiatives to sequence and assemble new species' genomes de novo fail to achieve the ultimate endpoint to produce contigs, each representing one whole chromosome. Even the best-assembled genomes (using contemporary technologies) consist of subchromosomal-sized scaffolds. To circumvent this problem, we developed a novel approach that combines computational algorithms to merge scaffolds into chromosomal fragments, PCR-based scaffold verification, and physical mapping to chromosomes. Multigenome-alignment-guided probe selection led to the development of a set of universal avian BAC clones that permit rapid anchoring of multiple scaffolds to chromosomes on all avian genomes. As proof of principle, we assembled genomes of the pigeon ( Columbia livia ) and peregrine falcon ( Falco peregrinus ) to chromosome levels comparable, in continuity, to avian reference genomes. Both species are of interest for breeding, cultural, food, and/or environmental reasons. Pigeon has a typical avian karyotype (2n = 80), while falcon (2n = 50) is highly rearranged compared to the avian ancestor. By using chromosome breakpoint data, we established that avian interchromosomal breakpoints appear in the regions of low density of conserved noncoding elements (CNEs) and that the chromosomal fission sites are further limited to long CNE "deserts." This corresponds with fission being the rarest type of rearrangement in avian genome evolution. High-throughput multiple hybridization and rapid capture strategies using the current BAC set provide the basis for assembling numerous avian (and possibly other reptilian) species, while the overall strategy for scaffold assembly and mapping provides the basis for an approach that (provided metaphases can be generated) could be applied to any animal genome. © 2017 Damas et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Upgrading short-read animal genome assemblies to chromosome level using comparative genomics and a universal probe set

    PubMed Central

    O'Connor, Rebecca; Lenis, Vasileios Panagiotis E.; Martell, Henry J.; Mandawala, Anjali; Fowler, Katie; Joseph, Sunitha; Swain, Martin T.; Griffin, Darren K.; Larkin, Denis M.

    2017-01-01

    Most recent initiatives to sequence and assemble new species’ genomes de novo fail to achieve the ultimate endpoint to produce contigs, each representing one whole chromosome. Even the best-assembled genomes (using contemporary technologies) consist of subchromosomal-sized scaffolds. To circumvent this problem, we developed a novel approach that combines computational algorithms to merge scaffolds into chromosomal fragments, PCR-based scaffold verification, and physical mapping to chromosomes. Multigenome-alignment-guided probe selection led to the development of a set of universal avian BAC clones that permit rapid anchoring of multiple scaffolds to chromosomes on all avian genomes. As proof of principle, we assembled genomes of the pigeon (Columbia livia) and peregrine falcon (Falco peregrinus) to chromosome levels comparable, in continuity, to avian reference genomes. Both species are of interest for breeding, cultural, food, and/or environmental reasons. Pigeon has a typical avian karyotype (2n = 80), while falcon (2n = 50) is highly rearranged compared to the avian ancestor. By using chromosome breakpoint data, we established that avian interchromosomal breakpoints appear in the regions of low density of conserved noncoding elements (CNEs) and that the chromosomal fission sites are further limited to long CNE “deserts.” This corresponds with fission being the rarest type of rearrangement in avian genome evolution. High-throughput multiple hybridization and rapid capture strategies using the current BAC set provide the basis for assembling numerous avian (and possibly other reptilian) species, while the overall strategy for scaffold assembly and mapping provides the basis for an approach that (provided metaphases can be generated) could be applied to any animal genome. PMID:27903645

  20. Identification of candidate genes involved in neuroblastoma progression by combining genomic and expression microarrays with survival data.

    PubMed

    Łastowska, M; Viprey, V; Santibanez-Koref, M; Wappler, I; Peters, H; Cullinane, C; Roberts, P; Hall, A G; Tweddle, D A; Pearson, A D J; Lewis, I; Burchill, S A; Jackson, M S

    2007-11-22

    Identifying genes, whose expression is consistently altered by chromosomal gains or losses, is an important step in defining genes of biological relevance in a wide variety of tumour types. However, additional criteria are needed to discriminate further among the large number of candidate genes identified. This is particularly true for neuroblastoma, where multiple genomic copy number changes of proven prognostic value exist. We have used Affymetrix microarrays and a combination of fluorescent in situ hybridization and single nucleotide polymorphism (SNP) microarrays to establish expression profiles and delineate copy number alterations in 30 primary neuroblastomas. Correlation of microarray data with patient survival and analysis of expression within rodent neuroblastoma cell lines were then used to define further genes likely to be involved in the disease process. Using this approach, we identify >1000 genes within eight recurrent genomic alterations (loss of 1p, 3p, 4p, 10q and 11q, 2p gain, 17q gain, and the MYCN amplicon) whose expression is consistently altered by copy number change. Of these, 84 correlate with patient survival, with the minimal regions of 17q gain and 4p loss being enriched significantly for such genes. These include genes involved in RNA and DNA metabolism, and apoptosis. Orthologues of all but one of these genes on 17q are overexpressed in rodent neuroblastoma cell lines. A significant excess of SNPs whose copy number correlates with survival is also observed on proximal 4p in stage 4 tumours, and we find that deletion of 4p is associated with improved outcome in an extended cohort of tumours. These results define the major impact of genomic copy number alterations upon transcription within neuroblastoma, and highlight genes on distal 17q and proximal 4p for downstream analyses. They also suggest that integration of discriminators, such as survival and comparative gene expression, with microarray data may be useful in the identification of

  1. Integrated Analysis of Genome-wide Copy Number Alterations and Gene Expression in MSS, CIMP-negative Colon Cancer

    PubMed Central

    Loo, Lenora WM; Tiirikainen, Maarit; Cheng, Iona; Lum-Jones, Annette; Seifried, Ann; Church, James M; Gryfe, Robert; Weisenberger, Daniel J; Lindor, Noralane M; Gallinger, Steven; Haile, Robert W; Duggan, David J; Thibodeau, Stephen N; Casey, Graham; Le Marchand, Loïc

    2014-01-01

    Microsatellite stable (MSS), CpG island methylator phenotype (CIMP)-negative colorectal tumors, the most prevalent molecular subtype of colorectal cancer, are associated with extensive copy number alteration (CNA) events and aneuploidy. We report on the identification of characteristic recurrent CNA (with frequency >25%) events and associated gene expression profiles for a total of 40 paired tumor and adjacent normal colon tissues using genome-wide microarrays. We observed recurrent CNAs, namely gains at 1q, 7p, 7q, 8p12-11, 8q, 12p13, 13q, 20p, 20q, Xp, and Xq and losses at 1p36, 1p31, 1p21, 4p15-12, 4q12-35, 5q21-22, 6q26, 8p, 14q, 15q11-12, 17p, 18p, 18q, 21q21-22, and 22q. Within these genomic regions we identified 356 genes with significant differential expression (P<0.0001 and ±1.5 fold change) in the tumor compared to adjacent normal tissue. Gene ontology and pathway analyses indicated that many of these genes were involved in functional mechanisms that regulate cell cycle, cell death, and metabolism. An amplicon present in >70% of the tumor samples at 20q11-20q13 contained several cancer-related genes (AHCY, POFUT1, RPN2, TH1L and PRPF6) that were up-regulated and demonstrated a significant linear correlation (P<0.05) for gene dosage and gene expression. Copy number loss at 8p, a CNA associated with adenocarcinoma and poor prognosis, was observed in >50% of the tumor samples and demonstrated a significant linear correlation for gene dosage and gene expression for two potential tumor suppressor genes, MTUS1 (8p22) and PPP2CB (8p12). The results from our integration analysis illustrate the complex relationship between genomic alterations and gene expression in colon cancer. PMID:23341073

  2. Complete genome sequence of the fire blight pathogen Erwinia pyrifoliae DSM 12163T and comparative genomic insights into plant pathogenicity

    PubMed Central

    2010-01-01

    Background Erwinia pyrifoliae is a newly described necrotrophic pathogen, which causes fire blight on Asian (Nashi) pear and is geographically restricted to Eastern Asia. Relatively little is known about its genetics compared to the closely related main fire blight pathogen E. amylovora. Results The genome of the type strain of E. pyrifoliae strain DSM 12163T, was sequenced using both 454 and Solexa pyrosequencing and annotated. The genome contains a circular chromosome of 4.026 Mb and four small plasmids. Based on their respective role in virulence in E. amylovora or related organisms, we identified several putative virulence factors, including type III and type VI secretion systems and their effectors, flagellar genes, sorbitol metabolism, iron uptake determinants, and quorum-sensing components. A deletion in the rpoS gene covering the most conserved region of the protein was identified which may contribute to the difference in virulence/host-range compared to E. amylovora. Comparative genomics with the pome fruit epiphyte Erwinia tasmaniensis Et1/99 showed that both species are overall highly similar, although specific differences were identified, for example the presence of some phage gene-containing regions and a high number of putative genomic islands containing transposases in the E. pyrifoliae DSM 12163T genome. Conclusions The E. pyrifoliae genome is an important addition to the published genome of E. tasmaniensis and the unfinished genome of E. amylovora providing a foundation for re-sequencing additional strains that may shed light on the evolution of the host-range and virulence/pathogenicity of this important group of plant-associated bacteria. PMID:20047678

  3. Proteomics and comparative genomics of Nitrososphaera viennensis reveal the core genome and adaptations of archaeal ammonia oxidizers

    PubMed Central

    Kerou, Melina; Offre, Pierre; Valledor, Luis; Abby, Sophie S.; Melcher, Michael; Nagler, Matthias; Weckwerth, Wolfram; Schleper, Christa

    2016-01-01

    Ammonia-oxidizing archaea (AOA) are among the most abundant microorganisms and key players in the global nitrogen and carbon cycles. They share a common energy metabolism but represent a heterogeneous group with respect to their environmental distribution and adaptions, growth requirements, and genome contents. We report here the genome and proteome of Nitrososphaera viennensis EN76, the type species of the archaeal class Nitrososphaeria of the phylum Thaumarchaeota encompassing all known AOA. N. viennensis is a soil organism with a 2.52-Mb genome and 3,123 predicted protein-coding genes. Proteomic analysis revealed that nearly 50% of the predicted genes were translated under standard laboratory growth conditions. Comparison with genomes of closely related species of the predominantly terrestrial Nitrososphaerales as well as the more streamlined marine Nitrosopumilales [Candidatus (Ca.) order] and the acidophile “Ca. Nitrosotalea devanaterra” revealed a core genome of AOA comprising 860 genes, which allowed for the reconstruction of central metabolic pathways common to all known AOA and expressed in the N. viennensis and “Ca. Nitrosopelagicus brevis” proteomes. Concomitantly, we were able to identify candidate proteins for as yet unidentified crucial steps in central metabolisms. In addition to unraveling aspects of core AOA metabolism, we identified specific metabolic innovations associated with the Nitrososphaerales mediating growth and survival in the soil milieu, including the capacity for biofilm formation, cell surface modifications and cell adhesion, and carbohydrate conversions as well as detoxification of aromatic compounds and drugs. PMID:27864514

  4. Sequencing and comparative analyses of the genomes of zoysiagrasses

    PubMed Central

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-01-01

    Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196

  5. Sequencing and comparative analyses of the genomes of zoysiagrasses.

    PubMed

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-04-01

    Zoysiais a warm-season turfgrass, which comprises 11 allotetraploid species (2n= 4x= 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession 'Nagirizaki' (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella 'Wakaba' and Z. pacifica 'Zanpa' were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica'Kyoto', Z. japonica'Miyagi' and Z. matrella'Chiba Fair Green', were accumulated, and aligned against the reference genome of 'Nagirizaki' along with those from 'Wakaba' and 'Zanpa'. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the 'Zoysia Genome Database' at http://zoysia.kazusa.or.jp. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  6. Comparative Genomics of Flatworms (Platyhelminthes) Reveals Shared Genomic Features of Ecto- and Endoparastic Neodermata

    PubMed Central

    Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

    2014-01-01

    The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host–parasite interactions and speciation in the highly diverse monogenean flatworms. PMID:24732282

  7. Genome-wide identification, characterization, and expression profile of aquaporin gene family in flax (Linum usitatissimum)

    PubMed Central

    Shivaraj, S. M.; Deshmukh, Rupesh K.; Rai, Rhitu; Bélanger, Richard; Agrawal, Pawan K.; Dash, Prasanta K.

    2017-01-01

    Membrane intrinsic proteins (MIPs) form transmembrane channels and facilitate transport of myriad substrates across the cell membrane in many organisms. Majority of plant MIPs have water transporting ability and are commonly referred as aquaporins (AQPs). In the present study, we identified aquaporin coding genes in flax by genome-wide analysis, their structure, function and expression pattern by pan-genome exploration. Cross-genera phylogenetic analysis with known aquaporins from rice, arabidopsis, and poplar showed five subgroups of flax aquaporins representing 16 plasma membrane intrinsic proteins (PIPs), 17 tonoplast intrinsic proteins (TIPs), 13 NOD26-like intrinsic proteins (NIPs), 2 small basic intrinsic proteins (SIPs), and 3 uncharacterized intrinsic proteins (XIPs). Amongst aquaporins, PIPs contained hydrophilic aromatic arginine (ar/R) selective filter but TIP, NIP, SIP and XIP subfamilies mostly contained hydrophobic ar/R selective filter. Analysis of RNA-seq and microarray data revealed high expression of PIPs in multiple tissues, low expression of NIPs, and seed specific expression of TIP3 in flax. Exploration of aquaporin homologs in three closely related Linum species bienne, grandiflorum and leonii revealed presence of 49, 39 and 19 AQPs, respectively. The genome-wide identification of aquaporins, first in flax, provides insight to elucidate their physiological and developmental roles in flax. PMID:28447607

  8. Genome-wide identification, characterization, and expression profile of aquaporin gene family in flax (Linum usitatissimum).

    PubMed

    Shivaraj, S M; Deshmukh, Rupesh K; Rai, Rhitu; Bélanger, Richard; Agrawal, Pawan K; Dash, Prasanta K

    2017-04-27

    Membrane intrinsic proteins (MIPs) form transmembrane channels and facilitate transport of myriad substrates across the cell membrane in many organisms. Majority of plant MIPs have water transporting ability and are commonly referred as aquaporins (AQPs). In the present study, we identified aquaporin coding genes in flax by genome-wide analysis, their structure, function and expression pattern by pan-genome exploration. Cross-genera phylogenetic analysis with known aquaporins from rice, arabidopsis, and poplar showed five subgroups of flax aquaporins representing 16 plasma membrane intrinsic proteins (PIPs), 17 tonoplast intrinsic proteins (TIPs), 13 NOD26-like intrinsic proteins (NIPs), 2 small basic intrinsic proteins (SIPs), and 3 uncharacterized intrinsic proteins (XIPs). Amongst aquaporins, PIPs contained hydrophilic aromatic arginine (ar/R) selective filter but TIP, NIP, SIP and XIP subfamilies mostly contained hydrophobic ar/R selective filter. Analysis of RNA-seq and microarray data revealed high expression of PIPs in multiple tissues, low expression of NIPs, and seed specific expression of TIP3 in flax. Exploration of aquaporin homologs in three closely related Linum species bienne, grandiflorum and leonii revealed presence of 49, 39 and 19 AQPs, respectively. The genome-wide identification of aquaporins, first in flax, provides insight to elucidate their physiological and developmental roles in flax.

  9. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

    PubMed

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-16

    Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the

  10. Non-Gaussian Distributions Affect Identification of Expression Patterns, Functional Annotation, and Prospective Classification in Human Cancer Genomes

    PubMed Central

    Marko, Nicholas F.; Weil, Robert J.

    2012-01-01

    Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863

  11. Genome-wide comparative analysis of DNA methylation between soybean cytoplasmic male-sterile line NJCMS5A and its maintainer NJCMS5B.

    PubMed

    Li, Yanwei; Ding, Xianlong; Wang, Xuan; He, Tingting; Zhang, Hao; Yang, Longshu; Wang, Tanliu; Chen, Linfeng; Gai, Junyi; Yang, Shouping

    2017-08-10

    DNA methylation is an important epigenetic modification. It can regulate the expression of many key genes without changing the primary structure of the genomic DNA, and plays a vital role in the growth and development of the organism. The genome-wide DNA methylation profile of the cytoplasmic male sterile (CMS) line in soybean has not been reported so far. In this study, genome-wide comparative analysis of DNA methylation between soybean CMS line NJCMS5A and its maintainer NJCMS5B was conducted by whole-genome bisulfite sequencing. The results showed 3527 differentially methylated regions (DMRs) and 485 differentially methylated genes (DMGs), including 353 high-credible methylated genes, 56 methylated genes coding unknown protein and 76 novel methylated genes with no known function were identified. Among them, 25 DMRs were further validated that the genome-wide DNA methylation data were reliable through bisulfite treatment, and 9 DMRs were confirmed the relationship between DNA methylation and gene expression by qRT-PCR. Finally, 8 key DMGs possibly associated with soybean CMS were identified. Genome-wide DNA methylation profile of the soybean CMS line NJCMS5A and its maintainer NJCMS5B was obtained for the first time. Several specific DMGs which participated in pollen and flower development were further identified to be probably associated with soybean CMS. This study will contribute to further understanding of the molecular mechanism behind soybean CMS.

  12. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes.

    PubMed

    Schnitzler, Christine E; Pang, Kevin; Powers, Meghan L; Reitzel, Adam M; Ryan, Joseph F; Simmons, David; Tada, Takashi; Park, Morgan; Gupta, Jyoti; Brooks, Shelise Y; Blakesley, Robert W; Yokoyama, Shozo; Haddock, Steven Hd; Martindale, Mark Q; Baxevanis, Andreas D

    2012-12-21

    Calcium-activated photoproteins are luciferase variants found in photocyte cells of bioluminescent jellyfish (Phylum Cnidaria) and comb jellies (Phylum Ctenophora). The complete genomic sequence from the ctenophore Mnemiopsis leidyi, a representative of the earliest branch of animals that emit light, provided an opportunity to examine the genome of an organism that uses this class of luciferase for bioluminescence and to look for genes involved in light reception. To determine when photoprotein genes first arose, we examined the genomic sequence from other early-branching taxa. We combined our genomic survey with gene trees, developmental expression patterns, and functional protein assays of photoproteins and opsins to provide a comprehensive view of light production and light reception in Mnemiopsis. The Mnemiopsis genome has 10 full-length photoprotein genes situated within two genomic clusters with high sequence conservation that are maintained due to strong purifying selection and concerted evolution. Photoprotein-like genes were also identified in the genomes of the non-luminescent sponge Amphimedon queenslandica and the non-luminescent cnidarian Nematostella vectensis, and phylogenomic analysis demonstrated that photoprotein genes arose at the base of all animals. Photoprotein gene expression in Mnemiopsis embryos begins during gastrulation in migrating precursors to photocytes and persists throughout development in the canals where photocytes reside. We identified three putative opsin genes in the Mnemiopsis genome and show that they do not group with well-known bilaterian opsin subfamilies. Interestingly, photoprotein transcripts are co-expressed with two of the putative opsins in developing photocytes. Opsin expression is also seen in the apical sensory organ. We present evidence that one opsin functions as a photopigment in vitro, absorbing light at wavelengths that overlap with peak photoprotein light emission, raising the hypothesis that light

  13. Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes

    PubMed Central

    2012-01-01

    Background Calcium-activated photoproteins are luciferase variants found in photocyte cells of bioluminescent jellyfish (Phylum Cnidaria) and comb jellies (Phylum Ctenophora). The complete genomic sequence from the ctenophore Mnemiopsis leidyi, a representative of the earliest branch of animals that emit light, provided an opportunity to examine the genome of an organism that uses this class of luciferase for bioluminescence and to look for genes involved in light reception. To determine when photoprotein genes first arose, we examined the genomic sequence from other early-branching taxa. We combined our genomic survey with gene trees, developmental expression patterns, and functional protein assays of photoproteins and opsins to provide a comprehensive view of light production and light reception in Mnemiopsis. Results The Mnemiopsis genome has 10 full-length photoprotein genes situated within two genomic clusters with high sequence conservation that are maintained due to strong purifying selection and concerted evolution. Photoprotein-like genes were also identified in the genomes of the non-luminescent sponge Amphimedon queenslandica and the non-luminescent cnidarian Nematostella vectensis, and phylogenomic analysis demonstrated that photoprotein genes arose at the base of all animals. Photoprotein gene expression in Mnemiopsis embryos begins during gastrulation in migrating precursors to photocytes and persists throughout development in the canals where photocytes reside. We identified three putative opsin genes in the Mnemiopsis genome and show that they do not group with well-known bilaterian opsin subfamilies. Interestingly, photoprotein transcripts are co-expressed with two of the putative opsins in developing photocytes. Opsin expression is also seen in the apical sensory organ. We present evidence that one opsin functions as a photopigment in vitro, absorbing light at wavelengths that overlap with peak photoprotein light emission, raising the hypothesis

  14. Comparative genomics of closely related Salmonella enterica serovar Typhi strains reveals genome dynamics and the acquisition of novel pathogenic elements.

    PubMed

    Yap, Kien-Pong; Gan, Han Ming; Teh, Cindy Shuan Ju; Chai, Lay Ching; Thong, Kwai Lin

    2014-11-20

    Typhoid fever is an infectious disease of global importance that is caused by Salmonella enterica subsp. enterica serovar Typhi (S. Typhi). This disease causes an estimated 200,000 deaths per year and remains a serious global health threat. S. Typhi is strictly a human pathogen, and some recovered individuals become long-term carriers who continue to shed the bacteria in their faeces, thus becoming main reservoirs of infection. A comparative genomics analysis combined with a phylogenomic analysis revealed that the strains from the outbreak and carrier were closely related with microvariations and possibly derived from a common ancestor. Additionally, the comparative genomics analysis with all of the other completely sequenced S. Typhi genomes revealed that strains BL196 and CR0044 exhibit unusual genomic variations despite S. Typhi being generally regarded as highly clonal. The two genomes shared distinct chromosomal architectures and uncommon genome features; notably, the presence of a ~10 kb novel genomic island containing uncharacterised virulence-related genes, and zot in particular. Variations were also detected in the T6SS system and genes that were related to SPI-10, insertion sequences, CRISPRs and nsSNPs among the studied genomes. Interestingly, the carrier strain CR0044 harboured far more genetic polymorphisms (83% mutant nsSNPs) compared with the closely related BL196 outbreak strain. Notably, the two highly related virulence-determinant genes, rpoS and tviE, were mutated in strains BL196 and CR0044, respectively, which revealed that the mutation in rpoS is stabilising, while that in tviE is destabilising. These microvariations provide novel insight into the optimisation of genes by the pathogens. However, the sporadic strain was found to be far more conserved compared with the others. The uncommon genomic variations in the two closely related BL196 and CR0044 strains suggests that S. Typhi is more diverse than previously thought. Our study has

  15. Characterization of Deletions of the HBA and HBB Loci by Array Comparative Genomic Hybridization

    PubMed Central

    Sabath, Daniel E.; Bender, Michael A.; Sankaran, Vijay G.; Vamos, Esther; Kentsis, Alex; Yi, Hye-Son; Greisman, Harvey A.

    2017-01-01

    Thalassemia is among the most common genetic diseases worldwide. α-Thalassemia is usually caused by deletion of one or more of the duplicated HBA genes on chromosome 16. In contrast, most β-thalassemia results from point mutations that decrease or eliminate expression of the HBB gene on chromosome 11. Deletions within the HBB locus result in thalassemia or hereditary persistence of fetal Hb. Although routine diagnostic testing cannot distinguish thalassemia deletions from point mutations, deletional hereditary persistence of fetal Hb is notable for having an elevated HbF level with a normal mean corpuscular volume. A small number of deletions accounts for most α-thalassemias; in contrast, there are no predominant HBB deletions causing β-thalassemia. To facilitate the identification and characterization of deletions of the HBA and HBB globin loci, we performed array-based comparative genomic hybridization using a custom oligonucleotide microarray. We accurately mapped the breakpoints of known and previously uncharacterized HBB deletions defining previously uncharacterized deletion breakpoints by PCR amplification and sequencing. The array also successfully identified the common HBA deletions --SEA and --FIL. In summary, comparative genomic hybridization can be used to characterize deletions of the HBA and HBB loci, allowing high-resolution characterization of novel deletions that are not readily detected by PCR-based methods. PMID:26612711

  16. Use of whole genome expression analysis in the toxicity screening of nanoparticles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fröhlich, Eleonore, E-mail: eleonore.froehlich@medunigraz.at; Meindl, Claudia; Wagner, Karin

    2014-10-15

    The use of nanoparticles (NPs) offers exciting new options in technical and medical applications provided they do not cause adverse cellular effects. Cellular effects of NPs depend on particle parameters and exposure conditions. In this study, whole genome expression arrays were employed to identify the influence of particle size, cytotoxicity, protein coating, and surface functionalization of polystyrene particles as model particles and for short carbon nanotubes (CNTs) as particles with potential interest in medical treatment. Another aim of the study was to find out whether screening by microarray would identify other or additional targets than commonly used cell-based assays formore » NP action. Whole genome expression analysis and assays for cell viability, interleukin secretion, oxidative stress, and apoptosis were employed. Similar to conventional assays, microarray data identified inflammation, oxidative stress, and apoptosis as affected by NP treatment. Application of lower particle doses and presence of protein decreased the total number of regulated genes but did not markedly influence the top regulated genes. Cellular effects of CNTs were small; only carboxyl-functionalized single-walled CNTs caused appreciable regulation of genes. It can be concluded that regulated functions correlated well with results in cell-based assays. Presence of protein mitigated cytotoxicity but did not cause a different pattern of regulated processes. - Highlights: • Regulated functions were screened using whole genome expression assays. • Polystyrene particles regulated more genes than short carbon nanotubes. • Protein coating of polystyrene particles did not change regulation pattern. • Functions regulated by microarray were confirmed by cell-based assay.« less

  17. Complete genome sequences and comparative genome analysis of Lactobacillus plantarum strain 5-2 isolated from fermented soybean.

    PubMed

    Liu, Chen-Jian; Wang, Rui; Gong, Fu-Ming; Liu, Xiao-Feng; Zheng, Hua-Jun; Luo, Yi-Yong; Li, Xiao-Ran

    2015-12-01

    Lactobacillus plantarum is an important probiotic and is mostly isolated from fermented foods. We sequenced the genome of L. plantarum strain 5-2, which was derived from fermented soybean isolated from Yunnan province, China. The strain was determined to contain 3114 genes. Fourteen complete insertion sequence (IS) elements were found in 5-2 chromosome. There were 24 DNA replication proteins and 76 DNA repair proteins in the 5-2 genome. Consistent with the classification of L. plantarum as a facultative heterofermentative lactobacillus, the 5-2 genome encodes key enzymes required for the EMP (Embden-Meyerhof-Parnas) and phosphoketolase (PK) pathways. Several components of the secretion machinery are found in the 5-2 genome, which was compared with L. plantarum ST-III, JDM1 and WCFS1. Most of the specific proteins in the four genomes appeared to be related to their prophage elements. Copyright © 2015 Elsevier Inc. All rights reserved.

  18. Genome-Wide Tuning of Protein Expression Levels to Rapidly Engineer Microbial Traits.

    PubMed

    Freed, Emily F; Winkler, James D; Weiss, Sophie J; Garst, Andrew D; Mutalik, Vivek K; Arkin, Adam P; Knight, Rob; Gill, Ryan T

    2015-11-20

    The reliable engineering of biological systems requires quantitative mapping of predictable and context-independent expression over a broad range of protein expression levels. However, current techniques for modifying expression levels are cumbersome and are not amenable to high-throughput approaches. Here we present major improvements to current techniques through the design and construction of E. coli genome-wide libraries using synthetic DNA cassettes that can tune expression over a ∼10(4) range. The cassettes also contain molecular barcodes that are optimized for next-generation sequencing, enabling rapid and quantitative tracking of alleles that have the highest fitness advantage. We show these libraries can be used to determine which genes and expression levels confer greater fitness to E. coli under different growth conditions.

  19. [Sotos syndrome diagnosed by comparative genomic hybridisation].

    PubMed

    Saldarriaga, Wilmar; Molina-Barrera, Laura Camila; Ramírez-Cheyne, Julián

    2016-01-01

    Sotos Syndrome (SS) is a genetic disease with an autosomal dominant pattern caused by haplo-insufficiency of NSD1 gene secondary to point mutations or microdeletion of the 5q35 locus where the gene is located. It is a rare syndrome, occurring in 7 out of every 100,000 births. The objective of this report is to present the case of a 4 year-old patient with a global developmental delay, as well as specific physical findings suggesting a syndrome of genetic origin. Female patient, 4 years of age, thinning hair, triangular facie, long palpebral fissure, arched palate, prominent jaw, winged scapula and clinodactilia of the fifth finger both hands. The molecular test comparative genomic hybridisation test by microarray was subsequently performed, with the result showing 5q35.2 q35.3 region microdeletion of 2,082 MB, including the NSD1 gene. Finally, this article also proposes the performing of comparative genomic hybridisation as the first diagnostic option in cases where clinical findings are suggestive of SS. Copyright © 2015 Sociedad Chilena de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.

  20. Predicting Gene Expression Level from Relative Codon Usage Bias: An Application to Escherichia coli Genome

    PubMed Central

    Roymondal, Uttam; Das, Shibsankar; Sahoo, Satyabrata

    2009-01-01

    We present an expression measure of a gene, devised to predict the level of gene expression from relative codon bias (RCB). There are a number of measures currently in use that quantify codon usage in genes. Based on the hypothesis that gene expressivity and codon composition is strongly correlated, RCB has been defined to provide an intuitively meaningful measure of an extent of the codon preference in a gene. We outline a simple approach to assess the strength of RCB (RCBS) in genes as a guide to their likely expression levels and illustrate this with an analysis of Escherichia coli (E. coli) genome. Our efforts to quantitatively predict gene expression levels in E. coli met with a high level of success. Surprisingly, we observe a strong correlation between RCBS and protein length indicating natural selection in favour of the shorter genes to be expressed at higher level. The agreement of our result with high protein abundances, microarray data and radioactive data demonstrates that the genomic expression profile available in our method can be applied in a meaningful way to the study of cell physiology and also for more detailed studies of particular genes of interest. PMID:19131380

  1. Comparative genomic analysis of bacteriophages specific to the channel catfish pathogen Edwardsiella ictaluri

    PubMed Central

    2011-01-01

    Background The bacterial pathogen Edwardsiella ictaluri is a primary cause of mortality in channel catfish raised commercially in aquaculture farms. Additional treatment and diagnostic regimes are needed for this enteric pathogen, motivating the discovery and characterization of bacteriophages specific to E. ictaluri. Results The genomes of three Edwardsiella ictaluri-specific bacteriophages isolated from geographically distant aquaculture ponds, at different times, were sequenced and analyzed. The genomes for phages eiAU, eiDWF, and eiMSLS are 42.80 kbp, 42.12 kbp, and 42.69 kbp, respectively, and are greater than 95% identical to each other at the nucleotide level. Nucleotide differences were mostly observed in non-coding regions and in structural proteins, with significant variability in the sequences of putative tail fiber proteins. The genome organization of these phages exhibit a pattern shared by other Siphoviridae. Conclusions These E. ictaluri-specific phage genomes reveal considerable conservation of genomic architecture and sequence identity, even with considerable temporal and spatial divergence in their isolation. Their genomic homogeneity is similarly observed among E. ictaluri bacterial isolates. The genomic analysis of these phages supports the conclusion that these are virulent phages, lacking the capacity for lysogeny or expression of virulence genes. This study contributes to our knowledge of phage genomic diversity and facilitates studies on the diagnostic and therapeutic applications of these phages. PMID:21214923

  2. Comparing genomes with rearrangements and segmental duplications.

    PubMed

    Shao, Mingfu; Moret, Bernard M E

    2015-06-15

    Large-scale evolutionary events such as genomic rearrange.ments and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability. We study the comparison of two genomes under a model including general rearrangements (through double-cut-and-join) and segmental duplications. We formulate the comparison as an optimization problem and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the integer linear program (ILP) formulation yields a practical and exact algorithm to solve the problem. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications) and compare its performance with that of the state-of-the-art method MSOAR, using both simulations and real data. On simulated datasets, our method outperforms MSOAR by a significant margin, and on five well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons. http://lcbb.epfl.ch/softwares/coser. © The Author 2015. Published by Oxford University Press.

  3. Large clusters of co-expressed genes in the Drosophila genome.

    PubMed

    Boutanaev, Alexander M; Kalmykova, Alla I; Shevelyov, Yuri Y; Nurminsky, Dmitry I

    2002-12-12

    Clustering of co-expressed, non-homologous genes on chromosomes implies their co-regulation. In lower eukaryotes, co-expressed genes are often found in pairs. Clustering of genes that share aspects of transcriptional regulation has also been reported in higher eukaryotes. To advance our understanding of the mode of coordinated gene regulation in multicellular organisms, we performed a genome-wide analysis of the chromosomal distribution of co-expressed genes in Drosophila. We identified a total of 1,661 testes-specific genes, one-third of which are clustered on chromosomes. The number of clusters of three or more genes is much higher than expected by chance. We observed a similar trend for genes upregulated in the embryo and in the adult head, although the expression pattern of individual genes cannot be predicted on the basis of chromosomal position alone. Our data suggest that the prevalent mechanism of transcriptional co-regulation in higher eukaryotes operates with extensive chromatin domains that comprise multiple genes.

  4. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria

    PubMed Central

    Jackson, Christopher J; Norman, John E; Schnare, Murray N; Gray, Michael W; Keeling, Patrick J; Waller, Ross F

    2007-01-01

    Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs) within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements within the genome, RNA

  5. Genome-wide Determinants of Proviral Targeting, Clonal Abundance and Expression in Natural HTLV-1 Infection

    PubMed Central

    Melamed, Anat; Laydon, Daniel J.; Gillet, Nicolas A.; Tanaka, Yuetsu; Taylor, Graham P.; Bangham, Charles R. M.

    2013-01-01

    The regulation of proviral latency is a central problem in retrovirology. We postulate that the genomic integration site of human T lymphotropic virus type 1 (HTLV-1) determines the pattern of expression of the provirus, which in turn determines the abundance and pathogenic potential of infected T cell clones in vivo. We recently developed a high-throughput method for the genome-wide amplification, identification and quantification of proviral integration sites. Here, we used this protocol to test two hypotheses. First, that binding sites for transcription factors and chromatin remodelling factors in the genome flanking the proviral integration site of HTLV-1 are associated with integration targeting, spontaneous proviral expression, and in vivo clonal abundance. Second, that the transcriptional orientation of the HTLV-1 provirus relative to that of the nearest host gene determines spontaneous proviral expression and in vivo clonal abundance. Integration targeting was strongly associated with the presence of a binding site for specific host transcription factors, especially STAT1 and p53. The presence of the chromatin remodelling factors BRG1 and INI1 and certain host transcription factors either upstream or downstream of the provirus was associated respectively with silencing or spontaneous expression of the provirus. Cells expressing HTLV-1 Tax protein were significantly more frequent in clones of low abundance in vivo. We conclude that transcriptional interference and chromatin remodelling are critical determinants of proviral latency in natural HTLV-1 infection. PMID:23555266

  6. Draft Genomes, Phylogenetic Reconstruction, and Comparative Genomics of Two Novel Cohabiting Bacterial Symbionts Isolated from Frankliniella occidentalis

    PubMed Central

    Facey, Paul D.; Méric, Guillaume; Hitchings, Matthew D.; Pachebat, Justin A.; Hegarty, Matt J.; Chen, Xiaorui; Morgan, Laura V.A.; Hoeppner, James E.; Whitten, Miranda M.A.; Kirk, William D.J.; Dyson, Paul J.; Sheppard, Sam K.; Sol, Ricardo Del

    2015-01-01

    Obligate bacterial symbionts are widespread in many invertebrates, where they are often confined to specialized host cells and are transmitted directly from mother to progeny. Increasing numbers of these bacteria are being characterized but questions remain about their population structure and evolution. Here we take a comparative genomics approach to investigate two prominent bacterial symbionts (BFo1 and BFo2) isolated from geographically separated populations of western flower thrips, Frankliniella occidentalis. Our multifaceted approach to classifying these symbionts includes concatenated multilocus sequence analysis (MLSA) phylogenies, ribosomal multilocus sequence typing (rMLST), construction of whole-genome phylogenies, and in-depth genomic comparisons. We showed that the BFo1 genome clusters more closely to species in the genus Erwinia, and is a putative close relative to Erwinia aphidicola. BFo1 is also likely to have shared a common ancestor with Erwinia pyrifoliae/Erwinia amylovora and the nonpathogenic Erwinia tasmaniensis and genetic traits similar to Erwinia billingiae. The BFo1 genome contained virulence factors found in the genus Erwinia but represented a divergent lineage. In contrast, we showed that BFo2 belongs within the Enterobacteriales but does not group closely with any currently known bacterial species. Concatenated MLSA phylogenies indicate that it may have shared a common ancestor to the Erwinia and Pantoea genera, and based on the clustering of rMLST genes, it was most closely related to Pantoea ananatis but represented a divergent lineage. We reconstructed a core genome of a putative common ancestor of Erwinia and Pantoea and compared this with the genomes of BFo bacteria. BFo2 possessed none of the virulence determinants that were omnipresent in the Erwinia and Pantoea genera. Taken together, these data are consistent with BFo2 representing a highly novel species that maybe related to known Pantoea. PMID:26185096

  7. DNA microarrays of baculovirus genomes: differential expression of viral genes in two susceptible insect cell lines.

    PubMed

    Yamagishi, J; Isobe, R; Takebuchi, T; Bando, H

    2003-03-01

    We describe, for the first time, the generation of a viral DNA chip for simultaneous expression measurements of nearly all known open reading frames (ORFs) in the best-studied members of the family Baculoviridae, Autographa californica multiple nucleopolyhedrovirus (AcMNPV) and Bombyx mori nucleopolyhedrovirus (BmNPV). In this study, a viral DNA chip (Ac-BmNPV chip) was fabricated and used to characterize the viral gene expression profile for AcMNPV in different cell types. The viral chip is composed of microarrays of viral DNA prepared by robotic deposition of PCR-amplified viral DNA fragments on glass for ORFs in the NPV genome. Viral gene expression was monitored by hybridization to the DNA fragment microarrays with fluorescently labeled cDNAs prepared from infected Spodoptera frugiperda, Sf9 cells and Trichoplusia ni, TnHigh-Five cells, the latter a major producer of baculovirus and recombinant proteins. A comparison of expression profiles of known ORFs in AcMNPV elucidated six genes (ORF150, p10, pk2, and three late gene expression factor genes lef-3, p35 and lef- 6) the expression of each of which was regulated differently in the two cell lines. Most of these genes are known to be closely involved in the viral life cycle such as in DNA replication, late gene expression and the release of polyhedra from infected cells. These results imply that the differential expression of these viral genes accounts for the differences in viral replication between these two cell lines. Thus, these fabricated microarrays of NPV DNA which allow a rapid analysis of gene expression at the viral genome level should greatly speed the functional analysis of large genomes of NPV.

  8. Modulation of yeast genome expression in response to defective RNA polymerase III-dependent transcription.

    PubMed

    Conesa, Christine; Ruotolo, Roberta; Soularue, Pascal; Simms, Tiffany A; Donze, David; Sentenac, André; Dieci, Giorgio

    2005-10-01

    We used genome-wide expression analysis in Saccharomyces cerevisiae to explore whether and how the expression of protein-coding, RNA polymerase (Pol) II-transcribed genes is influenced by a decrease in RNA Pol III-dependent transcription. The Pol II transcriptome was characterized in four thermosensitive, slow-growth mutants affected in different components of the RNA Pol III transcription machinery. Unexpectedly, we found only a modest correlation between altered expression of Pol II-transcribed genes and their proximity to class III genes, a result also confirmed by the analysis of single tRNA gene deletants. Instead, the transcriptome of all of the four mutants was characterized by increased expression of genes known to be under the control of the Gcn4p transcriptional activator. Indeed, GCN4 was found to be translationally induced in the mutants, and deleting the GCN4 gene eliminated the response. The Gcn4p-dependent expression changes did not require the Gcn2 protein kinase and could be specifically counteracted by an increased gene dosage of initiator tRNA(Met). Initiator tRNA(Met) depletion thus triggers a GCN4-dependent reprogramming of genome expression in response to decreased Pol III transcription. Such an effect might represent a key element in the coordinated transcriptional response of yeast cells to environmental changes.

  9. Genome-Wide Identification and Transcriptome-Based Expression Profiling of the Sox Gene Family in the Nile Tilapia (Oreochromis niloticus)

    PubMed Central

    Wei, Ling; Yang, Chao; Tao, Wenjing; Wang, Deshou

    2016-01-01

    The Sox transcription factor family is characterized with the presence of a Sry-related high-mobility group (HMG) box and plays important roles in various biological processes in animals, including sex determination and differentiation, and the development of multiple organs. In this study, 27 Sox genes were identified in the genome of the Nile tilapia (Oreochromis niloticus), and were classified into seven groups. The members of each group of the tilapia Sox genes exhibited a relatively conserved exon-intron structure. Comparative analysis showed that the Sox gene family has undergone an expansion in tilapia and other teleost fishes following their whole genome duplication, and group K only exists in teleosts. Transcriptome-based analysis demonstrated that most of the tilapia Sox genes presented stage-specific and/or sex-dimorphic expressions during gonadal development, and six of the group B Sox genes were specifically expressed in the adult brain. Our results provide a better understanding of gene structure and spatio-temporal expression of the Sox gene family in tilapia, and will be useful for further deciphering the roles of the Sox genes during sex determination and gonadal development in teleosts. PMID:26907269

  10. Genome-Wide Identification and Transcriptome-Based Expression Profiling of the Sox Gene Family in the Nile Tilapia (Oreochromis niloticus).

    PubMed

    Wei, Ling; Yang, Chao; Tao, Wenjing; Wang, Deshou

    2016-02-23

    The Sox transcription factor family is characterized with the presence of a Sry-related high-mobility group (HMG) box and plays important roles in various biological processes in animals, including sex determination and differentiation, and the development of multiple organs. In this study, 27 Sox genes were identified in the genome of the Nile tilapia (Oreochromis niloticus), and were classified into seven groups. The members of each group of the tilapia Sox genes exhibited a relatively conserved exon-intron structure. Comparative analysis showed that the Sox gene family has undergone an expansion in tilapia and other teleost fishes following their whole genome duplication, and group K only exists in teleosts. Transcriptome-based analysis demonstrated that most of the tilapia Sox genes presented stage-specific and/or sex-dimorphic expressions during gonadal development, and six of the group B Sox genes were specifically expressed in the adult brain. Our results provide a better understanding of gene structure and spatio-temporal expression of the Sox gene family in tilapia, and will be useful for further deciphering the roles of the Sox genes during sex determination and gonadal development in teleosts.

  11. The genomes of many yam species contain transcriptionally active endogenous geminiviral sequences that may be functionally expressed

    PubMed Central

    Filloux, Denis; Murrell, Sasha; Koohapitagtam, Maneerat; Golden, Michael; Julian, Charlotte; Galzi, Serge; Uzest, Marilyne; Rodier-Goud, Marguerite; D’Hont, Angélique; Vernerey, Marie Stephanie; Wilkin, Paul; Peterschmitt, Michel; Winter, Stephan; Murrell, Ben; Martin, Darren P.; Roumagnac, Philippe

    2015-01-01

    Endogenous viral sequences are essentially ‘fossil records’ that can sometimes reveal the genomic features of long extinct virus species. Although numerous known instances exist of single-stranded DNA (ssDNA) genomes becoming stably integrated within the genomes of bacteria and animals, there remain very few examples of such integration events in plants. The best studied of these events are those which yielded the geminivirus-related DNA elements found within the nuclear genomes of various Nicotiana species. Although other ssDNA virus-like sequences are included within the draft genomes of various plant species, it is not entirely certain that these are not contaminants. The Nicotiana geminivirus-related DNA elements therefore remain the only definitively proven instances of endogenous plant ssDNA virus sequences. Here, we characterize two new classes of endogenous plant virus sequence that are also apparently derived from ancient geminiviruses in the genus Begomovirus. These two endogenous geminivirus-like elements (EGV1 and EGV2) are present in the Dioscorea spp. of the Enantiophyllum clade. We used fluorescence in situ hybridization to confirm that the EGV1 sequences are integrated in the D. alata genome and showed that one or two ancestral EGV sequences likely became integrated more than 1.4 million years ago during or before the diversification of the Asian and African Enantiophyllum Dioscorea spp. Unexpectedly, we found evidence of natural selection actively favouring the maintenance of EGV-expressed replication-associated protein (Rep) amino acid sequences, which clearly indicates that functional EGV Rep proteins were probably expressed for prolonged periods following endogenization. Further, the detection in D. alata of EGV gene transcripts, small 21–24 nt RNAs that are apparently derived from these transcripts, and expressed Rep proteins, provides evidence that some EGV genes are possibly still functionally expressed in at least some of the

  12. Comparative Genomics of Mycobacteria: Some Answers, Yet More New Questions

    PubMed Central

    Behr, Marcel A.

    2015-01-01

    Comparative genomic studies permit a genus-level perspective on the distinction between environmental mycobacteria and Mycobacterium tuberculosis, as well as a species-level assessment of genetic variability within M. tuberculosis. Both of these strata of evolutionary analysis serve to generate hypotheses regarding the genomic basis of M. tuberculosis virulence. In contrasting lessons from macroevolutionary study and microevolutionary study, one can form predictions about which segments of the genome are likely to be essential for or dispensable for the pathogenesis of tuberculosis. Although some of these predictions have been experimentally verified, notable exceptions challenge the direct link between these virulence factors and the capacity of M. tuberculosis to successfully cause disease and propagate between human hosts. These unexpected findings serve as the stimulus for further studies, using genomic comparisons and other approaches, to better define the remarkable success of this recalcitrant pathogen. PMID:25395374

  13. Controlling Citrate Synthase Expression by CRISPR/Cas9 Genome Editing for n-Butanol Production in Escherichia coli.

    PubMed

    Heo, Min-Ji; Jung, Hwi-Min; Um, Jaeyong; Lee, Sang-Woo; Oh, Min-Kyu

    2017-02-17

    Genome editing using CRISPR/Cas9 was successfully demonstrated in Esherichia coli to effectively produce n-butanol in a defined medium under microaerobic condition. The butanol synthetic pathway genes including those encoding oxygen-tolerant alcohol dehydrogenase were overexpressed in metabolically engineered E. coli, resulting in 0.82 g/L butanol production. To increase butanol production, carbon flux from acetyl-CoA to citric acid cycle should be redirected to acetoacetyl-CoA. For this purpose, the 5'-untranslated region sequence of gltA encoding citrate synthase was designed using an expression prediction program, UTR designer, and modified using the CRISPR/Cas9 genome editing method to reduce its expression level. E. coli strains with decreased citrate synthase expression produced more butanol and the citrate synthase activity was correlated with butanol production. These results demonstrate that redistributing carbon flux using genome editing is an efficient engineering tool for metabolite overproduction.

  14. Comprehensive Genomic Analysis and Expression Profiling of Phospholipase C Gene Family during Abiotic Stresses and Development in Rice

    PubMed Central

    Singh, Amarjeet; Kanwar, Poonam; Pandey, Amita; Tyagi, Akhilesh K.; Sopory, Sudhir K.; Kapoor, Sanjay; Pandey, Girdhar K.

    2013-01-01

    Background Phospholipase C (PLC) is one of the major lipid hydrolysing enzymes, implicated in lipid mediated signaling. PLCs have been found to play a significant role in abiotic stress triggered signaling and developmental processes in various plant species. Genome wide identification and expression analysis have been carried out for this gene family in Arabidopsis, yet not much has been accomplished in crop plant rice. Methodology/Principal Findings An exhaustive in-silico exploration of rice genome using various online databases and tools resulted in the identification of nine PLC encoding genes. Based on sequence, motif and phylogenetic analysis rice PLC gene family could be divided into phosphatidylinositol-specific PLCs (PI-PLCs) and phosphatidylcholine- PLCs (PC-PLC or NPC) classes with four and five members, respectively. A comparative analysis revealed that PLCs are conserved in Arabidopsis (dicots) and rice (monocot) at gene structure and protein level but they might have evolved through a separate evolutionary path. Transcript profiling using gene chip microarray and quantitative RT-PCR showed that most of the PLC members expressed significantly and differentially under abiotic stresses (salt, cold and drought) and during various developmental stages with condition/stage specific and overlapping expression. This finding suggested an important role of different rice PLC members in abiotic stress triggered signaling and plant development, which was also supported by the presence of relevant cis-regulatory elements in their promoters. Sub-cellular localization of few selected PLC members in Nicotiana benthamiana and onion epidermal cells has provided a clue about their site of action and functional behaviour. Conclusion/Significance The genome wide identification, structural and expression analysis and knowledge of sub-cellular localization of PLC gene family envisage the functional characterization of these genes in crop plants in near future. PMID

  15. Comprehensive genomic analysis and expression profiling of phospholipase C gene family during abiotic stresses and development in rice.

    PubMed

    Singh, Amarjeet; Kanwar, Poonam; Pandey, Amita; Tyagi, Akhilesh K; Sopory, Sudhir K; Kapoor, Sanjay; Pandey, Girdhar K

    2013-01-01

    Phospholipase C (PLC) is one of the major lipid hydrolysing enzymes, implicated in lipid mediated signaling. PLCs have been found to play a significant role in abiotic stress triggered signaling and developmental processes in various plant species. Genome wide identification and expression analysis have been carried out for this gene family in Arabidopsis, yet not much has been accomplished in crop plant rice. An exhaustive in-silico exploration of rice genome using various online databases and tools resulted in the identification of nine PLC encoding genes. Based on sequence, motif and phylogenetic analysis rice PLC gene family could be divided into phosphatidylinositol-specific PLCs (PI-PLCs) and phosphatidylcholine- PLCs (PC-PLC or NPC) classes with four and five members, respectively. A comparative analysis revealed that PLCs are conserved in Arabidopsis (dicots) and rice (monocot) at gene structure and protein level but they might have evolved through a separate evolutionary path. Transcript profiling using gene chip microarray and quantitative RT-PCR showed that most of the PLC members expressed significantly and differentially under abiotic stresses (salt, cold and drought) and during various developmental stages with condition/stage specific and overlapping expression. This finding suggested an important role of different rice PLC members in abiotic stress triggered signaling and plant development, which was also supported by the presence of relevant cis-regulatory elements in their promoters. Sub-cellular localization of few selected PLC members in Nicotiana benthamiana and onion epidermal cells has provided a clue about their site of action and functional behaviour. The genome wide identification, structural and expression analysis and knowledge of sub-cellular localization of PLC gene family envisage the functional characterization of these genes in crop plants in near future.

  16. G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.

    PubMed

    Lenis, Vasileios Panagiotis E; Swain, Martin; Larkin, Denis M

    2018-05-01

    Cross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses. Here, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species' reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources. G-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons. G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.

  17. Typing and comparative genome analysis of Brucella melitensis isolated from Lebanon.

    PubMed

    Abou Zaki, Natalia; Salloum, Tamara; Osman, Marwan; Rafei, Rayane; Hamze, Monzer; Tokajian, Sima

    2017-10-16

    Brucella melitensis is the main causative agent of the zoonotic disease brucellosis. This study aimed at typing and characterizing genetic variation in 33 Brucella isolates recovered from patients in Lebanon. Bruce-ladder multiplex PCR and PCR-RFLP of omp31, omp2a and omp2b were performed. Sixteen representative isolates were chosen for draft-genome sequencing and analyzed to determine variations in virulence, resistance, genomic islands, prophages and insertion sequences. Comparative whole-genome single nucleotide polymorphism analysis was also performed. The isolates were confirmed to be B. melitensis. Genome analysis revealed multiple virulence determinants and efflux pumps. Genome comparisons and single nucleotide polymorphisms divided the isolates based on geographical distribution but revealed high levels of similarity between the strains. Sequence divergence in B. melitensis was mainly due to lateral gene transfer of mobile elements. This is the first report of an in-depth genomic characterization of B. melitensis in Lebanon. © FEMS 2017. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Genome-Wide Analysis of Long Noncoding RNA (lncRNA) Expression in Hepatoblastoma Tissues

    PubMed Central

    Xue, Ping; Cui, Ximao; Li, Kai; Zheng, Shan; He, Xianghuo; Dong, Kuiran

    2014-01-01

    Long noncoding RNAs (lncRNAs) have crucial roles in cancer biology. We performed a genome-wide analysis of lncRNA expression in hepatoblastoma tissues to identify novel targets for further study of hepatoblastoma. Hepatoblastoma and normal liver tissue samples were obtained from hepatoblastoma patients. The genome-wide analysis of lncRNA expression in these tissues was performed using a 4×180 K lncRNA microarray and Sureprint G3 Human lncRNA Chips. Quantitative RT-PCR (qRT-PCR) was performed to confirm these results. The differential expressions of lncRNAs and mRNAs were identified through fold-change filtering. Gene Ontology (GO) and pathway analyses were performed using the standard enrichment computation method. Associations between lncRNAs and adjacent protein-coding genes were determined through complex transcriptional loci analysis. We found that 2736 lncRNAs were differentially expressed in hepatoblastoma tissues. Among these, 1757 lncRNAs were upregulated more than two-fold relative to normal tissues and 979 lncRNAs were downregulated. Moreover, in hepatoblastoma there were 420 matched lncRNA-mRNA pairs for 120 differentially expressed lncRNAs, and 167 differentially expressed mRNAs. The co-expression network analysis predicted 252 network nodes and 420 connections between 120 lncRNAs and 132 coding genes. Within this co-expression network, 369 pairs were positive, and 51 pairs were negative. Lastly, qRT-PCR data verified six upregulated and downregulated lncRNAs in hepatoblastoma, plus endothelial cell-specific molecule 1 (ESM1) mRNA. Our results demonstrated that expression of these aberrant lncRNAs could respond to hepatoblastoma development. Further study of these lncRNAs could provide useful insight into hepatoblastoma biology. PMID:24465615

  19. Genomic and expression analysis of the flax (Linum usitatissimum) family of glycosyl hydrolase 35 genes

    PubMed Central

    2013-01-01

    Background Several β-galactosidases of the Glycosyl Hydrolase 35 (GH35) family have been characterized, and many of these modify cell wall components, including pectins, xyloglucans, and arabinogalactan proteins. The phloem fibres of flax (Linum usitatissimum) have gelatinous-type cell walls that are rich in crystalline cellulose and depend on β-galactosidase activity for their normal development. In this study, we investigate the transcript expression patterns and inferred evolutionary relationships of the complete set of flax GH35 genes, to better understand the functions of these genes in flax and other species. Results Using the recently published flax genome assembly, we identified 43 β-galactosidase-like (BGAL) genes, based on the presence of a GH35 domain. Phylogenetic analyses of their protein sequences clustered them into eight sub-families. Sub-family B, whose members in other species were known to be expressed in developing flowers and pollen, was greatly under represented in flax (p-value < 0.01). Sub-family A5, whose sole member from arabidopsis has been described as its primary xyloglucan BGAL, was greatly expanded in flax (p-value < 0.01). A number of flax BGALs were also observed to contain non-consensus GH35 active sites. Expression patterns of the flax BGALs were investigated using qRT-PCR and publicly available microarray data. All predicted flax BGALs showed evidence of expression in at least one tissue. Conclusion Flax has a large number of BGAL genes, which display a distinct distribution among the BGAL sub-families, in comparison to other closely related species with available whole genome assemblies. Almost every flax BGAL was expressed in fibres, the majority of which expressed predominately in fibres as compared to other tissues, suggesting an important role for the expansion of this gene family in the development of this species as a fibre crop. Variations displayed in the canonical GH35 active site suggest a variety of roles

  20. Genomic and expression analysis of the flax (Linum usitatissimum) family of glycosyl hydrolase 35 genes.

    PubMed

    Hobson, Neil; Deyholos, Michael K

    2013-05-23

    Several β-galactosidases of the Glycosyl Hydrolase 35 (GH35) family have been characterized, and many of these modify cell wall components, including pectins, xyloglucans, and arabinogalactan proteins. The phloem fibres of flax (Linum usitatissimum) have gelatinous-type cell walls that are rich in crystalline cellulose and depend on β-galactosidase activity for their normal development. In this study, we investigate the transcript expression patterns and inferred evolutionary relationships of the complete set of flax GH35 genes, to better understand the functions of these genes in flax and other species. Using the recently published flax genome assembly, we identified 43 β-galactosidase-like (BGAL) genes, based on the presence of a GH35 domain. Phylogenetic analyses of their protein sequences clustered them into eight sub-families. Sub-family B, whose members in other species were known to be expressed in developing flowers and pollen, was greatly under represented in flax (p-value < 0.01). Sub-family A5, whose sole member from arabidopsis has been described as its primary xyloglucan BGAL, was greatly expanded in flax (p-value < 0.01). A number of flax BGALs were also observed to contain non-consensus GH35 active sites. Expression patterns of the flax BGALs were investigated using qRT-PCR and publicly available microarray data. All predicted flax BGALs showed evidence of expression in at least one tissue. Flax has a large number of BGAL genes, which display a distinct distribution among the BGAL sub-families, in comparison to other closely related species with available whole genome assemblies. Almost every flax BGAL was expressed in fibres, the majority of which expressed predominately in fibres as compared to other tissues, suggesting an important role for the expansion of this gene family in the development of this species as a fibre crop. Variations displayed in the canonical GH35 active site suggest a variety of roles unique to flax, which will require

  1. Complete Genome Sequence of Acinetobacter baumannii CIP 70.10, a Susceptible Reference Strain for Comparative Genome Analyses.

    PubMed

    Krahn, Thomas; Wibberg, Daniel; Maus, Irena; Winkler, Anika; Pühler, Alfred; Poirel, Laurent; Schlüter, Andreas

    2015-07-30

    The complete genome sequence for the reference strain Acinetobacter baumannii CIP 70.10 (ATCC 15151) was established. The strain was isolated in France in 1970, is susceptible to most antimicrobial compounds, and is therefore of importance for comparative genome analyses with clinical multidrug-resistant (MDR) A. baumannii strains to study resistance development and acquisition in this emerging human pathogen. Copyright © 2015 Krahn et al.

  2. Comparative genomics reveals insights into avian genome evolution and adaptation

    PubMed Central

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

    2015-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

  3. Genome-wide DNA methylation drives human embryonic stem cell erythropoiesis by remodeling gene expression dynamics.

    PubMed

    Liu, Zhijing; Feng, Qiang; Sun, Pengpeng; Lu, Yan; Yang, Minlan; Zhang, Xiaowei; Jin, Xiangshu; Li, Yulin; Lu, Shi-Jiang; Quan, Chengshi

    2017-12-01

    To investigate the role of DNA methylation during erythrocyte production by human embryonic stem cells (hESCs). We employed an erythroid differentiation model from hESCs, and then tracked the genome-wide DNA methylation maps and gene expression patterns through an Infinium HumanMethylation450K BeadChip and an Ilumina Human HT-12 v4 Expression Beadchip, respectively. A negative correlation between DNA methylation and gene expression was substantially enriched during the later differentiation stage and was present in both the promoter and the gene body. Moreover, erythropoietic genes with differentially methylated CpG sites that were primarily enriched in nonisland regions were upregulated, and demethylation of their gene bodies was associated with the presence of enhancers and DNase I hypersensitive sites. Finally, the components of JAK-STAT-NF-κB signaling were DNA hypomethylated and upregulated, which targets the key genes for erythropoiesis. Erythroid lineage commitment by hESCs requires genome-wide DNA methylation modifications to remodel gene expression dynamics.

  4. Complex interplay among DNA modification, noncoding RNA expression and protein-coding RNA expression in Salvia miltiorrhiza chloroplast genome.

    PubMed

    Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang

    2014-01-01

    Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box-like motif (CPGDMM1, "TATANNNATNA"), and an unknown motif (CPGDMM2 "WNYANTGAW"). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome.

  5. Insights into the environmental reservoir of pathogenic Vibrio parahaemolyticus using comparative genomics

    PubMed Central

    Hazen, Tracy H.; Lafon, Patricia C.; Garrett, Nancy M.; Lowe, Tiffany M.; Silberger, Daniel J.; Rowe, Lori A.; Frace, Michael; Parsons, Michele B.; Bopp, Cheryl A.; Rasko, David A.; Sobecky, Patricia A.

    2015-01-01

    Vibrio parahaemolyticus is an aquatic halophilic bacterium that occupies estuarine and coastal marine environments, and is a leading cause of seafood-borne food poisoning cases. To investigate the environmental reservoir and potential gene flow that occurs among V. parahaemolyticus isolates, the virulence-associated gene content and genome diversity of a collection of 133 V. parahaemolyticus isolates were analyzed. Phylogenetic analysis of housekeeping genes, and pulsed-field gel electrophoresis, demonstrated that there is genetic similarity among V. parahaemolyticus clinical and environmental isolates. Whole-genome sequencing and comparative analysis of six representative V. parahaemolyticus isolates was used to identify genes that are unique to the clinical and environmental isolates examined. Comparative genomics demonstrated an O3:K6 environmental isolate, AF91, which was cultured from sediment collected in Florida in 2006, has significant genomic similarity to the post-1995 O3:K6 isolates. However, AF91 lacks the majority of the virulence-associated genes and genomic islands associated with these highly virulent post-1995 O3:K6 genomes. These findings demonstrate that although they do not contain most of the known virulence-associated regions, some V. parahaemolyticus environmental isolates exhibit significant genetic similarity to clinical isolates. This highlights the dynamic nature of the V. parahaemolyticus genome allowing them to transition between aquatic and host-pathogen states. PMID:25852665

  6. Comparative Genomics of the Ubiquitous, Hydrocarbon-degrading Genus Marinobacter

    NASA Astrophysics Data System (ADS)

    Singer, E.; Webb, E.; Edwards, K. J.

    2012-12-01

    The genus Marinobacter is amongst the most ubiquitous in the global oceans and strains have been isolated from a wide variety of marine environments, including offshore oil-well heads, coastal thermal springs, Antarctic sea water, saline soils and associations with diatoms and dinoflagellates. Many strains have been recognized to be important hydrocarbon degraders in various marine habitats presenting sometimes extreme pH or salinity conditions. Analysis of the genome of M. aquaeolei revealed enormous adaptation versatility with an assortment of strategies for carbon and energy acquisition, sensation, and defense. In an effort to elucidate the ecological and biogeochemical significance of the Marinobacters, seven Marinobacter strains from diverse environments were included in a comparative genomics study. Genomes were screened for metabolic and adaptation potential to elucidate the strategies responsible for the omnipresence of the Marinobacter genus and their remedial action potential in hydrocarbon-polluted waters. The core genome predominantly encodes for key genes involved in hydrocarbon degradation, biofilm-relevant processes, including utilization of external DNA, halotolerance, as well as defense mechanisms against heavy metals, antibiotics, and toxins. All Marinobacter strains were observed to degrade a wide spectrum of hydrocarbon species, including aliphatic, polycyclic aromatic as well as acyclic isoprenoid compounds. Various genes predicted to facilitate hydrocarbon degradation, e.g. alkane 1-monooxygenase, appear to have originated from lateral gene transfer as they are located on gene clusters of 10-20% lower GC-content compared to genome averages and are flanked by transposases. Top ortholog hits are found in other hydrocarbon degrading organisms, e.g. Alcanivorax borkumensis. Strategies for hydrocarbon uptake encoded by various Marinobacter strains include cell surface hydrophobicity adaptation via capsular polysaccharide biosynthesis and attachment

  7. Posttranscriptional regulation of retroviral gene expression: primary RNA transcripts play three roles as pre-mRNA, mRNA, and genomic RNA

    PubMed Central

    LeBlanc, Jason; Weil, Jason; Beemon, Karen

    2013-01-01

    After reverse transcription of the retroviral RNA genome and integration of the DNA provirus into the host genome, host machinery is used for viral gene expression along with viral proteins and RNA regulatory elements. Here, we discuss co-transcriptional and posttranscriptional regulation of retroviral gene expression, comparing simple and complex retroviruses. Cellular RNA polymerase II synthesizes full-length viral primary RNA transcripts that are capped and polyadenylated. All retroviruses generate a singly spliced env mRNA from this primary transcript, which encodes the viral glycoproteins. In addition, complex viral RNAs are alternatively spliced to generate accessory proteins, such as Rev, which is involved in posttranscriptional regulation of HIV-1 RNA. Importantly, the splicing of all retroviruses is incomplete; they must maintain and export a fraction of their primary RNA transcripts. This unspliced RNA functions both as the major mRNA for Gag and Pol proteins and as the packaged genomic RNA. Different retroviruses export their unspliced viral RNA from the nucleus to the cytoplasm by either Tap-dependent or Rev/CRM1-dependent routes. Translation of the unspliced mRNA involves frame-shifting or termination codon suppression so that the Gag proteins, which make up the capsid, are expressed more abundantly than the Pol proteins, which are the viral enzymes. After the viral polyproteins assemble into viral particles and bud from the cell membrane, a viral encoded protease cleaves them. Some retroviruses have evolved mechanisms to protect their unspliced RNA from decay by nonsense-mediated RNA decay and to prevent genome editing by the cellular APOBEC deaminases. PMID:23754689

  8. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein familiesmore » that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.« less

  9. Identification, characterization, and comparative genomic distribution of the HERV-K (HML-2) group of human endogenous retroviruses

    PubMed Central

    2011-01-01

    Background Integration of retroviral DNA into a germ cell may lead to a provirus that is transmitted vertically to that host's offspring as an endogenous retrovirus (ERV). In humans, ERVs (HERVs) comprise about 8% of the genome, the vast majority of which are truncated and/or highly mutated and no longer encode functional genes. The most recently active retroviruses that integrated into the human germ line are members of the Betaretrovirus-like HERV-K (HML-2) group, many of which contain intact open reading frames (ORFs) in some or all genes, sometimes encoding functional proteins that are expressed in various tissues. Interestingly, this expression is upregulated in many tumors ranging from breast and ovarian tissues to lymphomas and melanomas, as well as schizophrenia, rheumatoid arthritis, and other disorders. Results No study to date has characterized all HML-2 elements in the genome, an essential step towards determining a possible functional role of HML-2 expression in disease. We present here the most comprehensive and accurate catalog of all full-length and partial HML-2 proviruses, as well as solo LTR elements, within the published human genome to date. Furthermore, we provide evidence for preferential maintenance of proviruses and solo LTR elements on gene-rich chromosomes of the human genome and in proximity to gene regions. Conclusions Our analysis has found and corrected several errors in the annotation of HML-2 elements in the human genome, including mislabeling of a newly identified group called HML-11. HML-elements have been implicated in a wide array of diseases, and characterization of these elements will play a fundamental role to understand the relationship between endogenous retrovirus expression and disease. PMID:22067224

  10. Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

    PubMed

    Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

    2017-07-11

    A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.

  11. Sequencing and Comparative Genome Analysis of Two Pathogenic Streptococcus gallolyticus Subspecies: Genome Plasticity, Adaptation and Virulence

    PubMed Central

    Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops. PMID:21633709

  12. Comparative genomics reveals insights into avian genome evolution and adaptation.

    PubMed

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

    2014-12-12

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.

  13. Genome-wide survey and expression analysis of F-box genes in chickpea.

    PubMed

    Gupta, Shefali; Garg, Vanika; Kant, Chandra; Bhatia, Sabhyata

    2015-02-13

    The F-box genes constitute one of the largest gene families in plants involved in degradation of cellular proteins. F-box proteins can recognize a wide array of substrates and regulate many important biological processes such as embryogenesis, floral development, plant growth and development, biotic and abiotic stress, hormonal responses and senescence, among others. However, little is known about the F-box genes in the important legume crop, chickpea. The available draft genome sequence of chickpea allowed us to conduct a genome-wide survey of the F-box gene family in chickpea. A total of 285 F-box genes were identified in chickpea which were classified based on their C-terminal domain structures into 10 subfamilies. Thirteen putative novel motifs were also identified in F-box proteins with no known functional domain at their C-termini. The F-box genes were physically mapped on the 8 chickpea chromosomes and duplication events were investigated which revealed that the F-box gene family expanded largely due to tandem duplications. Phylogenetic analysis classified the chickpea F-box genes into 9 clusters. Also, maximum syntenic relationship was observed with soybean followed by Medicago truncatula, Lotus japonicus and Arabidopsis. Digital expression analysis of F-box genes in various chickpea tissues as well as under abiotic stress conditions utilizing the available chickpea transcriptome data revealed differential expression patterns with several F-box genes specifically expressing in each tissue, few of which were validated by using quantitative real-time PCR. The genome-wide analysis of chickpea F-box genes provides new opportunities for characterization of candidate F-box genes and elucidation of their function in growth, development and stress responses for utilization in chickpea improvement.

  14. ISOL@: an Italian SOLAnaceae genomics resource.

    PubMed

    Chiusano, Maria Luisa; D'Agostino, Nunzio; Traini, Alessandra; Licciardello, Concetta; Raimondo, Enrico; Aversano, Mario; Frusciante, Luigi; Monti, Luigi

    2008-03-26

    Present-day '-omics' technologies produce overwhelming amounts of data which include genome sequences, information on gene expression (transcripts and proteins) and on cell metabolic status. These data represent multiple aspects of a biological system and need to be investigated as a whole to shed light on the mechanisms which underpin the system functionality. The gathering and convergence of data generated by high-throughput technologies, the effective integration of different data-sources and the analysis of the information content based on comparative approaches are key methods for meaningful biological interpretations. In the frame of the International Solanaceae Genome Project, we propose here ISOLA, an Italian SOLAnaceae genomics resource. ISOLA (available at http://biosrv.cab.unina.it/isola) represents a trial platform and it is conceived as a multi-level computational environment.ISOLA currently consists of two main levels: the genome and the expression level. The cornerstone of the genome level is represented by the Solanum lycopersicum genome draft sequences generated by the International Tomato Genome Sequencing Consortium. Instead, the basic element of the expression level is the transcriptome information from different Solanaceae species, mainly in the form of species-specific comprehensive collections of Expressed Sequence Tags (ESTs). The cross-talk between the genome and the expression levels is based on data source sharing and on tools that enhance data quality, that extract information content from the levels' under parts and produce value-added biological knowledge. ISOLA is the result of a bioinformatics effort that addresses the challenges of the post-genomics era. It is designed to exploit '-omics' data based on effective integration to acquire biological knowledge and to approach a systems biology view. Beyond providing experimental biologists with a preliminary annotation of the tomato genome, this effort aims to produce a trial

  15. Low-level laser irradiation alters mRNA expression from genes involved in DNA repair and genomic stabilization in myoblasts

    NASA Astrophysics Data System (ADS)

    Trajano, L. A. S. N.; Sergio, L. P. S.; Silva, C. L.; Carvalho, L.; Mencalha, A. L.; Stumbo, A. C.; Fonseca, A. S.

    2016-07-01

    Low-level lasers are used for the treatment of diseases in soft and bone tissues, but few data are available regarding their effects on genomic stability. In this study, we investigated mRNA expression from genes involved in DNA repair and genomic stabilization in myoblasts exposed to low-level infrared laser. C2C12 myoblast cultures in different fetal bovine serum concentrations were exposed to low-level infrared laser (10, 35 and 70 J cm-2), and collected for the evaluation of DNA repair gene expression. Laser exposure increased gene expression related to base excision repair (8-oxoguanine DNA glycosylase and apurinic/apyrimidinic endonuclease 1), nucleotide excision repair (excision repair cross-complementation group 1 and xeroderma pigmentosum C protein) and genomic stabilization (ATM serine/threonine kinase and tumor protein p53) in normal and low fetal bovine serum concentrations. Results suggest that genomic stability could be part of a biostimulation effect of low-level laser therapy in injured muscles.

  16. IMG/M: integrated genome and metagenome comparative data analysis system

    DOE PAGES

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; ...

    2016-10-13

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less

  17. IMG/M: integrated genome and metagenome comparative data analysis system

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less

  18. IMG/M: integrated genome and metagenome comparative data analysis system

    PubMed Central

    Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2017-01-01

    The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system. PMID:27738135

  19. Comparative analysis of chloroplast genomes of the genus Citrus and its close relatives.

    PubMed

    Liu, Xiaogang; Wu, Hongkun; Luo, Yan; Xi, Wanpeng; Zhou, Zhiqin

    2017-01-01

    The genus Citrus and its close relatives are economically and nutritionally important fruit trees. However, the huge controversy over the phylogeny of key wild species, as well as the genetic relationship between the cultivated species and their putative wild progenitors, remains unresolved. Comparative analyses of chloroplast (cp) genomes have been useful in resolving various phylogenetic issues. Thus far, the cp genomes of only two Citrus species have been sequenced. In this study, we sequenced six complete cp genomes, four belonging to the genus Citrus, and two belonging to the genera Fortunella and Poncirus, respectively. These newly sequenced genomes together with the two publicly available were used for comparative analyses of the genus Citrus and its close relatives. All eight cp genomes share similar basic structure, gene order and gene content. Phylogenetic analyses supported the monophyly of the three genera in the order Sapindales within the major clade Malvidae.

  20. Genome-wide identification and expression analysis of the B-box gene family in the Apple (Malus domestica Borkh.) genome.

    PubMed

    Liu, Xin; Li, Rong; Dai, Yaqing; Chen, Xuesen; Wang, Xiaoyun

    2018-04-01

    The B-box proteins (BBXs) are a family of zinc finger proteins containing one/two B-box domain(s). Compared with intensive studies of animal BBXs, investigations of the plant BBX family are limited, though some specific plant BBXs have been demonstrated to act as transcription factors in the regulation of flowering and photomorphogenesis. In this study, using a global search of the apple (Malus domestica Borkh.) genome, a total of 64 members of BBX (MdBBX) were identified. All the MdBBXs were divided into five groups based on the phylogenetic relationship, numbers of B-boxes contained and whether there was with an additional CCT domain. According to the characteristics of organ-specific expression, MdBBXs were divided into three groups based on the microarray information. An analysis of cis-acting elements showed that elements related to the stress response were prevalent in the promoter sequences of most MdBBXs. Twelve MdBBX members from different groups were randomly selected and exposed to abiotic stresses. Their expressions were up-regulated to some extent in the roots and leaves. Six among 12 MdBBXs were sensitive to osmotic pressure, salt, cold stress and exogenous abscisic acid treatment, with their expressions enhanced more than 20-fold. Our results suggested that MdBBXs may take part in response to abiotic stress.

  1. [Evolution of genomic imprinting in mammals: what a zoo!].

    PubMed

    Proudhon, Charlotte; Bourc'his, Déborah

    2010-05-01

    Genomic imprinting imposes an obligate mode of biparental reproduction in mammals. This phenomenon results from the monoparental expression of a subset of genes. This specific gene regulation mechanism affects viviparous mammals, especially eutherians, but also marsupials to a lesser extent. Oviparous mammals, or monotremes, do not seem to demonstrate monoparental allele expression. This phylogenic confinement suggests that the evolution of the placenta imposed a selective pressure for the emergence of genomic imprinting. This physiological argument is now complemented by recent genomic evidence facilitated by the sequencing of the platypus genome, a rare modern day case of a monotreme. Analysis of the platypus genome in comparison to eutherian genomes shows a chronological and functional coincidence between the appearance of genomic imprinting and transposable element accumulation. The systematic comparative analyses of genomic sequences in different species is essential for the further understanding of genomic imprinting emergence and divergent evolution along mammalian speciation.

  2. A comparative evaluation of genome assembly reconciliation tools.

    PubMed

    Alhakami, Hind; Mirebrahim, Hamid; Lonardi, Stefano

    2017-05-18

    The majority of eukaryotic genomes are unfinished due to the algorithmic challenges of assembling them. A variety of assembly and scaffolding tools are available, but it is not always obvious which tool or parameters to use for a specific genome size and complexity. It is, therefore, common practice to produce multiple assemblies using different assemblers and parameters, then select the best one for public release. A more compelling approach would allow one to merge multiple assemblies with the intent of producing a higher quality consensus assembly, which is the objective of assembly reconciliation. Several assembly reconciliation tools have been proposed in the literature, but their strengths and weaknesses have never been compared on a common dataset. We fill this need with this work, in which we report on an extensive comparative evaluation of several tools. Specifically, we evaluate contiguity, correctness, coverage, and the duplication ratio of the merged assembly compared to the individual assemblies provided as input. None of the tools we tested consistently improved the quality of the input GAGE and synthetic assemblies. Our experiments show an increase in contiguity in the consensus assembly when the original assemblies already have high quality. In terms of correctness, the quality of the results depends on the specific tool, as well as on the quality and the ranking of the input assemblies. In general, the number of misassemblies ranges from being comparable to the best of the input assembly to being comparable to the worst of the input assembly.

  3. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.

  4. The genome sequence of E. coli W (ATCC 9637): comparative genome analysis and an improved genome-scale reconstruction of E. coli

    PubMed Central

    2011-01-01

    Background Escherichia coli is a model prokaryote, an important pathogen, and a key organism for industrial biotechnology. E. coli W (ATCC 9637), one of four strains designated as safe for laboratory purposes, has not been sequenced. E. coli W is a fast-growing strain and is the only safe strain that can utilize sucrose as a carbon source. Lifecycle analysis has demonstrated that sucrose from sugarcane is a preferred carbon source for industrial bioprocesses. Results We have sequenced and annotated the genome of E. coli W. The chromosome is 4,900,968 bp and encodes 4,764 ORFs. Two plasmids, pRK1 (102,536 bp) and pRK2 (5,360 bp), are also present. W has unique features relative to other sequenced laboratory strains (K-12, B and Crooks): it has a larger genome and belongs to phylogroup B1 rather than A. W also grows on a much broader range of carbon sources than does K-12. A genome-scale reconstruction was developed and validated in order to interrogate metabolic properties. Conclusions The genome of W is more similar to commensal and pathogenic B1 strains than phylogroup A strains, and therefore has greater utility for comparative analyses with these strains. W should therefore be the strain of choice, or 'type strain' for group B1 comparative analyses. The genome annotation and tools created here are expected to allow further utilization and development of E. coli W as an industrial organism for sucrose-based bioprocesses. Refinements in our E. coli metabolic reconstruction allow it to more accurately define E. coli metabolism relative to previous models. PMID:21208457

  5. Molecular Toolkit for Gene Expression Control and Genome Modification in Rhodococcus opacus PD630

    DOE PAGES

    DeLorenzo, Drew M.; Rottinghaus, Austin G.; Henson, William R.; ...

    2018-01-24

    Rhodococcus opacus PD630 is a non-model, gram-positive bacterium that possesses desirable traits for lignocellulosic biomass conversion. In particular, it has a relatively rapid growth rate, exhibits genetic tractability, produces high quantities of lipids, and can tolerate and consume toxic, lignin-derived aromatic compounds. Despite these unique, industrially relevant characteristics, R. opacus has been underutilized due to a lack of reliable genetic parts and engineering tools. In this work, we developed a molecular toolbox for reliable gene expression control and genome modification in R. opacus. To facilitate predictable gene expression, a constitutive promoter library spanning ~45-fold in output was constructed. To improvemore » the characterization of available plasmids, the copy numbers of four heterologous and nine endogenous plasmids were determined using quantitative PCR. The molecular toolbox was further expanded by screening a previously unreported antibiotic resistance marker (HygR) and constructing a curable plasmid backbone for temporary gene expression (pB264). Furthermore, a system for genome modification was devised, and three neutral integration sites were identified using a novel combination of transcriptomic data, genomic architecture, and growth rate analysis. Finally, the first reported system for targeted, tunable gene repression in Rhodococcus was developed by utilizing CRISPR interference (CRISPRi). Overall, this work greatly expands the ability to manipulate and engineer R. opacus, making it a viable new chassis for bioproduction from renewable feedstocks.« less

  6. Molecular Toolkit for Gene Expression Control and Genome Modification in Rhodococcus opacus PD630

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DeLorenzo, Drew M.; Rottinghaus, Austin G.; Henson, William R.

    Rhodococcus opacus PD630 is a non-model, gram-positive bacterium that possesses desirable traits for lignocellulosic biomass conversion. In particular, it has a relatively rapid growth rate, exhibits genetic tractability, produces high quantities of lipids, and can tolerate and consume toxic, lignin-derived aromatic compounds. Despite these unique, industrially relevant characteristics, R. opacus has been underutilized due to a lack of reliable genetic parts and engineering tools. In this work, we developed a molecular toolbox for reliable gene expression control and genome modification in R. opacus. To facilitate predictable gene expression, a constitutive promoter library spanning ~45-fold in output was constructed. To improvemore » the characterization of available plasmids, the copy numbers of four heterologous and nine endogenous plasmids were determined using quantitative PCR. The molecular toolbox was further expanded by screening a previously unreported antibiotic resistance marker (HygR) and constructing a curable plasmid backbone for temporary gene expression (pB264). Furthermore, a system for genome modification was devised, and three neutral integration sites were identified using a novel combination of transcriptomic data, genomic architecture, and growth rate analysis. Finally, the first reported system for targeted, tunable gene repression in Rhodococcus was developed by utilizing CRISPR interference (CRISPRi). Overall, this work greatly expands the ability to manipulate and engineer R. opacus, making it a viable new chassis for bioproduction from renewable feedstocks.« less

  7. Functional analysis and transcriptional output of the Göttingen minipig genome.

    PubMed

    Heckel, Tobias; Schmucki, Roland; Berrera, Marco; Ringshandl, Stephan; Badi, Laura; Steiner, Guido; Ravon, Morgane; Küng, Erich; Kuhn, Bernd; Kratochwil, Nicole A; Schmitt, Georg; Kiialainen, Anna; Nowaczyk, Corinne; Daff, Hamina; Khan, Azinwi Phina; Lekolool, Isaac; Pelle, Roger; Okoth, Edward; Bishop, Richard; Daubenberger, Claudia; Ebeling, Martin; Certa, Ulrich

    2015-11-14

    In the past decade the Göttingen minipig has gained increasing recognition as animal model in pharmaceutical and safety research because it recapitulates many aspects of human physiology and metabolism. Genome-based comparison of drug targets together with quantitative tissue expression analysis allows rational prediction of pharmacology and cross-reactivity of human drugs in animal models thereby improving drug attrition which is an important challenge in the process of drug development. Here we present a new chromosome level based version of the Göttingen minipig genome together with a comparative transcriptional analysis of tissues with pharmaceutical relevance as basis for translational research. We relied on mapping and assembly of WGS (whole-genome-shotgun sequencing) derived reads to the reference genome of the Duroc pig and predict 19,228 human orthologous protein-coding genes. Genome-based prediction of the sequence of human drug targets enables the prediction of drug cross-reactivity based on conservation of binding sites. We further support the finding that the genome of Sus scrofa contains about ten-times less pseudogenized genes compared to other vertebrates. Among the functional human orthologs of these minipig pseudogenes we found HEPN1, a putative tumor suppressor gene. The genomes of Sus scrofa, the Tibetan boar, the African Bushpig, and the Warthog show sequence conservation of all inactivating HEPN1 mutations suggesting disruption before the evolutionary split of these pig species. We identify 133 Sus scrofa specific, conserved long non-coding RNAs (lncRNAs) in the minipig genome and show that these transcripts are highly conserved in the African pigs and the Tibetan boar suggesting functional significance. Using a new minipig specific microarray we show high conservation of gene expression signatures in 13 tissues with biomedical relevance between humans and adult minipigs. We underline this relationship for minipig and human liver where we

  8. Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups.

    PubMed

    Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F; Espinoza-Rojas, Daniela A; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G; Figueroa, Jaime E; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J

    2017-01-01

    Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis , functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be

  9. Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups

    PubMed Central

    Nourdin-Galindo, Guillermo; Sánchez, Patricio; Molina, Cristian F.; Espinoza-Rojas, Daniela A.; Oliver, Cristian; Ruiz, Pamela; Vargas-Chacoff, Luis; Cárcamo, Juan G.; Figueroa, Jaime E.; Mancilla, Marcos; Maracaja-Coutinho, Vinicius; Yañez, Alejandro J.

    2017-01-01

    Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these genes could be

  10. Comparative Genome Analyses of Serratia marcescens FS14 Reveals Its High Antagonistic Potential

    PubMed Central

    Li, Pengpeng; Kwok, Amy H. Y.; Jiang, Jingwei; Ran, Tingting; Xu, Dongqing; Wang, Weiwu; Leung, Frederick C.

    2015-01-01

    S. marcescens FS14 was isolated from an Atractylodes macrocephala Koidz plant that was infected by Fusarium oxysporum and showed symptoms of root rot. With the completion of the genome sequence of FS14, the first comprehensive comparative-genomic analysis of the Serratia genus was performed. Pan-genome and COG analyses showed that the majority of the conserved core genes are involved in basic cellular functions, while genomic factors such as prophages contribute considerably to genome diversity. Additionally, a Type I restriction-modification system, a Type III secretion system and tellurium resistance genes are found in only some Serratia species. Comparative analysis further identified that S. marcescens FS14 possesses multiple mechanisms for antagonism against other microorganisms, including the production of prodigiosin, bacteriocins, and multi-antibiotic resistant determinants as well as chitinases. The presence of two evolutionarily distinct Type VI secretion systems (T6SSs) in FS14 may provide further competitive advantages for FS14 against other microbes. To our knowledge, this is the first report of comparative analysis on T6SSs in the genus, which identifies four types of T6SSs in Serratia spp.. Competition bioassays of FS14 against the vital plant pathogenic bacterium Ralstonia solanacearum and fungi Fusarium oxysporum and Sclerotinia sclerotiorum were performed to support our genomic analyses, in which FS14 demonstrated high antagonistic activities against both bacterial and fungal phytopathogens. PMID:25856195

  11. Comparative genome map of human and cattle

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solinas-Toldo, S.; Fries, R.; Lengauer, C.

    Chromosomal homologies between individual human chromosomes and the bovine karyotype have been established by using a new approach termed Zoo-FISH. Labeled DNA libraries from flow-sorted human chromosomes were used as probes for fluorescence in situ hybridization on cattle chromosomes. All human DNA libraries, except the Y chromosome library, hybridized to one or more cattle chromosomes, identifying and delineating 50 segments of homology, most of them corresponding to the regions of homology as identified by the previous mapping of individual conserved loci. However, Zoo-FISH refines the comparative maps constructed by molecular gene mapping of individual loci by providing information on themore » boundaries of conserved regions in the absence of obvious cytogenetic homologies of human and bovine chromosomes. It allows study of karyotypic evolution and opens new avenues for genomic analysis by facilitating the extrapolation of results from the human genome initiative. 50 refs., 3 figs., 1 tab.« less

  12. Comparative genomics of Beauveria bassiana: uncovering signatures of virulence against mosquitoes.

    PubMed

    Valero-Jiménez, Claudio A; Faino, Luigi; Spring In't Veld, Daphne; Smit, Sandra; Zwaan, Bas J; van Kan, Jan A L

    2016-12-01

    Entomopathogenic fungi such as Beauveria bassiana are promising biological agents for control of malaria mosquitoes. Indeed, infection with B. bassiana reduces the lifespan of mosquitoes in the laboratory and in the field. Natural isolates of B. bassiana show up to 10-fold differences in virulence between the most and the least virulent isolate. In this study, we sequenced the genomes of five isolates representing the extremes of low/high virulence and three RNA libraries, and applied a genome comparison approach to uncover genetic mechanisms underpinning virulence. A high-quality, near-complete genome assembly was achieved for the highly virulent isolate Bb8028, which was compared to the assemblies of the four other isolates. Whole genome analysis showed a high level of genetic diversity between the five isolates (2.85-16.8 SNPs/kb), which grouped into two distinct phylogenetic clusters. Mating type gene analysis revealed the presence of either the MAT1-1-1 or the MAT1-2-1 gene. Moreover, a putative new MAT gene (MAT1-2-8) was detected in the MAT1-2 locus. Comparative genome analysis revealed that Bb8028 contains 163 genes exclusive for this isolate. These unique genes have a tendency to cluster in the genome and to be often located near the telomeres. Among the genes unique to Bb8028 are a Non-Ribosomal Peptide Synthetase (NRPS) secondary metabolite gene cluster, a polyketide synthase (PKS) gene, and five genes with homology to bacterial toxins. A survey of candidate virulence genes for B. bassiana is presented. Our results indicate several genes and molecular processes that may underpin virulence towards mosquitoes. Thus, the genome sequences of five isolates of B. bassiana provide a better understanding of the natural variation in virulence and will offer a major resource for future research on this important biological control agent.

  13. Comparative Genomics of Erwinia amylovora and Related Erwinia Species—What do We Learn?

    PubMed Central

    Zhao, Youfu; Qi, Mingsheng

    2011-01-01

    Erwinia amylovora, the causal agent of fire blight disease of apples and pears, is one of the most important plant bacterial pathogens with worldwide economic significance. Recent reports on the complete or draft genome sequences of four species in the genus Erwinia, including E. amylovora, E. pyrifoliae, E. tasmaniensis, and E. billingiae, have provided us near complete genetic information about this pathogen and its closely-related species. This review describes in silico subtractive hybridization-based comparative genomic analyses of eight genomes currently available, and highlights what we have learned from these comparative analyses, as well as genetic and functional genomic studies. Sequence analyses reinforce the assumption that E. amylovora is a relatively homogeneous species and support the current classification scheme of E. amylovora and its related species. The potential evolutionary origin of these Erwinia species is also proposed. The current understanding of the pathogen, its virulence mechanism and host specificity from genome sequencing data is summarized. Future research directions are also suggested. PMID:24710213

  14. Draft Genomes, Phylogenetic Reconstruction, and Comparative Genomics of Two Novel Cohabiting Bacterial Symbionts Isolated from Frankliniella occidentalis.

    PubMed

    Facey, Paul D; Méric, Guillaume; Hitchings, Matthew D; Pachebat, Justin A; Hegarty, Matt J; Chen, Xiaorui; Morgan, Laura V A; Hoeppner, James E; Whitten, Miranda M A; Kirk, William D J; Dyson, Paul J; Sheppard, Sam K; Del Sol, Ricardo

    2015-07-15

    Obligate bacterial symbionts are widespread in many invertebrates, where they are often confined to specialized host cells and are transmitted directly from mother to progeny. Increasing numbers of these bacteria are being characterized but questions remain about their population structure and evolution. Here we take a comparative genomics approach to investigate two prominent bacterial symbionts (BFo1 and BFo2) isolated from geographically separated populations of western flower thrips, Frankliniella occidentalis. Our multifaceted approach to classifying these symbionts includes concatenated multilocus sequence analysis (MLSA) phylogenies, ribosomal multilocus sequence typing (rMLST), construction of whole-genome phylogenies, and in-depth genomic comparisons. We showed that the BFo1 genome clusters more closely to species in the genus Erwinia, and is a putative close relative to Erwinia aphidicola. BFo1 is also likely to have shared a common ancestor with Erwinia pyrifoliae/Erwinia amylovora and the nonpathogenic Erwinia tasmaniensis and genetic traits similar to Erwinia billingiae. The BFo1 genome contained virulence factors found in the genus Erwinia but represented a divergent lineage. In contrast, we showed that BFo2 belongs within the Enterobacteriales but does not group closely with any currently known bacterial species. Concatenated MLSA phylogenies indicate that it may have shared a common ancestor to the Erwinia and Pantoea genera, and based on the clustering of rMLST genes, it was most closely related to Pantoea ananatis but represented a divergent lineage. We reconstructed a core genome of a putative common ancestor of Erwinia and Pantoea and compared this with the genomes of BFo bacteria. BFo2 possessed none of the virulence determinants that were omnipresent in the Erwinia and Pantoea genera. Taken together, these data are consistent with BFo2 representing a highly novel species that maybe related to known Pantoea. © The Author(s) 2015. Published by

  15. Comparative analysis of the complete genome of KPC-2-producing Klebsiella pneumoniae Kp13 reveals remarkable genome plasticity and a wide repertoire of virulence and resistance mechanisms

    PubMed Central

    2014-01-01

    Background Klebsiella pneumoniae is an important opportunistic pathogen associated with nosocomial and community-acquired infections. A wide repertoire of virulence and antimicrobial resistance genes is present in K. pneumoniae genomes, which can constitute extra challenges in the treatment of infections caused by some strains. K. pneumoniae Kp13 is a multidrug-resistant strain responsible for causing a large nosocomial outbreak in a teaching hospital located in Southern Brazil. Kp13 produces K. pneumoniae carbapenemase (KPC-2) but is unrelated to isolates belonging to ST 258 and ST 11, the main clusters associated with the worldwide dissemination of KPC-producing K. pneumoniae. In this report, we perform a genomic comparison between Kp13 and each of the following three K. pneumoniae genomes: MGH 78578, NTUH-K2044 and 342. Results We have completely determined the genome of K. pneumoniae Kp13, which comprises one chromosome (5.3 Mbp) and six plasmids (0.43 Mbp). Several virulence and resistance determinants were identified in strain Kp13. Specifically, we detected genes coding for six beta-lactamases (SHV-12, OXA-9, TEM-1, CTX-M-2, SHV-110 and KPC-2), eight adhesin-related gene clusters, including regions coding for types 1 (fim) and 3 (mrk) fimbrial adhesins. The rmtG plasmidial 16S rRNA methyltransferase gene was also detected, as well as efflux pumps belonging to five different families. Mutations upstream the OmpK35 porin-encoding gene were evidenced, possibly affecting its expression. SNPs analysis relative to the compared strains revealed 141 mutations falling within CDSs related to drug resistance which could also influence the Kp13 lifestyle. Finally, the genetic apparatus for synthesis of the yersiniabactin siderophore was identified within a plasticity region. Chromosomal architectural analysis allowed for the detection of 13 regions of difference in Kp13 relative to the compared strains. Conclusions Our results indicate that the plasticity occurring at

  16. Complex Interplay among DNA Modification, Noncoding RNA Expression and Protein-Coding RNA Expression in Salvia miltiorrhiza Chloroplast Genome

    PubMed Central

    Chen, Haimei; Zhang, Jianhui; Yuan, George; Liu, Chang

    2014-01-01

    Salvia miltiorrhiza is one of the most widely used medicinal plants. As a first step to develop a chloroplast-based genetic engineering method for the over-production of active components from S. miltiorrhiza, we have analyzed the genome, transcriptome, and base modifications of the S. miltiorrhiza chloroplast. Total genomic DNA and RNA were extracted from fresh leaves and then subjected to strand-specific RNA-Seq and Single-Molecule Real-Time (SMRT) sequencing analyses. Mapping the RNA-Seq reads to the genome assembly allowed us to determine the relative expression levels of 80 protein-coding genes. In addition, we identified 19 polycistronic transcription units and 136 putative antisense and intergenic noncoding RNA (ncRNA) genes. Comparison of the abundance of protein-coding transcripts (cRNA) with and without overlapping antisense ncRNAs (asRNA) suggest that the presence of asRNA is associated with increased cRNA abundance (p<0.05). Using the SMRT Portal software (v1.3.2), 2687 potential DNA modification sites and two potential DNA modification motifs were predicted. The two motifs include a TATA box–like motif (CPGDMM1, “TATANNNATNA”), and an unknown motif (CPGDMM2 “WNYANTGAW”). Specifically, 35 of the 97 CPGDMM1 motifs (36.1%) and 91 of the 369 CPGDMM2 motifs (24.7%) were found to be significantly modified (p<0.01). Analysis of genes downstream of the CPGDMM1 motif revealed the significantly increased abundance of ncRNA genes that are less than 400 bp away from the significantly modified CPGDMM1motif (p<0.01). Taking together, the present study revealed a complex interplay among DNA modifications, ncRNA and cRNA expression in chloroplast genome. PMID:24914614

  17. Evaluation of high throughput gene expression platforms using a genomic biomarker signature for prediction of skin sensitization.

    PubMed

    Forreryd, Andy; Johansson, Henrik; Albrekt, Ann-Sofie; Lindstedt, Malin

    2014-05-16

    Allergic contact dermatitis (ACD) develops upon exposure to certain chemical compounds termed skin sensitizers. To reduce the occurrence of skin sensitizers, chemicals are regularly screened for their capacity to induce sensitization. The recently developed Genomic Allergen Rapid Detection (GARD) assay is an in vitro alternative to animal testing for identification of skin sensitizers, classifying chemicals by evaluating transcriptional levels of a genomic biomarker signature. During assay development and biomarker identification, genome-wide expression analysis was applied using microarrays covering approximately 30,000 transcripts. However, the microarray platform suffers from drawbacks in terms of low sample throughput, high cost per sample and time consuming protocols and is a limiting factor for adaption of GARD into a routine assay for screening of potential sensitizers. With the purpose to simplify assay procedures, improve technical parameters and increase sample throughput, we assessed the performance of three high throughput gene expression platforms--nCounter®, BioMark HD™ and OpenArray®--and correlated their performance metrics against our previously generated microarray data. We measured the levels of 30 transcripts from the GARD biomarker signature across 48 samples. Detection sensitivity, reproducibility, correlations and overall structure of gene expression measurements were compared across platforms. Gene expression data from all of the evaluated platforms could be used to classify most of the sensitizers from non-sensitizers in the GARD assay. Results also showed high data quality and acceptable reproducibility for all platforms but only medium to poor correlations of expression measurements across platforms. In addition, evaluated platforms were superior to the microarray platform in terms of cost efficiency, simplicity of protocols and sample throughput. We evaluated the performance of three non-array based platforms using a limited set of

  18. Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

    PubMed

    Diao, Wei-Ping; Snyder, John C; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

    2016-01-01

    The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper.

  19. Genome-Wide Identification and Expression Analysis of WRKY Gene Family in Capsicum annuum L.

    PubMed Central

    Diao, Wei-Ping; Snyder, John C.; Wang, Shu-Bin; Liu, Jin-Bing; Pan, Bao-Gui; Guo, Guang-Jun; Wei, Ge

    2016-01-01

    The WRKY family of transcription factors is one of the most important families of plant transcriptional regulators with members regulating multiple biological processes, especially in regulating defense against biotic and abiotic stresses. However, little information is available about WRKYs in pepper (Capsicum annuum L.). The recent release of completely assembled genome sequences of pepper allowed us to perform a genome-wide investigation for pepper WRKY proteins. In the present study, a total of 71 WRKY genes were identified in the pepper genome. According to structural features of their encoded proteins, the pepper WRKY genes (CaWRKY) were classified into three main groups, with the second group further divided into five subgroups. Genome mapping analysis revealed that CaWRKY were enriched on four chromosomes, especially on chromosome 1, and 15.5% of the family members were tandemly duplicated genes. A phylogenetic tree was constructed depending on WRKY domain' sequences derived from pepper and Arabidopsis. The expression of 21 selected CaWRKY genes in response to seven different biotic and abiotic stresses (salt, heat shock, drought, Phytophtora capsici, SA, MeJA, and ABA) was evaluated by quantitative RT-PCR; Some CaWRKYs were highly expressed and up-regulated by stress treatment. Our results will provide a platform for functional identification and molecular breeding studies of WRKY genes in pepper. PMID:26941768

  20. Comparison of gene expression in segregating families identifies genes and genomic regions involved in a novel adaptation, zinc hyperaccumulation.

    PubMed

    Filatov, Victor; Dowdle, John; Smirnoff, Nicholas; Ford-Lloyd, Brian; Newbury, H John; Macnair, Mark R

    2006-09-01

    One of the challenges of comparative genomics is to identify specific genetic changes associated with the evolution of a novel adaptation or trait. We need to be able to disassociate the genes involved with a particular character from all the other genetic changes that take place as lineages diverge. Here we show that by comparing the transcriptional profile of segregating families with that of parent species differing in a novel trait, it is possible to narrow down substantially the list of potential target genes. In addition, by assuming synteny with a related model organism for which the complete genome sequence is available, it is possible to use the cosegregation of markers differing in transcription level to identify regions of the genome which probably contain quantitative trait loci (QTLs) for the character. This novel combination of genomics and classical genetics provides a very powerful tool to identify candidate genes. We use this methodology to investigate zinc hyperaccumulation in Arabidopsis halleri, the sister species to the model plant, Arabidopsis thaliana. We compare the transcriptional profile of A. halleri with that of its sister nonaccumulator species, Arabidopsis petraea, and between accumulator and nonaccumulator F(3)s derived from the cross between the two species. We identify eight genes which consistently show greater expression in accumulator phenotypes in both roots and shoots, including two metal transporter genes (NRAMP3 and ZIP6), and cytoplasmic aconitase, a gene involved in iron homeostasis in mammals. We also show that there appear to be two QTLs for zinc accumulation, on chromosomes 3 and 7.

  1. Comparative Genomics of 12 Strains of Erwinia amylovora Identifies a Pan-Genome with a Large Conserved Core

    PubMed Central

    Mann, Rachel A.; Smits, Theo H. M.; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E.; Plummer, Kim M.; Beer, Steven V.; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1Ea and a putative secondary metabolite pathway only present in Rubus-infecting strains. PMID:23409014

  2. Comparative genomics of 12 strains of Erwinia amylovora identifies a pan-genome with a large conserved core.

    PubMed

    Mann, Rachel A; Smits, Theo H M; Bühlmann, Andreas; Blom, Jochen; Goesmann, Alexander; Frey, Jürg E; Plummer, Kim M; Beer, Steven V; Luck, Joanne; Duffy, Brion; Rodoni, Brendan

    2013-01-01

    The plant pathogen Erwinia amylovora can be divided into two host-specific groupings; strains infecting a broad range of hosts within the Rosaceae subfamily Spiraeoideae (e.g., Malus, Pyrus, Crataegus, Sorbus) and strains infecting Rubus (raspberries and blackberries). Comparative genomic analysis of 12 strains representing distinct populations (e.g., geographic, temporal, host origin) of E. amylovora was used to describe the pan-genome of this major pathogen. The pan-genome contains 5751 coding sequences and is highly conserved relative to other phytopathogenic bacteria comprising on average 89% conserved, core genes. The chromosomes of Spiraeoideae-infecting strains were highly homogeneous, while greater genetic diversity was observed between Spiraeoideae- and Rubus-infecting strains (and among individual Rubus-infecting strains), the majority of which was attributed to variable genomic islands. Based on genomic distance scores and phylogenetic analysis, the Rubus-infecting strain ATCC BAA-2158 was genetically more closely related to the Spiraeoideae-infecting strains of E. amylovora than it was to the other Rubus-infecting strains. Analysis of the accessory genomes of Spiraeoideae- and Rubus-infecting strains has identified putative host-specific determinants including variation in the effector protein HopX1(Ea) and a putative secondary metabolite pathway only present in Rubus-infecting strains.

  3. Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches

    PubMed Central

    Paterson, Andrew H.; Wang, Xuelin; Xu, Yiqing; Wu, Dongyang; Qu, Yanshu; Jiang, Anna; Ye, Qiaolin

    2016-01-01

    Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp) genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt) DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb) in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense) than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants. PMID:27847816

  4. Genomic and Expression Profiling of Benign and Malignant Nerve Sheath Tumors in Neurofibromatosis Patients

    DTIC Science & Technology

    2008-05-01

    DAMD17-03-1-0297 Title: Genomic and Expression Pr ofiling of Benign and Malignant Nerve Sheath Tumors in Neurofibromatosis Patients...have determined the gene expression signature for benign and malignant peripheral nerve sheath tumors and found that the major trend in transformation...However, EGFR data in soft tissue neoplasms is limited. Using a variety of benign and malignant spindle cell neoplasms, we assessed EGFR status by

  5. Comparative genome analysis of entomopathogenic fungi reveals a complex set of secreted proteins.

    PubMed

    Staats, Charley Christian; Junges, Angela; Guedes, Rafael Lucas Muniz; Thompson, Claudia Elizabeth; de Morais, Guilherme Loss; Boldo, Juliano Tomazzoni; de Almeida, Luiz Gonzaga Paula; Andreis, Fábio Carrer; Gerber, Alexandra Lehmkuhl; Sbaraini, Nicolau; da Paixão, Rana Louise de Andrade; Broetto, Leonardo; Landell, Melissa; Santi, Lucélia; Beys-da-Silva, Walter Orlando; Silveira, Carolina Pereira; Serrano, Thaiane Rispoli; de Oliveira, Eder Silva; Kmetzsch, Lívia; Vainstein, Marilene Henning; de Vasconcelos, Ana Tereza Ribeiro; Schrank, Augusto

    2014-09-29

    Metarhizium anisopliae is an entomopathogenic fungus used in the biological control of some agricultural insect pests, and efforts are underway to use this fungus in the control of insect-borne human diseases. A large repertoire of proteins must be secreted by M. anisopliae to cope with the various available nutrients as this fungus switches through different lifestyles, i.e., from a saprophytic, to an infectious, to a plant endophytic stage. To further evaluate the predicted secretome of M. anisopliae, we employed genomic and transcriptomic analyses, coupled with phylogenomic analysis, focusing on the identification and characterization of secreted proteins. We determined the M. anisopliae E6 genome sequence and compared this sequence to other entomopathogenic fungi genomes. A robust pipeline was generated to evaluate the predicted secretomes of M. anisopliae and 15 other filamentous fungi, leading to the identification of a core of secreted proteins. Transcriptomic analysis using the tick Rhipicephalus microplus cuticle as an infection model during two periods of infection (48 and 144 h) allowed the identification of several differentially expressed genes. This analysis concluded that a large proportion of the predicted secretome coding genes contained altered transcript levels in the conditions analyzed in this study. In addition, some specific secreted proteins from Metarhizium have an evolutionary history similar to orthologs found in Beauveria/Cordyceps. This similarity suggests that a set of secreted proteins has evolved to participate in entomopathogenicity. The data presented represents an important step to the characterization of the role of secreted proteins in the virulence and pathogenicity of M. anisopliae.

  6. The bonobo genome compared with the chimpanzee and human genomes

    PubMed Central

    Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R.; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R.; Mullikin, James C.; Meader, Stephen J.; Ponting, Chris P.; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E.; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M.; Fischer, Anne; Ptak, Susan E.; Lachmann, Michael; Symer, David E.; Mailund, Thomas; Schierup, Mikkel H.; Andrés, Aida M.; Kelso, Janet; Pääbo, Svante

    2012-01-01

    Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1–4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other. PMID:22722832

  7. Comparative genomic analysis of four representative plant growth-promoting rhizobacteria in Pseudomonas.

    PubMed

    Shen, Xuemei; Hu, Hongbo; Peng, Huasong; Wang, Wei; Zhang, Xuehong

    2013-04-22

    Some Pseudomonas strains function as predominant plant growth-promoting rhizobacteria (PGPR). Within this group, Pseudomonas chlororaphis and Pseudomonas fluorescens are non-pathogenic biocontrol agents, and some Pseudomonas aeruginosa and Pseudomonas stutzeri strains are PGPR. P. chlororaphis GP72 is a plant growth-promoting rhizobacterium with a fully sequenced genome. We conducted a genomic analysis comparing GP72 with three other pseudomonad PGPR: P. fluorescens Pf-5, P. aeruginosa M18, and the nitrogen-fixing strain P. stutzeri A1501. Our aim was to identify the similarities and differences among these strains using a comparative genomic approach to clarify the mechanisms of plant growth-promoting activity. The genome sizes of GP72, Pf-5, M18, and A1501 ranged from 4.6 to 7.1 M, and the number of protein-coding genes varied among the four species. Clusters of Orthologous Groups (COGs) analysis assigned functions to predicted proteins. The COGs distributions were similar among the four species. However, the percentage of genes encoding transposases and their inactivated derivatives (COG L) was 1.33% of the total genes with COGs classifications in A1501, 0.21% in GP72, 0.02% in Pf-5, and 0.11% in M18. A phylogenetic analysis indicated that GP72 and Pf-5 were the most closely related strains, consistent with the genome alignment results. Comparisons of predicted coding sequences (CDSs) between GP72 and Pf-5 revealed 3544 conserved genes. There were fewer conserved genes when GP72 CDSs were compared with those of A1501 and M18. Comparisons among the four Pseudomonas species revealed 603 conserved genes in GP72, illustrating common plant growth-promoting traits shared among these PGPR. Conserved genes were related to catabolism, transport of plant-derived compounds, stress resistance, and rhizosphere colonization. Some strain-specific CDSs were related to different kinds of biocontrol activities or plant growth promotion. The GP72 genome contained the cus operon

  8. Comparative Genomic Analysis of Drechmeria coniospora Reveals Core and Specific Genetic Requirements for Fungal Endoparasitism of Nematodes

    PubMed Central

    Thakur, Nishant; Arguel, Marie-Jeanne; Polanowska, Jolanta; Henrissat, Bernard; Record, Eric; Magdelenat, Ghislaine; Barbe, Valérie; Raffaele, Sylvain; Barbry, Pascal

    2016-01-01

    Drechmeria coniospora is an obligate fungal pathogen that infects nematodes via the adhesion of specialized spores to the host cuticle. D. coniospora is frequently found associated with Caenorhabditis elegans in environmental samples. It is used in the study of the nematode’s response to fungal infection. Full understanding of this bi-partite interaction requires knowledge of the pathogen’s genome, analysis of its gene expression program and a capacity for genetic engineering. The acquisition of all three is reported here. A phylogenetic analysis placed D. coniospora close to the truffle parasite Tolypocladium ophioglossoides, and Hirsutella minnesotensis, another nematophagous fungus. Ascomycete nematopathogenicity is polyphyletic; D. coniospora represents a branch that has not been molecularly characterized. A detailed in silico functional analysis, comparing D. coniospora to 11 fungal species, revealed genes and gene families potentially involved in virulence and showed it to be a highly specialized pathogen. A targeted comparison with nematophagous fungi highlighted D. coniospora-specific genes and a core set of genes associated with nematode parasitism. A comparative gene expression analysis of samples from fungal spores and mycelia, and infected C. elegans, gave a molecular view of the different stages of the D. coniospora lifecycle. Transformation of D. coniospora allowed targeted gene knock-out and the production of fungus that expresses fluorescent reporter genes. It also permitted the initial characterisation of a potential fungal counter-defensive strategy, involving interference with a host antimicrobial mechanism. This high-quality annotated genome for D. coniospora gives insights into the evolution and virulence of nematode-destroying fungi. Coupled with genetic transformation, it opens the way for molecular dissection of D. coniospora physiology, and will allow both sides of the interaction between D. coniospora and C. elegans, as well as the

  9. A genomic survey of the fish parasite Spironucleus salmonicida indicates genomic plasticity among diplomonads and significant lateral gene transfer in eukaryote genome evolution

    PubMed Central

    Andersson, Jan O; Sjögren, Åsa M; Horner, David S; Murphy, Colleen A; Dyal, Patricia L; Svärd, Staffan G; Logsdon, John M; Ragan, Mark A; Hirt, Robert P; Roger, Andrew J

    2007-01-01

    Background Comparative genomic studies of the mitochondrion-lacking protist group Diplomonadida (diplomonads) has been lacking, although Giardia lamblia has been intensively studied. We have performed a sequence survey project resulting in 2341 expressed sequence tags (EST) corresponding to 853 unique clones, 5275 genome survey sequences (GSS), and eleven finished contigs from the diplomonad fish parasite Spironucleus salmonicida (previously described as S. barkhanus). Results The analyses revealed a compact genome with few, if any, introns and very short 3' untranslated regions. Strikingly different patterns of codon usage were observed in genes corresponding to frequently sampled ESTs versus genes poorly sampled, indicating that translational selection is influencing the codon usage of highly expressed genes. Rigorous phylogenomic analyses identified 84 genes – mostly encoding metabolic proteins – that have been acquired by diplomonads or their relatively close ancestors via lateral gene transfer (LGT). Although most acquisitions were from prokaryotes, more than a dozen represent likely transfers of genes between eukaryotic lineages. Many genes that provide novel insights into the genetic basis of the biology and pathogenicity of this parasitic protist were identified including 149 that putatively encode variant-surface cysteine-rich proteins which are candidate virulence factors. A number of genomic properties that distinguish S. salmonicida from its human parasitic relative G. lamblia were identified such as nineteen putative lineage-specific gene acquisitions, distinct mutational biases and codon usage and distinct polyadenylation signals. Conclusion Our results highlight the power of comparative genomic studies to yield insights into the biology of parasitic protists and the evolution of their genomes, and suggest that genetic exchange between distantly-related protist lineages may be occurring at an appreciable rate in eukaryote genome evolution. PMID

  10. Complete mitochondrial genome of Concholepas concholepas inferred by 454 pyrosequencing and mtDNA expression in two mollusc populations.

    PubMed

    Núñez-Acuña, Gustavo; Aguilar-Espinoza, Andrea; Gallardo-Escárate, Cristian

    2013-03-01

    Despite the great relevance of mitochondrial genome analysis in evolutionary studies, there is scarce information on how the transcripts associated with the mitogenome are expressed and their role in the genetic structuring of populations. This work reports the complete mitochondrial genome of the marine gastropod Concholepas concholepas, obtained by 454 pryosequencing, and an analysis of mitochondrial transcripts of two populations 1000 km apart along the Chilean coast. The mitochondrion of C. concholepas is 15,495 base pairs (bp) in size and contains the 37 subunits characteristic of metazoans, as well as a non-coding region of 330 bp. In silico analysis of mitochondrial gene variability showed significant differences among populations. In terms of levels of relative abundance of transcripts associated with mitochondrion in the two populations (assessed by qPCR), the genes associated with complexes III and IV of the mitochondrial genome had the highest levels of expression in the northern population while transcripts associated with the ATP synthase complex had the highest levels of expression in the southern population. Moreover, fifteen polymorphic SNPs were identified in silico between the mitogenomes of the two populations. Four of these markers implied different amino acid substitutions (non-synonymous SNPs). This work contributes novel information regarding the mitochondrial genome structure and mRNA expression levels of C. concholepas. Copyright © 2012 Elsevier Inc. All rights reserved.

  11. Reconstruction of a composite comparative map composed of ten legume genomes.

    PubMed

    Lee, Chaeyoung; Yu, Dongwoon; Choi, Hong-Kyu; Kim, Ryan W

    2017-01-01

    The Fabaceae (legume family) is the third largest and the second of agricultural importance among flowering plant groups. In this study, we report the reconstruction of a composite comparative map composed of ten legume genomes, including seven species from the galegoid clade ( Medicago truncatula , Medicago sativa , Lens culinaris, Pisum sativum , Lotus japonicus , Cicer arietinum , Vicia faba ) and three species from the phaseoloid clade ( Vigna radiata , Phaseolus vulgaris , Glycine max ). To accomplish this comparison, a total of 209 cross-species gene-derived markers were employed. The comparative analysis resulted in a single extensive genetic/genomic network composed of 93 chromosomes or linkage groups, from which 110 synteny blocks and other evolutionary events (e.g., 13 inversions) were identified. This comparative map also allowed us to deduce several large scale evolutionary events, such as chromosome fusion/fission, with which might explain differences in chromosome numbers among compared species or between the two clades. As a result, useful properties of cross-species genic markers were re-verified as an efficient tool for cross-species translation of genomic information, and similar approaches, combined with a high throughput bioinformatic marker design program, should be effective for applying the knowledge of trait-associated genes to other important crop species for breeding purposes. Here, we provide a basic comparative framework for the ten legume species, and expect to be usefully applied towards the crop improvement in legume breeding.

  12. The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes

    PubMed Central

    Liu, Shengyi; Liu, Yumei; Yang, Xinhua; Tong, Chaobo; Edwards, David; Parkin, Isobel A. P.; Zhao, Meixia; Ma, Jianxin; Yu, Jingyin; Huang, Shunmou; Wang, Xiyin; Wang, Junyi; Lu, Kun; Fang, Zhiyuan; Bancroft, Ian; Yang, Tae-Jin; Hu, Qiong; Wang, Xinfa; Yue, Zhen; Li, Haojie; Yang, Linfeng; Wu, Jian; Zhou, Qing; Wang, Wanxin; King, Graham J; Pires, J. Chris; Lu, Changxin; Wu, Zhangyan; Sampath, Perumal; Wang, Zhuo; Guo, Hui; Pan, Shengkai; Yang, Limei; Min, Jiumeng; Zhang, Dong; Jin, Dianchuan; Li, Wanshun; Belcram, Harry; Tu, Jinxing; Guan, Mei; Qi, Cunkou; Du, Dezhi; Li, Jiana; Jiang, Liangcai; Batley, Jacqueline; Sharpe, Andrew G; Park, Beom-Seok; Ruperao, Pradeep; Cheng, Feng; Waminal, Nomar Espinosa; Huang, Yin; Dong, Caihua; Wang, Li; Li, Jingping; Hu, Zhiyong; Zhuang, Mu; Huang, Yi; Huang, Junyan; Shi, Jiaqin; Mei, Desheng; Liu, Jing; Lee, Tae-Ho; Wang, Jinpeng; Jin, Huizhe; Li, Zaiyun; Li, Xun; Zhang, Jiefu; Xiao, Lu; Zhou, Yongming; Liu, Zhongsong; Liu, Xuequn; Qin, Rui; Tang, Xu; Liu, Wenbin; Wang, Yupeng; Zhang, Yangyong; Lee, Jonghoon; Kim, Hyun Hee; Denoeud, France; Xu, Xun; Liang, Xinming; Hua, Wei; Wang, Xiaowu; Wang, Jun; Chalhoub, Boulos; Paterson, Andrew H

    2014-01-01

    Polyploidization has provided much genetic variation for plant adaptive evolution, but the mechanisms by which the molecular evolution of polyploid genomes establishes genetic architecture underlying species differentiation are unclear. Brassica is an ideal model to increase knowledge of polyploid evolution. Here we describe a draft genome sequence of Brassica oleracea, comparing it with that of its sister species B. rapa to reveal numerous chromosome rearrangements and asymmetrical gene loss in duplicated genomic blocks, asymmetrical amplification of transposable elements, differential gene co-retention for specific pathways and variation in gene expression, including alternative splicing, among a large number of paralogous and orthologous genes. Genes related to the production of anticancer phytochemicals and morphological variations illustrate consequences of genome duplication and gene divergence, imparting biochemical and morphological variation to B. oleracea. This study provides insights into Brassica genome evolution and will underpin research into the many important crops in this genus. PMID:24852848

  13. Sampling Daphnia's expressed genes: preservation, expansion and invention of crustacean genes with reference to insect genomes

    PubMed Central

    Colbourne, John K; Eads, Brian D; Shaw, Joseph; Bohuski, Elizabeth; Bauer, Darren J; Andrews, Justen

    2007-01-01

    Background Functional and comparative studies of insect genomes have shed light on the complement of genes, which in part, account for shared morphologies, developmental programs and life-histories. Contrasting the gene inventories of insects to those of the nematodes provides insight into the genomic changes responsible for their diversification. However, nematodes have weak relationships to insects, as each belongs to separate animal phyla. A better outgroup to distinguish lineage specific novelties would include other members of Arthropoda. For example, crustaceans are close allies to the insects (together forming Pancrustacea) and their fascinating aquatic lifestyle provides an important comparison for understanding the genetic basis of adaptations to life on land versus life in water. Results This study reports on the first characterization of cDNA libraries and sequences for the model crustacean Daphnia pulex. We analyzed 1,546 ESTs of which 1,414 represent approximately 787 nuclear genes, by measuring their sequence similarities with insect and nematode proteomes. The provisional annotation of genes is supported by expression data from microarray studies described in companion papers. Loci expected to be shared between crustaceans and insects because of their mutual biological features are identified, including genes for reproduction, regulation and cellular processes. We identify genes that are likely derived within Pancrustacea or lost within the nematodes. Moreover, lineage specific gene family expansions are identified, which suggest certain biological demands associated with their ecological setting. In particular, up to seven distinct ferritin loci are found in Daphnia compared to three in most insects. Finally, a substantial fraction of the sampled gene transcripts shares no sequence similarity with those from other arthropods. Genes functioning during development and reproduction are comparatively well conserved between crustaceans and insects. By

  14. Genome-wide expression profiling of five mouse models identifies similarities and differences with human psoriasis.

    PubMed

    Swindell, William R; Johnston, Andrew; Carbajal, Steve; Han, Gangwen; Wohn, Christian; Lu, Jun; Xing, Xianying; Nair, Rajan P; Voorhees, John J; Elder, James T; Wang, Xiao-Jing; Sano, Shigetoshi; Prens, Errol P; DiGiovanni, John; Pittelkow, Mark R; Ward, Nicole L; Gudjonsson, Johann E

    2011-04-04

    Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the human disease, and standardized validation criteria for psoriasis mouse models have not been widely applied. In this study, whole-genome transcriptional profiling is used to compare gene expression patterns manifested by human psoriatic skin lesions with those that occur in five psoriasis mouse models (K5-Tie2, imiquimod, K14-AREG, K5-Stat3C and K5-TGFbeta1). While the cutaneous gene expression profiles associated with each mouse phenotype exhibited statistically significant similarity to the expression profile of psoriasis in humans, each model displayed distinctive sets of similarities and differences in comparison to human psoriasis. For all five models, correspondence to the human disease was strong with respect to genes involved in epidermal development and keratinization. Immune and inflammation-associated gene expression, in contrast, was more variable between models as compared to the human disease. These findings support the value of all five models as research tools, each with identifiable areas of convergence to and divergence from the human disease. Additionally, the approach used in this paper provides an objective and quantitative method for evaluation of proposed mouse models of psoriasis, which can be strategically applied in future studies to score strengths of mouse phenotypes relative to specific aspects of human psoriasis.

  15. Comparative genomics reveals tissue-specific regulation of prolactin receptor gene expression

    USDA-ARS?s Scientific Manuscript database

    Prolactin (PRL), acting via the prolactin receptor, fulfills a diversity of biological functions including the maintenance of solute balance and mineral homeostasis via tissues such as the heart, kidneys and intestine. Expression and activity of the prolactin receptor (PRLR) is regulated by various ...

  16. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  17. Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

    PubMed Central

    Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

    2000-01-01

    The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409

  18. MOSAIC: an online database dedicated to the comparative genomics of bacterial strains at the intra-species level.

    PubMed

    Chiapello, Hélène; Gendrault, Annie; Caron, Christophe; Blum, Jérome; Petit, Marie-Agnès; El Karoui, Meriem

    2008-11-27

    The recent availability of complete sequences for numerous closely related bacterial genomes opens up new challenges in comparative genomics. Several methods have been developed to align complete genomes at the nucleotide level but their use and the biological interpretation of results are not straightforward. It is therefore necessary to develop new resources to access, analyze, and visualize genome comparisons. Here we present recent developments on MOSAIC, a generalist comparative bacterial genome database. This database provides the bacteriologist community with easy access to comparisons of complete bacterial genomes at the intra-species level. The strategy we developed for comparison allows us to define two types of regions in bacterial genomes: backbone segments (i.e., regions conserved in all compared strains) and variable segments (i.e., regions that are either specific to or variable in one of the aligned genomes). Definition of these segments at the nucleotide level allows precise comparative and evolutionary analyses of both coding and non-coding regions of bacterial genomes. Such work is easily performed using the MOSAIC Web interface, which allows browsing and graphical visualization of genome comparisons. The MOSAIC database now includes 493 pairwise comparisons and 35 multiple maximal comparisons representing 78 bacterial species. Genome conserved regions (backbones) and variable segments are presented in various formats for further analysis. A graphical interface allows visualization of aligned genomes and functional annotations. The MOSAIC database is available online at http://genome.jouy.inra.fr/mosaic.

  19. Divergent and convergent modes of interaction between wheat and Puccinia graminis f. sp. tritici isolates revealed by the comparative gene co-expression network and genome analyses.

    PubMed

    Rutter, William B; Salcedo, Andres; Akhunova, Alina; He, Fei; Wang, Shichen; Liang, Hanquan; Bowden, Robert L; Akhunov, Eduard

    2017-04-12

    Two opposing evolutionary constraints exert pressure on plant pathogens: one to diversify virulence factors in order to evade plant defenses, and the other to retain virulence factors critical for maintaining a compatible interaction with the plant host. To better understand how the diversified arsenals of fungal genes promote interaction with the same compatible wheat line, we performed a comparative genomic analysis of two North American isolates of Puccinia graminis f. sp. tritici (Pgt). The patterns of inter-isolate divergence in the secreted candidate effector genes were compared with the levels of conservation and divergence of plant-pathogen gene co-expression networks (GCN) developed for each isolate. Comprative genomic analyses revealed substantial level of interisolate divergence in effector gene complement and sequence divergence. Gene Ontology (GO) analyses of the conserved and unique parts of the isolate-specific GCNs identified a number of conserved host pathways targeted by both isolates. Interestingly, the degree of inter-isolate sub-network conservation varied widely for the different host pathways and was positively associated with the proportion of conserved effector candidates associated with each sub-network. While different Pgt isolates tended to exploit similar wheat pathways for infection, the mode of plant-pathogen interaction varied for different pathways with some pathways being associated with the conserved set of effectors and others being linked with the diverged or isolate-specific effectors. Our data suggest that at the intra-species level pathogen populations likely maintain divergent sets of effectors capable of targeting the same plant host pathways. This functional redundancy may play an important role in the dynamic of the "arms-race" between host and pathogen serving as the basis for diverse virulence strategies and creating conditions where mutations in certain effector groups will not have a major effect on the pathogen

  20. The Perennial Ryegrass GenomeZipper: Targeted Use of Genome Resources for Comparative Grass Genomics1[C][W

    PubMed Central

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F.X.; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-01-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species. PMID:23184232

  1. A glimpse of streptococcal toxic shock syndrome from comparative genomics of S. suis 2 Chinese isolates.

    PubMed

    Chen, Chen; Tang, Jiaqi; Dong, Wei; Wang, Changjun; Feng, Youjun; Wang, Jing; Zheng, Feng; Pan, Xiuzhen; Liu, Di; Li, Ming; Song, Yajun; Zhu, Xinxing; Sun, Haibo; Feng, Tao; Guo, Zhaobiao; Ju, Aiping; Ge, Junchao; Dong, Yaqing; Sun, Wen; Jiang, Yongqiang; Wang, Jun; Yan, Jinghua; Yang, Huanming; Wang, Xiaoning; Gao, George F; Yang, Ruifu; Wang, Jian; Yu, Jun

    2007-03-21

    Streptococcus suis serotype 2 (SS2) is an important zoonotic pathogen, causing more than 200 cases of severe human infection worldwide, with the hallmarks of meningitis, septicemia, arthritis, etc. Very recently, SS2 has been recognized as an etiological agent for streptococcal toxic shock syndrome (STSS), which was originally associated with Streptococcus pyogenes (GAS) in Streptococci. However, the molecular mechanisms underlying STSS are poorly understood. To elucidate the genetic determinants of STSS caused by SS2, whole genome sequencing of 3 different Chinese SS2 strains was undertaken. Comparative genomics accompanied by several lines of experiments, including experimental animal infection, PCR assay, and expression analysis, were utilized to further dissect a candidate pathogenicity island (PAI). Here we show, for the first time, a novel molecular insight into Chinese isolates of highly invasive SS2, which caused two large-scale human STSS outbreaks in China. A candidate PAI of approximately 89 kb in length, which is designated 89K and specific for Chinese SS2 virulent isolates, was investigated at the genomic level. It shares the universal properties of PAIs such as distinct GC content, consistent with its pivotal role in STSS and high virulence. To our knowledge, this is the first PAI candidate from S. suis worldwide. Our finding thus sheds light on STSS triggered by SS2 at the genomic level, facilitates further understanding of its pathogenesis and points to directions of development on some effective strategies to combat highly pathogenic SS2 infections.

  2. BloodChIP: a database of comparative genome-wide transcription factor binding profiles in human blood cells.

    PubMed

    Chacon, Diego; Beck, Dominik; Perera, Dilmi; Wong, Jason W H; Pimanda, John E

    2014-01-01

    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression.

  3. flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection

    PubMed Central

    Stanley, Craig E.; Kulathinal, Rob J.

    2016-01-01

    With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster’s breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1–1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info. We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of

  4. flyDIVaS: A Comparative Genomics Resource for Drosophila Divergence and Selection.

    PubMed

    Stanley, Craig E; Kulathinal, Rob J

    2016-08-09

    With arguably the best finished and expertly annotated genome assembly, Drosophila melanogaster is a formidable genetics model to study all aspects of biology. Nearly a decade ago, the 12 Drosophila genomes project expanded D. melanogaster's breadth as a comparative model through the community-development of an unprecedented genus- and genome-wide comparative resource. However, since its inception, these datasets for evolutionary inference and biological discovery have become increasingly outdated, outmoded, and inaccessible. Here, we provide an updated and upgradable comparative genomics resource of Drosophila divergence and selection, flyDIVaS, based on the latest genomic assemblies, curated FlyBase annotations, and recent OrthoDB orthology calls. flyDIVaS is an online database containing D. melanogaster-centric orthologous gene sets, CDS and protein alignments, divergence statistics (% gaps, dN, dS, dN/dS), and codon-based tests of positive Darwinian selection. Out of 13,920 protein-coding D. melanogaster genes, ∼80% have one aligned ortholog in the closely related species, D. simulans, and ∼50% have 1-1 12-way alignments in the original 12 sequenced species that span over 80 million yr of divergence. Genes and their orthologs can be chosen from four different taxonomic datasets differing in phylogenetic depth and coverage density, and visualized via interactive alignments and phylogenetic trees. Users can also batch download entire comparative datasets. A functional survey finds conserved mitotic and neural genes, highly diverged immune and reproduction-related genes, more conspicuous signals of divergence across tissue-specific genes, and an enrichment of positive selection among highly diverged genes. flyDIVaS will be regularly updated and can be freely accessed at www.flydivas.info We encourage researchers to regularly use this resource as a tool for biological inference and discovery, and in their classrooms to help train the next generation of

  5. An initial comparative map of copy number variations in the goat (Capra hircus) genome

    PubMed Central

    2010-01-01

    Background The goat (Capra hircus) represents one of the most important farm animal species. It is reared in all continents with an estimated world population of about 800 million of animals. Despite its importance, studies on the goat genome are still in their infancy compared to those in other farm animal species. Comparative mapping between cattle and goat showed only a few rearrangements in agreement with the similarity of chromosome banding. We carried out a cross species cattle-goat array comparative genome hybridization (aCGH) experiment in order to identify copy number variations (CNVs) in the goat genome analysing animals of different breeds (Saanen, Camosciata delle Alpi, Girgentana, and Murciano-Granadina) using a tiling oligonucleotide array with ~385,000 probes designed on the bovine genome. Results We identified a total of 161 CNVs (an average of 17.9 CNVs per goat), with the largest number in the Saanen breed and the lowest in the Camosciata delle Alpi goat. By aggregating overlapping CNVs identified in different animals we determined CNV regions (CNVRs): on the whole, we identified 127 CNVRs covering about 11.47 Mb of the virtual goat genome referred to the bovine genome (0.435% of the latter genome). These 127 CNVRs included 86 loss and 41 gain and ranged from about 24 kb to about 1.07 Mb with a mean and median equal to 90,292 bp and 49,530 bp, respectively. To evaluate whether the identified goat CNVRs overlap with those reported in the cattle genome, we compared our results with those obtained in four independent cattle experiments. Overlapping between goat and cattle CNVRs was highly significant (P < 0.0001) suggesting that several chromosome regions might contain recurrent interspecies CNVRs. Genes with environmental functions were over-represented in goat CNVRs as reported in other mammals. Conclusions We describe a first map of goat CNVRs. This provides information on a comparative basis with the cattle genome by identifying putative

  6. Comparative genome analysis of two Streptococcus phocae subspecies provides novel insights into pathogenicity.

    PubMed

    Bethke, J; Avendaño-Herrera, R

    2017-02-01

    Streptococcus phocae is a beta-hemolytic, Gram-positive bacterium that was first isolated in Norway from clinical specimens of harbor seal (Phoca vitulina) affected by pneumonia or respiratory infection, and in 2005, this bacterium was identified from disease outbreaks at an Atlantic salmon farm. A recent comparative polyphasic study reclassified Streptococcus phocae as subsp. phocae and subsp. salmonis, and there are currently two S. phocae NCBI sequencing projects for the type strains ATCC 51973 T and C-4 T . The present study compared these genome sequences to determine shared properties between the pathogenic mammalian and fish S. phocae subspecies. Both subspecies presented genomic islands, prophages, CRISPRs, and multiple gene activator and RofA regulator regions that could play key roles in the pathogenesis of streptococcal species. Likewise, proteins possibly influencing immune system evasion and virulence strategies were identified in both genomes, including Streptokinases, Streptolysin S, IgG endopeptidase, Fibronectin binding proteins, Daunorubicin, and Penicillin resistance proteins. Comparative differences in phage, non-phage, and genomic island sequences may form the genetic basis for the virulence, pathogenicity, and ability of S. phocae subsp. salmonis to infect and cause disease in Atlantic salmon, in contrast to S. phocae subsp. phocae. This comparative genomic study between two S. phocae subsp. provides novel insights into virulence factors and pathogenicity, offering important information that will facilitate the development of preventive and treatment measures against this pathogen. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. GeneCount: genome-wide calculation of absolute tumor DNA copy numbers from array comparative genomic hybridization data

    PubMed Central

    Lyng, Heidi; Lando, Malin; Brøvig, Runar S; Svendsrud, Debbie H; Johansen, Morten; Galteland, Eivind; Brustugun, Odd T; Meza-Zepeda, Leonardo A; Myklebost, Ola; Kristensen, Gunnar B; Hovig, Eivind; Stokke, Trond

    2008-01-01

    Absolute tumor DNA copy numbers can currently be achieved only on a single gene basis by using fluorescence in situ hybridization (FISH). We present GeneCount, a method for genome-wide calculation of absolute copy numbers from clinical array comparative genomic hybridization data. The tumor cell fraction is reliably estimated in the model. Data consistent with FISH results are achieved. We demonstrate significant improvements over existing methods for exploring gene dosages and intratumor copy number heterogeneity in cancers. PMID:18500990

  8. Genome-wide analysis of AR binding and comparison with transcript expression in primary human fetal prostate fibroblasts and cancer associated fibroblasts.

    PubMed

    Nash, Claire; Boufaied, Nadia; Mills, Ian G; Franco, Omar E; Hayward, Simon W; Thomson, Axel A

    2017-05-05

    The androgen receptor (AR) is a transcription factor, and key regulator of prostate development and cancer, which has discrete functions in stromal versus epithelial cells. AR expressed in mesenchyme is necessary and sufficient for prostate development while loss of stromal AR is predictive of prostate cancer progression. Many studies have characterized genome-wide binding of AR in prostate tumour cells but none have used primary mesenchyme or stroma. We applied ChIPseq to identify genomic AR binding sites in primary human fetal prostate fibroblasts and patient derived cancer associated fibroblasts, as well as the WPMY1 cell line overexpressing AR. We identified AR binding sites that were specific to fetal prostate fibroblasts (7534), cancer fibroblasts (629), WPMY1-AR (2561) as well as those common among all (783). Primary fibroblasts had a distinct AR binding profile versus prostate cancer cell lines and tissue, and showed a localisation to gene promoter binding sites 1 kb upstream of the transcriptional start site, as well as non-classical AR binding sequence motifs. We used RNAseq to define transcribed genes associated with AR binding sites and derived cistromes for embryonic and cancer fibroblasts as well as a cistrome common to both. These were compared to several in vivo ChIPseq and transcript expression datasets; which identified subsets of AR targets that were expressed in vivo and regulated by androgens. This analysis enabled us to deconvolute stromal AR targets active in stroma within tumour samples. Taken together, our data suggest that the AR shows significantly different genomic binding site locations in primary prostate fibroblasts compared to that observed in tumour cells. Validation of our AR binding site data with transcript expression in vitro and in vivo suggests that the AR target genes we have identified in primary fibroblasts may contribute to clinically significant and biologically important AR-regulated changes in prostate tissue

  9. Genomic identification of regulatory elements by evolutionary sequence comparison and functional analysis.

    PubMed

    Loots, Gabriela G

    2008-01-01

    Despite remarkable recent advances in genomics that have enabled us to identify most of the genes in the human genome, comparable efforts to define transcriptional cis-regulatory elements that control gene expression are lagging behind. The difficulty of this task stems from two equally important problems: our knowledge of how regulatory elements are encoded in genomes remains elementary, and there is a vast genomic search space for regulatory elements, since most of mammalian genomes are noncoding. Comparative genomic approaches are having a remarkable impact on the study of transcriptional regulation in eukaryotes and currently represent the most efficient and reliable methods of predicting noncoding sequences likely to control the patterns of gene expression. By subjecting eukaryotic genomic sequences to computational comparisons and subsequent experimentation, we are inching our way toward a more comprehensive catalog of common regulatory motifs that lie behind fundamental biological processes. We are still far from comprehending how the transcriptional regulatory code is encrypted in the human genome and providing an initial global view of regulatory gene networks, but collectively, the continued development of comparative and experimental approaches will rapidly expand our knowledge of the transcriptional regulome.

  10. Enabling comparative modeling of closely related genomes: Example genus Brucella

    DOE PAGES

    Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.; ...

    2014-03-08

    For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less

  11. Enabling comparative modeling of closely related genomes: Example genus Brucella

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.

    For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less

  12. Evolutionary biology through the lens of budding yeast comparative genomics.

    PubMed

    Marsit, Souhir; Leducq, Jean-Baptiste; Durand, Éléonore; Marchant, Axelle; Filteau, Marie; Landry, Christian R

    2017-10-01

    The budding yeast Saccharomyces cerevisiae is a highly advanced model system for studying genetics, cell biology and systems biology. Over the past decade, the application of high-throughput sequencing technologies to this species has contributed to this yeast also becoming an important model for evolutionary genomics. Indeed, comparative genomic analyses of laboratory, wild and domesticated yeast populations are providing unprecedented detail about many of the processes that govern evolution, including long-term processes, such as reproductive isolation and speciation, and short-term processes, such as adaptation to natural and domestication-related environments.

  13. K-bZIP Mediated SUMO-2/3 Specific Modification on the KSHV Genome Negatively Regulates Lytic Gene Expression and Viral Reactivation

    PubMed Central

    Yang, Wan-Shan; Hsu, Hung-Wei; Campbell, Mel; Cheng, Chia-Yang; Chang, Pei-Ching

    2015-01-01

    SUMOylation is associated with epigenetic regulation of chromatin structure and transcription. Epigenetic modifications of herpesviral genomes accompany the transcriptional switch of latent and lytic genes during the virus life cycle. Here, we report a genome-wide comparison of SUMO paralog modification on the KSHV genome. Using chromatin immunoprecipitation in conjunction with high-throughput sequencing, our study revealed highly distinct landscape changes of SUMO paralog genomic modifications associated with KSHV reactivation. A rapid and widespread deposition of SUMO-2/3, compared with SUMO-1, modification across the KSHV genome upon reactivation was observed. Interestingly, SUMO-2/3 enrichment was inversely correlated with H3K9me3 mark after reactivation, indicating that SUMO-2/3 may be responsible for regulating the expression of viral genes located in low heterochromatin regions during viral reactivation. RNA-sequencing analysis showed that the SUMO-2/3 enrichment pattern positively correlated with KSHV gene expression profiles. Activation of KSHV lytic genes located in regions with high SUMO-2/3 enrichment was enhanced by SUMO-2/3 knockdown. These findings suggest that SUMO-2/3 viral chromatin modification contributes to the diminution of viral gene expression during reactivation. Our previous study identified a SUMO-2/3-specific viral E3 ligase, K-bZIP, suggesting a potential role of this enzyme in regulating SUMO-2/3 enrichment and viral gene repression. Consistent with this prediction, higher K-bZIP binding on SUMO-2/3 enrichment region during reactivation was observed. Moreover, a K-bZIP SUMO E3 ligase dead mutant, K-bZIP-L75A, in the viral context, showed no SUMO-2/3 enrichment on viral chromatin and higher expression of viral genes located in SUMO-2/3 enriched regions during reactivation. Importantly, virus production significantly increased in both SUMO-2/3 knockdown and KSHV K-bZIP-L75A mutant cells. These results indicate that SUMO-2/3 modification

  14. Comparative transcriptome analysis reveals differentially expressed genes associated with sex expression in garden asparagus (Asparagus officinalis).

    PubMed

    Li, Shu-Fen; Zhang, Guo-Jun; Zhang, Xue-Jin; Yuan, Jin-Hong; Deng, Chuan-Liang; Gao, Wu-Jun

    2017-08-22

    Garden asparagus (Asparagus officinalis) is a highly valuable vegetable crop of commercial and nutritional interest. It is also commonly used to investigate the mechanisms of sex determination and differentiation in plants. However, the sex expression mechanisms in asparagus remain poorly understood. De novo transcriptome sequencing via Illumina paired-end sequencing revealed more than 26 billion bases of high-quality sequence data from male and female asparagus flower buds. A total of 72,626 unigenes with an average length of 979 bp were assembled. In comparative transcriptome analysis, 4876 differentially expressed genes (DEGs) were identified in the possible sex-determining stage of female and male/supermale flower buds. Of these DEGs, 433, including 285 male/supermale-biased and 149 female-biased genes, were annotated as flower related. Of the male/supermale-biased flower-related genes, 102 were probably involved in anther development. In addition, 43 DEGs implicated in hormone response and biosynthesis putatively associated with sex expression and reproduction were discovered. Moreover, 128 transcription factor (TF)-related genes belonging to various families were found to be differentially expressed, and this finding implied the essential roles of TF in sex determination or differentiation in asparagus. Correlation analysis indicated that miRNA-DEG pairs were also implicated in asparagus sexual development. Our study identified a large number of DEGs involved in the sex expression and reproduction of asparagus, including known genes participating in plant reproduction, plant hormone signaling, TF encoding, and genes with unclear functions. We also found that miRNAs might be involved in the sex differentiation process. Our study could provide a valuable basis for further investigations on the regulatory networks of sex determination and differentiation in asparagus and facilitate further genetic and genomic studies on this dioecious species.

  15. Comparative Genomics Analysis of Streptomyces Species Reveals Their Adaptation to the Marine Environment and Their Diversity at the Genomic Level

    PubMed Central

    Tian, Xinpeng; Zhang, Zhewen; Yang, Tingting; Chen, Meili; Li, Jie; Chen, Fei; Yang, Jin; Li, Wenjie; Zhang, Bing; Zhang, Zhang; Wu, Jiayan; Zhang, Changsheng; Long, Lijuan; Xiao, Jingfa

    2016-01-01

    Over 200 genomes of streptomycete strains that were isolated from various environments are available from the NCBI. However, little is known about the characteristics that are linked to marine adaptation in marine-derived streptomycetes. The particularity and complexity of the marine environment suggest that marine streptomycetes are genetically diverse. Here, we sequenced nine strains from the Streptomyces genus that were isolated from different longitudes, latitudes, and depths of the South China Sea. Then we compared these strains to 22 NCBI downloaded streptomycete strains. Thirty-one streptomycete strains are clearly grouped into a marine-derived subgroup and multiple source subgroup-based phylogenetic tree. The phylogenetic analyses have revealed the dynamic process underlying streptomycete genome evolution, and lateral gene transfer is an important driving force during the process. Pan-genomics analyses have revealed that streptomycetes have an open pan-genome, which reflects the diversity of these streptomycetes and guarantees the species a quick and economical response to diverse environments. Functional and comparative genomics analyses indicate that the marine-derived streptomycetes subgroup possesses some common characteristics of marine adaptation. Our findings have expanded our knowledge of how ocean isolates of streptomycete strains adapt to marine environments. The availability of streptomycete genomes from the South China Sea will be beneficial for further analysis on marine streptomycetes and will enrich the South China Sea’s genetic data sources. PMID:27446038

  16. GreenPhylDB v2.0: comparative and functional genomics in plants.

    PubMed

    Rouard, Mathieu; Guignon, Valentin; Aluome, Christelle; Laporte, Marie-Angélique; Droc, Gaëtan; Walde, Christian; Zmasek, Christian M; Périn, Christophe; Conte, Matthieu G

    2011-01-01

    GreenPhylDB is a database designed for comparative and functional genomics based on complete genomes. Version 2 now contains sixteen full genomes of members of the plantae kingdom, ranging from algae to angiosperms, automatically clustered into gene families. Gene families are manually annotated and then analyzed phylogenetically in order to elucidate orthologous and paralogous relationships. The database offers various lists of gene families including plant, phylum and species specific gene families. For each gene cluster or gene family, easy access to gene composition, protein domains, publications, external links and orthologous gene predictions is provided. Web interfaces have been further developed to improve the navigation through information related to gene families. New analysis tools are also available, such as a gene family ontology browser that facilitates exploration. GreenPhylDB is a component of the South Green Bioinformatics Platform (http://southgreen.cirad.fr/) and is accessible at http://greenphyl.cirad.fr. It enables comparative genomics in a broad taxonomy context to enhance the understanding of evolutionary processes and thus tends to speed up gene discovery.

  17. LAMP detection assays for boxwood blight pathogens: A comparative genomics approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malapi-Wight, Martha; Demers, Jill E.; Veltri, Daniel

    Rapid and accurate molecular diagnostic tools are critical to efforts to minimize the impact and spread of emergent pathogens. The identification of diagnostic markers for novel pathogens presents several challenges, especially in the absence of information about population diversity and where genetic resources are limited. The objective of this study was to use comparative genomics datasets to find unique target regions suitable for the diagnosis of two fungal species causing a newly emergent blight disease of boxwood. Candidate marker regions for loop-mediated isothermal amplification (LAMP) assays were identified from draft genomes of Calonectria henricotiae and C. pseudonaviculata, as well asmore » three related species not associated with this disease. To increase the probability of identifying unique targets, we used three approaches to mine genome datasets, based on (i) unique regions, (ii) polymorphisms, and (iii) presence/absence of regions across datasets. From a pool of candidate markers, we demonstrate LAMP assay specificity by testing related fungal species, common boxwood pathogens, and environmental samples containing 445 diverse fungal taxa. In conclusion, this comparative-genomics-based approach to the development of LAMP diagnostic assays is the first of its kind for fungi and could be easily applied to diagnostic marker development for other newly emergent plant pathogens.« less

  18. LAMP detection assays for boxwood blight pathogens: A comparative genomics approach

    DOE PAGES

    Malapi-Wight, Martha; Demers, Jill E.; Veltri, Daniel; ...

    2016-05-20

    Rapid and accurate molecular diagnostic tools are critical to efforts to minimize the impact and spread of emergent pathogens. The identification of diagnostic markers for novel pathogens presents several challenges, especially in the absence of information about population diversity and where genetic resources are limited. The objective of this study was to use comparative genomics datasets to find unique target regions suitable for the diagnosis of two fungal species causing a newly emergent blight disease of boxwood. Candidate marker regions for loop-mediated isothermal amplification (LAMP) assays were identified from draft genomes of Calonectria henricotiae and C. pseudonaviculata, as well asmore » three related species not associated with this disease. To increase the probability of identifying unique targets, we used three approaches to mine genome datasets, based on (i) unique regions, (ii) polymorphisms, and (iii) presence/absence of regions across datasets. From a pool of candidate markers, we demonstrate LAMP assay specificity by testing related fungal species, common boxwood pathogens, and environmental samples containing 445 diverse fungal taxa. In conclusion, this comparative-genomics-based approach to the development of LAMP diagnostic assays is the first of its kind for fungi and could be easily applied to diagnostic marker development for other newly emergent plant pathogens.« less

  19. A 1463 Gene Cattle–Human Comparative Map With Anchor Points Defined by Human Genome Sequence Coordinates

    PubMed Central

    Everts-van der Wind, Annelie; Kata, Srinivas R.; Band, Mark R.; Rebeiz, Mark; Larkin, Denis M.; Everts, Robin E.; Green, Cheryl A.; Liu, Lei; Natarajan, Shreedhar; Goldammer, Tom; Lee, Jun Heon; McKay, Stephanie; Womack, James E.; Lewin, Harris A.

    2004-01-01

    A second-generation 5000 rad radiation hybrid (RH) map of the cattle genome was constructed primarily using cattle ESTs that were targeted to gaps in the existing cattle–human comparative map, as well as to sparsely populated map intervals. A total of 870 targeted markers were added, bringing the number of markers mapped on the RH5000 panel to 1913. Of these, 1463 have significant BLASTN hits (E < e–5) against the human genome sequence. A cattle–human comparative map was created using human genome sequence coordinates of the paired orthologs. One-hundred and ninety-five conserved segments (defined by two or more genes) were identified between the cattle and human genomes, of which 31 are newly discovered and 34 were extended singletons on the first-generation map. The new map represents an improvement of 20% genome-wide comparative coverage compared with the first-generation map. Analysis of gene content within human genome regions where there are gaps in the comparative map revealed gaps with both significantly greater and significantly lower gene content. The new, more detailed cattle–human comparative map provides an improved resource for the analysis of mammalian chromosome evolution, the identification of candidate genes for economically important traits, and for proper alignment of sequence contigs on cattle chromosomes. PMID:15231756

  20. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data.

    PubMed

    Chi, Bryan; DeLeeuw, Ronald J; Coe, Bradley P; MacAulay, Calum; Lam, Wan L

    2004-02-09

    Array comparative genomic hybridization (CGH) is a technique which detects copy number differences in DNA segments. Complete sequencing of the human genome and the development of an array representing a tiling set of tens of thousands of DNA segments spanning the entire human genome has made high resolution copy number analysis throughout the genome possible. Since array CGH provides signal ratio for each DNA segment, visualization would require the reassembly of individual data points into chromosome profiles. We have developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH is an application that translates spot signal ratio data from array CGH experiments to displays of high resolution chromosome profiles. Data is imported from a simple tab delimited text file obtained from standard microarray image analysis software. SeeGH processes the signal ratio data and graphically displays it in a conventional CGH karyotype diagram with the added features of magnification and DNA segment annotation. In this process, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions for graphical display. Once the data is displayed, users have the option of hiding or flagging DNA segments based on user defined criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. SeeGH represents a novel software tool used to view and analyze array CGH data. The software gives users the ability to view the data in an overall genomic view as well as magnify specific chromosomal regions facilitating the precise localization of genetic alterations. SeeGH is easily installed and runs on Microsoft Windows 2000 or later environments.

  1. Update on Comparative Genomics of Legumes

    USDA-ARS?s Scientific Manuscript database

    This year marks the essential completion of the genome sequences of Glycine max, Medicago truncatula, and Lotus japonicus (soybean, barrel medic, and birdsfoot trefoil, respectively). The impact of these assembled, annotated genomes will be enormous. L. japonicus and M. truncatula, both forage crop...

  2. Human Disease-Drug Network Based on Genomic Expression Profiles

    PubMed Central

    Hu, Guanghui; Agarwal, Pankaj

    2009-01-01

    Background Drug repositioning offers the possibility of faster development times and reduced risks in drug discovery. With the rapid development of high-throughput technologies and ever-increasing accumulation of whole genome-level datasets, an increasing number of diseases and drugs can be comprehensively characterized by the changes they induce in gene expression, protein, metabolites and phenotypes. Methodology/Principal Findings We performed a systematic, large-scale analysis of genomic expression profiles of human diseases and drugs to create a disease-drug network. A network of 170,027 significant interactions was extracted from the ∼24.5 million comparisons between ∼7,000 publicly available transcriptomic profiles. The network includes 645 disease-disease, 5,008 disease-drug, and 164,374 drug-drug relationships. At least 60% of the disease-disease pairs were in the same disease area as determined by the Medical Subject Headings (MeSH) disease classification tree. The remaining can drive a molecular level nosology by discovering relationships between seemingly unrelated diseases, such as a connection between bipolar disorder and hereditary spastic paraplegia, and a connection between actinic keratosis and cancer. Among the 5,008 disease-drug links, connections with negative scores suggest new indications for existing drugs, such as the use of some antimalaria drugs for Crohn's disease, and a variety of existing drugs for Huntington's disease; while the positive scoring connections can aid in drug side effect identification, such as tamoxifen's undesired carcinogenic property. From the ∼37K drug-drug relationships, we discover relationships that aid in target and pathway deconvolution, such as 1) KCNMA1 as a potential molecular target of lobeline, and 2) both apoptotic DNA fragmentation and G2/M DNA damage checkpoint regulation as potential pathway targets of daunorubicin. Conclusions/Significance We have automatically generated thousands of disease and

  3. Complete mitochondrial genome of the aluminum-tolerant fungus Rhodotorula taiwanensis RS1 and comparative analysis of Basidiomycota mitochondrial genomes

    PubMed Central

    Zhao, Xue Qiang; Aizawa, Tomoko; Schneider, Jessica; Wang, Chao; Shen, Ren Fang; Sunairi, Michio

    2013-01-01

    The complete mitochondrial genome of Rhodotorula taiwanensis RS1, an aluminum-tolerant Basidiomycota fungus, was determined and compared with the known mitochondrial genomes of 12 Basidiomycota species. The mitochondrial genome of R. taiwanensis RS1 is a circular DNA molecule of 40,392 bp and encodes the typical 15 mitochondrial proteins, 23 tRNAs, and small and large rRNAs as well as 10 intronic open reading frames. These genes are apparently transcribed in two directions and do not show syntenies in gene order with other investigated Basidiomycota species. The average G+C content (41%) of the mitochondrial genome of R. taiwanensis RS1 is the highest among the Basidiomycota species. Two introns were detected in the sequence of the atp9 gene of R. taiwanensis RS1, but not in that of other Basidiomycota species. Rhodotorula taiwanensis is the first species of the genus Rhodotorula whose full mitochondrial genome has been sequenced; and the data presented here supply valuable information for understanding the evolution of fungal mitochondrial genomes and researching the mechanism of aluminum tolerance in microorganisms. PMID:23427135

  4. Embryonic stem cell-like features of testicular carcinoma in situ revealed by genome-wide gene expression profiling.

    PubMed

    Almstrup, Kristian; Hoei-Hansen, Christina E; Wirkner, Ute; Blake, Jonathon; Schwager, Christian; Ansorge, Wilhelm; Nielsen, John E; Skakkebaek, Niels E; Rajpert-De Meyts, Ewa; Leffers, Henrik

    2004-07-15

    Carcinoma in situ (CIS) is the common precursor of histologically heterogeneous testicular germ cell tumors (TGCTs), which in recent decades have markedly increased and now are the most common malignancy of young men. Using genome-wide gene expression profiling, we identified >200 genes highly expressed in testicular CIS, including many never reported in testicular neoplasms. Expression was further verified by semiquantitative reverse transcription-PCR and in situ hybridization. Among the highest expressed genes were NANOG and POU5F1, and reverse transcription-PCR revealed possible changes in their stoichiometry on progression into embryonic carcinoma. We compared the CIS expression profile with patterns reported in embryonic stem cells (ESCs), which revealed a substantial overlap that may be as high as 50%. We also demonstrated an over-representation of expressed genes in regions of 17q and 12, reported as unstable in cultured ESCs. The close similarity between CIS and ESCs explains the pluripotency of CIS. Moreover, the findings are consistent with an early prenatal origin of TGCTs and thus suggest that etiologic factors operating in utero are of primary importance for the incidence trends of TGCTs. Finally, some of the highly expressed genes identified in this study are promising candidates for new diagnostic markers for CIS and/or TGCTs.

  5. Comparative genomic analysis of the multispecies probiotic-marketed product VSL#3.

    PubMed

    Douillard, François P; Mora, Diego; Eijlander, Robyn T; Wels, Michiel; de Vos, Willem M

    2018-01-01

    Several probiotic-marketed formulations available for the consumers contain live lactic acid bacteria and/or bifidobacteria. The multispecies product commercialized as VSL#3 has been used for treating various gastro-intestinal disorders. However, like many other products, the bacterial strains present in VSL#3 have only been characterized to a limited extent and their efficacy as well as their predicted mode of action remain unclear, preventing further applications or comparative studies. In this work, the genomes of all eight bacterial strains present in VSL#3 were sequenced and characterized, to advance insights into the possible mode of action of this product and also to serve as a basis for future work and trials. Phylogenetic and genomic data analysis allowed us to identify the 7 species present in the VSL#3 product as specified by the manufacturer. The 8 strains present belong to the species Streptococcus thermophilus, Lactobacillus acidophilus, Lactobacillus paracasei, Lactobacillus plantarum, Lactobacillus helveticus, Bifidobacterium breve and B. animalis subsp. lactis (two distinct strains). Comparative genomics revealed that the draft genomes of the S. thermophilus and L. helveticus strains were predicted to encode most of the defence systems such as restriction modification and CRISPR-Cas systems. Genes associated with a variety of potential probiotic functions were also identified. Thus, in the three Bifidobacterium spp., gene clusters were predicted to encode tight adherence pili, known to promote bacteria-host interaction and intestinal barrier integrity, and to impact host cell development. Various repertoires of putative signalling proteins were predicted to be encoded by the genomes of the Lactobacillus spp., i.e. surface layer proteins, LPXTG-containing proteins, or sortase-dependent pili that may interact with the intestinal mucosa and dendritic cells. Taken altogether, the individual genomic characterization of the strains present in the VSL#3

  6. Comparative genomic analysis of the multispecies probiotic-marketed product VSL#3

    PubMed Central

    Mora, Diego; Eijlander, Robyn T.; Wels, Michiel; de Vos, Willem M.

    2018-01-01

    Several probiotic-marketed formulations available for the consumers contain live lactic acid bacteria and/or bifidobacteria. The multispecies product commercialized as VSL#3 has been used for treating various gastro-intestinal disorders. However, like many other products, the bacterial strains present in VSL#3 have only been characterized to a limited extent and their efficacy as well as their predicted mode of action remain unclear, preventing further applications or comparative studies. In this work, the genomes of all eight bacterial strains present in VSL#3 were sequenced and characterized, to advance insights into the possible mode of action of this product and also to serve as a basis for future work and trials. Phylogenetic and genomic data analysis allowed us to identify the 7 species present in the VSL#3 product as specified by the manufacturer. The 8 strains present belong to the species Streptococcus thermophilus, Lactobacillus acidophilus, Lactobacillus paracasei, Lactobacillus plantarum, Lactobacillus helveticus, Bifidobacterium breve and B. animalis subsp. lactis (two distinct strains). Comparative genomics revealed that the draft genomes of the S. thermophilus and L. helveticus strains were predicted to encode most of the defence systems such as restriction modification and CRISPR-Cas systems. Genes associated with a variety of potential probiotic functions were also identified. Thus, in the three Bifidobacterium spp., gene clusters were predicted to encode tight adherence pili, known to promote bacteria-host interaction and intestinal barrier integrity, and to impact host cell development. Various repertoires of putative signalling proteins were predicted to be encoded by the genomes of the Lactobacillus spp., i.e. surface layer proteins, LPXTG-containing proteins, or sortase-dependent pili that may interact with the intestinal mucosa and dendritic cells. Taken altogether, the individual genomic characterization of the strains present in the VSL#3

  7. Building the Evidence Base for Decision-making in Cancer Genomic Medicine Using Comparative Effectiveness Research

    PubMed Central

    Goddard, Katrina A.B.; Knaus, William A.; Whitlock, Evelyn; Lyman, Gary H.; Feigelson, Heather Spencer; Schully, Sheri D.; Ramsey, Scott; Tunis, Sean; Freedman, Andrew N.; Khoury, Muin J.; Veenstra, David L.

    2013-01-01

    Background The clinical utility is uncertain for many cancer genomic applications. Comparative effectiveness research (CER) can provide evidence to clarify this uncertainty. Objectives To identify approaches to help stakeholders make evidence-based decisions, and to describe potential challenges and opportunities using CER to produce evidence-based guidance. Methods We identified general CER approaches for genomic applications through literature review, the authors’ experiences, and lessons learned from a recent, seven-site CER initiative in cancer genomic medicine. Case studies illustrate the use of CER approaches. Results Evidence generation and synthesis approaches include comparative observational and randomized trials, patient reported outcomes, decision modeling, and economic analysis. We identified significant challenges to conducting CER in cancer genomics: the rapid pace of innovation, the lack of regulation, the limited evidence for clinical utility, and the beliefs that genomic tests could have personal utility without having clinical utility. Opportunities to capitalize on CER methods in cancer genomics include improvements in the conduct of evidence synthesis, stakeholder engagement, increasing the number of comparative studies, and developing approaches to inform clinical guidelines and research prioritization. Conclusions CER offers a variety of methodological approaches to address stakeholders’ needs. Innovative approaches are needed to ensure an effective translation of genomic discoveries. PMID:22516979

  8. Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation.

    PubMed

    Wang, Xiyin; Guo, Hui; Wang, Jinpeng; Lei, Tianyu; Liu, Tao; Wang, Zhenyi; Li, Yuxian; Lee, Tae-Ho; Li, Jingping; Tang, Haibao; Jin, Dianchuan; Paterson, Andrew H

    2016-02-01

    The 'apparently' simple genomes of many angiosperms mask complex evolutionary histories. The reference genome sequence for cotton (Gossypium spp.) revealed a ploidy change of a complexity unprecedented to date, indeed that could not be distinguished as to its exact dosage. Herein, by developing several comparative, computational and statistical approaches, we revealed a 5× multiplication in the cotton lineage of an ancestral genome common to cotton and cacao, and proposed evolutionary models to show how such a decaploid ancestor formed. The c. 70% gene loss necessary to bring the ancestral decaploid to its current gene count appears to fit an approximate geometrical model; that is, although many genes may be lost by single-gene deletion events, some may be lost in groups of consecutive genes. Gene loss following cotton decaploidy has largely just reduced gene copy numbers of some homologous groups. We designed a novel approach to deconvolute layers of chromosome homology, providing definitive information on gene orthology and paralogy across broad evolutionary distances, both of fundamental value and serving as an important platform to support further studies in and beyond cotton and genomics communities. No claim to original US government works. New Phytologist © 2015 New Phytologist Trust.

  9. Integrating genome-wide genetic variations and monocyte expression data reveals trans-regulated gene modules in humans.

    PubMed

    Rotival, Maxime; Zeller, Tanja; Wild, Philipp S; Maouche, Seraya; Szymczak, Silke; Schillert, Arne; Castagné, Raphaele; Deiseroth, Arne; Proust, Carole; Brocheton, Jessy; Godefroy, Tiphaine; Perret, Claire; Germain, Marine; Eleftheriadis, Medea; Sinning, Christoph R; Schnabel, Renate B; Lubos, Edith; Lackner, Karl J; Rossmann, Heidi; Münzel, Thomas; Rendon, Augusto; Erdmann, Jeanette; Deloukas, Panos; Hengstenberg, Christian; Diemert, Patrick; Montalescot, Gilles; Ouwehand, Willem H; Samani, Nilesh J; Schunkert, Heribert; Tregouet, David-Alexandre; Ziegler, Andreas; Goodall, Alison H; Cambien, François; Tiret, Laurence; Blankenberg, Stefan

    2011-12-01

    One major expectation from the transcriptome in humans is to characterize the biological basis of associations identified by genome-wide association studies. So far, few cis expression quantitative trait loci (eQTLs) have been reliably related to disease susceptibility. Trans-regulating mechanisms may play a more prominent role in disease susceptibility. We analyzed 12,808 genes detected in at least 5% of circulating monocyte samples from a population-based sample of 1,490 European unrelated subjects. We applied a method of extraction of expression patterns-independent component analysis-to identify sets of co-regulated genes. These patterns were then related to 675,350 SNPs to identify major trans-acting regulators. We detected three genomic regions significantly associated with co-regulated gene modules. Association of these loci with multiple expression traits was replicated in Cardiogenics, an independent study in which expression profiles of monocytes were available in 758 subjects. The locus 12q13 (lead SNP rs11171739), previously identified as a type 1 diabetes locus, was associated with a pattern including two cis eQTLs, RPS26 and SUOX, and 5 trans eQTLs, one of which (MADCAM1) is a potential candidate for mediating T1D susceptibility. The locus 12q24 (lead SNP rs653178), which has demonstrated extensive disease pleiotropy, including type 1 diabetes, hypertension, and celiac disease, was associated to a pattern strongly correlating to blood pressure level. The strongest trans eQTL in this pattern was CRIP1, a known marker of cellular proliferation in cancer. The locus 12q15 (lead SNP rs11177644) was associated with a pattern driven by two cis eQTLs, LYZ and YEATS4, and including 34 trans eQTLs, several of them tumor-related genes. This study shows that a method exploiting the structure of co-expressions among genes can help identify genomic regions involved in trans regulation of sets of genes and can provide clues for understanding the mechanisms linking

  10. [Comparative results of preimplantation genetic screening by array comparative genomic hybridization and new-generation sequencing].

    PubMed

    Aleksandrova, N V; Shubina, E S; Ekimov, A N; Kodyleva, T A; Mukosey, I S; Makarova, N P; Kulakova, E V; Levkov, L A; Barkov, I Yu; Trofimov, D Yu; Sukhikh, G T

    2017-01-01

    Aneuploidies as quantitative chromosome abnormalities are a main cause of failed development of morphologically normal embryos, implantation failures, and early reproductive losses. Preimplantation genetic screening (PGS) allows a preselection of embryos with a normal karyotype, thus increasing the implantation rate and reducing the frequency of early pregnancy loss after IVF. Modern PGS technologies are based on a genome-wide analysis of the embryo. The first pilot study in Russia was performed to assess the possibility of using semiconductor new-generation sequencing (NGS) as a PGS method. NGS data were collected for 38 biopsied embryos and compared with the data from array comparative genomic hybridization (array-CGH). The concordance between the NGS and array-CGH data was 94.8%. Two samples showed the karyotype 47,XXY by array-CGH and a normal karyotype by NGS. The discrepancies may be explained by loss of efficiency of array-CGH amplicon labeling.

  11. Comparative transcriptome analysis reveals insights into the streamlined genomes of haplosclerid demosponges

    PubMed Central

    Guzman, Christine; Conaco, Cecilia

    2016-01-01

    Sponges (Porifera) are one of the most ancestral metazoan groups. They are characterized by a simple body plan lacking the true tissues and organ systems found in other animals. Members of this phylum display a remarkable diversity of form and function and yet little is known about the composition and complexity of their genomes. In this study, we sequenced the transcriptomes of two marine haplosclerid sponges belonging to Demospongiae, the largest and most diverse class within phylum Porifera, and compared their gene content with members of other sponge classes. We recovered 44,693 and 50,067 transcripts expressed in adult tissues of Haliclona amboinensis and Haliclona tubifera, respectively. These transcripts translate into 20,280 peptides in H. amboinensis and 18,000 peptides in H. tubifera. Genes associated with important signaling and metabolic pathways, regulatory networks, as well as genes that may be important in the organismal stress response, were identified in the transcriptomes. Futhermore, lineage-specific innovations were identified that may be correlated with observed sponge characters and ecological adaptations. The core gene complement expressed within the tissues of adult haplosclerid demosponges may represent a streamlined and flexible genetic toolkit that underlies the ecological success and resilience of sponges to environmental stress. PMID:26738846

  12. Comparative transcriptome analysis reveals insights into the streamlined genomes of haplosclerid demosponges

    NASA Astrophysics Data System (ADS)

    Guzman, Christine; Conaco, Cecilia

    2016-01-01

    Sponges (Porifera) are one of the most ancestral metazoan groups. They are characterized by a simple body plan lacking the true tissues and organ systems found in other animals. Members of this phylum display a remarkable diversity of form and function and yet little is known about the composition and complexity of their genomes. In this study, we sequenced the transcriptomes of two marine haplosclerid sponges belonging to Demospongiae, the largest and most diverse class within phylum Porifera, and compared their gene content with members of other sponge classes. We recovered 44,693 and 50,067 transcripts expressed in adult tissues of Haliclona amboinensis and Haliclona tubifera, respectively. These transcripts translate into 20,280 peptides in H. amboinensis and 18,000 peptides in H. tubifera. Genes associated with important signaling and metabolic pathways, regulatory networks, as well as genes that may be important in the organismal stress response, were identified in the transcriptomes. Futhermore, lineage-specific innovations were identified that may be correlated with observed sponge characters and ecological adaptations. The core gene complement expressed within the tissues of adult haplosclerid demosponges may represent a streamlined and flexible genetic toolkit that underlies the ecological success and resilience of sponges to environmental stress.

  13. Unstable genomes elevate transcriptome dynamics

    PubMed Central

    Stevens, Joshua B.; Liu, Guo; Abdallah, Batoul Y.; Horne, Steven D.; Ye, Karen J.; Bremer, Steven W.; Ye, Christine J.; Krawetz, Stephen A.; Heng, Henry H.

    2015-01-01

    The challenge of identifying common expression signatures in cancer is well known, however the reason behind this is largely unclear. Traditionally variation in expression signatures has been attributed to technological problems, however recent evidence suggests that chromosome instability (CIN) and resultant karyotypic heterogeneity may be a large contributing factor. Using a well-defined model of immortalization, we systematically compared the pattern of genome alteration and expression dynamics during somatic evolution. Co-measurement of global gene expression and karyotypic alteration throughout the immortalization process reveals that karyotype changes influence gene expression as major structural and numerical karyotypic alterations result in large gene expression deviation. Replicate samples from stages with stable genomes are more similar to each other than are replicate samples with karyotypic heterogeneity. Karyotypic and gene expression change during immortalization is dynamic as each stage of progression has a unique expression pattern. This was further verified by comparing global expression in two replicates grown in one flask with known karyotypes. Replicates with higher karyotypic instability were found to be less similar than replicates with stable karyotypes. This data illustrates the karyotype, transcriptome, and transcriptome determined pathways are in constant flux during somatic cellular evolution (particularly during the macroevolutionary phase) and this flux is an inextricable feature of CIN and essential for cancer formation. The findings presented here underscore the importance of understanding the evolutionary process of cancer in order to design improved treatment modalities. PMID:24122714

  14. Effects of immunostimulation on social behavior, chemical communication and genome-wide gene expression in honey bee workers (Apis mellifera)

    PubMed Central

    2012-01-01

    Background Social insects, such as honey bees, use molecular, physiological and behavioral responses to combat pathogens and parasites. The honey bee genome contains all of the canonical insect immune response pathways, and several studies have demonstrated that pathogens can activate expression of immune effectors. Honey bees also use behavioral responses, termed social immunity, to collectively defend their hives from pathogens and parasites. These responses include hygienic behavior (where workers remove diseased brood) and allo-grooming (where workers remove ectoparasites from nestmates). We have previously demonstrated that immunostimulation causes changes in the cuticular hydrocarbon profiles of workers, which results in altered worker-worker social interactions. Thus, cuticular hydrocarbons may enable workers to identify sick nestmates, and adjust their behavior in response. Here, we test the specificity of behavioral, chemical and genomic responses to immunostimulation by challenging workers with a panel of different immune stimulants (saline, Sephadex beads and Gram-negative bacteria E. coli). Results While only bacteria-injected bees elicited altered behavioral responses from healthy nestmates compared to controls, all treatments resulted in significant changes in cuticular hydrocarbon profiles. Immunostimulation caused significant changes in expression of hundreds of genes, the majority of which have not been identified as members of the canonical immune response pathways. Furthermore, several new candidate genes that may play a role in cuticular hydrocarbon biosynthesis were identified. Effects of immune challenge expression of several genes involved in immune response, cuticular hydrocarbon biosynthesis, and the Notch signaling pathway were confirmed using quantitative real-time PCR. Finally, we identified common genes regulated by pathogen challenge in honey bees and other insects. Conclusions These results demonstrate that honey bee genomic responses

  15. Effects of immunostimulation on social behavior, chemical communication and genome-wide gene expression in honey bee workers (Apis mellifera).

    PubMed

    Richard, Freddie-Jeanne; Holt, Holly L; Grozinger, Christina M

    2012-10-16

    Social insects, such as honey bees, use molecular, physiological and behavioral responses to combat pathogens and parasites. The honey bee genome contains all of the canonical insect immune response pathways, and several studies have demonstrated that pathogens can activate expression of immune effectors. Honey bees also use behavioral responses, termed social immunity, to collectively defend their hives from pathogens and parasites. These responses include hygienic behavior (where workers remove diseased brood) and allo-grooming (where workers remove ectoparasites from nestmates). We have previously demonstrated that immunostimulation causes changes in the cuticular hydrocarbon profiles of workers, which results in altered worker-worker social interactions. Thus, cuticular hydrocarbons may enable workers to identify sick nestmates, and adjust their behavior in response. Here, we test the specificity of behavioral, chemical and genomic responses to immunostimulation by challenging workers with a panel of different immune stimulants (saline, Sephadex beads and Gram-negative bacteria E. coli). While only bacteria-injected bees elicited altered behavioral responses from healthy nestmates compared to controls, all treatments resulted in significant changes in cuticular hydrocarbon profiles. Immunostimulation caused significant changes in expression of hundreds of genes, the majority of which have not been identified as members of the canonical immune response pathways. Furthermore, several new candidate genes that may play a role in cuticular hydrocarbon biosynthesis were identified. Effects of immune challenge expression of several genes involved in immune response, cuticular hydrocarbon biosynthesis, and the Notch signaling pathway were confirmed using quantitative real-time PCR. Finally, we identified common genes regulated by pathogen challenge in honey bees and other insects. These results demonstrate that honey bee genomic responses to immunostimulation are

  16. Genomic expression catalogue of a global collection of BCG vaccine strains show evidence for highly diverged metabolic and cell-wall adaptations

    PubMed Central

    Abdallah, Abdallah M.; Hill-Cawthorne, Grant A.; Otto, Thomas D.; Coll, Francesc; Guerra-Assunção, José Afonso; Gao, Ge; Naeem, Raeece; Ansari, Hifzur; Malas, Tareq B.; Adroub, Sabir A.; Verboom, Theo; Ummels, Roy; Zhang, Huoming; Panigrahi, Aswini Kumar; McNerney, Ruth; Brosch, Roland; Clark, Taane G.; Behr, Marcel A.; Bitter, Wilbert; Pain, Arnab

    2015-01-01

    Although Bacillus Calmette-Guérin (BCG) vaccines against tuberculosis have been available for more than 90 years, their effectiveness has been hindered by variable protective efficacy and a lack of lasting memory responses. One factor contributing to this variability may be the diversity of the BCG strains that are used around the world, in part from genomic changes accumulated during vaccine production and their resulting differences in gene expression. We have compared the genomes and transcriptomes of a global collection of fourteen of the most widely used BCG strains at single base-pair resolution. We have also used quantitative proteomics to identify key differences in expression of proteins across five representative BCG strains of the four tandem duplication (DU) groups. We provide a comprehensive map of single nucleotide polymorphisms (SNPs), copy number variation and insertions and deletions (indels) across fourteen BCG strains. Genome-wide SNP characterization allowed the construction of a new and robust phylogenic genealogy of BCG strains. Transcriptional and proteomic profiling revealed a metabolic remodeling in BCG strains that may be reflected by altered immunogenicity and possibly vaccine efficacy. Together, these integrated-omic data represent the most comprehensive catalogue of genetic variation across a global collection of BCG strains. PMID:26487098

  17. The Plant Genome Integrative Explorer Resource: PlantGenIE.org.

    PubMed

    Sundell, David; Mannapperuma, Chanaka; Netotea, Sergiu; Delhomme, Nicolas; Lin, Yao-Cheng; Sjödin, Andreas; Van de Peer, Yves; Jansson, Stefan; Hvidsten, Torgeir R; Street, Nathaniel R

    2015-12-01

    Accessing and exploring large-scale genomics data sets remains a significant challenge to researchers without specialist bioinformatics training. We present the integrated PlantGenIE.org platform for exploration of Populus, conifer and Arabidopsis genomics data, which includes expression networks and associated visualization tools. Standard features of a model organism database are provided, including genome browsers, gene list annotation, Blast homology searches and gene information pages. Community annotation updating is supported via integration of WebApollo. We have produced an RNA-sequencing (RNA-Seq) expression atlas for Populus tremula and have integrated these data within the expression tools. An updated version of the ComPlEx resource for performing comparative plant expression analyses of gene coexpression network conservation between species has also been integrated. The PlantGenIE.org platform provides intuitive access to large-scale and genome-wide genomics data from model forest tree species, facilitating both community contributions to annotation improvement and tools supporting use of the included data resources to inform biological insight. © 2015 The Authors. New Phytologist © 2015 New Phytologist Trust.

  18. Comparative genomic analysis of single-molecule sequencing and hybrid approaches for finishing the Clostridium autoethanogenum JA1-1 strain DSM 10061 genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Brown, Steven D; Nagaraju, Shilpa; Utturkar, Sagar M

    Background Clostridium autoethanogenum strain JA1-1 (DSM 10061) is an acetogen capable of fermenting CO, CO2 and H2 (e.g. from syngas or waste gases) into biofuel ethanol and commodity chemicals such as 2,3-butanediol. A draft genome sequence consisting of 100 contigs has been published. Results A closed, high-quality genome sequence for C. autoethanogenum DSM10061 was generated using only the latest single-molecule DNA sequencing technology and without the need for manual finishing. It is assigned to the most complex genome classification based upon genome features such as repeats, prophage, nine copies of the rRNA gene operons. It has a low G +more » C content of 31.1%. Illumina, 454, Illumina/454 hybrid assemblies were generated and then compared to the draft and PacBio assemblies using summary statistics, CGAL, QUAST and REAPR bioinformatics tools and comparative genomic approaches. Assemblies based upon shorter read DNA technologies were confounded by the large number repeats and their size, which in the case of the rRNA gene operons were ~5 kb. CRISPR (Clustered Regularly Interspaced Short Paloindromic Repeats) systems among biotechnologically relevant Clostridia were classified and related to plasmid content and prophages. Potential associations between plasmid content and CRISPR systems may have implications for historical industrial scale Acetone-Butanol-Ethanol (ABE) fermentation failures and future large scale bacterial fermentations. While C. autoethanogenum contains an active CRISPR system, no such system is present in the closely related Clostridium ljungdahlii DSM 13528. A common prophage inserted into the Arg-tRNA shared between the strains suggests a common ancestor. However, C. ljungdahlii contains several additional putative prophages and it has more than double the amount of prophage DNA compared to C. autoethanogenum. Other differences include important metabolic genes for central metabolism (as an additional hydrogenase and the absence of a

  19. House spider genome uncovers evolutionary shifts in the diversity and expression of black widow venom proteins associated with extreme toxicity.

    PubMed

    Gendreau, Kerry L; Haney, Robert A; Schwager, Evelyn E; Wierschin, Torsten; Stanke, Mario; Richards, Stephen; Garb, Jessica E

    2017-02-16

    Black widow spiders are infamous for their neurotoxic venom, which can cause extreme and long-lasting pain. This unusual venom is dominated by latrotoxins and latrodectins, two protein families virtually unknown outside of the black widow genus Latrodectus, that are difficult to study given the paucity of spider genomes. Using tissue-, sex- and stage-specific expression data, we analyzed the recently sequenced genome of the house spider (Parasteatoda tepidariorum), a close relative of black widows, to investigate latrotoxin and latrodectin diversity, expression and evolution. We discovered at least 47 latrotoxin genes in the house spider genome, many of which are tandem-arrayed. Latrotoxins vary extensively in predicted structural domains and expression, implying their significant functional diversification. Phylogenetic analyses show latrotoxins have substantially duplicated after the Latrodectus/Parasteatoda split and that they are also related to proteins found in endosymbiotic bacteria. Latrodectin genes are less numerous than latrotoxins, but analyses show their recruitment for venom function from neuropeptide hormone genes following duplication, inversion and domain truncation. While latrodectins and other peptides are highly expressed in house spider and black widow venom glands, latrotoxins account for a far smaller percentage of house spider venom gland expression. The house spider genome sequence provides novel insights into the evolution of venom toxins once considered unique to black widows. Our results greatly expand the size of the latrotoxin gene family, reinforce its narrow phylogenetic distribution, and provide additional evidence for the lateral transfer of latrotoxins between spiders and bacterial endosymbionts. Moreover, we strengthen the evidence for the evolution of latrodectin venom genes from the ecdysozoan Ion Transport Peptide (ITP)/Crustacean Hyperglycemic Hormone (CHH) neuropeptide superfamily. The lower expression of latrotoxins in

  20. Comparative genome and transcriptome analyses of the social amoeba Acytostelium subglobosum that accomplishes multicellular development without germ-soma differentiation.

    PubMed

    Urushihara, Hideko; Kuwayama, Hidekazu; Fukuhara, Kensuke; Itoh, Takehiko; Kagoshima, Hiroshi; Shin-I, Tadasu; Toyoda, Atsushi; Ohishi, Kazuyo; Taniguchi, Tateaki; Noguchi, Hideki; Kuroki, Yoko; Hata, Takashi; Uchi, Kyoko; Mohri, Kurato; King, Jason S; Insall, Robert H; Kohara, Yuji; Fujiyama, Asao

    2015-02-14

    Social amoebae are lower eukaryotes that inhabit the soil. They are characterized by the construction of a starvation-induced multicellular fruiting body with a spore ball and supportive stalk. In most species, the stalk is filled with motile stalk cells, as represented by the model organism Dictyostelium discoideum, whose developmental mechanisms have been well characterized. However, in the genus Acytostelium, the stalk is acellular and all aggregated cells become spores. Phylogenetic analyses have shown that it is not an ancestral genus but has lost the ability to undergo cell differentiation. We performed genome and transcriptome analyses of Acytostelium subglobosum and compared our findings to other available dictyostelid genome data. Although A. subglobosum adopts a qualitatively different developmental program from other dictyostelids, its gene repertoire was largely conserved. Yet, families of polyketide synthase and extracellular matrix proteins have not expanded and a serine protease and ABC transporter B family gene, tagA, and a few other developmental genes are missing in the A. subglobosum lineage. Temporal gene expression patterns are astonishingly dissimilar from those of D. discoideum, and only a limited fraction of the ortholog pairs shared the same expression patterns, so that some signaling cascades for development seem to be disabled in A. subglobosum. The absence of the ability to undergo cell differentiation in Acytostelium is accompanied by a small change in coding potential and extensive alterations in gene expression patterns.

  1. Comparative Genomics Reveals the Core Gene Toolbox for the Fungus-Insect Symbiosis

    PubMed Central

    Stata, Matt; Wang, Wei; White, Merlin M.; Moncalvo, Jean-Marc

    2018-01-01

    ABSTRACT Modern genomics has shed light on many entomopathogenic fungi and expanded our knowledge widely; however, little is known about the genomic features of the insect-commensal fungi. Harpellales are obligate commensals living in the digestive tracts of disease-bearing insects (black flies, midges, and mosquitoes). In this study, we produced and annotated whole-genome sequences of nine Harpellales taxa and conducted the first comparative analyses to infer the genomic diversity within the members of the Harpellales. The genomes of the insect gut fungi feature low (26% to 37%) GC content and large genome size variations (25 to 102 Mb). Further comparisons with insect-pathogenic fungi (from both Ascomycota and Zoopagomycota), as well as with free-living relatives (as negative controls), helped to identify a gene toolbox that is essential to the fungus-insect symbiosis. The results not only narrow the genomic scope of fungus-insect interactions from several thousands to eight core players but also distinguish host invasion strategies employed by insect pathogens and commensals. The genomic content suggests that insect commensal fungi rely mostly on adhesion protein anchors that target digestive system, while entomopathogenic fungi have higher numbers of transmembrane helices, signal peptides, and pathogen-host interaction (PHI) genes across the whole genome and enrich genes as well as functional domains to inactivate the host inflammation system and suppress the host defense. Phylogenomic analyses have revealed that genome sizes of Harpellales fungi vary among lineages with an integer-multiple pattern, which implies that ancient genome duplications may have occurred within the gut of insects. PMID:29764946

  2. Discovering novel subsystems using comparative genomics

    PubMed Central

    Ferrer, Luciana; Shearer, Alexander G.; Karp, Peter D.

    2011-01-01

    Motivation: Key problems for computational genomics include discovering novel pathways in genome data, and discovering functional interaction partners for genes to define new members of partially elucidated pathways. Results: We propose a novel method for the discovery of subsystems from annotated genomes. For each gene pair, a score measuring the likelihood that the two genes belong to a same subsystem is computed using genome context methods. Genes are then grouped based on these scores, and the resulting groups are filtered to keep only high-confidence groups. Since the method is based on genome context analysis, it relies solely on structural annotation of the genomes. The method can be used to discover new pathways, find missing genes from a known pathway, find new protein complexes or other kinds of functional groups and assign function to genes. We tested the accuracy of our method in Escherichia coli K-12. In one configuration of the system, we find that 31.6% of the candidate groups generated by our method match a known pathway or protein complex closely, and that we rediscover 31.2% of all known pathways and protein complexes of at least 4 genes. We believe that a significant proportion of the candidates that do not match any known group in E.coli K-12 corresponds to novel subsystems that may represent promising leads for future laboratory research. We discuss in-depth examples of these findings. Availability: Predicted subsystems are available at http://brg.ai.sri.com/pwy-discovery/journal.html. Contact: lferrer@ai.sri.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21775308

  3. Comparative Genomics of Oral Isolates of Streptococcus mutans by in silico Genome Subtraction Does Not Reveal Accessory DNA Associated with Severe Early Childhood Caries

    PubMed Central

    Argimón, Silvia; Konganti, Kranti; Chen, Hao; Alekseyenko, Alexander V.; Brown, Stuart; Caufield, Page W.

    2014-01-01

    Comparative genomics is a popular method for the identification of microbial virulence determinants, especially since the sequencing of a large number of whole bacterial genomes from pathogenic and non-pathogenic strains has become relatively inexpensive. The bioinformatics pipelines for comparative genomics usually include gene prediction and annotation and can require significant computer power. To circumvent this, we developed a rapid method for genome-scale in silico subtractive hybridization, based on blastn and independent of feature identification and annotation. Whole genome comparisons by in silico genome subtraction were performed to identify genetic loci specific to Streptococcus mutans strains associated with severe early childhood caries (S-ECC), compared to strains isolated from caries-free (CF) children. The genome similarity of the 20 S. mutans strains included in this study, calculated by Simrank k-mer sharing, ranged from 79.5 to 90.9%, confirming this is a genetically heterogeneous group of strains. We identified strain-specific genetic elements in 19 strains, with sizes ranging from 200 bp to 39 kb. These elements contained protein-coding regions with functions mostly associated with mobile DNA. We did not, however, identify any genetic loci consistently associated with dental caries, i.e., shared by all the S-ECC strains and absent in the CF strains. Conversely, we did not identify any genetic loci specific with the healthy group. Comparison of previously published genomes from pathogenic and carriage strains of Neisseria meningitidis with our in silico genome subtraction yielded the same set of genes specific to the pathogenic strains, thus validating our method. Our results suggest that S. mutans strains derived from caries active or caries free dentitions cannot be differentiated based on the presence or absence of specific genetic elements. Our in silico genome subtraction method is available as the Microbial Genome Comparison (MGC) tool

  4. Reference genome sequence of the model plant Setaria

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species thatmore » demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).« less

  5. Reference genome sequence of the model plant Setaria.

    PubMed

    Bennetzen, Jeffrey L; Schmutz, Jeremy; Wang, Hao; Percifield, Ryan; Hawkins, Jennifer; Pontaroli, Ana C; Estep, Matt; Feng, Liang; Vaughn, Justin N; Grimwood, Jane; Jenkins, Jerry; Barry, Kerrie; Lindquist, Erika; Hellsten, Uffe; Deshpande, Shweta; Wang, Xuewen; Wu, Xiaomei; Mitros, Therese; Triplett, Jimmy; Yang, Xiaohan; Ye, Chu-Yu; Mauro-Herrera, Margarita; Wang, Lin; Li, Pinghua; Sharma, Manoj; Sharma, Rita; Ronald, Pamela C; Panaud, Olivier; Kellogg, Elizabeth A; Brutnell, Thomas P; Doust, Andrew N; Tuskan, Gerald A; Rokhsar, Daniel; Devos, Katrien M

    2012-05-13

    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ∼400-Mb assembly covers ∼80% of the genome and >95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with >1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum).

  6. Genome-Wide Expression Profiling of Five Mouse Models Identifies Similarities and Differences with Human Psoriasis

    PubMed Central

    Swindell, William R.; Johnston, Andrew; Carbajal, Steve; Han, Gangwen; Wohn, Christian; Lu, Jun; Xing, Xianying; Nair, Rajan P.; Voorhees, John J.; Elder, James T.; Wang, Xiao-Jing; Sano, Shigetoshi; Prens, Errol P.; DiGiovanni, John; Pittelkow, Mark R.; Ward, Nicole L.; Gudjonsson, Johann E.

    2011-01-01

    Development of a suitable mouse model would facilitate the investigation of pathomechanisms underlying human psoriasis and would also assist in development of therapeutic treatments. However, while many psoriasis mouse models have been proposed, no single model recapitulates all features of the human disease, and standardized validation criteria for psoriasis mouse models have not been widely applied. In this study, whole-genome transcriptional profiling is used to compare gene expression patterns manifested by human psoriatic skin lesions with those that occur in five psoriasis mouse models (K5-Tie2, imiquimod, K14-AREG, K5-Stat3C and K5-TGFbeta1). While the cutaneous gene expression profiles associated with each mouse phenotype exhibited statistically significant similarity to the expression profile of psoriasis in humans, each model displayed distinctive sets of similarities and differences in comparison to human psoriasis. For all five models, correspondence to the human disease was strong with respect to genes involved in epidermal development and keratinization. Immune and inflammation-associated gene expression, in contrast, was more variable between models as compared to the human disease. These findings support the value of all five models as research tools, each with identifiable areas of convergence to and divergence from the human disease. Additionally, the approach used in this paper provides an objective and quantitative method for evaluation of proposed mouse models of psoriasis, which can be strategically applied in future studies to score strengths of mouse phenotypes relative to specific aspects of human psoriasis. PMID:21483750

  7. Functional and comparative genomics analyses of pmp22 in medaka fish

    PubMed Central

    Itou, Junji; Suyama, Mikita; Imamura, Yukio; Deguchi, Tomonori; Fujimori, Kazuhiro; Yuba, Shunsuke; Kawarabayasi, Yutaka; Kawasaki, Takashi

    2009-01-01

    Background Pmp22, a member of the junction protein family Claudin/EMP/PMP22, plays an important role in myelin formation. Increase of pmp22 transcription causes peripheral neuropathy, Charcot-Marie-Tooth disease type1A (CMT1A). The pathophysiological phenotype of CMT1A is aberrant axonal myelination which induces a reduction in nerve conduction velocity (NCV). Several CMT1A model rodents have been established by overexpressing pmp22. Thus, it is thought that pmp22 expression must be tightly regulated for correct myelin formation in mammals. Interestingly, the myelin sheath is also present in other jawed vertebrates. The purpose of this study is to analyze the evolutionary conservation of the association between pmp22 transcription level and vertebrate myelin formation, and to find the conserved non-coding sequences for pmp22 regulation by comparative genomics analyses between jawed fishes and mammals. Results A transgenic pmp22 over-expression medaka fish line was established. The transgenic fish had approximately one fifth the peripheral NCV values of controls, and aberrant myelination of transgenic fish in the peripheral nerve system (PNS) was observed. We successfully confirmed that medaka fish pmp22 has the same exon-intron structure as mammals, and identified some known conserved regulatory motifs. Furthermore, we found novel conserved sequences in the first intron and 3'UTR. Conclusion Medaka fish undergo abnormalities in the PNS when pmp22 transcription increases. This result indicates that an adequate pmp22 transcription level is necessary for correct myelination of jawed vertebrates. Comparison of pmp22 orthologs between distantly related species identifies evolutionary conserved sequences that contribute to precise regulation of pmp22 expression. PMID:19534778

  8. Comparative chloroplast genomics: Analyses including new sequencesfrom the angiosperms Nuphar advena and Ranunculus macranthus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Raubeso, Linda A.; Peery, Rhiannon; Chumley, Timothy W.

    2007-03-01

    The number of completely sequenced plastid genomes available is growing rapidly. This new array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is most useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the new genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (from the basal group of eudicots). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as proteinmore » coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition.« less

  9. Genomic analyses and expression evaluation of thaumatin-like gene family in the cacao fungal pathogen Moniliophthora perniciosa.

    PubMed

    Franco, Sulamita de Freitas; Baroni, Renata Moro; Carazzolle, Marcelo Falsarella; Teixeira, Paulo José Pereira Lima; Reis, Osvaldo; Pereira, Gonçalo Amarante Guimarães; Mondego, Jorge Maurício Costa

    2015-10-30

    Thaumatin-like proteins (TLPs) are found in diverse eukaryotes. Plant TLPs, known as Pathogenicity Related Protein (PR-5), are considered fungal inhibitors. However, genes encoding TLPs are frequently found in fungal genomes. In this work, we have identified that Moniliophthora perniciosa, a basidiomycete pathogen that causes the Witches' Broom Disease (WBD) of cacao, presents thirteen putative TLPs from which four are expressed during WBD progression. One of them is similar to small TLPs, which are present in phytopathogenic basidiomycete, such as wheat stem rust fungus Puccinia graminis. Fungi genomes annotation and phylogenetic data revealed a larger number of TLPs in basidiomycetes when comparing with ascomycetes, suggesting that these proteins could be involved in specific traits of mushroom-forming species. Based on the present data, we discuss the contribution of TLPs in the combat against fungal competitors and hypothesize a role of these proteins in M. perniciosa pathogenicity. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Genome-wide identification, phylogeny, and gonadal expression of fox genes in Nile tilapia, Oreochromis niloticus.

    PubMed

    Yuan, Jing; Tao, Wenjing; Cheng, Yunying; Huang, Baofeng; Wang, Deshou

    2014-08-01

    The fox genes play important roles in various biological processes, including sexual development. In the present study, we isolated 65 fox genes, belonging to 18 subfamilies named A-R, from Nile tilapia through genome-wide screening. Twenty-four of them have two or three (foxm1) copies. Furthermore, 16, 25, 68, and 45 fox members were isolated from nematodes, protochordates, teleosts, and tetrapods, respectively. Phylogenetic analyses indicated fox gene family had undergone three expansions parallel to the three rounds of genome duplication during evolution. We also analyzed the clustered fox genes and found that apparent linkage duplication existed in teleosts, which further supported fish-specific genome duplication hypothesis. In addition, species- and lineage-specific duplication is another reason for fox gene family expansion. Based on the four pairs of XX and XY gonadal transcriptome data from four critical developmental stages, we analyzed the expression profile of all fox genes and identified sexually dimorphic fox genes at each stage. All fox genes were detected in gonads, with 15 of them at the background expression level (total read per kb per million reads, RPKM < 10), 29 at moderate expression level (10 < total RPKM < 100), and 21 at high expression level (total RPKM > 100). There are 27, 24, 28, and 9 sexually dimorphic fox genes at 5, 30, 90, and 180 days after hatching (dah), respectively. foxq1a, foxf1, foxr1, and foxr1 were identified as the most differentially expressed genes at each stage. foxl2 was characterized as XX-dominant gene, while foxd5, foxi3, foxn3, foxj1a, foxj3b, and foxo6b were characterized as XY-dominant genes. qPCR and in situ hybridization of foxh1 and foxj1a were performed to confirm the expression profiles and to validate the transcriptome data. Our results suggest that fox genes might play important roles in sex determination and gonadal development in teleosts.

  11. Germline Cas9 expression yields highly efficient genome engineering in a major worldwide disease vector, Aedes aegypti

    PubMed Central

    Li, Ming; Bui, Michelle; Yang, Ting; Bowman, Christian S.; White, Bradley J.; Akbari, Omar S.

    2017-01-01

    The development of CRISPR/Cas9 technologies has dramatically increased the accessibility and efficiency of genome editing in many organisms. In general, in vivo germline expression of Cas9 results in substantially higher activity than embryonic injection. However, no transgenic lines expressing Cas9 have been developed for the major mosquito disease vector Aedes aegypti. Here, we describe the generation of multiple stable, transgenic Ae. aegypti strains expressing Cas9 in the germline, resulting in dramatic improvements in both the consistency and efficiency of genome modifications using CRISPR. Using these strains, we disrupted numerous genes important for normal morphological development, and even generated triple mutants from a single injection. We have also managed to increase the rates of homology-directed repair by more than an order of magnitude. Given the exceptional mutagenic efficiency and specificity of the Cas9 strains we engineered, they can be used for high-throughput reverse genetic screens to help functionally annotate the Ae. aegypti genome. Additionally, these strains represent a step toward the development of novel population control technologies targeting Ae. aegypti that rely on Cas9-based gene drives. PMID:29138316

  12. Germline Cas9 expression yields highly efficient genome engineering in a major worldwide disease vector, Aedes aegypti.

    PubMed

    Li, Ming; Bui, Michelle; Yang, Ting; Bowman, Christian S; White, Bradley J; Akbari, Omar S

    2017-12-05

    The development of CRISPR/Cas9 technologies has dramatically increased the accessibility and efficiency of genome editing in many organisms. In general, in vivo germline expression of Cas9 results in substantially higher activity than embryonic injection. However, no transgenic lines expressing Cas9 have been developed for the major mosquito disease vector Aedes aegypti Here, we describe the generation of multiple stable, transgenic Ae. aegypti strains expressing Cas9 in the germline, resulting in dramatic improvements in both the consistency and efficiency of genome modifications using CRISPR. Using these strains, we disrupted numerous genes important for normal morphological development, and even generated triple mutants from a single injection. We have also managed to increase the rates of homology-directed repair by more than an order of magnitude. Given the exceptional mutagenic efficiency and specificity of the Cas9 strains we engineered, they can be used for high-throughput reverse genetic screens to help functionally annotate the Ae. aegypti genome. Additionally, these strains represent a step toward the development of novel population control technologies targeting Ae. aegypti that rely on Cas9-based gene drives. Copyright © 2017 the Author(s). Published by PNAS.

  13. Comparative genome analysis reveals a conserved family of actin-like proteins in apicomplexan parasites

    PubMed Central

    Gordon, Jennifer L; Sibley, L David

    2005-01-01

    Background The phylum Apicomplexa is an early-branching eukaryotic lineage that contains a number of important human and animal pathogens. Their complex life cycles and unique cytoskeletal features distinguish them from other model eukaryotes. Apicomplexans rely on actin-based motility for cell invasion, yet the regulation of this system remains largely unknown. Consequently, we focused our efforts on identifying actin-related proteins in the recently completed genomes of Toxoplasma gondii, Plasmodium spp., Cryptosporidium spp., and Theileria spp. Results Comparative genomic and phylogenetic studies of apicomplexan genomes reveals that most contain only a single conventional actin and yet they each have 8–10 additional actin-related proteins. Among these are a highly conserved Arp1 protein (likely part of a conserved dynactin complex), and Arp4 and Arp6 homologues (subunits of the chromatin-remodeling machinery). In contrast, apicomplexans lack canonical Arp2 or Arp3 proteins, suggesting they lost the Arp2/3 actin polymerization complex on their evolutionary path towards intracellular parasitism. Seven of these actin-like proteins (ALPs) are novel to apicomplexans. They show no phylogenetic associations to the known Arp groups and likely serve functions specific to this important group of intracellular parasites. Conclusion The large diversity of actin-like proteins in apicomplexans suggests that the actin protein family has diverged to fulfill various roles in the unique biology of intracellular parasites. Conserved Arps likely participate in vesicular transport and gene expression, while apicomplexan-specific ALPs may control unique biological traits such as actin-based gliding motility. PMID:16343347

  14. Comparative Genome Analysis Between Aspergillus oryzae Strains Reveals Close Relationship Between Sites of Mutation Localization and Regions of Highly Divergent Genes among Aspergillus Species

    PubMed Central

    Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.; Machida, Masayuki; Hirano, Takashi

    2012-01-01

    Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome. PMID:22912434

  15. Comparative genome analysis between Aspergillus oryzae strains reveals close relationship between sites of mutation localization and regions of highly divergent genes among Aspergillus species.

    PubMed

    Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E; Machida, Masayuki; Hirano, Takashi

    2012-10-01

    Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome.

  16. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum.

    PubMed

    Rao, Soumya; Nandineni, Madhusudan R

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens.

  17. Genome sequencing and comparative genomics reveal a repertoire of putative pathogenicity genes in chilli anthracnose fungus Colletotrichum truncatum

    PubMed Central

    Rao, Soumya

    2017-01-01

    Colletotrichum truncatum, a major fungal phytopathogen, causes the anthracnose disease on an economically important spice crop chilli (Capsicum annuum), resulting in huge economic losses in tropical and sub-tropical countries. It follows a subcuticular intramural infection strategy on chilli with a short, asymptomatic, endophytic phase, which contrasts with the intracellular hemibiotrophic lifestyle adopted by most of the Colletotrichum species. However, little is known about the molecular determinants and the mechanism of pathogenicity in this fungus. A high quality whole genome sequence and gene annotation based on transcriptome data of an Indian isolate of C. truncatum from chilli has been obtained. Analysis of the genome sequence revealed a rich repertoire of pathogenicity genes in C. truncatum encoding secreted proteins, effectors, plant cell wall degrading enzymes, secondary metabolism associated proteins, with potential roles in the host-specific infection strategy, placing it next only to the Fusarium species. The size of genome assembly, number of predicted genes and some of the functional categories were similar to other sequenced Colletotrichum species. The comparative genomic analyses with other species and related fungi identified some unique genes and certain highly expanded gene families of CAZymes, proteases and secondary metabolism associated genes in the genome of C. truncatum. The draft genome assembly and functional annotation of potential pathogenicity genes of C. truncatum provide an important genomic resource for understanding the biology and lifestyle of this important phytopathogen and will pave the way for designing efficient disease control regimens. PMID:28846714

  18. Expression of Genomic Functional Estrogen Receptor 1 in Mouse Sertoli Cells

    PubMed Central

    Lin, Jing; Zhu, Jia; Li, Xian; Li, Shengqiang; Lan, Zijian; Ko, Jay

    2014-01-01

    There is no consensus whether Sertoli cells express estrogen receptor 1 (Esr1). Reverse transcription-polymerase chain reaction, Western blot, and immunofluorescence demonstrated that mouse Sertoli cell lines, TM4, MSC-1, and 15P-1, and purified primary mouse Sertoli cells (PSCs) contained Esr1 messenger RNA and proteins. Incubation of Sertoli cells with 17β-estradiol (E2) or ESR1 agonist stimulated the expression of an estrogen responsive gene Greb1, which was prevented by ESR inhibitor or ESR1 antagonist. Overexpression of Esr1 in MSC-1 enhanced E2-induced Greb1 expression, while knockdown of Esr1 by small interfering RNA in TM4 attenuated the response. Furthermore, E2-induced Greb1 expression was abolished in the PSCs isolated from Amh-Cre/Esr1-floxed mice in which Esr1 in Sertoli cells were selectively deleted. Chromatin immunoprecipitation assays indicated that E2-induced Greb1 expression in Sertoli cells was mediated by binding of ESR1 to estrogen responsive elements. In summary, ligand-dependent nuclear ESR1 was present in mouse Sertoli cells and mediates a classical genomic action of estrogens. PMID:24615934

  19. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.

    PubMed

    Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B

    2017-08-01

    To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.

  20. Comparative genomics analyses revealed two virulent Listeria monocytogenes strains isolated from ready-to-eat food.

    PubMed

    Lim, Shu Yong; Yap, Kien-Pong; Thong, Kwai Lin

    2016-01-01

    Listeria monocytogenes is an important foodborne pathogen that causes considerable morbidity in humans with high mortality rates. In this study, we have sequenced the genomes and performed comparative genomics analyses on two strains, LM115 and LM41, isolated from ready-to-eat food in Malaysia. The genome size of LM115 and LM41 was 2,959,041 and 2,963,111 bp, respectively. These two strains shared approximately 90% homologous genes. Comparative genomics and phylogenomic analyses revealed that LM115 and LM41 were more closely related to the reference strains F2365 and EGD-e, respectively. Our virulence profiling indicated a total of 31 virulence genes shared by both analysed strains. These shared genes included those that encode for internalins and L. monocytogenes pathogenicity island 1 (LIPI-1). Both the Malaysian L. monocytogenes strains also harboured several genes associated with stress tolerance to counter the adverse conditions. Seven antibiotic and efflux pump related genes which may confer resistance against lincomycin, erythromycin, fosfomycin, quinolone, tetracycline, and penicillin, and macrolides were identified in the genomes of both strains. Whole genome sequencing and comparative genomics analyses revealed two virulent L. monocytogenes strains isolated from ready-to-eat foods in Malaysia. The identification of strains with pathogenic, persistent, and antibiotic resistant potentials from minimally processed food warrant close attention from both healthcare and food industry.