Science.gov

Sample records for comparing genomic expression

  1. Genome-wide identification and comparative expression analysis of LEA genes in watermelon and melon genomes.

    PubMed

    Celik Altunoglu, Yasemin; Baloglu, Mehmet Cengiz; Baloglu, Pinar; Yer, Esra Nurten; Kara, Sibel

    2017-01-01

    Late embryogenesis abundant (LEA) proteins are large and diverse group of polypeptides which were first identified during seed dehydration and then in vegetative plant tissues during different stress responses. Now, gene family members of LEA proteins have been detected in various organisms. However, there is no report for this protein family in watermelon and melon until this study. A total of 73 LEA genes from watermelon (ClLEA) and 61 LEA genes from melon (CmLEA) were identified in this comprehensive study. They were classified into four and three distinct clusters in watermelon and melon, respectively. There was a correlation between gene structure and motif composition among each LEA groups. Segmental duplication played an important role for LEA gene expansion in watermelon. Maximum gene ontology of LEA genes was observed with poplar LEA genes. For evaluation of tissue specific expression patterns of ClLEA and CmLEA genes, publicly available RNA-seq data were analyzed. The expression analysis of selected LEA genes in root and leaf tissues of drought-stressed watermelon and melon were examined using qRT-PCR. Among them, ClLEA-12-17-46 genes were quickly induced after drought application. Therefore, they might be considered as early response genes for water limitation conditions in watermelon. In addition, CmLEA-42-43 genes were found to be up-regulated in both tissues of melon under drought stress. Our results can open up new frontiers about understanding of functions of these important family members under normal developmental stages and stress conditions by bioinformatics and transcriptomic approaches.

  2. Comparative analysis and "expression space" coverage of the production of prokaryotic membrane proteins for structural genomics.

    PubMed

    Surade, Sachin; Klein, Markus; Stolt-Bergner, Peggy C; Muenke, Cornelia; Roy, Ankita; Michel, Hartmut

    2006-09-01

    Membrane proteins comprise up to one-third of prokaryotic and eukaryotic genomes, but only a very small number of membrane protein structures are known. Membrane proteins are challenging targets for structural biology, primarily due to the difficulty in producing and purifying milligram quantities of these proteins. We are evaluating different methods to produce and purify large numbers of prokaryotic membrane proteins for subsequent structural and functional analysis. Here, we present the comparative expression data for 37 target proteins, all of them secondary transporters, from the mesophilic organism Salmonella typhimurium and the two hyperthermophilic organisms Aquifex aeolicus and Pyrococcus furiosus in three different Escherichia coli expression vectors. In addition, we study the use of Lactococcus lactis as a host for integral membrane protein expression. Overall, 78% of the targets were successfully produced under at least one set of conditions. Analysis of these results allows us to assess the role of different variables in increasing "expression space" coverage for our set of targets. This analysis implies that to maximize the number of nonhomologous targets that are expressed, orthologous targets should be chosen and tested in two vectors with different types of promoters, using C-terminal tags. In addition, E. coli is shown to be a robust host for the expression of prokaryotic transporters, and is superior to L. lactis. These results therefore suggest appropriate strategies for high-throughput heterologous overproduction of membrane proteins.

  3. Comparative evaluation of genome-wide gene expression profiles in ruptured and unruptured human intracranial aneurysms.

    PubMed

    Marchese, Enrico; Vignati, A; Albanese, A; Nucci, C G; Sabatino, G; Tirpakova, B; Lofrese, G; Zelano, G; Maira, G

    2010-01-01

    Few studies have evaluated the over or the underexpression of genes directly in samples of aneurysmal wall and extracranial pericranial vascular tissue to investigate the genetic influence in formation and rupture of intracranial aneurysms. We present the results obtained using the DNA microarray technique analysis on sample tissues collected during surgery. We collected and analyzed 12 aneurismal and 9 peripheral arteries (superficial temporal (STA) and middle meningeal artery (MMA) specimens from ruptured aneurysm group patients (13 cases), 10 aneurismal and 12 STA and MMA samples from unruptured aneurysm group patients (14 cases) and 5 STA and MMA artery specimens from control group patients (4 cases). Total RNA was isolated from samples and subjected to cDNA microarray analysis with the use of the human genome U133A GeneChip oligonucleotide microarray (Affymetrix, Santa Clara, CA), which allows to analyze a total number of 14,500 genes in the same time. For genes of interest, real-time RT-PCR was performed to confirm their expression level. Total RNA was isolated from samples and subjected to DNA microarray analysis with the use of the human genome U133A GeneChip oligonucleotide microarray, which allows to analyze a total number of 14,500 genes at the same time. For genes of interest, real-time RT-PCR was performed to confirm their expression level. Regarding ruptured aneurysms, genes were identified showing differential expressions (overexpressed or downregulated) pertaining to specific pathways, particularly those for the structural proteins of the extracellular matrix, members of matrix metalloproteinase (MMP) family (which resulted as being overexpressed) and genes involved in apoptotic phenomena. Particularly, real-time RT-PCR analysis confirmed the upregulation of MMP-2, MMP-9 and pro-apoptotic genes, such as Fas, Bax and Bid, and the downregulation of anti-apoptotic genes, such as Bcl-X(L) and Bcl-2. In a compared analyses of ruptured vs unruptured

  4. Functional analysis and comparative genomics of expressed sequence tags from the lycophyte Selaginella moellendorffii

    PubMed Central

    Weng, Jing-Ke; Tanurdzic, Milos; Chapple, Clint

    2005-01-01

    Background The lycophyte Selaginella moellendorffii is a member of one of the oldest lineages of vascular plants on Earth. Fossil records show that the lycophyte clade arose 400 million years ago, 150–200 million years earlier than angiosperms, a group of plants that includes the well-studied flowering plant Arabidopsis thaliana. S. moellendorffii has a genome size of approximately 100 Mbp, as small or smaller than that of A. thaliana. S. moellendorffii has the potential to provide significant comparative information to better understand the evolution of vascular plants. Results We sequenced 2181 Expressed Sequence Tags (ESTs) from a S. moellendorffii cDNA library. One thousand three hundred and one non-redundant sequences were assembled, containing 291 contigs and 1010 singletons. Approximately 75% of the ESTs matched proteins in the non-redundant protein database. Among 1301 clusters, 343 were categorized according to Gene Ontology (GO) hierarchy and were compared to the GO mapping of A. thaliana tentative consensus sequences. We compared S. moellendorffii ESTs to the A. thaliana and Physcomitrella patens EST databases, using the tBLASTX algorithm. Approximately 60% of the ESTs exhibited similarity with both A. thaliana and P. patens ESTs; whereas, 13% and 1% of the ESTs had exclusive similarity with A. thaliana and P. patens ESTs, respectively. A substantial proportion of the ESTs (26%) had no match with A. thaliana or P. patens ESTs. Conclusion We discovered 1301 putative unigenes in S. moellendorffii. These results give an initial insight into its transcriptome that will aid in the study of the S. moellendorffii genome in the near future. PMID:15938755

  5. Comparing genomic expression patterns across plant species reveals highly diverged transcriptional dynamics in response to salt stress

    USDA-ARS?s Scientific Manuscript database

    Rice and barley are both members of Poaceae (grass family) but have a marked difference in salt tolerance. The molecular mechanism underlying this difference was previously unexplored. This study employs a comparative genomics approach to identify analogous and contrasting gene expression patterns b...

  6. Carotenoid biosynthetic genes in Brassica rapa: comparative genomic analysis, phylogenetic analysis, and expression profiling.

    PubMed

    Li, Peirong; Zhang, Shujiang; Zhang, Shifan; Li, Fei; Zhang, Hui; Cheng, Feng; Wu, Jian; Wang, Xiaowu; Sun, Rifei

    2015-07-03

    Carotenoids are isoprenoid compounds synthesized by all photosynthetic organisms. Despite much research on carotenoid biosynthesis in the model plant Arabidopsis thaliana, there is a lack of information on the carotenoid pathway in Brassica rapa. To better understand its carotenoid biosynthetic pathway, we performed a systematic analysis of carotenoid biosynthetic genes at the genome level in B. rapa. We identified 67 carotenoid biosynthetic genes in B. rapa, which were orthologs of the 47 carotenoid genes in A. thaliana. A high level of synteny was observed for carotenoid biosynthetic genes between A. thaliana and B. rapa. Out of 47 carotenoid biosynthetic genes in A. thaliana, 46 were successfully mapped to the 10 B. rapa chromosomes, and most of the genes retained more than one copy in B. rapa. The gene expansion was caused by the whole-genome triplication (WGT) event experienced by Brassica species. An expression analysis of the carotenoid biosynthetic genes suggested that their expression levels differed in root, stem, leaf, flower, callus, and silique tissues. Additionally, the paralogs of each carotenoid biosynthetic gene, which were generated from the WGT in B. rapa, showed significantly different expression levels among tissues, suggesting differentiated functions for these multi-copy genes in the carotenoid pathway. This first systematic study of carotenoid biosynthetic genes in B. rapa provides insights into the carotenoid metabolic mechanisms of Brassica crops. In addition, a better understanding of carotenoid biosynthetic genes in B. rapa will contribute to the development of conventional and transgenic B. rapa cultivars with enriched carotenoid levels in the future.

  7. Expression and comparative genomics of two serum response factor genes in zebrafish.

    PubMed

    Davis, Jody L; Long, Xiaochun; Georger, Mary A; Scott, Ian C; Rich, Adam; Miano, Joseph M

    2008-01-01

    Serum response factor (SRF) is a single copy, highly conserved transcription factor that governs the expression of hundreds of genes involved with actin cytoskeletal organization, cellular growth and signaling, neuronal circuitry and muscle differentiation. Zebrafish have emerged as a facile and inexpensive vertebrate model to delineate gene expression, regulation, and function, and yet the study of SRF in this animal has been virtually unexplored. Here, we report the existence of two srf genes in zebrafish, with partially overlapping patterns of expression in 3 and 7 day old developing animals. The mammalian ortholog (srf1) encodes for a 520 amino acid protein expressed in adult vascular and visceral smooth muscle cells, cardiac and skeletal muscle, as well as neuronal cells. The second zebrafish srf gene (srf2), encoding for a presumptive protein of only 314 amino acids, is transcribed at lower levels and appears to be less widely expressed across adult tissues. Both srf genes are induced by the SRF coactivator myocardin and attenuated with a short hairpin RNA to mammalian SRF. Promoter studies with srf1 reveal conserved CArG boxes that are the targets of SRF-myocardin in embryonic zebrafish cells. These results reveal that SRF was duplicated in the zebrafish genome and that its protein expression in all three muscle cell types is highly conserved across vertebrate animals suggesting an ancient code for transcriptional regulation of genes unique to muscle cell lineages.

  8. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  9. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  10. A comparative genomics screen identifies a Sinorhizobium meliloti 1021 sodM-like gene strongly expressed within host plant nodules

    PubMed Central

    2012-01-01

    Background We have used the genomic data in the Integrated Microbial Genomes system of the Department of Energy’s Joint Genome Institute to make predictions about rhizobial open reading frames that play a role in nodulation of host plants. The genomic data was screened by searching for ORFs conserved in α-proteobacterial rhizobia, but not conserved in closely-related non-nitrogen-fixing α-proteobacteria. Results Using this approach, we identified many genes known to be involved in nodulation or nitrogen fixation, as well as several new candidate genes. We knocked out selected new genes and assayed for the presence of nodulation phenotypes and/or nodule-specific expression. One of these genes, SMc00911, is strongly expressed by bacterial cells within host plant nodules, but is expressed minimally by free-living bacterial cells. A strain carrying an insertion mutation in SMc00911 is not defective in the symbiosis with host plants, but in contrast to expectations, this mutant strain is able to out-compete the S. meliloti 1021 wild type strain for nodule occupancy in co-inoculation experiments. The SMc00911 ORF is predicted to encode a “SodM-like” (superoxide dismutase-like) protein containing a rhodanese sulfurtransferase domain at the N-terminus and a chromate-resistance superfamily domain at the C-terminus. Several other ORFs (SMb20360, SMc01562, SMc01266, SMc03964, and the SMc01424-22 operon) identified in the screen are expressed at a moderate level by bacteria within nodules, but not by free-living bacteria. Conclusions Based on the analysis of ORFs identified in this study, we conclude that this comparative genomics approach can identify rhizobial genes involved in the nitrogen-fixing symbiosis with host plants, although none of the newly identified genes were found to be essential for this process. PMID:22587634

  11. Ebolavirus comparative genomics

    DOE PAGES

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; ...

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of themore » same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.« less

  12. Comparative genomic expression signatures of signal transduction pathways and targets in paediatric Burkitt lymphoma: a Children's Oncology Group report.

    PubMed

    Lee, Sanghoon; Day, Nancy S; Miles, Rodney R; Perkins, Sherrie L; Lim, Megan S; Ayello, Janet; van de Ven, Carmella; Harrison, Lauren; El-Mallawany, Nader K; Goldman, Stanton; Cairo, Mitchell S

    2017-05-01

    Burkitt lymphoma (BL) is the most common histological subtype of non-Hodgkin lymphoma (NHL) in children and adolescents. Through the introduction of short intensive multi-agent chemoimmunotherapy, survival has improved significantly over the past 30 years. However, this successful approach is limited by significant chemotherapy-induced acute toxicity and risk of developing resistant disease, demonstrating the need to identify less toxic and targeted therapies. We analysed the comparative genomic signature and targetable signalling pathways in paediatric BL (PEBL) samples from the Children's Oncology Group study (ANHL01P1) by genomic profiling and selected genes were confirmed by quantitative real time polymerase chain reaction. These results were compared to PEBL samples from public databases and utilised the Gene Expression Omnibus (GEO) Series (GSE) 10172 and 4475 (n = 16), and 4732 (n = 15). Three hundred and seventy-six genes (approximately 25%) were similarly expressed among three PEBL sample groups. Several target genes in Toll-like receptor signalling, JAK-STAT signalling and MAPK signalling were significantly overexpressed in PEBL. In addition, several tyrosine kinases, including Bruton tyrosine kinase, protein tyrosine phosphatase and histone deacetylase inhibitor were highly expressed in PEBL. These pre-clinical results suggest that specific signal transduction pathways are overly expressed in PEBL and several pathways could serve as potential future therapeutic targets. © 2017 John Wiley & Sons Ltd.

  13. Molecular cytogenetic analysis of head and neck squamous cell carcinoma: By comparative genomic hybridization, spectral karyotyping, and expression array analysis.

    PubMed

    Squire, Jeremy A; Bayani, Jane; Luk, Catherine; Unwin, Lianne; Tokunaga, Jason; MacMillan, Christina; Irish, Jonathan; Brown, Dale; Gullane, Patrick; Kamel-Reid, Suzanne

    2002-09-01

    A combination of molecular cytogenetic and expression array analysis has been performed on head and neck squamous cell carcinoma (HNSCC) of the oral cavity and supraglottis. These studies were performed to identify consensus regions of chromosomal imbalance and structural rearrangement to determine whether genes located in these genomic regions are subject to alterations in gene expression. Such combinatorial studies may help to identify recurrent patterns of altered gene expression in the context of specific chromosomal changes. Comparative genomic hybridization (CGH) was used to identify net genomic imbalances and spectral karyotyping (SKY) to visualize the numerical and structural chromosomal changes in metaphase preparations. Expression microarray analysis of HNSCC cell lines and primary tongue tumors was also performed to identify genes that were commonly overexpressed or underexpressed compared with adjacent normal tissue. CGH detected gains at 3q (64%), 8q (45%) and 6q22-qter (45%) and losses at 18q22-qter (27%). SKY analysis of seven cell lines identified frequent structural rearrangement of the following chromosomal regions: 3q, 5p13-q11.2, 5q32-q34, 7p12-q11.2, 8p12-q12, 9p, 10p, 13p13-q12, 14q11.1-q11.2, 15p13-q11.2, 16p11.1-q11.1, 18q22-q23, and 22p13-q11.2. Consistent deregulation of interleukin 8, integrin alpha-6, c-MYC, epithelial discoidin domain receptor 1, and sterol regulatory element binding protein were apparent by expression analysis. Interestingly, some of these genes map to regions of genomic imbalance and chromosomal rearrangement as determined by our molecular cytogenetic analysis. In this small study, a combinatorial analysis using SKY, CGH, and microarray provides a model linking the changes in gene expression to changes in chromosomal dosage and structure. This approach has identified a subset of genetic changes that provide new opportunities for investigating the genetic basis of tumorigenesis in HNSCC. Copyright 2002 Wiley

  14. Ebolavirus comparative genomics

    PubMed Central

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Wassenaar, Trudy M.; Ussery, David W.

    2015-01-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies. This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). PMID:26175035

  15. Ebolavirus comparative genomics.

    PubMed

    Jun, Se-Ran; Leuze, Michael R; Nookaew, Intawat; Uberbacher, Edward C; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S; Pedersen, Thomas D; Wassenaar, Trudy M; Ussery, David W

    2015-09-01

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. To examine the dynamics of this genome, we compare more than 100 currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP) and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. This information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

  16. Comparative genomic analyses in Asparagus.

    PubMed

    Kuhl, Joseph C; Havey, Michael J; Martin, William J; Cheung, Foo; Yuan, Qiaoping; Landherr, Lena; Hu, Yi; Leebens-Mack, James; Town, Christopher D; Sink, Kenneth C

    2005-12-01

    Garden asparagus (Asparagus officinalis L.) belongs to the monocot family Asparagaceae in the order Asparagales. Onion (Allium cepa L.) and Asparagus officinalis are 2 of the most economically important plants of the core Asparagales, a well supported monophyletic group within the Asparagales. Coding regions in onion have lower GC contents than the grasses. We compared the GC content of 3374 unique expressed sequence tags (ESTs) from A. officinalis with Lycoris longituba and onion (both members of the core Asparagales), Acorus americanus (sister to all other monocots), the grasses, and Arabidopsis. Although ESTs in A. officinalis and Acorus had a higher average GC content than Arabidopsis, Lycoris, and onion, all were clearly lower than the grasses. The Asparagaceae have the smallest nuclear genomes among all plants in the core Asparagales, which typically have huge genomes. Within the Asparagaceae, European Asparagus species have approximately twice the nuclear DNA of that of southern African Asparagus species. We cloned and sequenced 20 genomic amplicons from European A. officinalis and the southern African species Asparagus plumosus and observed no clear evidence for a recent genome doubling in A. officinalis relative to A. plumosus. These results indicate that members of the genus Asparagus with smaller genomes may be useful genomic models for plants in the core Asparagales.

  17. A Comparative Genomic Study in Schizophrenic and in Bipolar Disorder Patients, Based on Microarray Expression Profiling Meta-Analysis

    PubMed Central

    Logotheti, Marianthi; Papadodima, Olga; Venizelos, Nikolaos; Chatziioannou, Aristotelis; Kolisis, Fragiskos

    2013-01-01

    Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%–5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets. PMID:23554570

  18. A comparative genomic study in schizophrenic and in bipolar disorder patients, based on microarray expression profiling meta-analysis.

    PubMed

    Logotheti, Marianthi; Papadodima, Olga; Venizelos, Nikolaos; Chatziioannou, Aristotelis; Kolisis, Fragiskos

    2013-01-01

    Schizophrenia affecting almost 1% and bipolar disorder affecting almost 3%-5% of the global population constitute two severe mental disorders. The catecholaminergic and the serotonergic pathways have been proved to play an important role in the development of schizophrenia, bipolar disorder, and other related psychiatric disorders. The aim of the study was to perform and interpret the results of a comparative genomic profiling study in schizophrenic patients as well as in healthy controls and in patients with bipolar disorder and try to relate and integrate our results with an aberrant amino acid transport through cell membranes. In particular we have focused on genes and mechanisms involved in amino acid transport through cell membranes from whole genome expression profiling data. We performed bioinformatic analysis on raw data derived from four different published studies. In two studies postmortem samples from prefrontal cortices, derived from patients with bipolar disorder, schizophrenia, and control subjects, have been used. In another study we used samples from postmortem orbitofrontal cortex of bipolar subjects while the final study was performed based on raw data from a gene expression profiling dataset in the postmortem superior temporal cortex of schizophrenics. The data were downloaded from NCBI's GEO datasets.

  19. Phytozome Comparative Plant Genomics Portal

    SciTech Connect

    Goodstein, David; Batra, Sajeev; Carlson, Joseph; Hayes, Richard; Phillips, Jeremy; Shu, Shengqiang; Schmutz, Jeremy; Rokhsar, Daniel

    2014-09-09

    The Dept. of Energy Joint Genome Institute is a genomics user facility supporting DOE mission science in the areas of Bioenergy, Carbon Cycling, and Biogeochemistry. The Plant Program at the JGI applies genomic, analytical, computational and informatics platforms and methods to: 1. Understand and accelerate the improvement (domestication) of bioenergy crops 2. Characterize and moderate plant response to climate change 3. Use comparative genomics to identify constrained elements and infer gene function 4. Build high quality genomic resource platforms of JGI Plant Flagship genomes for functional and experimental work 5. Expand functional genomic resources for Plant Flagship genomes

  20. Comparative genomics reveals tissue-specific regulation of prolactin receptor gene expression

    USDA-ARS?s Scientific Manuscript database

    Prolactin (PRL), acting via the prolactin receptor, fulfills a diversity of biological functions including the maintenance of solute balance and mineral homeostasis via tissues such as the heart, kidneys and intestine. Expression and activity of the prolactin receptor (PRLR) is regulated by various ...

  1. Comparative genomics of Lactobacillus

    PubMed Central

    Kant, Ravi; Blom, Jochen; Palva, Airi; Siezen, Roland J.; de Vos, Willem M.

    2011-01-01

    Summary The genus Lactobacillus includes a diverse group of bacteria consisting of many species that are associated with fermentations of plants, meat or milk. In addition, various lactobacilli are natural inhabitants of the intestinal tract of humans and other animals. Finally, several Lactobacillus strains are marketed as probiotics as their consumption can confer a health benefit to host. Presently, 154 Lactobacillus species are known and a growing fraction of these are subject to draft genome sequencing. However, complete genome sequences are needed to provide a platform for detailed genomic comparisons. Therefore, we selected a total of 20 genomes of various Lactobacillus strains for which complete genomic sequences have been reported. These genomes had sizes varying from 1.8 to 3.3 Mb and other characteristic features, such as G+C content that ranged from 33% to 51%. The Lactobacillus pan genome was found to consist of approximately 14 000 protein‐encoding genes while all 20 genomes shared a total of 383 sets of orthologous genes that defined the Lactobacillus core genome (LCG). Based on advanced phylogeny of the proteins encoded by this LCG, we grouped the 20 strains into three main groups and defined core group genes present in all genomes of a single group, signature group genes shared in all genomes of one group but absent in all other Lactobacillus genomes, and Group‐specific ORFans present in core group genes of one group and absent in all other complete genomes. The latter are of specific value in defining the different groups of genomes. The study provides a platform for present individual comparisons as well as future analysis of new Lactobacillus genomes. PMID:21375712

  2. Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics.

    PubMed

    del Val, Coral; Rivas, Elena; Torres-Quesada, Omar; Toro, Nicolás; Jiménez-Zurdo, José I

    2007-12-01

    Bacterial small non-coding RNAs (sRNAs) are being recognized as novel widespread regulators of gene expression in response to environmental signals. Here, we present the first search for sRNA-encoding genes in the nitrogen-fixing endosymbiont Sinorhizobium meliloti, performed by a genome-wide computational analysis of its intergenic regions. Comparative sequence data from eight related alpha-proteobacteria were obtained, and the interspecies pairwise alignments were scored with the programs eQRNA and RNAz as complementary predictive tools to identify conserved and stable secondary structures corresponding to putative non-coding RNAs. Northern experiments confirmed that eight of the predicted loci, selected among the original 32 candidates as most probable sRNA genes, expressed small transcripts. This result supports the combined use of eQRNA and RNAz as a robust strategy to identify novel sRNAs in bacteria. Furthermore, seven of the transcripts accumulated differentially in free-living and symbiotic conditions. Experimental mapping of the 5'-ends of the detected transcripts revealed that their encoding genes are organized in autonomous transcription units with recognizable promoter and, in most cases, termination signatures. These findings suggest novel regulatory functions for sRNAs related to the interactions of alpha-proteobacteria with their eukaryotic hosts.

  3. Comparative genomics of nematodes.

    PubMed

    Mitreva, Makedonka; Blaxter, Mark L; Bird, David M; McCarter, James P

    2005-10-01

    Recent transcriptome and genome projects have dramatically expanded the biological data available across the phylum Nematoda. Here we summarize analyses of these sequences, which have revealed multiple unexpected results. Despite a uniform body plan, nematodes are more diverse at the molecular level than was previously recognized, with many species- and group-specific novel genes. In the genus Caenorhabditis, changes in chromosome arrangement, particularly local inversions, are also rapid, with breakpoints occurring at 50-fold the rate in vertebrates. Tylenchid plant parasitic nematode genomes contain several genes closely related to genes in bacteria, implicating horizontal gene transfer events in the origins of plant parasitism. Functional genomics techniques are also moving from Caenorhabditis elegans to application throughout the phylum. Soon, eight more draft nematode genome sequences will be available. This unique resource will underpin both molecular understanding of these most abundant metazoan organisms and aid in the examination of the dynamics of genome evolution in animals.

  4. Comparative genomics of Brassicaceae crops

    PubMed Central

    Sharma, Ashutosh; Li, Xiaonan; Lim, Yong Pyo

    2014-01-01

    The family Brassicaceae is one of the major groups of the plant kingdom and comprises diverse species of great economic, agronomic and scientific importance, including the model plant Arabidopsis. The sequencing of the Arabidopsis genome has revolutionized our knowledge in the field of plant biology and provides a foundation in genomics and comparative biology. Genomic resources have been utilized in Brassica for diversity analyses, construction of genetic maps and identification of agronomic traits. In Brassicaceae, comparative sequence analysis across the species has been utilized to understand genome structure, evolution and the detection of conserved genomic segments. In this review, we focus on the progress made in genetic resource development, genome sequencing and comparative mapping in Brassica and related species. The utilization of genomic resources and next-generation sequencing approaches in improvement of Brassica crops is also discussed. PMID:24987286

  5. Comparative mapping, genomic structure, and expression analysis of eight pseudo-response regulator genes in Brassica rapa.

    PubMed

    Kim, Jin A; Kim, Jung Sun; Hong, Joon Ki; Lee, Yeon-Hee; Choi, Beom-Soon; Seol, Young-Joo; Jeon, Chang Hoo

    2012-05-01

    Circadian clocks regulate plant growth and development in response to environmental factors. In this function, clocks influence the adaptation of species to changes in location or climate. Circadian-clock genes have been subject of intense study in models such as Arabidopsis thaliana but the results may not necessarily reflect clock functions in species with polyploid genomes, such as Brassica species, that include multiple copies of clock-related genes. The triplicate genome of Brassica rapa retains high sequence-level co-linearity with Arabidopsis genomes. In B. rapa we had previously identified five orthologs of the five known Arabidopsis pseudo-response regulator (PRR) genes that are key regulators of the circadian clock in this species. Three of these B. rapa genes, BrPRR1, BrPPR5, and BrPPR7, are present in two copies each in the B. rapa genome, for a total of eight B. rapa PRR (BrPRR) orthologs. We have now determined sequences and expression characteristics of the eight BrPRR genes and mapped their positions in the B. rapa genome. Although both members of each paralogous pair exhibited the same expression pattern, some variation in their gene structures was apparent. The BrPRR genes are tightly linked to several flowering genes. The knowledge about genome location, copy number variation and structural diversity of these B. rapa clock genes will improve our understanding of clock-related functions in this important crop. This will facilitate the development of Brassica crops for optimal growth in new environments and under changing conditions.

  6. Evaluation of one- and two-color gene expression arrays for microbial comparative genome hybridization analyses in routine applications.

    PubMed

    Schwarz, Roland; Joseph, Biju; Gerlach, Gabriele; Schramm-Glück, Anja; Engelhard, Kathrin; Frosch, Matthias; Müller, Tobias; Schoen, Christoph

    2010-09-01

    DNA microarray technology has already revolutionized basic research in infectious diseases, and whole-genome sequencing efforts have allowed for the fabrication of tailor-made spotted microarrays for an increasing number of bacterial pathogens. However, the application of microarrays in diagnostic microbiology is currently hampered by the high costs associated with microarray experiments and the specialized equipment needed. Here, we show that a thorough bioinformatic postprocessing of the microarray design to reduce the amount of unspecific noise also allows the reliable use of spotted gene expression microarrays for gene content analyses. We further demonstrate that the use of only single-color labeling to halve the costs for dye-labeled nucleotides results in only a moderate decrease in overall specificity and sensitivity. Therefore, gene expression microarrays using only single-color labeling can also reliably be used for gene content analyses, thus reducing the costs for potential routine applications such as genome-based pathogen detection or strain typing.

  7. Comparative Microbial Genomics and Forensics.

    PubMed

    Massey, Steven E

    2016-08-01

    Forensic science concerns the application of scientific techniques to questions of a legal nature and may also be used to address questions of historical importance. Forensic techniques are often used in legal cases that involve crimes against persons or property, and they increasingly may involve cases of bioterrorism, crimes against nature, medical negligence, or tracing the origin of food- and crop-borne disease. Given the rapid advance of genome sequencing and comparative genomics techniques, we ask how these might be used to address cases of a forensic nature, focusing on the use of microbial genome sequence analysis. Such analyses rely on the increasingly large numbers of microbial genomes present in public databases, the ability of individual investigators to rapidly sequence whole microbial genomes, and an increasing depth of understanding of their evolution and function. Suggestions are made as to how comparative microbial genomics might be applied forensically and may represent possibilities for the future development of forensic techniques. A particular emphasis is on the nascent field of genomic epidemiology, which utilizes rapid whole-genome sequencing to identify the source and spread of infectious outbreaks. Also discussed is the application of comparative microbial genomics to the study of historical epidemics and deaths and how the approaches developed may also be applicable to more recent and actionable cases.

  8. Cloud computing for comparative genomics.

    PubMed

    Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

    2010-05-18

    Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.

  9. Cloud computing for comparative genomics

    PubMed Central

    2010-01-01

    Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786

  10. Identification of candidate genes in Populus cell wall biosynthesis using text-mining, co-expression network and comparative genomics

    SciTech Connect

    Yang, Xiaohan; Ye, Chuyu; Bisaria, Anjali; Tuskan, Gerald A; Kalluri, Udaya C

    2011-01-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of ethanol from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidences supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database and additional genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional genomics in relation to cell wall biosynthesis.

  11. Comparative Genomics of the Cucurbitaceae

    USDA-ARS?s Scientific Manuscript database

    The genome size for watermelon, melon, cucumber, and pumpkin is 425, 454, 367, and 502 Mbp, respectively, and considered medium size as compared with most other crops. Whole-genome duplication is common in angiosperm plants. Research has revealed a paleohexaploidy (') event in the common ancestor of...

  12. Culex genome is not just another genome for comparative genomics.

    PubMed

    Reddy, B P Niranjan; Labbé, Pierrick; Corbel, Vincent

    2012-03-30

    Formal publication of the Culex genome sequence has closed the human disease vector triangle by meeting the Anopheles gambiae and Aedes aegypti genome sequences. Compared to these other mosquitoes, Culex quinquefasciatus possesses many specific hallmark characteristics, and may thus provide different angles for research which ultimately leads to a practical solution for controlling the ever increasing burden of insect-vector-borne diseases around the globe. We argue the special importance of the cosmopolitan species- Culex genome sequence by invoking many interesting questions and the possible of potential of the Culex genome to answer those.

  13. Culex genome is not just another genome for comparative genomics

    PubMed Central

    2012-01-01

    Formal publication of the Culex genome sequence has closed the human disease vector triangle by meeting the Anopheles gambiae and Aedes aegypti genome sequences. Compared to these other mosquitoes, Culex quinquefasciatus possesses many specific hallmark characteristics, and may thus provide different angles for research which ultimately leads to a practical solution for controlling the ever increasing burden of insect-vector-borne diseases around the globe. We argue the special importance of the cosmopolitan species- Culex genome sequence by invoking many interesting questions and the possible of potential of the Culex genome to answer those. PMID:22463777

  14. Comparative genomics for biodiversity conservation

    PubMed Central

    Grueber, Catherine E.

    2015-01-01

    Genomic approaches are gathering momentum in biology and emerging opportunities lie in the creative use of comparative molecular methods for revealing the processes that influence diversity of wildlife. However, few comparative genomic studies are performed with explicit and specific objectives to aid conservation of wild populations. Here I provide a brief overview of comparative genomic approaches that offer specific benefits to biodiversity conservation. Because conservation examples are few, I draw on research from other areas to demonstrate how comparing genomic data across taxa may be used to inform the characterisation of conservation units and studies of hybridisation, as well as studies that provide conservation outcomes from a better understanding of the drivers of divergence. A comparative approach can also provide valuable insight into the threatening processes that impact rare species, such as emerging diseases and their management in conservation. In addition to these opportunities, I note areas where additional research is warranted. Overall, comparing and contrasting the genomic composition of threatened and other species provide several useful tools for helping to preserve the molecular biodiversity of the global ecosystem. PMID:26106461

  15. Identification, expression, and comparative genomic analysis of the IPT and CKX gene families in Chinese cabbage (Brassica rapa ssp. pekinensis)

    PubMed Central

    2013-01-01

    Background Cytokinins (CKs) have significant roles in various aspects of plant growth and development, and they are also involved in plant stress adaptations. The fine-tuning of the controlled CK levels in individual tissues, cells, and organelles is properly maintained by isopentenyl transferases (IPTs) and cytokinin oxidase/dehydrogenases (CKXs). Chinese cabbage is one of the most economically important vegetable crops worldwide. The whole genome sequencing of Brassica rapa enables us to perform the genome-wide identification and functional analysis of the IPT and CKX gene families. Results In this study, a total of 13 BrIPT genes and 12 BrCKX genes were identified. The gene structures, conserved domains and phylogenetic relationships were analyzed. The isoelectric point, subcellular localization and glycosylation sites of the proteins were predicted. Segmental duplicates were found in both BrIPT and BrCKX gene families. We also analyzed evolutionary patterns and divergence of the IPT and CKX genes in the Cruciferae family. The transcription levels of BrIPT and BrCKX genes were analyzed to obtain an initial picture of the functions of these genes. Abiotic stress elements related to adverse environmental stimuli were found in the promoter regions of BrIPT and BrCKX genes and they were confirmed to respond to drought and high salinity conditions. The effects of 6-BA and ABA on the expressions of BrIPT and BrCKX genes were also investigated. Conclusions The expansion of BrIPT and BrCKX genes after speciation from Arabidopsis thaliana is mainly attributed to segmental duplication events during the whole genome triplication (WGT) and substantial duplicated genes are lost during the long evolutionary history. Genes produced by segmental duplication events have changed their expression patterns or may adopted new functions and thus are obtained. BrIPT and BrCKX genes respond well to drought and high salinity stresses, and their transcripts are affected by exogenous

  16. An Expressed Sequence Tag (EST)-enriched genetic map of turbot (Scophthalmus maximus): a useful framework for comparative genomics across model and farmed teleosts

    PubMed Central

    2012-01-01

    Background The turbot (Scophthalmus maximus) is a relevant species in European aquaculture. The small turbot genome provides a source for genomics strategies to use in order to understand the genetic basis of productive traits, particularly those related to sex, growth and pathogen resistance. Genetic maps represent essential genomic screening tools allowing to localize quantitative trait loci (QTL) and to identify candidate genes through comparative mapping. This information is the backbone to develop marker-assisted selection (MAS) programs in aquaculture. Expressed sequenced tag (EST) resources have largely increased in turbot, thus supplying numerous type I markers suitable for extending the previous linkage map, which was mostly based on anonymous loci. The aim of this study was to construct a higher-resolution turbot genetic map using EST-linked markers, which will turn out to be useful for comparative mapping studies. Results A consensus gene-enriched genetic map of the turbot was constructed using 463 SNP and microsatellite markers in nine reference families. This map contains 438 markers, 180 EST-linked, clustered at 24 linkage groups. Linkage and comparative genomics evidences suggested additional linkage group fusions toward the consolidation of turbot map according to karyotype information. The linkage map showed a total length of 1402.7 cM with low average intermarker distance (3.7 cM; ~2 Mb). A global 1.6:1 female-to-male recombination frequency (RF) ratio was observed, although largely variable among linkage groups and chromosome regions. Comparative sequence analysis revealed large macrosyntenic patterns against model teleost genomes, significant hits decreasing from stickleback (54%) to zebrafish (20%). Comparative mapping supported particular chromosome rearrangements within Acanthopterygii and aided to assign unallocated markers to specific turbot linkage groups. Conclusions The new gene-enriched high-resolution turbot map represents a

  17. Comparative Genome Mapping in Brassica

    PubMed Central

    Lagercrantz, U.; Lydiate, D. J.

    1996-01-01

    A Brassica nigra genetic linkage map was developed from a highly polymorphic cross analyzed with a set of low copy number Brassica RFLP probes. The Brassica genome is extensively duplicated with eight distinct sets of chromosomal segments, each present in three copies, covering virtually the whole genome. Thus, B. nigra could be descended from a hexaploid ancestor. A comparative analysis of B. nigra, B. oleracea and B. rapa genomes, based on maps developed using a common set of RFLP probes, was also performed. The three genomes have distinct chromosomal structures differentiated by a large number of rearrangements, but collinear regions involving virtually the whole of each the three genomes were identified. The genic contents of B. nigra, B. oleracea and B. rapa were basically equivalent and differences in chromosome number (8, 9 and 10, respectively) are probably the result of chromsome fusions and/or fissions. The strong conservation of overall genic content across the three Brassica genomes mirrors the conservation of genic content observed over a much longer evolutionary span in cereals. However, the rate of chromosomal rearrangement in crucifers is much higher than that observed in cereal genomes. PMID:8978073

  18. Comparative Genomics of Gene Expression in the Parasitic and Free-Living Nematodes Strongyloides stercoralis and Caenorhabditis elegans

    PubMed Central

    Mitreva, Makedonka; McCarter, James P.; Martin, John; Dante, Mike; Wylie, Todd; Chiapelli, Brandi; Pape, Deana; Clifton, Sandra W.; Nutman, Thomas B.; Waterston, Robert H.

    2004-01-01

    Although developmental timing of gene expression is used to infer potential gene function, studies have yet to correlate this information between species. We analyzed 10,921 ESTs in 3311 clusters from first- and infective third-stage larva (L1, L3i) of the parasitic nematode Strongyloides stercoralis and compared the results to Caenorhabditis elegans, a species that has an L3i-like dauer stage. In the comparison of S. stercoralis clusters with stage-specific expression to C. elegans homologs expressed in either dauer or nondauer stages, matches between S. stercoralis L1 and C. elegans nondauer-expressed genes dominated, suggesting conservation in the repertoire of genes expressed during growth in nutrient-rich conditions. For example, S. stercoralis collagen transcripts were abundant in L1 but not L3i, a pattern consistent with C. elegans collagens. Although a greater proportion of S. stercoralis L3i than L1 genes have homologs among the C. elegans dauer-specific transcripts, we did not uncover evidence of a robust conserved L3i/dauer `expression signature.' Strikingly, in comparisons of S. stercoralis clusters to C. elegans homologs with RNAi knockouts, those with significant L1-specific expression were more than twice as likely as L3i-specific clusters to match genes with phenotypes. We also provide functional classifications of S. stercoralis clusters. PMID:14762059

  19. Comparative Genomic Analysis of Transgenic Poplar Dwarf Mutant Reveals Numerous Differentially Expressed Genes Involved in Energy Flow

    PubMed Central

    Chen, Su; Bai, Shuang; Liu, Guifeng; Li, Huiyu; Jiang, Jing

    2014-01-01

    In our previous research, the Tamarix androssowii LEA gene (Tamarix androssowii late embryogenesis abundant protein Mrna, GenBank ID: DQ663481) was transferred into Populus simonii × Populus nigra. Among the eleven transgenic lines, one exhibited a dwarf phenotype compared to the wild type and other transgenic lines, named dwf1. To uncover the mechanisms underlying this phenotype, digital gene expression libraries were produced from dwf1, wild-type, and other normal transgenic lines, XL-5 and XL-6. Gene expression profile analysis indicated that dwf1 had a unique gene expression pattern in comparison to the other two transgenic lines. Finally, a total of 1246 dwf1-unique differentially expressed genes were identified. These genes were further subjected to gene ontology and pathway analysis. Results indicated that photosynthesis and carbohydrate metabolism related genes were significantly affected. In addition, many transcription factors genes were also differentially expressed in dwf1. These various differentially expressed genes may be critical for dwarf mutant formation; thus, the findings presented here might provide insight for our understanding of the mechanisms of tree growth and development. PMID:25192286

  20. Enhancer Identification through Comparative Genomics

    SciTech Connect

    Visel, Axel; Bristow, James; Pennacchio, Len A.

    2006-10-01

    With the availability of genomic sequence from numerousvertebrates, a paradigm shift has occurred in the identification ofdistant-acting gene regulatory elements. In contrast to traditionalgene-centric studies in which investigators randomly scanned genomicfragments that flank genes of interest in functional assays, the modernapproach begins electronically with publicly available comparativesequence datasets that provide investigators with prioritized lists ofputative functional sequences based on their evolutionary conservation.However, although a large number of tools and resources are nowavailable, application of comparative genomic approaches remains far fromtrivial. In particular, it requires users to dynamically consider thespecies and methods for comparison depending on the specific biologicalquestion under investigation. While there is currently no single generalrule to this end, it is clear that when applied appropriately,comparative genomic approaches exponentially increase our power ingenerating biological hypotheses for subsequent experimentaltesting.

  1. Plant Comparative and Functional Genomics

    DOE PAGES

    Yang, Xiaohan; Leebens-Mack, Jim; Chen, Feng; ...

    2015-01-01

    Plants form the foundation for our global ecosystem and are essential for environmental and human health. An increasing number of available plant genomes and tractable experimental systems, comparative and functional plant genomics research is greatly expanding our knowledge of the molecular basis of economically and nutritionally important traits in crop plants. Inferences drawn from comparative genomics are motivating experimental investigations of gene function and gene interactions. In this special issue aims to highlight recent advances made in comparative and functional genomics research in plants. Nine original research articles in this special issue cover five important topics: (1) transcription factor genemore » families relevant to abiotic stress tolerance; (2) plant secondary metabolism; (3) transcriptomebased markers for quantitative trait locus; (4) epigenetic modifications in plant-microbe interactions; and (5) computational prediction of protein-protein interactions. Finally, we studied the plant species in these articles which include model species as well as nonmodel plant species of economic importance (e.g., food crops and medicinal plants).« less

  2. Comparative genomic analysis and expression of the APETALA2-like genes from barley, wheat, and barley-wheat amphiploids

    PubMed Central

    Gil-Humanes, Javier; Pistón, Fernando; Martín, Antonio; Barro, Francisco

    2009-01-01

    Background The APETALA2-like genes form a large multi-gene family of transcription factors which play an important role during the plant life cycle, being key regulators of many developmental processes. Many studies in Arabidopsis have revealed that the APETALA2 (AP2) gene is implicated in the establishment of floral meristem and floral organ identity as well as temporal and spatial regulation of flower homeotic gene expression. Results In this work, we have cloned and characterised the AP2-like gene from accessions of Hordeum chilense and Hordeum vulgare, wild and domesticated barley, respectively, and compared with other AP2 homoeologous genes, including the Q gene in wheat. The Hordeum AP2-like genes contain two plant-specific DNA binding motifs called AP2 domains, as does the Q gene of wheat. We confirm that the H. chilense AP2-like gene is located on chromosome 5Hch. Patterns of expression of the AP2-like genes were examined in floral organs and other tissues in barley, wheat and in tritordeum amphiploids (barley × wheat hybrids). In tritordeum amphiploids, the level of transcription of the barley AP2-like gene was lower than in its barley parental and the chromosome substitutions 1D/1Hch and 2D/2Hch were seen to modify AP2 gene expression levels. Conclusion The results are of interest in order to understand the role of the AP2-like gene in the spike morphology of barley and wheat, and to understand the regulation of this gene in the amphiploids obtained from barley-wheat crossing. This information may have application in cereal breeding programs to up- or down-regulate the expression of AP2-like genes in order to modify spike characteristics and to obtain free-threshing plants. PMID:19480686

  3. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set

  4. GenomeFingerprinter: The Genome Fingerprint and the Universal Genome Fingerprint Analysis for Systematic Comparative Genomics

    PubMed Central

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    Background No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. Results First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. Conclusions We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the

  5. Comparative Genomics of the Eukaryotes

    PubMed Central

    Rubin, Gerald M.; Yandell, Mark D.; Wortman, Jennifer R.; Gabor Miklos, George L.; Nelson, Catherine R.; Hariharan, Iswar K.; Fortini, Mark E.; Li, Peter W.; Apweiler, Rolf; Fleischmann, Wolfgang; Cherry, J. Michael; Henikoff, Steven; Skupski, Marian P.; Misra, Sima; Ashburner, Michael; Birney, Ewan; Boguski, Mark S.; Brody, Thomas; Brokstein, Peter; Celniker, Susan E.; Chervitz, Stephen A.; Coates, David; Cravchik, Anibal; Gabrielian, Andrei; Galle, Richard F.; Gelbart, William M.; George, Reed A.; Goldstein, Lawrence S. B.; Gong, Fangcheng; Guan, Ping; Harris, Nomi L.; Hay, Bruce A.; Hoskins, Roger A.; Li, Jiayin; Li, Zhenya; Hynes, Richard O.; Jones, S. J. M.; Kuehl, Peter M.; Lemaitre, Bruno; Littleton, J. Troy; Morrison, Deborah K.; Mungall, Chris; O'Farrell, Patrick H.; Pickeral, Oxana K.; Shue, Chris; Vosshall, Leslie B.; Zhang, Jiong; Zhao, Qi; Zheng, Xiangqun H.; Zhong, Fei; Zhong, Wenyan; Gibbs, Richard; Venter, J. Craig; Adams, Mark D.; Lewis, Suzanna

    2009-01-01

    A comparative analysis of the genomes of Drosophila melanogaster, Caenorhabditis elegans, and Saccharomyces cerevisiae—and the proteins they are predicted to encode—was undertaken in the context of cellular, developmental, and evolutionary processes. The nonredundant protein sets of flies and worms are similar in size and are only twice that of yeast, but different gene families are expanded in each genome, and the multidomain proteins and signaling pathways of the fly and worm are far more complex than those of yeast. The fly has orthologs to 177 of the 289 human disease genes examined and provides the foundation for rapid analysis of some of the basic processes involved in human disease. PMID:10731134

  6. Gramene 2013: comparative plant genomics resources

    PubMed Central

    Monaco, Marcela K.; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D’Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J.; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen

    2014-01-01

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology. PMID:24217918

  7. A Comprehensive Expressed Sequence Tag Linkage Map for Tiger Salamander and Mexican Axolotl: Enabling Gene Mapping and Comparative Genomics in Ambystoma

    PubMed Central

    Smith, J. J.; Kump, D. K.; Walker, J. A.; Parichy, D. M.; Voss, S. R.

    2005-01-01

    Expressed sequence tag (EST) markers were developed for Ambystoma tigrinum tigrinum (Eastern tiger salamander) and for A. mexicanum (Mexican axolotl) to generate the first comprehensive linkage map for these model amphibians. We identified 14 large linkage groups (125.5–836.7 cM) that presumably correspond to the 14 haploid chromosomes in the Ambystoma genome. The extent of genome coverage for these linkage groups is apparently high because the total map size (5251 cM) falls within the range of theoretical estimates and is consistent with independent empirical estimates. Unlike most vertebrate species, linkage map size in Ambystoma is not strongly correlated with chromosome arm number. Presumably, the large physical genome size (∼30 Gbp) is a major determinant of map size in Ambystoma. To demonstrate the utility of this resource, we mapped the position of two historically significant A. mexicanum mutants, white and melanoid, and also met, a quantitative trait locus (QTL) that contributes to variation in metamorphic timing. This new collection of EST-based PCR markers will better enable the Ambystoma system by facilitating development of new molecular probes, and the linkage map will allow comparative studies of this important vertebrate group. PMID:16079226

  8. A comprehensive expressed sequence tag linkage map for tiger salamander and Mexican axolotl: enabling gene mapping and comparative genomics in Ambystoma.

    PubMed

    Smith, J J; Kump, D K; Walker, J A; Parichy, D M; Voss, S R

    2005-11-01

    Expressed sequence tag (EST) markers were developed for Ambystoma tigrinum tigrinum (Eastern tiger salamander) and for A. mexicanum (Mexican axolotl) to generate the first comprehensive linkage map for these model amphibians. We identified 14 large linkage groups (125.5-836.7 cM) that presumably correspond to the 14 haploid chromosomes in the Ambystoma genome. The extent of genome coverage for these linkage groups is apparently high because the total map size (5251 cM) falls within the range of theoretical estimates and is consistent with independent empirical estimates. Unlike most vertebrate species, linkage map size in Ambystoma is not strongly correlated with chromosome arm number. Presumably, the large physical genome size ( approximately 30 Gbp) is a major determinant of map size in Ambystoma. To demonstrate the utility of this resource, we mapped the position of two historically significant A. mexicanum mutants, white and melanoid, and also met, a quantitative trait locus (QTL) that contributes to variation in metamorphic timing. This new collection of EST-based PCR markers will better enable the Ambystoma system by facilitating development of new molecular probes, and the linkage map will allow comparative studies of this important vertebrate group.

  9. Comparative primate genomics: emerging patterns of genome content and dynamics.

    PubMed

    Rogers, Jeffrey; Gibbs, Richard A

    2014-05-01

    Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for various primate species, and analyses of several others are underway. Whole-genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other non-human primates offer valuable insights into genetic similarities and differences among species that are used as models for disease-related research. This Review summarizes current knowledge regarding primate genome content and dynamics, and proposes a series of goals for the near future.

  10. Comparative primate genomics: emerging patterns of genome content and dynamics

    PubMed Central

    Rogers, Jeffrey; Gibbs, Richard A.

    2014-01-01

    Preface Advances in genome sequencing technologies have created new opportunities for comparative primate genomics. Genome assemblies have been published for several primates, with analyses of several others underway. Whole genome assemblies for the great apes provide remarkable new information about the evolutionary origins of the human genome and the processes involved. Genomic data for macaques and other nonhuman primates provide valuable insight into genetic similarities and differences among species used as models for disease-related research. This review summarizes current knowledge regarding primate genome content and dynamics and offers a series of goals for the near future. PMID:24709753

  11. Comparative genomics of the Hlx homeobox gene and protein: conservation of structure and expression from fish to mammals.

    PubMed

    Bates, Michael D; Wells, James M; Venkatesh, Byrappa

    2005-06-06

    Hlx is a homeobox transcription factor gene that is expressed in intestinal and hepatic mesenchyme of the developing mouse embryo and is essential for normal intestinal and hepatic development. Because of the morphological and molecular similarities in the development of the digestive system across species, we hypothesized that the Hlx gene and protein sequences and expression patterns would be conserved among vertebrates. Comparison of the Hlx gene orthologues of human, chimpanzee, mouse, rat, pufferfish (Fugu) and zebrafish demonstrates that these six genes share an identical organization with four exons and three introns. Comparison of the inferred Hlx protein sequences from these and three additional species (chick, Spanish ribbed newt and rainbow trout) reveals significant sequence identity, with identical homeodomains. The expression of Hlx in the mesenchyme of developing chick embryos is highly similar to that of mouse. Fugu Hlx is expressed in a tissue-specific manner that is similar though not identical to that of mouse, suggesting a conservation of Hlx function between mammals and birds. The mammalian and fish Hlx genes share a putative 5' upstream enhancer as well as an inverted repeat containing CCAAT boxes on opposite strands that we have previously shown to be important for mouse Hlx gene expression. These results suggest that the function of Hlx and the mechanisms regulating its expression are highly conserved in mammals, birds, amphibians and fish.

  12. Comparative genomic sequence and expression analyses of Medicago truncatula and alfalfa subspecies falcata COLD-ACCLIMATION-SPECIFIC genes.

    PubMed

    Pennycooke, Joyce C; Cheng, Hongmei; Stockinger, Eric J

    2008-03-01

    In Arabidopsis (Arabidopsis thaliana) the low-temperature induction of genes encoding the C-REPEAT BINDING FACTOR (CBF) transcriptional activators is a key step in cold acclimation. CBFs in turn activate a battery of downstream genes known as the CBF regulon, which collectively act to increase tolerance to low temperatures. Fundamental questions are: What determines the size and scope of the CBF regulon, and is this is a major determinant of the low-temperature tolerance capacity of individual plant species? Here we have begun to address these questions through comparative analyses of Medicago truncatula and Medicago sativa subsp. falcata. M. truncatula survived to -4 degrees C but did not cold acclimate, whereas Medicago falcata cold acclimated and survived -14 degrees C. Both species possessed low-temperature-induced CBFs but differed in the expression of the COLD-ACCLIMATION-SPECIFIC (CAS) genes, which are candidate CBF targets. M. falcata CAS30 was robustly cold-responsive whereas the MtCAS31 homolog was not. M. falcata also possessed additional CAS30 homologs in comparison to the single CAS31 gene in M. truncatula. MfCAS30 possessed multiple pairs of closely spaced C-REPEAT/DEHYDRATION RESPONSIVE ELEMENT (CRT/DRE) motifs, the cognate CBF binding site in its upstream region whereas MtCAS31 lacked one CRT/DRE partner of the two proximal partner pairs. CAS genes also shared a promoter structure comprising modules proximal and distal to the coding sequence. CAS15, highly cold-responsive in both species, harbored numerous CRT/DRE motifs, but only in the distal module. However, fusion of the MtCAS15 promoter, including the distal module, to a reporter gene did not result in low-temperature responsiveness in stably transformed Arabidopsis. In contrast, both MtCAS31 and MfCAS30 promoter fusions were low-temperature responsive, although the MfCAS31 fusion was less robust than the MfCAS30 fusion. From these studies we conclude that CAS genes harbor CRT/DRE motifs, their

  13. A Comparative Map of the Zebrafish Genome

    PubMed Central

    Woods, Ian G.; Kelly, Peter D.; Chu, Felicia; Ngo-Hazelett, Phuong; Yan, Yi-Lin; Huang, Hui; Postlethwait, John H.; Talbot, William S.

    2000-01-01

    Zebrafish mutations define the functions of hundreds of essential genes in the vertebrate genome. To accelerate the molecular analysis of zebrafish mutations and to facilitate comparisons among the genomes of zebrafish and other vertebrates, we used a homozygous diploid meiotic mapping panel to localize polymorphisms in 691 previously unmapped genes and expressed sequence tags (ESTs). Together with earlier efforts, this work raises the total number of markers scored in the mapping panel to 2119, including 1503 genes and ESTs and 616 previously characterized simple-sequence length polymorphisms. Sequence analysis of zebrafish genes mapped in this study and in prior work identified putative human orthologs for 804 zebrafish genes and ESTs. Map comparisons revealed 139 new conserved syntenies, in which two or more genes are on the same chromosome in zebrafish and human. Although some conserved syntenies are quite large, there were changes in gene order within conserved groups, apparently reflecting the relatively frequent occurrence of inversions and other intrachromosomal rearrangements since the divergence of teleost and tetrapod ancestors. Comparative mapping also shows that there is not a one-to-one correspondence between zebrafish and human chromosomes. Mapping of duplicate gene pairs identified segments of 20 linkage groups that may have arisen during a genome duplication that occurred early in the evolution of teleosts after the divergence of teleost and mammalian ancestors. This comparative map will accelerate the molecular analysis of zebrafish mutations and enhance the understanding of the evolution of the vertebrate genome. PMID:11116086

  14. Datasets for evolutionary comparative genomics

    PubMed Central

    Liberles, David A

    2005-01-01

    Many decisions about genome sequencing projects are directed by perceived gaps in the tree of life, or towards model organisms. With the goal of a better understanding of biology through the lens of evolution, however, there are additional genomes that are worth sequencing. One such rationale for whole-genome sequencing is discussed here, along with other important strategies for understanding the phenotypic divergence of species. PMID:16086856

  15. Gramene database: navigating plant comparative genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...

  16. Comparative genomics of the liberibacteral plant pathogens

    USDA-ARS?s Scientific Manuscript database

    Comparative analyses of multiple Liberibacter genomes provide significant insights into the evolutionary history, genetic diversity, and phylogenetic and metabolomic capacities among pathogenic bacteria that have caused tremendous economic losses to agricultural crops. In addition, genomic analyses ...

  17. Cocoa/Cotton Comparative Genomics

    USDA-ARS?s Scientific Manuscript database

    With genome sequence from two members of the Malvaceae family recently made available, we are exploring syntenic relationships, gene content, and evolutionary trajectories between the cacao and cotton genomes. An assembly of cacao (Theobroma cacao) using Illumina and 454 sequence technology yielded ...

  18. Haemonchus contortus: Genome Structure, Organization and Comparative Genomics.

    PubMed

    Laing, R; Martinelli, A; Tracey, A; Holroyd, N; Gilleard, J S; Cotton, J A

    2016-01-01

    One of the first genome sequencing projects for a parasitic nematode was that for Haemonchus contortus. The open access data from the Wellcome Trust Sanger Institute provided a valuable early resource for the research community, particularly for the identification of specific genes and genetic markers. Later, a second sequencing project was initiated by the University of Melbourne, and the two draft genome sequences for H. contortus were published back-to-back in 2013. There is a pressing need for long-range genomic information for genetic mapping, population genetics and functional genomic studies, so we are continuing to improve the Wellcome Trust Sanger Institute assembly to provide a finished reference genome for H. contortus. This review describes this process, compares the H. contortus genome assemblies with draft genomes from other members of the strongylid group and discusses future directions for parasite genomics using the H. contortus model. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Divergent and convergent modes of interaction between wheat and Puccinia graminis f. sp. tritici isolates revealed by the comparative gene co-expression network and genome analyses.

    PubMed

    Rutter, William B; Salcedo, Andres; Akhunova, Alina; He, Fei; Wang, Shichen; Liang, Hanquan; Bowden, Robert L; Akhunov, Eduard

    2017-04-12

    Two opposing evolutionary constraints exert pressure on plant pathogens: one to diversify virulence factors in order to evade plant defenses, and the other to retain virulence factors critical for maintaining a compatible interaction with the plant host. To better understand how the diversified arsenals of fungal genes promote interaction with the same compatible wheat line, we performed a comparative genomic analysis of two North American isolates of Puccinia graminis f. sp. tritici (Pgt). The patterns of inter-isolate divergence in the secreted candidate effector genes were compared with the levels of conservation and divergence of plant-pathogen gene co-expression networks (GCN) developed for each isolate. Comprative genomic analyses revealed substantial level of interisolate divergence in effector gene complement and sequence divergence. Gene Ontology (GO) analyses of the conserved and unique parts of the isolate-specific GCNs identified a number of conserved host pathways targeted by both isolates. Interestingly, the degree of inter-isolate sub-network conservation varied widely for the different host pathways and was positively associated with the proportion of conserved effector candidates associated with each sub-network. While different Pgt isolates tended to exploit similar wheat pathways for infection, the mode of plant-pathogen interaction varied for different pathways with some pathways being associated with the conserved set of effectors and others being linked with the diverged or isolate-specific effectors. Our data suggest that at the intra-species level pathogen populations likely maintain divergent sets of effectors capable of targeting the same plant host pathways. This functional redundancy may play an important role in the dynamic of the "arms-race" between host and pathogen serving as the basis for diverse virulence strategies and creating conditions where mutations in certain effector groups will not have a major effect on the pathogen

  20. Freshwater bacterial lifestyles inferred from comparative genomics.

    PubMed

    Livermore, Joshua A; Emrich, Scott J; Tan, John; Jones, Stuart E

    2014-03-01

    While micro-organisms actively mediate and participate in freshwater ecosystem services, we know little about freshwater microbial genetic diversity. Genome sequences are available for many bacteria from the human microbiome and the ocean (over 800 and 200, respectively), but only two freshwater genomes are currently available: the streamlined genomes of Polynucleobacter necessarius ssp. asymbioticus and the Actinobacterium AcI-B1. Here, we sequenced and analysed draft genomes of eight phylogentically diverse freshwater bacteria exhibiting a range of lifestyle characteristics. Comparative genomics of these bacteria reveals putative freshwater bacterial lifestyles based on differences in predicted growth rate, capability to respond to environmental stimuli and diversity of useable carbon substrates. Our conceptual model based on these genomic characteristics provides a foundation on which further ecophysiological and genomic studies can be built. In addition, these genomes greatly expand the diversity of existing genomic context for future studies on the ecology and genetics of freshwater bacteria.

  1. Comparative Genomics Reveals High Genomic Diversity in the Genus Photobacterium.

    PubMed

    Machado, Henrique; Gram, Lone

    2017-01-01

    Vibrionaceae is a large marine bacterial family, which can constitute up to 50% of the prokaryotic population in marine waters. Photobacterium is the second largest genus in the family and we used comparative genomics on 35 strains representing 16 of the 28 species described so far, to understand the genomic diversity present in the Photobacterium genus. Such understanding is important for ecophysiology studies of the genus. We used whole genome sequences to evaluate phylogenetic relationships using several analyses (16S rRNA, MLSA, fur, amino-acid usage, ANI), which allowed us to identify two misidentified strains. Genome analyses also revealed occurrence of higher and lower GC content clades, correlating with phylogenetic clusters. Pan- and core-genome analysis revealed the conservation of 25% of the genome throughout the genus, with a large and open pan-genome. The major source of genomic diversity could be traced to the smaller chromosome and plasmids. Several of the physiological traits studied in the genus did not correlate with phylogenetic data. Since horizontal gene transfer (HGT) is often suggested as a source of genetic diversity and a potential driver of genomic evolution in bacterial species, we looked into evidence of such in Photobacterium genomes. Genomic islands were the source of genomic differences between strains of the same species. Also, we found transposase genes and CRISPR arrays that suggest multiple encounters with foreign DNA. Presence of genomic exchange traits was widespread and abundant in the genus, suggesting a role in genomic evolution. The high genetic variability and indications of genetic exchange make it difficult to elucidate genome evolutionary paths and raise the awareness of the roles of foreign DNA in the genomic evolution of environmental organisms.

  2. Comparative genomics of protoploid Saccharomycetaceae

    PubMed Central

    Souciet, Jean-Luc; Dujon, Bernard; Gaillardin, Claude; Johnston, Mark; Baret, Philippe V.; Cliften, Paul; Sherman, David J.; Weissenbach, Jean; Westhof, Eric; Wincker, Patrick; Jubin, Claire; Poulain, Julie; Barbe, Valérie; Ségurens, Béatrice; Artiguenave, François; Anthouard, Véronique; Vacherie, Benoit; Val, Marie-Eve; Fulton, Robert S.; Minx, Patrick; Wilson, Richard; Durrens, Pascal; Jean, Géraldine; Marck, Christian; Martin, Tiphaine; Nikolski, Macha; Rolland, Thomas; Seret, Marie-Line; Casarégola, Serge; Despons, Laurence; Fairhead, Cécile; Fischer, Gilles; Lafontaine, Ingrid; Leh, Véronique; Lemaire, Marc; de Montigny, Jacky; Neuvéglise, Cécile; Thierry, Agnès; Blanc-Lenfle, Isabelle; Bleykasten, Claudine; Diffels, Julie; Fritsch, Emilie; Frangeul, Lionel; Goëffon, Adrien; Jauniaux, Nicolas; Kachouri-Lafond, Rym; Payen, Célia; Potier, Serge; Pribylova, Lenka; Ozanne, Christophe; Richard, Guy-Franck; Sacerdot, Christine; Straub, Marie-Laure; Talla, Emmanuel

    2009-01-01

    Our knowledge of yeast genomes remains largely dominated by the extensive studies on Saccharomyces cerevisiae and the consequences of its ancestral duplication, leaving the evolution of the entire class of hemiascomycetes only partly explored. We concentrate here on five species of Saccharomycetaceae, a large subdivision of hemiascomycetes, that we call “protoploid” because they diverged from the S. cerevisiae lineage prior to its genome duplication. We determined the complete genome sequences of three of these species: Kluyveromyces (Lachancea) thermotolerans and Saccharomyces (Lachancea) kluyveri (two members of the newly described Lachancea clade), and Zygosaccharomyces rouxii. We included in our comparisons the previously available sequences of Kluyveromyces lactis and Ashbya (Eremothecium) gossypii. Despite their broad evolutionary range and significant individual variations in each lineage, the five protoploid Saccharomycetaceae share a core repertoire of approximately 3300 protein families and a high degree of conserved synteny. Synteny blocks were used to define gene orthology and to infer ancestors. Far from representing minimal genomes without redundancy, the five protoploid yeasts contain numerous copies of paralogous genes, either dispersed or in tandem arrays, that, altogether, constitute a third of each genome. Ancient, conserved paralogs as well as novel, lineage-specific paralogs were identified. PMID:19525356

  3. Comparative Reannotation of 21 Aspergillus Genomes

    SciTech Connect

    Salamov, Asaf; Riley, Robert; Kuo, Alan; Grigoriev, Igor

    2013-03-08

    We used comparative gene modeling to reannotate 21 Aspergillus genomes. Initial automatic annotation of individual genomes may contain some errors of different nature, e.g. missing genes, incorrect exon-intron structures, 'chimeras', which fuse 2 or more real genes or alternatively splitting some real genes into 2 or more models. The main premise behind the comparative modeling approach is that for closely related genomes most orthologous families have the same conserved gene structure. The algorithm maps all gene models predicted in each individual Aspergillus genome to the other genomes and, for each locus, selects from potentially many competing models, the one which most closely resembles the orthologous genes from other genomes. This procedure is iterated until no further change in gene models is observed. For Aspergillus genomes we predicted in total 4503 new gene models ( ~;;2percent per genome), supported by comparative analysis, additionally correcting ~;;18percent of old gene models. This resulted in a total of 4065 more genes with annotated PFAM domains (~;;3percent increase per genome). Analysis of a few genomes with EST/transcriptomics data shows that the new annotation sets also have a higher number of EST-supported splice sites at exon-intron boundaries.

  4. Glutathione S-Transferase Gene Family in Gossypium raimondii and G. arboreum: Comparative Genomic Study and their Expression under Salt Stress

    PubMed Central

    Dong, Yating; Li, Cong; Zhang, Yi; He, Qiuling; Daud, Muhammad K.; Chen, Jinhong; Zhu, Shuijin

    2016-01-01

    Glutathione S-transferases (GSTs) play versatile functions in multiple aspects of plant growth and development. A comprehensive genome-wide survey of this gene family in the genomes of G. raimondii and G. arboreum was carried out in this study. Based on phylogenetic analyses, the GST gene family of both two diploid cotton species could be divided into eight classes, and approximately all the GST genes within the same subfamily shared similar gene structure. Additionally, the gene structures between the orthologs were highly conserved. The chromosomal localization analyses revealed that GST genes were unevenly distributed across the genome in both G. raimondii and G. arboreum. Tandem duplication could be the major driver for the expansion of GST gene families. Meanwhile, the expression analysis for the selected 40 GST genes showed that they exhibited tissue-specific expression patterns and their expression were induced or repressed by salt stress. Those findings shed lights on the function and evolution of the GST gene family in Gossypium species. PMID:26904090

  5. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    PubMed Central

    Callister, Stephen J.; McCue, Lee Ann; Turse, Joshua E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-01-01

    While comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry, experimental validation of the existence of this core genome requires extensive measurement and is typically not undertaken. Enabled by an extensive proteome database developed over six years, we have experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. Although genomic studies can establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits. PMID:18253490

  6. Comparative Bacterial Proteomics: Analysis of the Core Genome Concept

    SciTech Connect

    Callister, Stephen J.; McCue, Lee Ann; Turse, Josh E.; Monroe, Matthew E.; Auberry, Kenneth J.; Smith, Richard D.; Adkins, Joshua N.; Lipton, Mary S.

    2008-02-06

    Comparative bacterial genomic studies commonly predict a set of genes indicative of common ancestry. Experimental validation of the existence of this core genome requires extensive measurement and is not typically undertaken. Enabled by an extensive proteome database development over a six year period, we experimentally verified the expression of proteins predicted from genomic ortholog comparisons among 17 environmental and pathogenic bacteria. More exclusive relationships were observed among the expressed protein content of phenotypically related bacteria, which is indicative of the specific lifestyles associated with these organisms. While genomic studies establish relative orthologous relationships among a set of bacteria and propose a set of ancestral genes, our proteomics study establishes expressed lifestyle differences among conserved genes and proposes a set of expressed ancestral traits.

  7. Positively Selected Disease Response Orthologous Gene Sets in the Cereals Identified Using Sorghum bicolor L. Moench Expression Profiles and Comparative Genomics

    PubMed Central

    Zamora, Alejandro; Sun, Qi; Hamblin, Martha T.; Aquadro, Charles F.; Kresovich, Stephen

    2009-01-01

    Disease response genes (DRGs) diverge under recurrent positive selection as a result of a molecular arms race between hosts and pathogens. Most of these studies were conducted in animals, and few defense genes have been shown to evolve adaptively in plants. To test for adaptation in the molecules mediating disease resistance in the cereals, we first combined information from the expression pattern of Sorghum bicolor genes and from divergence to the full genome of rice to identify candidate DRGs. We then used evolutionary analyses of orthologous gene sets from several grass species, to determine whether the DRGs show signals of positive selection and the residues targeted. We found 140 divergent genes upregulated under biotic stress in S. bicolor by evaluating the relative abundance of expressed sequence tags in different libraries and comparing them with rice genes. For 10 of these genes, we found sets of orthologs including sequences from rice and three other cereals; six genes showed a pattern of substitution that was consistent with positive selection. Three of these genes, a thaumatin, a peroxidase, and a barley mlo homolog, are known antifungal proteins. The other three genes with evidence of positive selection were a MCM-1 agamous deficiens SRF- (MADS) box transcription factor, an eIF5 translation initiation factor, and a gene of unknown function but with evidence of expression during stress. Permutation analyses, using different ortholog and paralog sequences, consistently identified five positively selected codons in the peroxidase, a member of a cluster of genes and a large gene family. We mapped the positively selected residues onto the structure of the peroxidase and thaumatin and found that all sites are on the surface of these proteins and several are close to biochemically determined active sites. Identifying new positively selected plant disease resistance genes and the critical amino acid sites provides a basis for functional studies that may

  8. Orthology for comparative genomics in the mouse genome database.

    PubMed

    Dolan, Mary E; Baldarelli, Richard M; Bello, Susan M; Ni, Li; McAndrews, Monica S; Bult, Carol J; Kadin, James A; Richardson, Joel E; Ringwald, Martin; Eppig, Janan T; Blake, Judith A

    2015-08-01

    The mouse genome database (MGD) is the model organism database component of the mouse genome informatics system at The Jackson Laboratory. MGD is the international data resource for the laboratory mouse and facilitates the use of mice in the study of human health and disease. Since its beginnings, MGD has included comparative genomics data with a particular focus on human-mouse orthology, an essential component of the use of mouse as a model organism. Over the past 25 years, novel algorithms and addition of orthologs from other model organisms have enriched comparative genomics in MGD data, extending the use of orthology data to support the laboratory mouse as a model of human biology. Here, we describe current comparative data in MGD and review the history and refinement of orthology representation in this resource.

  9. Gramene 2013: Comparative plant genomics resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework fo...

  10. Gramene: a growing plant comparative genomics resource

    USDA-ARS?s Scientific Manuscript database

    Gramene (www.gramene.org) is a curated genetic, genomic and comparative genome analysis resource for the major crop species, such as rice, maize, wheat and many other plant (mainly grass) species. Gramene is an open-source project, with all data and software freely downloadable through the ftp site ...

  11. Comparative genomic analysis of esophageal cancers.

    PubMed

    Caygill, Christine P J; Gatenby, Piers A C; Herceg, Zdenko; Lima, Sheila C S; Pinto, Luis F R; Watson, Anthony; Wu, Ming-Shiang

    2014-09-01

    The following, from the 12th OESO World Conference: Cancers of the Esophagus, includes commentaries on comparative genomic analysis of esophageal cancers: genomic polymorphisms, the genetic and epigenetic drivers in esophageal cancers, and the collection of data in the UK Barrett's Oesophagus Registry.

  12. Gramene 2016: comparative plant genomics and pathway resources.

    PubMed

    Tello-Ruiz, Marcela K; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A; Huerta, Laura; Keays, Maria; Tang, Y Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J; Jaiswal, Pankaj; Ware, Doreen

    2016-01-04

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼ 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  13. Gramene 2016: comparative plant genomics and pathway resources

    PubMed Central

    Tello-Ruiz, Marcela K.; Stein, Joshua; Wei, Sharon; Preece, Justin; Olson, Andrew; Naithani, Sushma; Amarasinghe, Vindhya; Dharmawardhana, Palitha; Jiao, Yinping; Mulvaney, Joseph; Kumari, Sunita; Chougule, Kapeel; Elser, Justin; Wang, Bo; Thomason, James; Bolser, Daniel M.; Kerhornou, Arnaud; Walts, Brandon; Fonseca, Nuno A.; Huerta, Laura; Keays, Maria; Tang, Y. Amy; Parkinson, Helen; Fabregat, Antonio; McKay, Sheldon; Weiser, Joel; D'Eustachio, Peter; Stein, Lincoln; Petryszak, Robert; Kersey, Paul J.; Jaiswal, Pankaj; Ware, Doreen

    2016-01-01

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials. PMID:26553803

  14. ERCC1: a comparative genomic perspective.

    PubMed

    Wilson, M D; Ruttan, C C; Koop, B F; Glickman, B W

    2001-01-01

    ERCC1 plays an essential role in the nucleotide excision repair (NER) of DNA. We compare 37 kb of sequence from the ERCC1 region on human chromosome 19q13.3 to the orthologous region on mouse chromosome 7. In addition to showing the conserved gene structure between ERCC1, ASE-1, and their murine counterparts, this genomic comparison reveals a highly conserved 497 bp segment found 5 kb upstream of ERCC1 exon 1 that contains a CpG island and previously unidentified "classical" promoter elements. Additional putative regulatory elements are also found within a conserved LINE-1 (long interspersed nuclear element) sequence 800 bp upstream of exon 1 in both human and mouse. Expressed sequence tag (EST) assemblies for human ERCC1 identified numerous splice variants involving exons 1, 2, 3, 7, 8, and 9 that could affect DNA repair efficiencies of ERCC1. A previously undescribed transcript that reads through exon 9 and utilizes the polyadenylation signal of a neighboring Alu element accounts for nearly half of the total splice variants identified in the human EST database. This transcript would theoretically translate to a larger ERCC1 protein product containing a novel C-terminal end. Overall, approximately 18% of publicly available ERCC1 cDNA sequences were determined to be splice variants, while no variants were found in the mouse. The ability to assess novel transcripts and identify candidate regulatory regions demonstrates the potential utility for a catalogue archiving comparative analyses for all genes involved in DNA repair. Our comparative genomic analysis of ERCC1 can be viewed at http://web.uvic.ca/-bioweb/laj.html. Copyright 2001 Wiley-Liss, Inc.

  15. Comparative assembly hubs: Web-accessible browsers for comparative genomics

    PubMed Central

    Nguyen, Ngan; Hickey, Glenn; Raney, Brian J.; Armstrong, Joel; Clawson, Hiram; Zweig, Ann; Karolchik, Donna; Kent, William James; Haussler, David; Paten, Benedict

    2014-01-01

    Motivation: Researchers now have access to large volumes of genome sequences for comparative analysis, some generated by the plethora of public sequencing projects and, increasingly, from individual efforts. It is not possible, or necessarily desirable, that the public genome browsers attempt to curate all these data. Instead, a wealth of powerful tools is emerging to empower users to create their own visualizations and browsers. Results: We introduce a pipeline to easily generate collections of Web-accessible UCSC Genome Browsers interrelated by an alignment. It is intended to democratize our comparative genomic browser resources, serving the broad and growing community of evolutionary genomicists and facilitating easy public sharing via the Internet. Using the alignment, all annotations and the alignment itself can be efficiently viewed with reference to any genome in the collection, symmetrically. A new, intelligently scaled alignment display makes it simple to view all changes between the genomes at all levels of resolution, from substitutions to complex structural rearrangements, including duplications. To demonstrate this work, we create a comparative assembly hub containing 57 Escherichia coli and 9 Shigella genomes and show examples that highlight their unique biology. Availability and implementation: The source code is available as open source at: https://github.com/glennhickey/progressiveCactus The E.coli and Shigella genome hub is now a public hub listed on the UCSC browser public hubs Web page. Contact: benedict@soe.ucsc.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25138168

  16. Homology-independent metrics for comparative genomics.

    PubMed

    Coutinho, Tarcisio José Domingos; Franco, Glória Regina; Lobo, Francisco Pereira

    2015-01-01

    A mainstream procedure to analyze the wealth of genomic data available nowadays is the detection of homologous regions shared across genomes, followed by the extraction of biological information from the patterns of conservation and variation observed in such regions. Although of pivotal importance, comparative genomic procedures that rely on homology inference are obviously not applicable if no homologous regions are detectable. This fact excludes a considerable portion of "genomic dark matter" with no significant similarity - and, consequently, no inferred homology to any other known sequence - from several downstream comparative genomic methods. In this review we compile several sequence metrics that do not rely on homology inference and can be used to compare nucleotide sequences and extract biologically meaningful information from them. These metrics comprise several compositional parameters calculated from sequence data alone, such as GC content, dinucleotide odds ratio, and several codon bias metrics. They also share other interesting properties, such as pervasiveness (patterns persist on smaller scales) and phylogenetic signal. We also cite examples where these homology-independent metrics have been successfully applied to support several bioinformatics challenges, such as taxonomic classification of biological sequences without homology inference. They where also used to detect higher-order patterns of interactions in biological systems, ranging from detecting coevolutionary trends between the genomes of viruses and their hosts to characterization of gene pools of entire microbial communities. We argue that, if correctly understood and applied, homology-independent metrics can add important layers of biological information in comparative genomic studies without prior homology inference.

  17. A universal genomic coordinate translator for comparative genomics.

    PubMed

    Zamani, Neda; Sundström, Görel; Meadows, Jennifer R S; Höppner, Marc P; Dainat, Jacques; Lantz, Henrik; Haas, Brian J; Grabherr, Manfred G

    2014-06-30

    Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across

  18. A White Paper on Nematode Comparative Genomics

    PubMed Central

    Bird, David McK.; Blaxter, Mark L.; McCarter, James P.; Mitreva, Makedonka; Sternberg, Paul W.; Thomas, W. Kelley

    2005-01-01

    In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) nematodes have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequencing. PMID:19262884

  19. A white paper on nematode comparative genomics.

    PubMed

    Bird, David McK; Blaxter, Mark L; McCarter, James P; Mitreva, Makedonka; Sternberg, Paul W; Thomas, W Kelley

    2005-12-01

    In response to the new opportunities for genome sequencing and comparative genomics, the Society of Nematology (SON) formed a committee to develop a white paper in support of the broad scientific needs associated with this phylum and interests of SON members. Although genome sequencing is expensive, the data generated are unique in biological systems in that genomes have the potential to be complete (every base of the genome can be accounted for), accurate (the data are digital and not subject to stochastic variation), and permanent (once obtained, the genome of a species does not need to be experimentally re-sampled). The availability of complete, accurate, and permanent genome sequences from diverse nematode species will underpin future studies into the biology and evolution of this phylum and the ecological associations (particularly parasitic) nematodes have with other organisms. We anticipate that upwards of 100 nematode genomes will be solved to varying levels of completion in the coming decade and suggest biological and practical considerations to guide the selection of the most informative taxa for sequencing.

  20. Comparative Genome Analysis of Enterobacter cloacae

    PubMed Central

    Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching

    2013-01-01

    The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314

  1. Comparative Genomics Reveals the Core and Accessory Genomes of Streptomyces Species.

    PubMed

    Kim, Ji-Nu; Kim, Yeonbum; Jeong, Yujin; Roe, Jung-Hye; Kim, Byung-Gee; Cho, Byung-Kwan

    2015-10-01

    The development of rapid and efficient genome sequencing methods has enabled us to study the evolutionary background of bacterial genetic information. Here, we present comparative genomic analysis of 17 Streptomyces species, for which the genome has been completely sequenced, using the pan-genome approach. The analysis revealed that 34,592 ortholog clusters constituted the pan-genome of these Streptomyces species, including 2,018 in the core genome, 11,743 in the dispensable genome, and 20,831 in the unique genome. The core genome was converged to a smaller number of genes than reported previously, with 3,096 gene families. Functional enrichment analysis showed that genes involved in transcription were most abundant in the Streptomyces pan-genome. Finally, we investigated core genes for the sigma factors, mycothiol biosynthesis pathway, and secondary metabolism pathways; our data showed that many genes involved in stress response and morphological differentiation were commonly expressed in Streptomyces species. Elucidation of the core genome offers a basis for understanding the functional evolution of Streptomyces species and provides insights into target selection for the construction of industrial strains.

  2. Comparative genomic analysis of the genus Enterococcus.

    PubMed

    Zhong, Zhi; Zhang, Wenyi; Song, Yuqin; Liu, Wenjun; Xu, Haiyan; Xi, Xiaoxia; Menghe, Bilige; Zhang, Heping; Sun, Zhihong

    2017-03-01

    As important lactic acid bacteria, Enterococcus species are widely used in the production of fermented food. However, as some strains of Enterococcus are opportunistic pathogens, their safety has not been generally accepted. In recent years, a large number of new species have been described and classified within the genus Enterococcus, so a better understanding of the genetic relationships and evolution of Enterococcus species is needed. In this study, the genomes of 29 type strains of Enterococcus species were sequenced. In combination with eight complete genome sequences from the Genbank database, the whole genomes of 37 strains of Enterococcus were comparatively analyzed. The average length of Enterococcus genomes was 3.20Mb and the average GC content was 37.99%. The core- and pan- genomes were defined based on the genomes of the 37 strains of Enterococcus. The core-genome contained 605 genes, a large proportion of which were associated with carbohydrate metabolism, protein metabolism, DNA and RNA metabolism. The phylogenetic tree showed that habitat is very important in the evolution of Enterococcus. The genetic relationships were closer in strains that come from similar habitats. According to the topology of the time tree, we found that humans and mammals may be the original hosts of Enterococcus, and then species from humans and mammals made a host-shift to plants, birds, food and other environments. However, it was just an evolutionary scenario, and more data and efforts were needed to prove this postulation. The comparative genomic analysis provided a snapshot of the evolution and genetic diversity of the genus Enterococcus, which paves the way for follow-up studies on its taxonomy and functional genomics. Copyright © 2017 Elsevier GmbH. All rights reserved.

  3. Identification of candidate genes in Arabidopsis and Populus cell wall biosynthesis using text-mining, co-expression network analysis and comparative genomics.

    PubMed

    Yang, Xiaohan; Ye, Chu-Yu; Bisaria, Anjali; Tuskan, Gerald A; Kalluri, Udaya C

    2011-12-01

    Populus is an important bioenergy crop for bioethanol production. A greater understanding of cell wall biosynthesis processes is critical in reducing biomass recalcitrance, a major hindrance in efficient generation of biofuels from lignocellulosic biomass. Here, we report the identification of candidate cell wall biosynthesis genes through the development and application of a novel bioinformatics pipeline. As a first step, via text-mining of PubMed publications, we obtained 121 Arabidopsis genes that had the experimental evidence supporting their involvement in cell wall biosynthesis or remodeling. The 121 genes were then used as bait genes to query an Arabidopsis co-expression database, and additional genes were identified as neighbors of the bait genes in the network, increasing the number of genes to 548. The 548 Arabidopsis genes were then used to re-query the Arabidopsis co-expression database and re-construct a network that captured additional network neighbors, expanding to a total of 694 genes. The 694 Arabidopsis genes were computationally divided into 22 clusters. Queries of the Populus genome using the Arabidopsis genes revealed 817 Populus orthologs. Functional analysis of gene ontology and tissue-specific gene expression indicated that these Arabidopsis and Populus genes are high likelihood candidates for functional characterization in relation to cell wall biosynthesis.

  4. Comparative genomics of green sulfur bacteria.

    PubMed

    Davenport, Colin; Ussery, David W; Tümmler, Burkhard

    2010-06-01

    Eleven completely sequenced Chlorobi genomes were compared in oligonucleotide usage, gene contents, and synteny. The green sulfur bacteria (GSB) are equipped with a core genome that sustains their anoxygenic phototrophic lifestyle by photosynthesis, sulfur oxidation, and CO(2) fixation. Whole-genome gene family and single gene sequence comparisons yielded similar phylogenetic trees of the sequenced chromosomes indicating a concerted vertical evolution of large gene sets. Chromosomal synteny of genes is not preserved in the phylum Chlorobi. The accessory genome is characterized by anomalous oligonucleotide usage and endows the strains with individual features for transport, secretion, cell wall, extracellular constituents, and a few elements of the biosynthetic apparatus. Giant genes are a peculiar feature of the genera Chlorobium and Prosthecochloris. The predicted proteins have a huge molecular weight of 10(6), and are probably instrumental for the bacteria to generate their own intimate (micro)environment.

  5. Comparative genomic hybridization with single cells after whole genome amplification

    SciTech Connect

    Haddad, B.R.; Baldini, A.; Hughes, M.R.

    1994-09-01

    Conventional karyotype analysis is the ideal way to diagnose chromosomal imbalances. However it requires cell culture and chromosome preparation. There are instances where a very small number of cells are available for cytogenetic evaluation and chromosomes cannot be obtained. Comparative genomic hybridization (CGH) is a novel molecular cytogenetic technique that provides information about genetic imbalances affecting the genome. The power of this technique lies in its ability to detect genetic imbalances using total genomic DNA. We have previously demonstrated the feasibility of whole genome amplification from single cells for subsequent analysis of multiple genetic loci by PCR. In this present work, we combine whole genome amplification with CGH to detect chromosomal imbalances from small numbers of cells. Both cytogenetically normal and abnormal cells were individually picked by micromanipulation and subjected to whole genome amplification using random oligonucleotide primers. Amplified test and control DNA were differentially labeled by incorporation of digoxigenin or biotin, mixed together and hybridized to normal male metaphase spreads. Hybridization was detected with two fluorochromes, rhodamine-anti-digoxigenin and FITC -Avidin. Ratio of intensities of the two fluorochromes along the target chromosomes was analyzed using locally developed computer imaging software. Using the combination of whole genome amplification and CGH, we were able to detect different chromosomal aneuploidies from 30, 20, and 10 cells. It can also be applied to the analysis of fetal cells sorted from maternal circulation, or to tumor cells obtained from needle biopsies or from different body fluids and effusions. Finally, its successful application to single cells will have a great impact on preimplantation diagnosis.

  6. Sequencing and comparing whole mitochondrial genomes ofanimals

    SciTech Connect

    Boore, Jeffrey L.; Macey, J. Robert; Medina, Monica

    2005-04-22

    Comparing complete animal mitochondrial genome sequences is becoming increasingly common for phylogenetic reconstruction and as a model for genome evolution. Not only are they much more informative than shorter sequences of individual genes for inferring evolutionary relatedness, but these data also provide sets of genome-level characters, such as the relative arrangements of genes, that can be especially powerful. We describe here the protocols commonly used for physically isolating mtDNA, for amplifying these by PCR or RCA, for cloning,sequencing, assembly, validation, and gene annotation, and for comparing both sequences and gene arrangements. On several topics, we offer general observations based on our experiences to date with determining and comparing complete mtDNA sequences.

  7. Comparative Genomics of Cluster O Mycobacteriophages

    PubMed Central

    Cresawn, Steven G.; Pope, Welkin H.; Jacobs-Sera, Deborah; Bowman, Charles A.; Russell, Daniel A.; Dedrick, Rebekah M.; Adair, Tamarah; Anders, Kirk R.; Ball, Sarah; Bollivar, David; Breitenberger, Caroline; Burnett, Sandra H.; Butela, Kristen; Byrnes, Deanna; Carzo, Sarah; Cornely, Kathleen A.; Cross, Trevor; Daniels, Richard L.; Dunbar, David; Findley, Ann M.; Gissendanner, Chris R.; Golebiewska, Urszula P.; Hartzog, Grant A.; Hatherill, J. Robert; Hughes, Lee E.; Jalloh, Chernoh S.; De Los Santos, Carla; Ekanem, Kevin; Khambule, Sphindile L.; King, Rodney A.; King-Smith, Christina; Klyczek, Karen; Krukonis, Greg P.; Laing, Christian; Lapin, Jonathan S.; Lopez, A. Javier; Mkhwanazi, Sipho M.; Molloy, Sally D.; Moran, Deborah; Munsamy, Vanisha; Pacey, Eddie; Plymale, Ruth; Poxleitner, Marianne; Reyna, Nathan; Schildbach, Joel F.; Stukey, Joseph; Taylor, Sarah E.; Ware, Vassie C.; Wellmann, Amanda L.; Westholm, Daniel; Wodarski, Donna; Zajko, Michelle; Zikalala, Thabiso S.; Hendrix, Roger W.; Hatfull, Graham F.

    2015-01-01

    Mycobacteriophages – viruses of mycobacterial hosts – are genetically diverse but morphologically are all classified in the Caudovirales with double-stranded DNA and tails. We describe here a group of five closely related mycobacteriophages – Corndog, Catdawg, Dylan, Firecracker, and YungJamal – designated as Cluster O with long flexible tails but with unusual prolate capsids. Proteomic analysis of phage Corndog particles, Catdawg particles, and Corndog-infected cells confirms expression of half of the predicted gene products and indicates a non-canonical mechanism for translation of the Corndog tape measure protein. Bioinformatic analysis identifies 8–9 strongly predicted SigA promoters and all five Cluster O genomes contain more than 30 copies of a 17 bp repeat sequence with dyad symmetry located throughout the genomes. Comparison of the Cluster O phages provides insights into phage genome evolution including the processes of gene flux by horizontal genetic exchange. PMID:25742016

  8. Comparative Genomic Analysis of N2-Fixing and Non-N2-Fixing Paenibacillus spp.: Organization, Evolution and Expression of the Nitrogen Fixation Genes

    PubMed Central

    Xie, Jian-Bo; Du, Zhenglin; Bai, Lanqing; Tian, Changfu; Zhang, Yunzhi; Xie, Jiu-Yan; Wang, Tianshu; Liu, Xiaomeng; Chen, Xi; Cheng, Qi; Chen, Sanfeng; Li, Jilun

    2014-01-01

    We provide here a comparative genome analysis of 31 strains within the genus Paenibacillus including 11 new genomic sequences of N2-fixing strains. The heterogeneity of the 31 genomes (15 N2-fixing and 16 non-N2-fixing Paenibacillus strains) was reflected in the large size of the shell genome, which makes up approximately 65.2% of the genes in pan genome. Large numbers of transposable elements might be related to the heterogeneity. We discovered that a minimal and compact nif cluster comprising nine genes nifB, nifH, nifD, nifK, nifE, nifN, nifX, hesA and nifV encoding Mo-nitrogenase is conserved in the 15 N2-fixing strains. The nif cluster is under control of a σ70-depedent promoter and possesses a GlnR/TnrA-binding site in the promoter. Suf system encoding [Fe–S] cluster is highly conserved in N2-fixing and non-N2-fixing strains. Furthermore, we demonstrate that the nif cluster enabled Escherichia coli JM109 to fix nitrogen. Phylogeny of the concatenated NifHDK sequences indicates that Paenibacillus and Frankia are sister groups. Phylogeny of the concatenated 275 single-copy core genes suggests that the ancestral Paenibacillus did not fix nitrogen. The N2-fixing Paenibacillus strains were generated by acquiring the nif cluster via horizontal gene transfer (HGT) from a source related to Frankia. During the history of evolution, the nif cluster was lost, producing some non-N2-fixing strains, and vnf encoding V-nitrogenase or anf encoding Fe-nitrogenase was acquired, causing further diversification of some strains. In addition, some N2-fixing strains have additional nif and nif-like genes which may result from gene duplications. The evolution of nitrogen fixation in Paenibacillus involves a mix of gain, loss, HGT and duplication of nif/anf/vnf genes. This study not only reveals the organization and distribution of nitrogen fixation genes in Paenibacillus, but also provides insight into the complex evolutionary history of nitrogen fixation. PMID:24651173

  9. VISTA - computational tools for comparative genomics

    SciTech Connect

    Frazer, Kelly A.; Pachter, Lior; Poliakov, Alexander; Rubin,Edward M.; Dubchak, Inna

    2004-01-01

    Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/VISTA/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, submit their own sequences of interest to several VISTA servers for various types of comparative analysis, and obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kilobase (kb) interval on human chromosome 5 that encodes for the kinesin family member3A (KIF3A) protein.

  10. Ebolavirus comparative genomics

    SciTech Connect

    Jun, Se-Ran; Leuze, Michael R.; Nookaew, Intawat; Uberbacher, Edward C.; Land, Miriam; Zhang, Qian; Wanchai, Visanu; Chai, Juanjuan; Nielsen, Morten; Trolle, Thomas; Lund, Ole; Buzard, Gregory S.; Pedersen, Thomas D.; Ussery, David W.

    2015-07-14

    The 2014 Ebola outbreak in West Africa is the largest documented for this virus. We examine the dynamics of this genome, comparing more than one hundred currently available ebolavirus genomes to each other and to other viral genomes. Based on oligomer frequency analysis, the family Filoviridae forms a distinct group from all other sequenced viral genomes. All filovirus genomes sequenced to date encode proteins with similar functions and gene order, although there is considerable divergence in sequences between the three genera Ebolavirus, Cuevavirus, and Marburgvirus within the family Filoviridae. Whereas all ebolavirus genomes are quite similar (multiple sequences of the same strain are often identical), variation is most common in the intergenic regions and within specific areas of the genes encoding the glycoprotein (GP), nucleoprotein (NP), and polymerase (L). We predict regions that could contain epitope-binding sites, which might be good vaccine targets. In conclusion, this information, combined with glycosylation sites and experimentally determined epitopes, can identify the most promising regions for the development of therapeutic strategies.

  11. Comparative and demographic analysis of orangutan genomes

    PubMed Central

    Locke, Devin P.; Hillier, LaDeana W.; Warren, Wesley C.; Worley, Kim C.; Nazareth, Lynne V.; Muzny, Donna M.; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T.; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D.; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A.; Fulton, Robert S.; Nelson, Joanne O.; Magrini, Vincent; Pohl, Craig; Graves, Tina A.; Markovic, Chris; Cree, Andy; Dinh, Huyen H.; Hume, Jennifer; Kovar, Christie L.; Fowler, Gerald R.; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P.; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M.; Eichler, Evan E.; White, Simon; Searle, Stephen; Vilella, Albert J.; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Valle, Giuliano Della; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K.; Walker, Jerilyn A.; Ullmer, Brygg; Batzer, Mark A.; Smit, Arian F. A.; Hubley, Robert; Casola, Claudio; Schrider, Daniel R.; Hahn, Matthew W.; Quesada, Victor; Puente, Xose S.; Ordoñez, Gonzalo R.; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S.; Miller, Webb; Kosiol, Carolin; Lawson, Heather A.; Taliwal, Vikas; Martins, André L.; Siepel, Adam; RoyChoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D.; Gutenkunst, Ryan N.; Mailund, Thomas; Dutheil, Julien Y.; Hobolth, Asger; Schierup, Mikkel H.; Chemnick, Leona; Ryder, Oliver A.; Yoshinaga, Yuko; de Jong, Pieter J.; Weinstock, George M.; Rogers, Jeffrey; Mardis, Elaine R.; Gibbs, Richard A.; Wilson, Richard K.

    2011-01-01

    “Orangutan” is derived from the Malay term “man of the forest” and aptly describes the Southeast Asian great apes native to Sumatra and Borneo. The orangutan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orangutan draft genome assembly and short read sequence data from five Sumatran and five Bornean orangutan genomes. Our analyses reveal that, compared to other primates, the orangutan genome has many unique features. Structural evolution of the orangutan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe the first primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orangutan genome structure. Orangutans have extremely low energy usage for a eutherian mammal1, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400k years ago (ya), is more recent than most previous studies and underscores the complexity of the orangutan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (Ne) expanded exponentially relative to the ancestral Ne after the split, while Bornean Ne declined over the same period. Overall, the resources and analyses presented here offer new opportunities

  12. Genome-wide identification and comparative expression analysis of NBS-LRR-encoding genes upon Colletotrichum gloeosporioides infection in two ecotypes of Fragaria vesca.

    PubMed

    Li, Jing; Zhang, Qing-Yu; Gao, Zhi-Hong; Wang, Fei; Duan, Ke; Ye, Zheng-Wen; Gao, Qing-Hua

    2013-09-15

    Anthracnose caused by Colletotrichum spp. is one of the most destructive diseases of cultivated strawberry (Fragaria×ananassa Duchesne) worldwide. The correlation between NBS-LRR genes, the largest class of known resistance genes, and strawberry anthracnose resistance has been elusive. BLAST search in NCBI identified 94 FvNBSs in the diploid genome of strawberry Fragaria vesca, with 67 of the TIR-NBS-LRR type. At least 36 FvNBSs were expressed, with 25% being non-coding genes. Two F. vesca ecotypes, HLJ and YW, showed great variations in both morphological and physiological responses upon C. gloeosporioides infection. qRT-PCR revealed that 5 of the 12 leaf-expressed FvNBSs displaying opposite transcription responses to C. gloeosporioides infection in two ecotypes. These results showed that the transcriptional responses of several FvNBSs were involved in the ecotype-specific responses to C. gloeosporioides in F. vesca. These FvNBSs hold potential in characterizing molecular components and developing novel markers associated with anthracnose resistance in strawberry.

  13. Comparative genomics of Shiga toxin encoding bacteriophages

    PubMed Central

    2012-01-01

    Background Stx bacteriophages are responsible for driving the dissemination of Stx toxin genes (stx) across their bacterial host range. Lysogens carrying Stx phages can cause severe, life-threatening disease and Stx toxin is an integral virulence factor. The Stx-bacteriophage vB_EcoP-24B, commonly referred to as Ф24B, is capable of multiply infecting a single bacterial host cell at a high frequency, with secondary infection increasing the rate at which subsequent bacteriophage infections can occur. This is biologically unusual, therefore determining the genomic content and context of Ф24B compared to other lambdoid Stx phages is important to understanding the factors controlling this phenomenon and determining whether they occur in other Stx phages. Results The genome of the Stx2 encoding phage, Ф24B was sequenced and annotated. The genomic organisation and general features are similar to other sequenced Stx bacteriophages induced from Enterohaemorrhagic Escherichia coli (EHEC), however Ф24B possesses significant regions of heterogeneity, with implications for phage biology and behaviour. The Ф24B genome was compared to other sequenced Stx phages and the archetypal lambdoid phage, lambda, using the Circos genome comparison tool and a PCR-based multi-loci comparison system. Conclusions The data support the hypothesis that Stx phages are mosaic, and recombination events between the host, phages and their remnants within the same infected bacterial cell will continue to drive the evolution of Stx phage variants and the subsequent dissemination of shigatoxigenic potential. PMID:22799768

  14. Comparative genomics of biotechnologically important yeasts

    USDA-ARS?s Scientific Manuscript database

    Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the...

  15. Novel recombinant papillomavirus genomes expressing selectable genes

    PubMed Central

    Van Doorslaer, Koenraad; Porter, Samuel; McKinney, Caleb; Stepp, Wesley H.; McBride, Alison A.

    2016-01-01

    Papillomaviruses infect and replicate in keratinocytes, but viral proteins are initially expressed at low levels and there is no effective and quantitative method to determine the efficiency of infection on a cell-to-cell basis. Here we describe human papillomavirus (HPV) genomes that express marker proteins (antibiotic resistance genes and Green Fluorescent Protein), and can be used to elucidate early stages in HPV infection of primary keratinocytes. To generate these recombinant genomes, the late region of the oncogenic HPV18 genome was replaced by CpG free marker genes. Insertion of these exogenous genes did not affect early replication, and had only minimal effects on early viral transcription. When introduced into primary keratinocytes, the recombinant marker genomes gave rise to drug-resistant keratinocyte colonies and cell lines, which maintained the extrachromosomal recombinant genome long-term. Furthermore, the HPV18 “marker” genomes could be packaged into viral particles (quasivirions) and used to infect primary human keratinocytes in culture. This resulted in the outgrowth of drug-resistant keratinocyte colonies containing replicating HPV18 genomes. In summary, we describe HPV18 marker genomes that can be used to quantitatively investigate many aspects of the viral life cycle. PMID:27892937

  16. Comparative genomics of chondrichthyan Hoxa clusters.

    PubMed

    Mulley, John F; Zhong, Ying-Fu; Holland, Peter Wh

    2009-09-02

    The chondrichthyan or cartilaginous fish (chimeras, sharks, skates and rays) occupy an important phylogenetic position as the sister group to all other jawed vertebrates and as an early lineage to diverge from the vertebrate lineage following two whole genome duplication events in vertebrate evolution. There have been few comparative genomic analyses incorporating data from chondrichthyan fish and none comparing genomic information from within the group. We have sequenced the complete Hoxa cluster of the Little Skate (Leucoraja erinacea) and compared to the published Hoxa cluster of the Horn Shark (Heterodontus francisci) and to available data from the Elephant Shark (Callorhinchus milii) genome project. A BAC clone containing the full Little Skate Hoxa cluster was fully sequenced and assembled. Analyses of coding sequences and conserved non-coding elements reveal a strikingly high level of conservation across the cartilaginous fish, with twenty ultraconserved elements (100%,100 bp) found between Skate and Horn Shark, compared to three between human and marsupials. We have also identified novel potential non-coding RNAs in the Skate BAC clone, some of which are conserved to other species. We find that the Little Skate Hoxa cluster is remarkably similar to the previously published Horn Shark Hoxa cluster with respect to sequence identity, gene size and intergenic distance despite over 180 million years of separation between the two lineages. We suggest that the genomes of cartilaginous fish are more highly conserved than those of tetrapods or teleost fish and so are more likely to have retained ancestral non-coding elements. While useful for isolating homologous DNA, this complicates bioinformatic approaches to identify chondrichthyan-specific non-coding DNA elements.

  17. Comparative genomics of chondrichthyan Hoxa clusters

    PubMed Central

    Mulley, John F; Zhong, Ying-Fu; Holland, Peter WH

    2009-01-01

    Background The chondrichthyan or cartilaginous fish (chimeras, sharks, skates and rays) occupy an important phylogenetic position as the sister group to all other jawed vertebrates and as an early lineage to diverge from the vertebrate lineage following two whole genome duplication events in vertebrate evolution. There have been few comparative genomic analyses incorporating data from chondrichthyan fish and none comparing genomic information from within the group. We have sequenced the complete Hoxa cluster of the Little Skate (Leucoraja erinacea) and compared to the published Hoxa cluster of the Horn Shark (Heterodontus francisci) and to available data from the Elephant Shark (Callorhinchus milii) genome project. Results A BAC clone containing the full Little Skate Hoxa cluster was fully sequenced and assembled. Analyses of coding sequences and conserved non-coding elements reveal a strikingly high level of conservation across the cartilaginous fish, with twenty ultraconserved elements (100%,100 bp) found between Skate and Horn Shark, compared to three between human and marsupials. We have also identified novel potential non-coding RNAs in the Skate BAC clone, some of which are conserved to other species. Conclusion We find that the Little Skate Hoxa cluster is remarkably similar to the previously published Horn Shark Hoxa cluster with respect to sequence identity, gene size and intergenic distance despite over 180 million years of separation between the two lineages. We suggest that the genomes of cartilaginous fish are more highly conserved than those of tetrapods or teleost fish and so are more likely to have retained ancestral non-coding elements. While useful for isolating homologous DNA, this complicates bioinformatic approaches to identify chondrichthyan-specific non-coding DNA elements PMID:19725973

  18. Decoding the molecular evolution of human cognition using comparative genomics

    PubMed Central

    Usui, Noriyoshi; Co, Marissa; Konopka, Genevieve

    2014-01-01

    Identification of genetic and molecular factors responsible for the specialized cognitive abilities of humans is expected to provide important insights into the mechanisms responsible for disorders of cognition such as autism, schizophrenia, and Alzheimer’s disease. Here, we discuss the use of comparative genomics for identifying salient genes and gene networks that may underlie cognition. We focus on the comparison of human and non-human primate brain gene expression and the utility of building gene co-expression networks for prioritizing hundreds of genes that differ in expression among the species queried. We also discuss the importance and methods for functional studies of individual genes identified. Together, this integration of comparative genomics with cellular and animal models should provide improved systems for developing effective therapeutics for disorders of cognition. PMID:25247723

  19. Comparative genomics tools applied to bioterrorism defence.

    PubMed

    Slezak, Tom; Kuczmarski, Tom; Ott, Linda; Torres, Clinton; Medeiros, Dan; Smith, Jason; Truitt, Brian; Mulakken, Nisha; Lam, Marisa; Vitalis, Elizabeth; Zemla, Adam; Zhou, Carol Ecale; Gardner, Shea

    2003-06-01

    Rapid advances in the genomic sequencing of bacteria and viruses over the past few years have made it possible to consider sequencing the genomes of all pathogens that affect humans and the crops and livestock upon which our lives depend. Recent events make it imperative that full genome sequencing be accomplished as soon as possible for pathogens that could be used as weapons of mass destruction or disruption. This sequence information must be exploited to provide rapid and accurate diagnostics to identify pathogens and distinguish them from harmless near-neighbours and hoaxes. The Chem-Bio Non-Proliferation (CBNP) programme of the US Department of Energy (DOE) began a large-scale effort of pathogen detection in early 2000 when it was announced that the DOE would be providing bio-security at the 2002 Winter Olympic Games in Salt Lake City, Utah. Our team at the Lawrence Livermore National Lab (LLNL) was given the task of developing reliable and validated assays for a number of the most likely bioterrorist agents. The short timeline led us to devise a novel system that utilised whole-genome comparison methods to rapidly focus on parts of the pathogen genomes that had a high probability of being unique. Assays developed with this approach have been validated by the Centers for Disease Control (CDC). They were used at the 2002 Winter Olympics, have entered the public health system, and have been in continual use for non-publicised aspects of homeland defence since autumn 2001. Assays have been developed for all major threat list agents for which adequate genomic sequence is available, as well as for other pathogens requested by various government agencies. Collaborations with comparative genomics algorithm developers have enabled our LLNL team to make major advances in pathogen detection, since many of the existing tools simply did not scale well enough to be of practical use for this application. It is hoped that a discussion of a real-life practical application of

  20. Whole-genome sequencing for comparative genomics and de novo genome assembly.

    PubMed

    Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

    2015-01-01

    Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).

  1. DCODE.ORG Anthology of Comparative Genomic Tools

    SciTech Connect

    Loots, G G; Ovcharenko, I

    2005-01-11

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.

  2. Comparative Analysis of Genome Sequences with VISTA

    DOE Data Explorer

    Dubchak, Inna

    VISTA is a comprehensive suite of programs and databases developed by and hosted at the Genomics Division of Lawrence Berkeley National Laboratory. They provide information and tools designed to facilitate comparative analysis of genomic sequences. Users have two ways to interact with the suite of applications at the VISTA portal. They can submit their own sequences and alignments for analysis (VISTA servers) or examine pre-computed whole-genome alignments of different species. A key menu option is the Enhancer Browser and Database at http://enhancer.lbl.gov/. The VISTA Enhancer Browser is a central resource for experimentally validated human noncoding fragments with gene enhancer activity as assessed in transgenic mice. Most of these noncoding elements were selected for testing based on their extreme conservation with other vertebrates. The results of this enhancer screen are provided through this publicly available website. The browser also features relevant results by external contributors and a large collection of additional genome-wide conserved noncoding elements which are candidate enhancer sequences. The LBL developers invite external groups to submit computational predictions of developmental enhancers. As of 10/19/2009 the database contains information on 1109 in vivo tested elements - 508 elements with enhancer activity.

  3. Comparative genome analysis of Basidiomycete fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  4. Comparative genomics of biotechnologically important yeasts

    PubMed Central

    Riley, Robert; Haridas, Sajeet; Wolfe, Kenneth H.; Lopes, Mariana R.; Hittinger, Chris Todd; Göker, Markus; Salamov, Asaf A.; Wisecaver, Jennifer H.; Long, Tanya M.; Aerts, Andrea L.; Barry, Kerrie W.; Choi, Cindy; Clum, Alicia; Coughlan, Aisling Y.; Deshpande, Shweta; Douglass, Alexander P.; Hanson, Sara J.; Klenk, Hans-Peter; LaButti, Kurt M.; Lapidus, Alla; Lindquist, Erika A.; Lipzen, Anna M.; Meier-Kolthoff, Jan P.; Ohm, Robin A.; Otillar, Robert P.; Pangilinan, Jasmyn L.; Peng, Yi; Rosa, Carlos A.; Scheuner, Carmen; Sibirny, Andriy A.; Slot, Jason C.; Stielow, J. Benjamin; Sun, Hui; Kurtzman, Cletus P.; Blackwell, Meredith; Grigoriev, Igor V.

    2016-01-01

    Ascomycete yeasts are metabolically diverse, with great potential for biotechnology. Here, we report the comparative genome analysis of 29 taxonomically and biotechnologically important yeasts, including 16 newly sequenced. We identify a genetic code change, CUG-Ala, in Pachysolen tannophilus in the clade sister to the known CUG-Ser clade. Our well-resolved yeast phylogeny shows that some traits, such as methylotrophy, are restricted to single clades, whereas others, such as l-rhamnose utilization, have patchy phylogenetic distributions. Gene clusters, with variable organization and distribution, encode many pathways of interest. Genomics can predict some biochemical traits precisely, but the genomic basis of others, such as xylose utilization, remains unresolved. Our data also provide insight into early evolution of ascomycetes. We document the loss of H3K9me2/3 heterochromatin, the origin of ascomycete mating-type switching, and panascomycete synteny at the MAT locus. These data and analyses will facilitate the engineering of efficient biosynthetic and degradative pathways and gateways for genomic manipulation. PMID:27535936

  5. COMPARISON OF COMPARATIVE GENOMIC HYBRIDIZATIONS TECHNOLOGIES ACROSS MICROARRAY PLATFORMS

    EPA Science Inventory

    Comparative Genomic Hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase...

  6. COMPARISON OF COMPARATIVE GENOMIC HYBRIDIZATIONS TECHNOLOGIES ACROSS MICROARRAY PLATFORMS

    EPA Science Inventory

    Comparative Genomic Hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase...

  7. Genome-wide identification and comparative expression analysis reveal a rapid expansion and functional divergence of duplicated genes in the WRKY gene family of cabbage, Brassica oleracea var. capitata.

    PubMed

    Yao, Qiu-Yang; Xia, En-Hua; Liu, Fei-Hu; Gao, Li-Zhi

    2015-02-15

    WRKY transcription factors (TFs), one of the ten largest TF families in higher plants, play important roles in regulating plant development and resistance. To date, little is known about the WRKY TF family in Brassica oleracea. Recently, the completed genome sequence of cabbage (B. oleracea var. capitata) allows us to systematically analyze WRKY genes in this species. A total of 148 WRKY genes were characterized and classified into seven subgroups that belong to three major groups. Phylogenetic and synteny analyses revealed that the repertoire of cabbage WRKY genes was derived from a common ancestor shared with Arabidopsis thaliana. The B. oleracea WRKY genes were found to be preferentially retained after the whole-genome triplication (WGT) event in its recent ancestor, suggesting that the WGT event had largely contributed to a rapid expansion of the WRKY gene family in B. oleracea. The analysis of RNA-Seq data from various tissues (i.e., roots, stems, leaves, buds, flowers and siliques) revealed that most of the identified WRKY genes were positively expressed in cabbage, and a large portion of them exhibited patterns of differential and tissue-specific expression, demonstrating that these gene members might play essential roles in plant developmental processes. Comparative analysis of the expression level among duplicated genes showed that gene expression divergence was evidently presented among cabbage WRKY paralogs, indicating functional divergence of these duplicated WRKY genes.

  8. Image analysis in comparative genomic hybridization

    SciTech Connect

    Lundsteen, C.; Maahr, J.; Christensen, B.

    1995-01-01

    Comparative genomic hybridization (CGH) is a new technique by which genomic imbalances can be detected by combining in situ suppression hybridization of whole genomic DNA and image analysis. We have developed software for rapid, quantitative CGH image analysis by a modification and extension of the standard software used for routine karyotyping of G-banded metaphase spreads in the Magiscan chromosome analysis system. The DAPI-counterstained metaphase spread is karyotyped interactively. Corrections for image shifts between the DAPI, FITC, and TRITC images are done manually by moving the three images relative to each other. The fluorescence background is subtracted. A mean filter is applied to smooth the FITC and TRITC images before the fluorescence ratio between the individual FITC and TRITC-stained chromosomes is computed pixel by pixel inside the area of the chromosomes determined by the DAPI boundaries. Fluorescence intensity ratio profiles are generated, and peaks and valleys indicating possible gains and losses of test DNA are marked if they exceed ratios below 0.75 and above 1.25. By combining the analysis of several metaphase spreads, consistent findings of gains and losses in all or almost all spreads indicate chromosomal imbalance. Chromosomal imbalances are detected either by visual inspection of fluorescence ratio (FR) profiles or by a statistical approach that compares FR measurements of the individual case with measurements of normal chromosomes. The complete analysis of one metaphase can be carried out in approximately 10 minutes. 8 refs., 7 figs., 1 tab.

  9. Comparative genomics and evolution of eukaryotic phospholipidbiosynthesis

    SciTech Connect

    Lykidis, Athanasios

    2006-12-01

    Phospholipid biosynthetic enzymes produce diverse molecular structures and are often present in multiple forms encoded by different genes. This work utilizes comparative genomics and phylogenetics for exploring the distribution, structure and evolution of phospholipid biosynthetic genes and pathways in 26 eukaryotic genomes. Although the basic structure of the pathways was formed early in eukaryotic evolution, the emerging picture indicates that individual enzyme families followed unique evolutionary courses. For example, choline and ethanolamine kinases and cytidylyltransferases emerged in ancestral eukaryotes, whereas, multiple forms of the corresponding phosphatidyltransferases evolved mainly in a lineage specific manner. Furthermore, several unicellular eukaryotes maintain bacterial-type enzymes and reactions for the synthesis of phosphatidylglycerol and cardiolipin. Also, base-exchange phosphatidylserine synthases are widespread and ancestral enzymes. The multiplicity of phospholipid biosynthetic enzymes has been largely generated by gene expansion in a lineage specific manner. Thus, these observations suggest that phospholipid biosynthesis has been an actively evolving system. Finally, comparative genomic analysis indicates the existence of novel phosphatidyltransferases and provides a candidate for the uncharacterized eukaryotic phosphatidylglycerol phosphate phosphatase.

  10. Comparative genome map of human and cattle

    SciTech Connect

    Solinas-Toldo, S.; Fries, R.; Lengauer, C.

    1995-06-10

    Chromosomal homologies between individual human chromosomes and the bovine karyotype have been established by using a new approach termed Zoo-FISH. Labeled DNA libraries from flow-sorted human chromosomes were used as probes for fluorescence in situ hybridization on cattle chromosomes. All human DNA libraries, except the Y chromosome library, hybridized to one or more cattle chromosomes, identifying and delineating 50 segments of homology, most of them corresponding to the regions of homology as identified by the previous mapping of individual conserved loci. However, Zoo-FISH refines the comparative maps constructed by molecular gene mapping of individual loci by providing information on the boundaries of conserved regions in the absence of obvious cytogenetic homologies of human and bovine chromosomes. It allows study of karyotypic evolution and opens new avenues for genomic analysis by facilitating the extrapolation of results from the human genome initiative. 50 refs., 3 figs., 1 tab.

  11. Decoding the molecular evolution of human cognition using comparative genomics.

    PubMed

    Usui, Noriyoshi; Co, Marissa; Konopka, Genevieve

    2014-01-01

    Identification of genetic and molecular factors responsible for the specialized cognitive abilities of humans is expected to provide important insights into the mechanisms responsible for disorders of cognition such as autism, schizophrenia and Alzheimer's disease. Here, we discuss the use of comparative genomics for identifying salient genes and gene networks that may underlie cognition. We focus on the comparison of human and non-human primate brain gene expression and the utility of building gene coexpression networks for prioritizing hundreds of genes that differ in expression among the species queried. We also discuss the importance of and methods for functional studies of the individual genes identified. Together, this integration of comparative genomics with cellular and animal models should provide improved systems for developing effective therapeutics for disorders of cognition. © 2014 S. Karger AG, Basel.

  12. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes

    PubMed Central

    Gupta, Nitin; Benhamida, Jamal; Bhargava, Vipul; Goodman, Daniel; Kain, Elisabeth; Kerman, Ian; Nguyen, Ngan; Ollikainen, Noah; Rodriguez, Jesse; Wang, Jian; Lipton, Mary S.; Romine, Margaret; Bafna, Vineet; Smith, Richard D.; Pevzner, Pavel A.

    2008-01-01

    Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional “single proteome” approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of “one-hit-wonders” in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations. PMID:18426904

  13. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes.

    PubMed

    Gupta, Nitin; Benhamida, Jamal; Bhargava, Vipul; Goodman, Daniel; Kain, Elisabeth; Kerman, Ian; Nguyen, Ngan; Ollikainen, Noah; Rodriguez, Jesse; Wang, Jian; Lipton, Mary S; Romine, Margaret; Bafna, Vineet; Smith, Richard D; Pevzner, Pavel A

    2008-07-01

    Recent proliferation of low-cost DNA sequencing techniques will soon lead to an explosive growth in the number of sequenced genomes and will turn manual annotations into a luxury. Mass spectrometry recently emerged as a valuable technique for proteogenomic annotations that improves on the state-of-the-art in predicting genes and other features. However, previous proteogenomic approaches were limited to a single genome and did not take advantage of analyzing mass spectrometry data from multiple genomes at once. We show that such a comparative proteogenomics approach (like comparative genomics) allows one to address the problems that remained beyond the reach of the traditional "single proteome" approach in mass spectrometry. In particular, we show how comparative proteogenomics addresses the notoriously difficult problem of "one-hit-wonders" in proteomics, improves on the existing gene prediction tools in genomics, and allows identification of rare post-translational modifications. We therefore argue that complementing DNA sequencing projects by comparative proteogenomics projects can be a viable approach to improve both genomic and proteomic annotations.

  14. Reduction and expansion in microsporidian genome evolution: new insights from comparative genomics.

    PubMed

    Nakjang, Sirintra; Williams, Tom A; Heinz, Eva; Watson, Andrew K; Foster, Peter G; Sendra, Kacper M; Heaps, Sarah E; Hirt, Robert P; Martin Embley, T

    2013-01-01

    Microsporidia are an abundant group of obligate intracellular parasites of other eukaryotes, including immunocompromised humans, but the molecular basis of their intracellular lifestyle and pathobiology are poorly understood. New genomes from a taxonomically broad range of microsporidians, complemented by published expression data, provide an opportunity for comparative analyses to identify conserved and lineage-specific patterns of microsporidian genome evolution that have underpinned this success. In this study, we infer that a dramatic bottleneck in the last common microsporidian ancestor (LCMA) left a small conserved core of genes that was subsequently embellished by gene family expansion driven by gene acquisition in different lineages. Novel expressed protein families represent a substantial fraction of sequenced microsporidian genomes and are significantly enriched for signals consistent with secretion or membrane location. Further evidence of selection is inferred from the gain and reciprocal loss of functional domains between paralogous genes, for example, affecting transport proteins. Gene expansions among transporter families preferentially affect those that are located on the plasma membrane of model organisms, consistent with recruitment to plug conserved gaps in microsporidian biosynthesis and metabolism. Core microsporidian genes shared with other eukaryotes are enriched in orthologs that, in yeast, are highly expressed, highly connected, and often essential, consistent with strong negative selection against further reduction of the conserved gene set since the LCMA. Our study reveals that microsporidian genome evolution is a highly dynamic process that has balanced constraint, reductive evolution, and genome expansion during adaptation to an extraordinarily successful obligate intracellular lifestyle.

  15. Comparative genome sequencing reveals genomic signature of extreme desiccation tolerance in the anhydrobiotic midge

    PubMed Central

    Gusev, Oleg; Suetsugu, Yoshitaka; Cornette, Richard; Kawashima, Takeshi; Logacheva, Maria D.; Kondrashov, Alexey S.; Penin, Aleksey A.; Hatanaka, Rie; Kikuta, Shingo; Shimura, Sachiko; Kanamori, Hiroyuki; Katayose, Yuichi; Matsumoto, Takashi; Shagimardanova, Elena; Alexeev, Dmitry; Govorun, Vadim; Wisecaver, Jennifer; Mikheyev, Alexander; Koyanagi, Ryo; Fujie, Manabu; Nishiyama, Tomoaki; Shigenobu, Shuji; Shibata, Tomoko F.; Golygina, Veronika; Hasebe, Mitsuyasu; Okuda, Takashi; Satoh, Nori; Kikawada, Takahiro

    2014-01-01

    Anhydrobiosis represents an extreme example of tolerance adaptation to water loss, where an organism can survive in an ametabolic state until water returns. Here we report the first comparative analysis examining the genomic background of extreme desiccation tolerance, which is exclusively found in larvae of the only anhydrobiotic insect, Polypedilum vanderplanki. We compare the genomes of P. vanderplanki and a congeneric desiccation-sensitive midge P. nubifer. We determine that the genome of the anhydrobiotic species specifically contains clusters of multi-copy genes with products that act as molecular shields. In addition, the genome possesses several groups of genes with high similarity to known protective proteins. However, these genes are located in distinct paralogous clusters in the genome apart from the classical orthologues of the corresponding genes shared by both chironomids and other insects. The transcripts of these clustered paralogues contribute to a large majority of the mRNA pool in the desiccating larvae and most likely define successful anhydrobiosis. Comparison of expression patterns of orthologues between two chironomid species provides evidence for the existence of desiccation-specific gene expression systems in P. vanderplanki. PMID:25216354

  16. Comparative omics-driven genome annotation refinement: application across Yersiniae.

    PubMed

    Schrimpe-Rutledge, Alexandra C; Jones, Marcus B; Chauhan, Sadhana; Purvine, Samuel O; Sanford, James A; Monroe, Matthew E; Brewer, Heather M; Payne, Samuel H; Ansong, Charles; Frank, Bryan C; Smith, Richard D; Peterson, Scott N; Motin, Vladimir L; Adkins, Joshua N

    2012-01-01

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. The annotation process is now performed almost exclusively in an automated fashion to balance the large number of sequences generated. One possible way of reducing errors inherent to automated computational annotations is to apply data from omics measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. Here, the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species. Transcriptomic and proteomic data derived from highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis Pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 incorrect (i.e., observed frameshifts, extended start sites, and translated pseudogenes) protein-coding sequences within the three current genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus the discovery of many translated pseudogenes, including the insertion-ablated argD, underscores a need for functional analyses to investigate hypotheses related to divergence. Refinements included the discovery of a seemingly essential ribosomal protein, several virulence-associated factors, a transcriptional regulator, and many hypothetical proteins that were missed during annotation.

  17. Floral gene resources from basal angiosperms for comparative genomics research.

    PubMed

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; DePamphilis, Claude W; Leebens-Mack, James H

    2005-03-30

    The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and

  18. Comparative genomics approaches to study organism similarities and differences

    SciTech Connect

    Wei, Liping; Liu, Yueyi; Dubchak, Inna; Shon, John; Park, John

    2002-06-01

    Comparative genomics is a large-scale, holistic approach that compares two or more genomes to discover the similarities and differences between the genomes and to study the biology of the individual genomes. Comparative studies can be performed at different levels of the genomes to obtain multiple perspectives about the organisms. We discuss in detail the type of analyses that offer significant biological insights in the comparisons of (1) genome structure including overall genome statistics, repeats, genome rearrangement at both DNA and gene level, synteny, and breakpoints; (2) coding regions including gene content, protein content, orthologs, and paralogs; and (3) noncoding regions including the prediction of regulatory elements. We also briefly review the currently available computational tools in comparative genomics such as algorithms for genome-scale sequence alignment, gene identification, and nonhomology-based function prediction.

  19. [Sotos syndrome diagnosed by comparative genomic hybridisation].

    PubMed

    Saldarriaga, Wilmar; Molina-Barrera, Laura Camila; Ramírez-Cheyne, Julián

    2016-01-01

    Sotos Syndrome (SS) is a genetic disease with an autosomal dominant pattern caused by haplo-insufficiency of NSD1 gene secondary to point mutations or microdeletion of the 5q35 locus where the gene is located. It is a rare syndrome, occurring in 7 out of every 100,000 births. The objective of this report is to present the case of a 4 year-old patient with a global developmental delay, as well as specific physical findings suggesting a syndrome of genetic origin. Female patient, 4 years of age, thinning hair, triangular facie, long palpebral fissure, arched palate, prominent jaw, winged scapula and clinodactilia of the fifth finger both hands. The molecular test comparative genomic hybridisation test by microarray was subsequently performed, with the result showing 5q35.2 q35.3 region microdeletion of 2,082 MB, including the NSD1 gene. Finally, this article also proposes the performing of comparative genomic hybridisation as the first diagnostic option in cases where clinical findings are suggestive of SS. Copyright © 2015 Sociedad Chilena de Pediatría. Publicado por Elsevier España, S.L.U. All rights reserved.

  20. Dissecting Plant Genomes with the PLAZA Comparative Genomics Platform1[W

    PubMed Central

    Van Bel, Michiel; Proost, Sebastian; Wischnitzki, Elisabeth; Movahedi, Sara; Scheerlinck, Christopher; Van de Peer, Yves; Vandepoele, Klaas

    2012-01-01

    With the arrival of low-cost, next-generation sequencing, a multitude of new plant genomes are being publicly released, providing unseen opportunities and challenges for comparative genomics studies. Here, we present PLAZA 2.5, a user-friendly online research environment to explore genomic information from different plants. This new release features updates to previous genome annotations and a substantial number of newly available plant genomes as well as various new interactive tools and visualizations. Currently, PLAZA hosts 25 organisms covering a broad taxonomic range, including 13 eudicots, five monocots, one lycopod, one moss, and five algae. The available data consist of structural and functional gene annotations, homologous gene families, multiple sequence alignments, phylogenetic trees, and colinear regions within and between species. A new Integrative Orthology Viewer, combining information from different orthology prediction methodologies, was developed to efficiently investigate complex orthology relationships. Cross-species expression analysis revealed that the integration of complementary data types extended the scope of complex orthology relationships, especially between more distantly related species. Finally, based on phylogenetic profiling, we propose a set of core gene families within the green plant lineage that will be instrumental to assess the gene space of draft or newly sequenced plant genomes during the assembly or annotation phase. PMID:22198273

  1. Regulation of cytochrome P450 expression in Drosophila: Genomic insights

    PubMed Central

    Giraudo, Maeva; Unnithan, G. Chandran; Le Goff, Gaëlle; Feyereisen, René

    2009-01-01

    Genomic tools such as the availability of the Drosophila genome sequence, the relative ease of stable transformation, and DNA microarrays have made the fruit fly a powerful model in insecticide toxicology research. We have used transgenic promoter-GFP constructs to document the detailed pattern of induced Cyp6a2 gene expression in larval and adult Drosophila tissues. We also compared various insecticides and xenobiotics for their ability to induce this cytochrome P450 gene, and show that the pattern of Cyp6a2 inducibility is comparable to that of vertebrate CYP2B genes, and different from that of vertebrate CYP1A genes, suggesting a degree of evolutionary conservation for the “phenobarbital-type” induction mechanism. Our results are compared to the increasingly diverse reports on P450 induction that can be gleaned from whole genome or from “detox” microarray experiments in Drosophila. These suggest that only a third of the genomic repertoire of CYP genes is inducible by xenobiotics, and that there are distinct subsets of inducers / induced genes, suggesting multiple xenobiotic transduction mechanisms. A relationship between induction and resistance is not supported by expression data from the literature. The relative abundance of expression data now available is in contrast to the paucity of studies on functional expression of P450 enzymes, and this remains a challenge for our understanding of the toxicokinetic aspects of insecticide action. PMID:20582327

  2. Comparative genomic analysis of prion genes

    PubMed Central

    Premzl, Marko; Gamulin, Vera

    2007-01-01

    Background The homologues of human disease genes are expected to contribute to better understanding of physiological and pathogenic processes. We made use of the present availability of vertebrate genomic sequences, and we have conducted the most comprehensive comparative genomic analysis of the prion protein gene PRNP and its homologues, shadow of prion protein gene SPRN and doppel gene PRND, and prion testis-specific gene PRNT so far. Results While the SPRN and PRNP homologues are present in all vertebrates, PRND is known in tetrapods, and PRNT is present in primates. PRNT could be viewed as a TE-associated gene. Using human as the base sequence for genomic sequence comparisons (VISTA), we annotated numerous potential cis-elements. The conserved regions in SPRNs harbour the potential Sp1 sites in promoters (mammals, birds), C-rich intron splicing enhancers and PTB intron splicing silencers in introns (mammals, birds), and hsa-miR-34a sites in 3'-UTRs (eutherians). We showed the conserved PRNP upstream regions, which may be potential enhancers or silencers (primates, dog). In the PRNP 3'-UTRs, there are conserved cytoplasmic polyadenylation element sites (mammals, birds). The PRND core promoters include highly conserved CCAAT, CArG and TATA boxes (mammals). We deduced 42 new protein primary structures, and performed the first phylogenetic analysis of all vertebrate prion genes. Using the protein alignment which included 122 sequences, we constructed the neighbour-joining tree which showed four major clusters, including shadoos, shadoo2s and prion protein-likes (cluster 1), fish prion proteins (cluster 2), tetrapode prion proteins (cluster 3) and doppels (cluster 4). We showed that the entire prion protein conformationally plastic region is well conserved between eutherian prion proteins and shadoos (18–25% identity and 28–34% similarity), and there could be a potential structural compatibility between shadoos and the left-handed parallel beta-helical fold

  3. Comparative Genome Analysis in the Integrated Microbial Genomes(IMG) System

    SciTech Connect

    Kyrpides, Nikos C.; Markowitz, Victor M.

    2006-03-01

    Comparative genome analysis is critical for the effectiveexploration of a rapidly growing number of complete and draft sequencesfor microbial genomes. The Integrated Microbial Genomes (IMG) system(img.jgi.doe.gov) has been developed as a community resource thatprovides support for comparative analysis of microbial genomes in anintegrated context. IMG allows users to navigate the multidimensionalmicrobial genome data space and focus their analysis on a subset ofgenes, genomes, and functions of interest. IMG provides graphicalviewers, summaries and occurrence profile tools for comparing genes,pathways and functions (terms) across specific genomes. Genes can befurther examined using gene neighborhoods and compared with sequencealignment tools.

  4. Comparative genomic analysis of sixty mycobacteriophage genomes: Genome clustering, gene acquisition and gene size

    PubMed Central

    Hatfull, Graham F.; Jacobs-Sera, Deborah; Lawrence, Jeffrey G.; Pope, Welkin H.; Russell, Daniel A.; Ko, Ching-Chung; Weber, Rebecca J.; Patel, Manisha C.; Germane, Katherine L.; Edgar, Robert H.; Hoyte, Natasha N.; Bowman, Charles A.; Tantoco, Anthony T.; Paladin, Elizabeth C.; Myers, Marlana S.; Smith, Alexis L.; Grace, Molly S.; Pham, Thuy T.; O'Brien, Matthew B.; Vogelsberger, Amy M.; Hryckowian, Andrew J.; Wynalek, Jessica L.; Donis-Keller, Helen; Bogel, Matt W.; Peebles, Craig L.; Cresawn, Steve G.; Hendrix, Roger W.

    2010-01-01

    Mycobacteriophages are viruses that infect mycobacterial hosts. Expansion of a collection of sequenced phage genomes to a total of sixty – all infecting a common bacterial host – provides further insight into their diversity and evolution. Of the sixty phage genomes, 55 can be grouped into nine clusters according to their nucleotide sequence similarities, five of which can be further divided into subclusters; five genomes do not cluster with other phages. The sequence diversity between genomes within a cluster varies greatly; for example, the six genomes in cluster D share more than 97.5% average nucleotide similarity with each other. In contrast, similarity between the two genomes in Cluster I is barely detectable by diagonal plot analysis. The total of 6,858 predicted ORFs have been grouped into 1523 phamilies (phams) of related sequences, 46% of which possess only a single member. Only 18.8% of the phams have sequence similarity to non-mycobacteriophage database entries and fewer than 10% of all phams can be assigned functions based on database searching or synteny. Genome clustering facilitates the identification of genes that are in greatest genetic flux and are more likely to have been exchanged horizontally in relatively recent evolutionary time. Although mycobacteriophage genes exhibit smaller average size than genes of their host (205 residues compared to 315), phage genes in higher flux average only ∼100 amino acids, suggesting that the primary units of genetic exchange correspond to single protein domains. PMID:20064525

  5. Genomic signatures of germline gene expression.

    PubMed

    McVicker, Graham; Green, Phil

    2010-11-01

    Transcribed regions in the human genome differ from adjacent intergenic regions in transposable element density, crossover rates, and asymmetric substitution and sequence composition patterns. We tested whether these differences reflect selection or are instead a byproduct of germline transcription, using publicly available gene expression data from a variety of germline and somatic tissues. Crossover rate shows a strong negative correlation with gene expression in meiotic tissues, suggesting that crossover is inhibited by transcription. Strand-biased composition (G+T content) and A → G versus T → C substitution asymmetry are both positively correlated with germline gene expression. We find no evidence for a strand bias in allele frequency data, implying that the substitution asymmetry reflects a mutation rather than a fixation bias. The density of transposable elements is positively correlated with germline expression, suggesting that such elements preferentially insert into regions that are actively transcribed. For each of the features examined, our analyses favor a nonselective explanation for the observed trends and point to the role of germline gene expression in shaping the mammalian genome.

  6. An evaluation of Comparative Genome Sequencing (CGS) by comparing two previously-sequenced bacterial genomes

    PubMed Central

    Herring, Christopher D; Palsson, Bernhard Ø

    2007-01-01

    Background With the development of new technology, it has recently become practical to resequence the genome of a bacterium after experimental manipulation. It is critical though to know the accuracy of the technique used, and to establish confidence that all of the mutations were detected. Results In order to evaluate the accuracy of genome resequencing using the microarray-based Comparative Genome Sequencing service provided by Nimblegen Systems Inc., we resequenced the E. coli strain W3110 Kohara using MG1655 as a reference, both of which have been completely sequenced using traditional sequencing methods. CGS detected 7 of 8 small sequence differences, one large deletion, and 9 of 12 IS element insertions present in W3110, but did not detect a large chromosomal inversion. In addition, we confirmed that CGS also detected 2 SNPs, one deletion and 7 IS element insertions that are not present in the genome sequence, which we attribute to changes that occurred after the creation of the W3110 lambda clone library. The false positive rate for SNPs was one per 244 Kb of genome sequence. Conclusion CGS is an effective way to detect multiple mutations present in one bacterium relative to another, and while highly cost-effective, is prone to certain errors. Mutations occurring in repeated sequences or in sequences with a high degree of secondary structure may go undetected. It is also critical to follow up on regions of interest in which SNPs were not called because they often indicate deletions or IS element insertions. PMID:17697331

  7. Comparative genomics reveals insights into avian genome evolution and adaptation.

    PubMed

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

    2014-12-12

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.

  8. Comparative genomics reveals insights into avian genome evolution and adaptation

    PubMed Central

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

    2015-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

  9. The bonobo genome compared with the chimpanzee and human genomes

    PubMed Central

    Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R.; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R.; Mullikin, James C.; Meader, Stephen J.; Ponting, Chris P.; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E.; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M.; Fischer, Anne; Ptak, Susan E.; Lachmann, Michael; Symer, David E.; Mailund, Thomas; Schierup, Mikkel H.; Andrés, Aida M.; Kelso, Janet; Pääbo, Svante

    2012-01-01

    Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours1–4, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other. PMID:22722832

  10. The bonobo genome compared with the chimpanzee and human genomes.

    PubMed

    Prüfer, Kay; Munch, Kasper; Hellmann, Ines; Akagi, Keiko; Miller, Jason R; Walenz, Brian; Koren, Sergey; Sutton, Granger; Kodira, Chinnappa; Winer, Roger; Knight, James R; Mullikin, James C; Meader, Stephen J; Ponting, Chris P; Lunter, Gerton; Higashino, Saneyuki; Hobolth, Asger; Dutheil, Julien; Karakoç, Emre; Alkan, Can; Sajjadian, Saba; Catacchio, Claudia Rita; Ventura, Mario; Marques-Bonet, Tomas; Eichler, Evan E; André, Claudine; Atencia, Rebeca; Mugisha, Lawrence; Junhold, Jörg; Patterson, Nick; Siebauer, Michael; Good, Jeffrey M; Fischer, Anne; Ptak, Susan E; Lachmann, Michael; Symer, David E; Mailund, Thomas; Schierup, Mikkel H; Andrés, Aida M; Kelso, Janet; Pääbo, Svante

    2012-06-28

    Two African apes are the closest living relatives of humans: the chimpanzee (Pan troglodytes) and the bonobo (Pan paniscus). Although they are similar in many respects, bonobos and chimpanzees differ strikingly in key social and sexual behaviours, and for some of these traits they show more similarity with humans than with each other. Here we report the sequencing and assembly of the bonobo genome to study its evolutionary relationship with the chimpanzee and human genomes. We find that more than three per cent of the human genome is more closely related to either the bonobo or the chimpanzee genome than these are to each other. These regions allow various aspects of the ancestry of the two ape species to be reconstructed. In addition, many of the regions that overlap genes may eventually help us understand the genetic basis of phenotypes that humans share with one of the two apes to the exclusion of the other.

  11. Bamboo Flowering from the Perspective of Comparative Genomics and Transcriptomics

    PubMed Central

    Biswas, Prasun; Chakraborty, Sukanya; Dutta, Smritikana; Pal, Amita; Das, Malay

    2016-01-01

    Bamboos are an important member of the subfamily Bambusoideae, family Poaceae. The plant group exhibits wide variation with respect to the timing (1–120 years) and nature (sporadic vs. gregarious) of flowering among species. Usually flowering in woody bamboos is synchronous across culms growing over a large area, known as gregarious flowering. In many monocarpic bamboos this is followed by mass death and seed setting. While in sporadic flowering an isolated wild clump may flower, set little or no seed and remain alive. Such wide variation in flowering time and extent means that the plant group serves as repositories for genes and expression patterns that are unique to bamboo. Due to the dearth of available genomic and transcriptomic resources, limited studies have been undertaken to identify the potential molecular players in bamboo flowering. The public release of the first bamboo genome sequence Phyllostachys heterocycla, availability of related genomes Brachypodium distachyon and Oryza sativa provide us the opportunity to study this long-standing biological problem in a comparative and functional genomics framework. We identified bamboo genes homologous to those of Oryza and Brachypodium that are involved in established pathways such as vernalization, photoperiod, autonomous, and hormonal regulation of flowering. Additionally, we investigated triggers like stress (drought), physiological maturity and micro RNAs that may play crucial roles in flowering. We also analyzed available transcriptome datasets of different bamboo species to identify genes and their involvement in bamboo flowering. Finally, we summarize potential research hurdles that need to be addressed in future research. PMID:28018419

  12. Bamboo Flowering from the Perspective of Comparative Genomics and Transcriptomics.

    PubMed

    Biswas, Prasun; Chakraborty, Sukanya; Dutta, Smritikana; Pal, Amita; Das, Malay

    2016-01-01

    Bamboos are an important member of the subfamily Bambusoideae, family Poaceae. The plant group exhibits wide variation with respect to the timing (1-120 years) and nature (sporadic vs. gregarious) of flowering among species. Usually flowering in woody bamboos is synchronous across culms growing over a large area, known as gregarious flowering. In many monocarpic bamboos this is followed by mass death and seed setting. While in sporadic flowering an isolated wild clump may flower, set little or no seed and remain alive. Such wide variation in flowering time and extent means that the plant group serves as repositories for genes and expression patterns that are unique to bamboo. Due to the dearth of available genomic and transcriptomic resources, limited studies have been undertaken to identify the potential molecular players in bamboo flowering. The public release of the first bamboo genome sequence Phyllostachys heterocycla, availability of related genomes Brachypodium distachyon and Oryza sativa provide us the opportunity to study this long-standing biological problem in a comparative and functional genomics framework. We identified bamboo genes homologous to those of Oryza and Brachypodium that are involved in established pathways such as vernalization, photoperiod, autonomous, and hormonal regulation of flowering. Additionally, we investigated triggers like stress (drought), physiological maturity and micro RNAs that may play crucial roles in flowering. We also analyzed available transcriptome datasets of different bamboo species to identify genes and their involvement in bamboo flowering. Finally, we summarize potential research hurdles that need to be addressed in future research.

  13. A New System for Comparative Functional Genomics of Saccharomyces Yeasts

    PubMed Central

    Caudy, Amy A.; Guan, Yuanfang; Jia, Yue; Hansen, Christina; DeSevo, Chris; Hayes, Alicia P.; Agee, Joy; Alvarez-Dominguez, Juan R.; Arellano, Hugo; Barrett, Daniel; Bauerle, Cynthia; Bisaria, Namita; Bradley, Patrick H.; Breunig, J. Scott; Bush, Erin; Cappel, David; Capra, Emily; Chen, Walter; Clore, John; Combs, Peter A.; Doucette, Christopher; Demuren, Olukunle; Fellowes, Peter; Freeman, Sam; Frenkel, Evgeni; Gadala-Maria, Daniel; Gawande, Richa; Glass, David; Grossberg, Samuel; Gupta, Anita; Hammonds-Odie, Latanya; Hoisos, Aaron; Hsi, Jenny; Hsu, Yu-Han Huang; Inukai, Sachi; Karczewski, Konrad J.; Ke, Xiaobo; Kojima, Mina; Leachman, Samuel; Lieber, Danny; Liebowitz, Anna; Liu, Julia; Liu, Yufei; Martin, Trevor; Mena, Jose; Mendoza, Rosa; Myhrvold, Cameron; Millian, Christian; Pfau, Sarah; Raj, Sandeep; Rich, Matt; Rokicki, Joe; Rounds, William; Salazar, Michael; Salesi, Matthew; Sharma, Rajani; Silverman, Sanford; Singer, Cara; Sinha, Sandhya; Staller, Max; Stern, Philip; Tang, Hanlin; Weeks, Sharon; Weidmann, Maxwell; Wolf, Ashley; Young, Carmen; Yuan, Jie; Crutchfield, Christopher; McClean, Megan; Murphy, Coleen T.; Llinás, Manuel; Botstein, David; Troyanskaya, Olga G.; Dunham, Maitreya J.

    2013-01-01

    Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast. PMID:23852385

  14. A new system for comparative functional genomics of Saccharomyces yeasts.

    PubMed

    Caudy, Amy A; Guan, Yuanfang; Jia, Yue; Hansen, Christina; DeSevo, Chris; Hayes, Alicia P; Agee, Joy; Alvarez-Dominguez, Juan R; Arellano, Hugo; Barrett, Daniel; Bauerle, Cynthia; Bisaria, Namita; Bradley, Patrick H; Breunig, J Scott; Bush, Erin; Cappel, David; Capra, Emily; Chen, Walter; Clore, John; Combs, Peter A; Doucette, Christopher; Demuren, Olukunle; Fellowes, Peter; Freeman, Sam; Frenkel, Evgeni; Gadala-Maria, Daniel; Gawande, Richa; Glass, David; Grossberg, Samuel; Gupta, Anita; Hammonds-Odie, Latanya; Hoisos, Aaron; Hsi, Jenny; Hsu, Yu-Han Huang; Inukai, Sachi; Karczewski, Konrad J; Ke, Xiaobo; Kojima, Mina; Leachman, Samuel; Lieber, Danny; Liebowitz, Anna; Liu, Julia; Liu, Yufei; Martin, Trevor; Mena, Jose; Mendoza, Rosa; Myhrvold, Cameron; Millian, Christian; Pfau, Sarah; Raj, Sandeep; Rich, Matt; Rokicki, Joe; Rounds, William; Salazar, Michael; Salesi, Matthew; Sharma, Rajani; Silverman, Sanford; Singer, Cara; Sinha, Sandhya; Staller, Max; Stern, Philip; Tang, Hanlin; Weeks, Sharon; Weidmann, Maxwell; Wolf, Ashley; Young, Carmen; Yuan, Jie; Crutchfield, Christopher; McClean, Megan; Murphy, Coleen T; Llinás, Manuel; Botstein, David; Troyanskaya, Olga G; Dunham, Maitreya J

    2013-09-01

    Whole-genome sequencing, particularly in fungi, has progressed at a tremendous rate. More difficult, however, is experimental testing of the inferences about gene function that can be drawn from comparative sequence analysis alone. We present a genome-wide functional characterization of a sequenced but experimentally understudied budding yeast, Saccharomyces bayanus var. uvarum (henceforth referred to as S. bayanus), allowing us to map changes over the 20 million years that separate this organism from S. cerevisiae. We first created a suite of genetic tools to facilitate work in S. bayanus. Next, we measured the gene-expression response of S. bayanus to a diverse set of perturbations optimized using a computational approach to cover a diverse array of functionally relevant biological responses. The resulting data set reveals that gene-expression patterns are largely conserved, but significant changes may exist in regulatory networks such as carbohydrate utilization and meiosis. In addition to regulatory changes, our approach identified gene functions that have diverged. The functions of genes in core pathways are highly conserved, but we observed many changes in which genes are involved in osmotic stress, peroxisome biogenesis, and autophagy. A surprising number of genes specific to S. bayanus respond to oxidative stress, suggesting the organism may have evolved under different selection pressures than S. cerevisiae. This work expands the scope of genome-scale evolutionary studies from sequence-based analysis to rapid experimental characterization and could be adopted for functional mapping in any lineage of interest. Furthermore, our detailed characterization of S. bayanus provides a valuable resource for comparative functional genomics studies in yeast.

  15. Comparative Analysis of Acinetobacters: Three Genomes for Three Lifestyles

    PubMed Central

    Vallenet, David; Nordmann, Patrice; Barbe, Valérie; Poirel, Laurent; Mangenot, Sophie; Bataille, Elodie; Dossat, Carole; Gas, Shahinaz; Kreimeyer, Annett; Lenoble, Patricia; Oztas, Sophie; Poulain, Julie; Segurens, Béatrice; Robert, Catherine; Abergel, Chantal; Claverie, Jean-Michel; Raoult, Didier; Médigue, Claudine; Weissenbach, Jean; Cruveiller, Stéphane

    2008-01-01

    Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i) whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss); ii) strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii) several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors) were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS). Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment), louse, soil. PMID:18350144

  16. Update on Comparative Genomics of Legumes

    USDA-ARS?s Scientific Manuscript database

    This year marks the essential completion of the genome sequences of Glycine max, Medicago truncatula, and Lotus japonicus (soybean, barrel medic, and birdsfoot trefoil, respectively). The impact of these assembled, annotated genomes will be enormous. L. japonicus and M. truncatula, both forage crop...

  17. Comparative genomic hybridization in clinical cytogenetics

    SciTech Connect

    Bryndorf, T.; Kirchhoff, M.; Rose, H.

    1995-11-01

    We report the results of applying comparative genomic hybridization (CGH) in a cytogenetic service laboratory for (1) determination of the origin of extra and missing chromosomal material in intricate cases of unbalanced aberrations and (2) detection of common prenatal numerical chromosome aberrations. A total of 11 fetal samples were analyzed. Seven cases of complex unbalanced aberrations that could not be identified reliably by conventional cytogenetics were successfully resolved by CGH analysis. CGH results were validated by using FISH with chromosome-specific probes. Four cases representing common prenatal numerical aberrations (trisomy 21, 18, and 13 and monosomy X) were also successfully diagnosed by CGH. We conclude that CGH is a powerful adjunct to traditional cytogenetic techniques that makes it possible to solve clinical cases of intricate unbalanced aberrations in a single hybridization. CGH may also be a useful adjunct to screen for euchromatic involvement in marker chromosomes. Further technical development may render CGH applicable for routine aberration screening. 16 refs., 4 figs., 2 tabs.

  18. Floral gene resources from basal angiosperms for comparative genomics research

    PubMed Central

    Albert, Victor A; Soltis, Douglas E; Carlson, John E; Farmerie, William G; Wall, P Kerr; Ilut, Daniel C; Solow, Teri M; Mueller, Lukas A; Landherr, Lena L; Hu, Yi; Buzgo, Matyas; Kim, Sangtae; Yoo, Mi-Jeong; Frohlich, Michael W; Perl-Treves, Rafael; Schlarbaum, Scott E; Bliss, Barbara J; Zhang, Xiaohong; Tanksley, Steven D; Oppenheimer, David G; Soltis, Pamela S; Ma, Hong; dePamphilis, Claude W; Leebens-Mack, James H

    2005-01-01

    Background The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. Results Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. Conclusion Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and

  19. Comparative Omics-Driven Genome Annotation Refinement: Application across Yersiniae

    SciTech Connect

    Rutledge, Alexandra C.; Jones, Marcus B.; Chauhan, Sadhana; Purvine, Samuel O.; Sanford, James; Monroe, Matthew E.; Brewer, Heather M.; Payne, Samuel H.; Ansong, Charles; Frank, Bryan C.; Smith, Richard D.; Peterson, Scott; Motin, Vladimir L.; Adkins, Joshua N.

    2012-03-27

    Genome sequencing continues to be a rapidly evolving technology, yet most downstream aspects of genome annotation pipelines remain relatively stable or are even being abandoned. To date, the perceived value of manual curation for genome annotations is not offset by the real cost and time associated with the process. In order to balance the large number of sequences generated, the annotation process is now performed almost exclusively in an automated fashion for most genome sequencing projects. One possible way to reduce errors inherent to automated computational annotations is to apply data from 'omics' measurements (i.e. transcriptional and proteomic) to the un-annotated genome with a proteogenomic-based approach. This approach does require additional experimental and bioinformatics methods to include omics technologies; however, the approach is readily automatable and can benefit from rapid developments occurring in those research domains as well. The annotation process can be improved by experimental validation of transcription and translation and aid in the discovery of annotation errors. Here the concept of annotation refinement has been extended to include a comparative assessment of genomes across closely related species, as is becoming common in sequencing efforts. Transcriptomic and proteomic data derived from three highly similar pathogenic Yersiniae (Y. pestis CO92, Y. pestis pestoides F, and Y. pseudotuberculosis PB1/+) was used to demonstrate a comprehensive comparative omic-based annotation methodology. Peptide and oligo measurements experimentally validated the expression of nearly 40% of each strain's predicted proteome and revealed the identification of 28 novel and 68 previously incorrect protein-coding sequences (e.g., observed frameshifts, extended start sites, and translated pseudogenes) within the three current Yersinia genome annotations. Gene loss is presumed to play a major role in Y. pestis acquiring its niche as a virulent pathogen, thus

  20. Comparative Genomics of the Campylobacter lari Group

    PubMed Central

    Miller, William G.; Yee, Emma; Chapman, Mary H.; Smith, Timothy P.L.; Bono, James L.; Huynh, Steven; Parker, Craig T.; Vandamme, Peter; Luong, Khai; Korlach, Jonas

    2014-01-01

    The Campylobacter lari group is a phylogenetic clade within the epsilon subdivision of the Proteobacteria and is part of the thermotolerant Campylobacter spp., a division within the genus that includes the human pathogen Campylobacter jejuni. The C. lari group is currently composed of five species (C. lari, Campylobacter insulaenigrae, Campylobacter volucris, Campylobacter subantarcticus, and Campylobacter peloridis), as well as a group of strains termed the urease-positive thermophilic Campylobacter (UPTC) and other C. lari-like strains. Here we present the complete genome sequences of 11 C. lari group strains, including the five C. lari group species, four UPTC strains, and a lari-like strain isolated in this study. The genome of C. lari subsp. lari strain RM2100 was described previously. Analysis of the C. lari group genomes indicates that this group is highly related at the genome level. Furthermore, these genomes are strongly syntenic with minor rearrangements occurring only in 4 of the 12 genomes studied. The C. lari group can be bifurcated, based on the flagella and flagellar modification genes. Genomic analysis of the UPTC strains indicated that these organisms are variable but highly similar, closely related to but distinct from C. lari. Additionally, the C. lari group contains multiple genes encoding hemagglutination domain proteins, which are either contingency genes or linked to conserved contingency genes. Many of the features identified in strain RM2100, such as major deficiencies in amino acid biosynthesis and energy metabolism, are conserved across all 12 genomes, suggesting that these common features may play a role in the association of the C. lari group with coastal environments and watersheds. PMID:25381664

  1. Comparative Genomics of Large Mitochondria in Placozoans

    PubMed Central

    Signorovitch, Ana Y; Buss, Leo W; Dellaporta, Stephen L

    2007-01-01

    The first sequenced mitochondrial genome of a placozoan, Trichoplax adhaerens, challenged the conventional wisdom that a compact mitochondrial genome is a common feature among all animals. Three additional placozoan mitochondrial genomes representing highly divergent clades have been sequenced to determine whether the large Trichoplax mtDNA is a shared feature among members of the phylum Placozoa or a uniquely derived condition. All three mitochondrial genomes were found to be very large, 32- to 37-kb, circular molecules, having the typical 12 respiratory chain genes, 24 tRNAs, rnS, and rnL. They share with the Trichoplax mitochondrial genome the absence of atp8, atp9, and all ribosomal protein genes, the presence of several cox1 introns, and a large open reading frame containing an intron group I LAGLIDADG endonuclease domain. The differences in mtDNA size within Placozoa are due to variation in intergenic spacer regions and the presence or absence of long open reading frames of unknown function. Phylogenetic analyses of the 12 respiratory chain genes support the monophyly of Placozoa. The similarities in composition and structure between the three mitochondrial genomes reported here and that of Trichoplax's mtDNA suggest that their uncompacted state is a shared ancestral feature to other nonmetazoans while their gene content is a derived feature shared only among the Metazoa. PMID:17222063

  2. Comparative genomic identification and validation of β-defensin genes in the Ovis aries genome.

    PubMed

    Hall, T J; McQuillan, C; Finlay, E K; O'Farrelly, C; Fair, S; Meade, K G

    2017-04-04

    β-defensins are small, cationic, antimicrobial peptides found in species across the plant and animal kingdoms. In addition to microbiocidal activity, roles in immunity as well as reproduction have more recently been documented. β-defensin genes in Ovis aries (domestic sheep) have been poorly annotated, having been identified only by automatic gene prediction algorithms. The objective of this study was to use a comparative genomics approach to identify and characterise the β-defensin gene repertoire in sheep using the bovine genome as the primary reference. All 57 currently predicted bovine β-defensin genes were used to find orthologous sequences in the most recent version of the sheep genome (OAR v4.0). Forty three genes were found to have close genomic matches (>70% similarity) between sheep and cattle. The orthologous genes were located in four clusters across the genome, with 4 genes on chromosome 2, 19 genes on chromosome 13, 5 genes on chromosome 20 and 15 genes on chromosome 26. Conserved gene order for the β-defensin genes was apparent in the two smaller clusters, although gene order was reversed on chromosome 2, suggesting an inversion between sheep and cattle. Complete conservation of gene order was also observed for chromosome 13 β-defensin orthologs. More structural differences were apparent between chromosome 26 genes and the orthologous region in the bovine reference genome, which is known to be copy-number variable. In this cluster, the Defensin-beta 1 (DEFB1) gene matched to eleven Bovine Neutrophil beta-Defensin (BNBD) genes on chromosome 27 with almost uniform similarity, as well as to tracheal, enteric and lingual anti-microbial peptides (TAP, EAP and LAP), suggesting that annotation of the bovine reference sequence is still incomplete. qPCR was used to profile the expression of 34 β-defensin genes, representing each of the four clusters, in the ram reproductive tract. Distinct site-specific and differential expression profiles were

  3. Comparing Genomic Profiles of Women With and Without Fibromyalgia

    PubMed Central

    Lukkahatai, Nada; Walitt, Brian; Espina, Alexandra; Wang, Dan; Saligan, Leorey N.

    2016-01-01

    Background Fibromyalgia syndrome (FMS), a chronic musculoskeletal condition characterized by diffuse pain, fatigue, sleep impairment, and cognitive dysfunction, is associated with significant functional disability. Its underlying biological mechanisms are unknown. This study investigated differentially expressed genes between women with FMS and healthy volunteers. Methods Women who met the 1990 or 2010 American College of Rheumatology fibromyalgia criteria were compared to age- and race-matched pain-free healthy women. Peripheral blood samples were collected, and a full genome microarray gene expression analysis was performed. One-way analysis of variance was used to identify differentially expressed genes using the filtering criterion of 1% false discovery rate. Analysis of canonical pathways associated with these genes was performed. Confirmatory quantitative real-time polymerase chain reaction and enzyme-linked immunosorbent assay verified microarray results. Independent t-tests compared gene and protein expression between groups. Result Participants were 54 women with FMS and 25 controls. Expression arrays from a subset of women with FMS (n = 29) and controls (n = 20) showed upregulation of 12 genes (>1.8-fold change, p < .05) in the FMS sample. Differentially expressed genes were related to B-cell development, primary immunodeficiency signaling, and mitotic roles of polo-like kinase. CENPK and HSP90AA1 were the most differentially expressed genes (p < .01). Conclusion Activity of interrelated pathways related to immune response, and homeostasis appears to be relevant to the experience of FMS. Replication and exploration of the relationship between gene expression and symptom severity will help determine clinical relevance of these findings. PMID:26015072

  4. Comparative genomic hybridization: Detection of segmental aneusomies

    SciTech Connect

    Cronin, J.E.; Magrane, G.G.; Gray, J.W.

    1994-09-01

    Comparative genomic hybridization (CGH) has been used successfully to detect whole chromosome and segmental aneusomies. However, its sensitivity for detection of segmental aneusomies is still not well known. We present here an analysis of CGH sensitivity with emphasis on detection of abnormalities commonly found during pre-and neo-natal diagnosis. CGH is performed by hybridizing green and red fluorescing test and normal DNA samples, respectively, to normal metaphase spreads and measuring green:red fluorescence ratios along all chromosomes. The ratios are normalized such that 2 copies of a normal chromosome region in the test sample gives a ratio of 1.0. Alterations in test vs. control gene copy number range from 1.5 [trisomy] to 0.5 [monosomy]. Clinical samples analyzed included Wolf Hirschhorn (4p-), Cri du Chat (5p-) and DiGeorge (22q-). In addition, 7 cell lines with chromosome 21 segmental aneusomies were analyzed. These included 3 with terminal duplications, 1 with a terminal deletion, 1 with an interstitial deletion and 2 with interstitial amplifications. The DiGeorge deletion was the only deletion not deleted by CGH. This is not surprising as standard G banding does not routinely detect this 1-2 megabase deletion. The 4p- and 5p- monosomies were detected and breakpoints correctly assigned prospectively. Proximal alterations involving 21q22.11 are unambiguously defined. Specifically, two interstitial aneusomies involving this region are detected. Studies involving late prophase chromosome normal spreads gave identical breakpoints. Thus, analysis of extended chromosomes did not improve the sensitivity of the technique. Taken together, these data suggest that CGH can detect segmental aneusomies greater than 8 megabases in extent. Smaller aneusomies can, at times, be detected. Work is now underway to modify the analysis software to increase sensitivity and to decrease the amount of material needed for analysis.

  5. Comparative analysis and visualization of multiple collinear genomes

    PubMed Central

    2012-01-01

    Background Genome browsers are a common tool used by biologists to visualize genomic features including genes, polymorphisms, and many others. However, existing genome browsers and visualization tools are not well-suited to perform meaningful comparative analysis among a large number of genomes. With the increasing quantity and availability of genomic data, there is an increased burden to provide useful visualization and analysis tools for comparison of multiple collinear genomes such as the large panels of model organisms which are the basis for much of the current genetic research. Results We have developed a novel web-based tool for visualizing and analyzing multiple collinear genomes. Our tool illustrates genome-sequence similarity through a mosaic of intervals representing local phylogeny, subspecific origin, and haplotype identity. Comparative analysis is facilitated through reordering and clustering of tracks, which can vary throughout the genome. In addition, we provide local phylogenetic trees as an alternate visualization to assess local variations. Conclusions Unlike previous genome browsers and viewers, ours allows for simultaneous and comparative analysis. Our browser provides intuitive selection and interactive navigation about features of interest. Dynamic visualizations adjust to scale and data content making analysis at variable resolutions and of multiple data sets more informative. We demonstrate our genome browser for an extensive set of genomic data sets composed of almost 200 distinct mouse laboratory strains. PMID:22536897

  6. Comparative analysis of genomic signal processing for microarray data clustering.

    PubMed

    Istepanian, Robert S H; Sungoor, Ala; Nebel, Jean-Christophe

    2011-12-01

    Genomic signal processing is a new area of research that combines advanced digital signal processing methodologies for enhanced genetic data analysis. It has many promising applications in bioinformatics and next generation of healthcare systems, in particular, in the field of microarray data clustering. In this paper we present a comparative performance analysis of enhanced digital spectral analysis methods for robust clustering of gene expression across multiple microarray data samples. Three digital signal processing methods: linear predictive coding, wavelet decomposition, and fractal dimension are studied to provide a comparative evaluation of the clustering performance of these methods on several microarray datasets. The results of this study show that the fractal approach provides the best clustering accuracy compared to other digital signal processing and well known statistical methods.

  7. Comparative genomics and genome biology of invasive Campylobacter jejuni.

    PubMed

    Skarp, C P A; Akinrinade, O; Nilsson, A J E; Ellström, P; Myllykangas, S; Rautelin, H

    2015-11-25

    Campylobacter jejuni is a major pathogen in bacterial gastroenteritis worldwide and can cause bacteremia in severe cases. C. jejuni is highly structured into clonal lineages of which the ST677CC lineage has been overrepresented among C. jejuni isolates derived from blood. In this study, we characterized the genomes of 31 C. jejuni blood isolates and 24 faecal isolates belonging to ST677CC in order to study the genome biology related to C. jejuni invasiveness. We combined the genome analyses with phenotypical evidence on serum resistance which was associated with phase variation of wcbK; a GDP-mannose 4,6-dehydratase involved in capsular biosynthesis. We also describe the finding of a Type III restriction-modification system unique to the ST-794 sublineage. However, features previously considered to be related to pathogenesis of C. jejuni were either absent or disrupted among our strains. Our results refine the role of capsule features associated with invasive disease and accentuate the possibility of methylation and restriction enzymes in the potential of C. jejuni to establish invasive infections. Our findings underline the importance of studying clinically relevant well-characterized bacterial strains in order to understand pathogenesis mechanisms important in human infections.

  8. Comparative genomics of Mortierella elongata and its bacterial endosymbiont Mycoavidus cysteinexigens: Comparative genomics of Mortierella elongata

    DOE PAGES

    Uehling, J.; Gryganskyi, A.; Hameed, K.; ...

    2017-01-01

    Endosymbiosis of bacteria by eukaryotes is a defining feature of cellular evolution. In addition to well-known bacterial origins for mitochondria and chloroplasts, multiple origins of bacterial endosymbiosis are known within the cells of diverse animals, plants and fungi. Early-diverging lineages of terrestrial fungi harbor endosymbiotic bacteria belonging to the Burkholderiaceae. Furthermore, we sequenced the metagenome of the soil-inhabiting fungus Mortierella elongata and assembled the complete circular chromosome of its endosymbiont, Mycoavidus cysteinexigens, which we place within a lineage of endofungal symbionts that are sister clade to Burkholderia. The genome of M. elongata strain AG77 features a core set of primarymore » metabolic pathways for degradation of simple carbohydrates and lipid biosynthesis, while the M. cysteinexigens (AG77) genome is reduced in size and function. Experiments using antibiotics to cure the endobacterium from the host demonstrate that the fungal host metabolism is highly modulated by presence/ absence of M. cysteinexigens. In independent comparative phylogenomic analyses of fungal and bacterial genomes we find that they are consistent with an ancient origin for M. elongata M. cysteinexigens symbiosis, most likely over 350 million years ago and concomitant with the terrestrialization of Earth and diversification of land fungi and plants.« less

  9. Initial sequencing and comparative analysis of the mouse genome

    SciTech Connect

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F.; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E.; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R.; Brown, Daniel G.; Brown, Stephen D.; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D.; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T.; Church, Deanna M.; Clamp, Michele; Clee, Christopher; Collins, Francis S.; Cook, Lisa L.; Copley, Richard R.; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D.; Deri, Justin; Dermitzakis, Emmanouil T.; Dewey, Colin; Dickens, Nicholas J.; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M.; Eddy, Sean R.; Elnitski, Laura; Emes, Richard D.; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A.; Flicek, Paul; Foley, Karen; Frankel, Wayne N.; Fulton, Lucinda A.; Fulton, Robert S.; Furey, Terrence S.; Gage, Diane; Gibbs, Richard A.; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A.; Green, Eric D.; Gregory, Simon; Guigo, Roderic; Guyer, Mark; Hardison, Ross C.; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W.; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B.; Johnson, L. Steven; Jones, Matthew; Jones, Thomas A.; Joy, Ann; Kamal, Michael; Karlsson, Elinor K.; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W. James; Kirby, Andrew; Kolbe, Diana L.; Korf, Ian; Kucherlapati, Raju S.; Kulbokas III, Edward J.; Kulp, David; Landers, Tom; Leger, J.P.; Leonard, Steven; Letunic, Ivica; Levine, Rosie; et al.

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  10. Initial sequencing and comparative analysis of the mouse genome.

    PubMed

    Waterston, Robert H; Lindblad-Toh, Kerstin; Birney, Ewan; Rogers, Jane; Abril, Josep F; Agarwal, Pankaj; Agarwala, Richa; Ainscough, Rachel; Alexandersson, Marina; An, Peter; Antonarakis, Stylianos E; Attwood, John; Baertsch, Robert; Bailey, Jonathon; Barlow, Karen; Beck, Stephan; Berry, Eric; Birren, Bruce; Bloom, Toby; Bork, Peer; Botcherby, Marc; Bray, Nicolas; Brent, Michael R; Brown, Daniel G; Brown, Stephen D; Bult, Carol; Burton, John; Butler, Jonathan; Campbell, Robert D; Carninci, Piero; Cawley, Simon; Chiaromonte, Francesca; Chinwalla, Asif T; Church, Deanna M; Clamp, Michele; Clee, Christopher; Collins, Francis S; Cook, Lisa L; Copley, Richard R; Coulson, Alan; Couronne, Olivier; Cuff, James; Curwen, Val; Cutts, Tim; Daly, Mark; David, Robert; Davies, Joy; Delehaunty, Kimberly D; Deri, Justin; Dermitzakis, Emmanouil T; Dewey, Colin; Dickens, Nicholas J; Diekhans, Mark; Dodge, Sheila; Dubchak, Inna; Dunn, Diane M; Eddy, Sean R; Elnitski, Laura; Emes, Richard D; Eswara, Pallavi; Eyras, Eduardo; Felsenfeld, Adam; Fewell, Ginger A; Flicek, Paul; Foley, Karen; Frankel, Wayne N; Fulton, Lucinda A; Fulton, Robert S; Furey, Terrence S; Gage, Diane; Gibbs, Richard A; Glusman, Gustavo; Gnerre, Sante; Goldman, Nick; Goodstadt, Leo; Grafham, Darren; Graves, Tina A; Green, Eric D; Gregory, Simon; Guigó, Roderic; Guyer, Mark; Hardison, Ross C; Haussler, David; Hayashizaki, Yoshihide; Hillier, LaDeana W; Hinrichs, Angela; Hlavina, Wratko; Holzer, Timothy; Hsu, Fan; Hua, Axin; Hubbard, Tim; Hunt, Adrienne; Jackson, Ian; Jaffe, David B; Johnson, L Steven; Jones, Matthew; Jones, Thomas A; Joy, Ann; Kamal, Michael; Karlsson, Elinor K; Karolchik, Donna; Kasprzyk, Arkadiusz; Kawai, Jun; Keibler, Evan; Kells, Cristyn; Kent, W James; Kirby, Andrew; Kolbe, Diana L; Korf, Ian; Kucherlapati, Raju S; Kulbokas, Edward J; Kulp, David; Landers, Tom; Leger, J P; Leonard, Steven; Letunic, Ivica; Levine, Rosie; Li, Jia; Li, Ming; Lloyd, Christine; Lucas, Susan; Ma, Bin; Maglott, Donna R; Mardis, Elaine R; Matthews, Lucy; Mauceli, Evan; Mayer, John H; McCarthy, Megan; McCombie, W Richard; McLaren, Stuart; McLay, Kirsten; McPherson, John D; Meldrim, Jim; Meredith, Beverley; Mesirov, Jill P; Miller, Webb; Miner, Tracie L; Mongin, Emmanuel; Montgomery, Kate T; Morgan, Michael; Mott, Richard; Mullikin, James C; Muzny, Donna M; Nash, William E; Nelson, Joanne O; Nhan, Michael N; Nicol, Robert; Ning, Zemin; Nusbaum, Chad; O'Connor, Michael J; Okazaki, Yasushi; Oliver, Karen; Overton-Larty, Emma; Pachter, Lior; Parra, Genís; Pepin, Kymberlie H; Peterson, Jane; Pevzner, Pavel; Plumb, Robert; Pohl, Craig S; Poliakov, Alex; Ponce, Tracy C; Ponting, Chris P; Potter, Simon; Quail, Michael; Reymond, Alexandre; Roe, Bruce A; Roskin, Krishna M; Rubin, Edward M; Rust, Alistair G; Santos, Ralph; Sapojnikov, Victor; Schultz, Brian; Schultz, Jörg; Schwartz, Matthias S; Schwartz, Scott; Scott, Carol; Seaman, Steven; Searle, Steve; Sharpe, Ted; Sheridan, Andrew; Shownkeen, Ratna; Sims, Sarah; Singer, Jonathan B; Slater, Guy; Smit, Arian; Smith, Douglas R; Spencer, Brian; Stabenau, Arne; Stange-Thomann, Nicole; Sugnet, Charles; Suyama, Mikita; Tesler, Glenn; Thompson, Johanna; Torrents, David; Trevaskis, Evanne; Tromp, John; Ucla, Catherine; Ureta-Vidal, Abel; Vinson, Jade P; Von Niederhausern, Andrew C; Wade, Claire M; Wall, Melanie; Weber, Ryan J; Weiss, Robert B; Wendl, Michael C; West, Anthony P; Wetterstrand, Kris; Wheeler, Raymond; Whelan, Simon; Wierzbowski, Jamey; Willey, David; Williams, Sophie; Wilson, Richard K; Winter, Eitan; Worley, Kim C; Wyman, Dudley; Yang, Shan; Yang, Shiaw-Pyng; Zdobnov, Evgeny M; Zody, Michael C; Lander, Eric S

    2002-12-05

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.

  11. e-Fungi: a data resource for comparative analysis of fungal genomes

    PubMed Central

    Hedeler, Cornelia; Wong, Han Min; Cornell, Michael J; Alam, Intikhab; Soanes, Darren M; Rattray, Magnus; Hubbard, Simon J; Talbot, Nicholas J; Oliver, Stephen G; Paton, Norman W

    2007-01-01

    Background The number of sequenced fungal genomes is ever increasing, with about 200 genomes already fully sequenced or in progress. Only a small percentage of those genomes have been comprehensively studied, for example using techniques from functional genomics. Comparative analysis has proven to be a useful strategy for enhancing our understanding of evolutionary biology and of the less well understood genomes. However, the data required for these analyses tends to be distributed in various heterogeneous data sources, making systematic comparative studies a cumbersome task. Furthermore, comparative analyses benefit from close integration of derived data sets that cluster genes or organisms in a way that eases the expression of requests that clarify points of similarity or difference between species. Description To support systematic comparative analyses of fungal genomes we have developed the e-Fungi database, which integrates a variety of data for more than 30 fungal genomes. Publicly available genome data, functional annotations, and pathway information has been integrated into a single data repository and complemented with results of comparative analyses, such as MCL and OrthoMCL cluster analysis, and predictions of signaling proteins and the sub-cellular localisation of proteins. To access the data, a library of analysis tasks is available through a web interface. The analysis tasks are motivated by recent comparative genomics studies, and aim to support the study of evolutionary biology as well as community efforts for improving the annotation of genomes. Web services for each query are also available, enabling the tasks to be incorporated into workflows. Conclusion The e-Fungi database provides fungal biologists with a resource for comparative studies of a large range of fungal genomes. Its analysis library supports the comparative study of genome data, functional annotation, and results of large scale analyses over all the genomes stored in the database

  12. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus.

    PubMed

    Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark; Arakawa, Kazuharu

    2017-07-01

    Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda.

  13. Comparative genomics of the tardigrades Hypsibius dujardini and Ramazzottius varieornatus

    PubMed Central

    Yoshida, Yuki; Koutsovoulos, Georgios; Laetsch, Dominik R.; Stevens, Lewis; Kumar, Sujai; Horikawa, Daiki D.; Ishino, Kyoko; Komine, Shiori; Kunieda, Takekazu; Tomita, Masaru; Blaxter, Mark

    2017-01-01

    Tardigrada, a phylum of meiofaunal organisms, have been at the center of discussions of the evolution of Metazoa, the biology of survival in extreme environments, and the role of horizontal gene transfer in animal evolution. Tardigrada are placed as sisters to Arthropoda and Onychophora (velvet worms) in the superphylum Panarthropoda by morphological analyses, but many molecular phylogenies fail to recover this relationship. This tension between molecular and morphological understanding may be very revealing of the mode and patterns of evolution of major groups. Limnoterrestrial tardigrades display extreme cryptobiotic abilities, including anhydrobiosis and cryobiosis, as do bdelloid rotifers, nematodes, and other animals of the water film. These extremophile behaviors challenge understanding of normal, aqueous physiology: how does a multicellular organism avoid lethal cellular collapse in the absence of liquid water? Meiofaunal species have been reported to have elevated levels of horizontal gene transfer (HGT) events, but how important this is in evolution, and particularly in the evolution of extremophile physiology, is unclear. To address these questions, we resequenced and reassembled the genome of H. dujardini, a limnoterrestrial tardigrade that can undergo anhydrobiosis only after extensive pre-exposure to drying conditions, and compared it to the genome of R. varieornatus, a related species with tolerance to rapid desiccation. The 2 species had contrasting gene expression responses to anhydrobiosis, with major transcriptional change in H. dujardini but limited regulation in R. varieornatus. We identified few horizontally transferred genes, but some of these were shown to be involved in entry into anhydrobiosis. Whole-genome molecular phylogenies supported a Tardigrada+Nematoda relationship over Tardigrada+Arthropoda, but rare genomic changes tended to support Tardigrada+Arthropoda. PMID:28749982

  14. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources

    PubMed Central

    Klima, Cassidy L.; Cook, Shaun R.; Zaheer, Rahat; Laing, Chad; Gannon, Vick P.; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W.; McAllister, Tim A.

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2–8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  15. Detection of chromosomal abnormalities by comparative genomic hybridization.

    PubMed

    Lapierre, Jean-Michel; Tachdjian, Gérard

    2005-04-01

    Comparative genomic hybridization (CGH) is a modified in-situ hybridization technique. In this type of analysis, two differentially labeled genomic DNAs (study and reference) are cohybridized to normal metaphase spreads or to microarray. Chromosomal locations of copy number changes in the DNA segments of the study genome are revealed by a variable fluorescence intensity ratio along each target chromosome. Thus, CGH allows detection and mapping of DNA sequence copy differences between two genomes in a single experiment. Since its development, comparative genomic hybridization has been applied mostly as a research tool in the field of cancer cytogenetics to identify genetic changes in many previously unknown regions. It is also a powerful tool for detection and identification of unbalanced chromosomal abnormalities in prenatal, postnatal and preimplantation diagnostics. The development of comparative genomic hybridization and increase in resolution analysis by using the microarray-based technique offer new information on chromosomal pathologies and thus better management of patients.

  16. Analysis of the allohexaploid bread wheat genome (Triticum aestivum) using comparative whole genome shotgun sequencing

    USDA-ARS?s Scientific Manuscript database

    The large 17 Gb allopolyploid genome of bread wheat is a major challenge for genome analysis because it is composed of three closely- related and independently maintained genomes, with genes dispersed as small “islands” separated by vast tracts of repetitive DNA. We used a novel comparative genomi...

  17. Human-mouse comparative genomics: successes and failures to reveal functional regions of the human genome

    SciTech Connect

    Pennacchio, Len A.; Baroukh, Nadine; Rubin, Edward M.

    2003-05-15

    Deciphering the genetic code embedded within the human genome remains a significant challenge despite the human genome consortium's recent success at defining its linear sequence (Lander et al. 2001; Venter et al. 2001). While useful strategies exist to identify a large percentage of protein encoding regions, efforts to accurately define functional sequences in the remaining {approx}97 percent of the genome lag. Our primary interest has been to utilize the evolutionary relationship and the universal nature of genomic sequence information in vertebrates to reveal functional elements in the human genome. This has been achieved through the combined use of vertebrate comparative genomics to pinpoint highly conserved sequences as candidates for biological activity and transgenic mouse studies to address the functionality of defined human DNA fragments. Accordingly, we describe strategies and insights into functional sequences in the human genome through the use of comparative genomics coupled wit h functional studies in the mouse.

  18. AcCNET (Accessory Genome Constellation Network): comparative genomics software for accessory genome analysis using bipartite networks.

    PubMed

    Lanza, Val F; Baquero, Fernando; de la Cruz, Fernando; Coque, Teresa M

    2017-01-15

    AcCNET (Accessory genome Constellation Network) is a Perl application that aims to compare accessory genomes of a large number of genomic units, both at qualitative and quantitative levels. Using the proteomes extracted from the analysed genomes, AcCNET creates a bipartite network compatible with standard network analysis platforms. AcCNET allows merging phylogenetic and functional information about the concerned genomes, thus improving the capability of current methods of network analysis. The AcCNET bipartite network opens a new perspective to explore the pangenome of bacterial species, focusing on the accessory genome behind the idiosyncrasy of a particular strain and/or population.

  19. Comparative genomics of the lactic acid bacteria

    SciTech Connect

    Makarova, K.; Slesarev, A.; Wolf, Y.; Sorokin, A.; Mirkin, B.; Koonin, E.; Pavlov, A.; Pavlova, N.; Karamychev, V.; Polouchine, N.; Shakhova, V.; Grigoriev, I.; Lou, Y.; Rokhsar, D.; Lucas, S.; Huang, K.; Goodstein, D. M.; Hawkins, T.; Plengvidhya, V.; Welker, D.; Hughes, J.; Goh, Y.; Benson, A.; Baldwin, K.; Lee, J. -H.; Diaz-Muniz, I.; Dosti, B.; Smeianov, V; Wechter, W.; Barabote, R.; Lorca, G.; Altermann, E.; Barrangou, R.; Ganesan, B.; Xie, Y.; Rawsthorne, H.; Tamir, D.; Parker, C.; Breidt, F.; Broadbent, J.; Hutkins, R.; O'Sullivan, D.; Steele, J.; Unlu, G.; Saier, M.; Klaenhammer, T.; Richardson, P.; Kozyavkin, S.; Weimer, B.; Mills, D.

    2006-06-01

    Lactic acid-producing bacteria are associated with various plant and animal niches and play a key role in the production of fermented foods and beverages. We report nine genome sequences representing the phylogenetic and functional diversity of these bacteria. The small genomes of lactic acid bacteria encode a broad repertoire of transporters for efficient carbon and nitrogen acquisition from the nutritionally rich environments they inhabit and reflect a limited range of biosynthetic capabilities that indicate both prototrophic and auxotrophic strains. Phylogenetic analyses, comparison of gene content across the group, and reconstruction of ancestral gene sets indicate a combination of extensive gene loss and key gene acquisitions via horizontal gene transfer during the coevolution of lactic acid bacteria with their habitats.

  20. Comparative genomics of trypanosomatid parasitic protozoa.

    PubMed

    El-Sayed, Najib M; Myler, Peter J; Blandin, Gaëlle; Berriman, Matthew; Crabtree, Jonathan; Aggarwal, Gautam; Caler, Elisabet; Renauld, Hubert; Worthey, Elizabeth A; Hertz-Fowler, Christiane; Ghedin, Elodie; Peacock, Christopher; Bartholomeu, Daniella C; Haas, Brian J; Tran, Anh-Nhi; Wortman, Jennifer R; Alsmark, U Cecilia M; Angiuoli, Samuel; Anupama, Atashi; Badger, Jonathan; Bringaud, Frederic; Cadag, Eithon; Carlton, Jane M; Cerqueira, Gustavo C; Creasy, Todd; Delcher, Arthur L; Djikeng, Appolinaire; Embley, T Martin; Hauser, Christopher; Ivens, Alasdair C; Kummerfeld, Sarah K; Pereira-Leal, Jose B; Nilsson, Daniel; Peterson, Jeremy; Salzberg, Steven L; Shallom, Joshua; Silva, Joana C; Sundaram, Jaideep; Westenberger, Scott; White, Owen; Melville, Sara E; Donelson, John E; Andersson, Björn; Stuart, Kenneth D; Hall, Neil

    2005-07-15

    A comparison of gene content and genome architecture of Trypanosoma brucei, Trypanosoma cruzi, and Leishmania major, three related pathogens with different life cycles and disease pathology, revealed a conserved core proteome of about 6200 genes in large syntenic polycistronic gene clusters. Many species-specific genes, especially large surface antigen families, occur at nonsyntenic chromosome-internal and subtelomeric regions. Retroelements, structural RNAs, and gene family expansion are often associated with syntenic discontinuities that-along with gene divergence, acquisition and loss, and rearrangement within the syntenic regions-have shaped the genomes of each parasite. Contrary to recent reports, our analyses reveal no evidence that these species are descended from an ancestor that contained a photosynthetic endosymbiont.

  1. An Efficient and Robust Statistical Modeling Approach to Discover Differentially Expressed Genes Using Genomic Expression Profiles

    PubMed Central

    Thomas, Jeffrey G.; Olson, James M.; Tapscott, Stephen J.; Zhao, Lue Ping

    2001-01-01

    We have developed a statistical regression modeling approach to discover genes that are differentially expressed between two predefined sample groups in DNA microarray experiments. Our model is based on well-defined assumptions, uses rigorous and well-characterized statistical measures, and accounts for the heterogeneity and genomic complexity of the data. In contrast to cluster analysis, which attempts to define groups of genes and/or samples that share common overall expression profiles, our modeling approach uses known sample group membership to focus on expression profiles of individual genes in a sensitive and robust manner. Further, this approach can be used to test statistical hypotheses about gene expression. To demonstrate this methodology, we compared the expression profiles of 11 acute myeloid leukemia (AML) and 27 acute lymphoblastic leukemia (ALL) samples from a previous study (Golub et al. 1999) and found 141 genes differentially expressed between AML and ALL with a 1% significance at the genomic level. Using this modeling approach to compare different sample groups within the AML samples, we identified a group of genes whose expression profiles correlated with that of thrombopoietin and found that genes whose expression associated with AML treatment outcome lie in recurrent chromosomal locations. Our results are compared with those obtained using t-tests or Wilcoxon rank sum statistics. PMID:11435405

  2. GenColors-based comparative genome databases for small eukaryotic genomes.

    PubMed

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources.

  3. GenColors-based comparative genome databases for small eukaryotic genomes

    PubMed Central

    Felder, Marius; Romualdi, Alessandro; Petzold, Andreas; Platzer, Matthias; Sühnel, Jürgen; Glöckner, Gernot

    2013-01-01

    Many sequence data repositories can give a quick and easily accessible overview on genomes and their annotations. Less widespread is the possibility to compare related genomes with each other in a common database environment. We have previously described the GenColors database system (http://gencolors.fli-leibniz.de) and its applications to a number of bacterial genomes such as Borrelia, Legionella, Leptospira and Treponema. This system has an emphasis on genome comparison. It combines data from related genomes and provides the user with an extensive set of visualization and analysis tools. Eukaryote genomes are normally larger than prokaryote genomes and thus pose additional challenges for such a system. We have, therefore, adapted GenColors to also handle larger datasets of small eukaryotic genomes and to display eukaryotic gene structures. Further recent developments include whole genome views, genome list options and, for bacterial genome browsers, the display of horizontal gene transfer predictions. Two new GenColors-based databases for two fungal species (http://fgb.fli-leibniz.de) and for four social amoebas (http://sacgb.fli-leibniz.de) were set up. Both new resources open up a single entry point for related genomes for the amoebozoa and fungal research communities and other interested users. Comparative genomics approaches are greatly facilitated by these resources. PMID:23193285

  4. Comparative genetics and genomics of nematodes: genome structure, development, and lifestyle.

    PubMed

    Sommer, Ralf J; Streit, Adrian

    2011-01-01

    Nematodes are found in virtually all habitats on earth. Many of them are parasites of plants and animals, including humans. The free-living nematode, Caenorhabditis elegans, is one of the genetically best-studied model organisms and was the first metazoan whose genome was fully sequenced. In recent years, the draft genome sequences of another six nematodes representing four of the five major clades of nematodes were published. Compared to mammalian genomes, all these genomes are very small. Nevertheless, they contain almost the same number of genes as the human genome. Nematodes are therefore a very attractive system for comparative genetic and genomic studies, with C. elegans as an excellent baseline. Here, we review the efforts that were made to extend genetic analysis to nematodes other than C. elegans, and we compare the seven available nematode genomes. One of the most striking findings is the unexpectedly high incidence of gene acquisition through horizontal gene transfer (HGT).

  5. Ten years of bacterial genome sequencing: comparative-genomics-based discoveries.

    PubMed

    Binnewies, Tim T; Motro, Yair; Hallin, Peter F; Lund, Ole; Dunn, David; La, Tom; Hampson, David J; Bellgard, Matthew; Wassenaar, Trudy M; Ussery, David W

    2006-07-01

    It has been more than 10 years since the first bacterial genome sequence was published. Hundreds of bacterial genome sequences are now available for comparative genomics, and searching a given protein against more than a thousand genomes will soon be possible. The subject of this review will address a relatively straightforward question: "What have we learned from this vast amount of new genomic data?" Perhaps one of the most important lessons has been that genetic diversity, at the level of large-scale variation amongst even genomes of the same species, is far greater than was thought. The classical textbook view of evolution relying on the relatively slow accumulation of mutational events at the level of individual bases scattered throughout the genome has changed. One of the most obvious conclusions from examining the sequences from several hundred bacterial genomes is the enormous amount of diversity--even in different genomes from the same bacterial species. This diversity is generated by a variety of mechanisms, including mobile genetic elements and bacteriophages. An examination of the 20 Escherichia coli genomes sequenced so far dramatically illustrates this, with the genome size ranging from 4.6 to 5.5 Mbp; much of the variation appears to be of phage origin. This review also addresses mobile genetic elements, including pathogenicity islands and the structure of transposable elements. There are at least 20 different methods available to compare bacterial genomes. Metagenomics offers the chance to study genomic sequences found in ecosystems, including genomes of species that are difficult to culture. It has become clear that a genome sequence represents more than just a collection of gene sequences for an organism and that information concerning the environment and growth conditions for the organism are important for interpretation of the genomic data. The newly proposed Minimal Information about a Genome Sequence standard has been developed to obtain this

  6. Computational Methods for the Analysis of Array Comparative Genomic Hybridization

    PubMed Central

    Chari, Raj; Lockwood, William W.; Lam, Wan L.

    2006-01-01

    Array comparative genomic hybridization (array CGH) is a technique for assaying the copy number status of cancer genomes. The widespread use of this technology has lead to a rapid accumulation of high throughput data, which in turn has prompted the development of computational strategies for the analysis of array CGH data. Here we explain the principles behind array image processing, data visualization and genomic profile analysis, review currently available software packages, and raise considerations for future software development. PMID:17992253

  7. Unclassified renal cell carcinoma: a clinicopathological, comparative genomic hybridization, and whole-genome exon sequencing study

    PubMed Central

    Hu, Zhen-Yan; Pang, Li-Juan; Qi, Yan; Kang, Xue-Ling; Hu, Jian-Ming; Wang, Lianghai; Liu, Kun-Peng; Ren, Yuan; Cui, Mei; Song, Li-Li; Li, Hong-An; Zou, Hong; Li, Feng

    2014-01-01

    Unclassified renal cell carcinoma (URCC) is a rare variant of RCC, accounting for only 3-5% of all cases. Studies on the molecular genetics of URCC are limited, and hence, we report on 2 cases of URCC analyzed using comparative genome hybridization (CGH) and the genome-wide human exon GeneChip technique to identify the genomic alterations of URCC. Both URCC patients (mean age, 72 years) presented at an advanced stage and died within 30 months post-surgery. Histologically, the URCCs were composed of undifferentiated, multinucleated, giant cells with eosinophilic cytoplasm. Immunostaining revealed that both URCC cases had strong p53 protein expression and partial expression of cluster of differentiation-10 and cytokeratin. The CGH profiles showed chromosomal imbalances in both URCC cases: gains were observed in chromosomes 1p11-12, 1q12-13, 2q20-23, 3q22-23, 8p12, and 16q11-15, whereas losses were detected on chromosomes 1q22-23, 3p12-22, 5p30-ter, 6p, 11q, 16q18-22, 17p12-14, and 20p. Compared with 18 normal renal tissues, 40 mutated genes were detected in the URCC tissues, including 32 missense and 8 silent mutations. Functional enrichment analysis revealed that the missense mutation genes were involved in 11 different biological processes and pathways, including cell cycle regulation, lipid localization and transport, neuropeptide signaling, organic ether metabolism, and ATP-binding cassette transporter signaling. Our findings indicate that URCC may be a highly aggressive cancer, and the genetic alterations identified herein may provide clues regarding the tumorigenesis of URCC and serve as a basis for the development of targeted therapies against URCC in the future. PMID:25120763

  8. Comparative Genomics of an Emerging Amphibian Virus

    PubMed Central

    Epstein, Brendan; Storfer, Andrew

    2015-01-01

    Ranaviruses, a genus of the Iridoviridae, are large double-stranded DNA viruses that infect cold-blooded vertebrates worldwide. Ranaviruses have caused severe epizootics in commercial frog and fish populations, and are currently classified as notifiable pathogens in international trade. Previous work shows that a ranavirus that infects tiger salamanders throughout Western North America (Ambystoma tigrinum virus, or ATV) is in high prevalence among salamanders in the fishing bait trade. Bait ATV strains have elevated virulence and are transported long distances by humans, providing widespread opportunities for pathogen pollution. We sequenced the genomes of 15 strains of ATV collected from tiger salamanders across western North America and performed phylogenetic and population genomic analyses and tests for recombination. We find that ATV forms a monophyletic clade within the rest of the Ranaviruses and that it likely emerged within the last several thousand years, before human activities influenced its spread. We also identify several genes under strong positive selection, some of which appear to be involved in viral virulence and/or host immune evasion. In addition, we provide support for the pathogen pollution hypothesis with evidence of recombination among ATV strains, and potential bait-endemic strain recombination. PMID:26530419

  9. Comparative Genomics of an Emerging Amphibian Virus.

    PubMed

    Epstein, Brendan; Storfer, Andrew

    2015-11-03

    Ranaviruses, a genus of the Iridoviridae, are large double-stranded DNA viruses that infect cold-blooded vertebrates worldwide. Ranaviruses have caused severe epizootics in commercial frog and fish populations, and are currently classified as notifiable pathogens in international trade. Previous work shows that a ranavirus that infects tiger salamanders throughout Western North America (Ambystoma tigrinum virus, or ATV) is in high prevalence among salamanders in the fishing bait trade. Bait ATV strains have elevated virulence and are transported long distances by humans, providing widespread opportunities for pathogen pollution. We sequenced the genomes of 15 strains of ATV collected from tiger salamanders across western North America and performed phylogenetic and population genomic analyses and tests for recombination. We find that ATV forms a monophyletic clade within the rest of the Ranaviruses and that it likely emerged within the last several thousand years, before human activities influenced its spread. We also identify several genes under strong positive selection, some of which appear to be involved in viral virulence and/or host immune evasion. In addition, we provide support for the pathogen pollution hypothesis with evidence of recombination among ATV strains, and potential bait-endemic strain recombination.

  10. Development and characterization of genomic and expressed SSRs in citrus by genome-wide analysis.

    PubMed

    Liu, Sheng-Rui; Li, Wen-Yang; Long, Dang; Hu, Chun-Gen; Zhang, Jin-Zhi

    2013-01-01

    Microsatellites or simple sequence repeats (SSRs) are one of the most popular sources of genetic markers and play a significant role in plant genetics and breeding. In this study, we identified citrus SSRs in the genome of Clementine mandarin and analyzed their frequency and distribution in different genomic regions. A total of 80,708 SSRs were detected in the genome with an overall density of 268 SSRs/Mb. While di-nucleotide repeats were the most frequent microsatellites in genomic DNA sequence, tetra-nucleotides, which had more repeat units than any other SSR types, had the highest cumulative sequence length. We identified 6,834 transcripts as containing 8,989 SSRs in 33,929 Clementine mandarin transcripts, among which, tri-nucleotide motifs (36.0%) were the most common, followed by di-nucleotide (26.9%) and hexa-nucleotide motifs (15.1%). The motif AG (16.7%) was most abundant among these SSRs, while motifs AAG (6.6%), AAT (5.0%), and TAG (2.2%) were most common among tri-nucleotides. Functional categorization of transcripts containing SSRs revealed that 5,879 (86.0%) of such transcripts had homology with known proteins, GO and KEGG annotation revealed that transcripts containing SSRs were those implicated in diverse biological processes in plants, including binding, development, transcription, and protein degradation. When 27 genomic and 78 randomly selected SSRs were tested on Clementine mandarin, 95 SSRs revealed polymorphism. These 95 SSRs were further deployed on 18 genotypes of the three generas of Rutaceae for the genetic diversity assessment, genomic SSRs generally show low transferability in comparison to SSRs developed from expressed sequences. These transcript-markers identified in our study may provide a valuable genetic and genomic tool for further genetic research and varietal development in citrus, such as diversity study, QTL mapping, molecular breeding, comparative mapping and other genetic analyses.

  11. Comparative genomics of pectinacetylesterases: Insight on function and biology

    PubMed Central

    de Souza, Amancio José; Pauly, Markus

    2015-01-01

    Pectin acetylation influences the gelling ability of this important plant polysaccharide for the food industry. Plant apoplastic pectinacetylesterases (PAEs) play a key role in regulating the degree of pectin acetylation and modifying their expression thus represents one way to engineer plant polysaccharides for food applications. Identifying the major active enzymes within the PAE gene family will aid in our understanding of this biological phenomena as well as provide the tools for direct trait manipulation. Using comparative genomics we propose that there is a minimal set of 4 distinct PAEs in plants. Possible functional diversification of the PAE family in the grasses is also explored with the identification of 3 groups of PAE genes specific to grasses. PMID:26237162

  12. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus.

    PubMed

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria.

  13. Complete Genome Sequence and Comparative Genomics of a Novel Myxobacterium Myxococcus hansupus

    PubMed Central

    Sharma, Gaurav; Narwani, Tarun; Subramanian, Srikrishna

    2016-01-01

    Myxobacteria, a group of Gram-negative aerobes, belong to the class δ-proteobacteria and order Myxococcales. Unlike anaerobic δ-proteobacteria, they exhibit several unusual physiogenomic properties like gliding motility, desiccation-resistant myxospores and large genomes with high coding density. Here we report a 9.5 Mbp complete genome of Myxococcus hansupus that encodes 7,753 proteins. Phylogenomic and genome-genome distance based analysis suggest that Myxococcus hansupus is a novel member of the genus Myxococcus. Comparative genome analysis with other members of the genus Myxococcus was performed to explore their genome diversity. The variation in number of unique proteins observed across different species is suggestive of diversity at the genus level while the overrepresentation of several Pfam families indicates the extent and mode of genome expansion as compared to non-Myxococcales δ-proteobacteria. PMID:26900859

  14. A phylogenetic foundation for comparative mammalian genomics.

    PubMed

    Waddell, P J; Kishino, H; Ota, R

    2001-01-01

    A major effort is being undertaken to sequence an array of mammalian genomes. Coincidentally, the evolutionary relationships of the 18 presently recognized orders of placental mammals are only just being resolved. In this work we construct and analyse the largest alignments of amino acid sequence data to date. Our findings allow us to set up a series of superordinal groups (clades) to act as prior hypotheses for further testing. Important findings include strong evidence for a clade of Euarchonta+Glires (=Supraprimates) comprised of primates, flying lemurs, tree shrews, lagomorphs and rodents. In addition, there is good evidence for a clade of all placental mammals except Xenarthra and Afrotheria (=Boreotheria) and for the previously recognised clades Laurasiatheria, Scrotifera, Fereuungulata, Ferae, Afrotheria, Euarchonta, Glires, and Eulipotyphla. Accordingly, a revised classification of the placental mammals is put forward. Using this and molecular divergence-time methods, the ages of the superordinal splits are estimated. While results are strongly consistent with the earliest superordinal divergences all being >65 mybp (Cretaceous period), they suffer from greater uncertainty than presently appreciated. The early primate split of tarsiers from the anthropoid lineage at ~55 mybp is seen to be an especially informative fossil calibration point. A statistical framework for testing clades using SINE data is presented and reveals significant support for the tarsier/anthropoid clade, as well as the clades Cetruminantia and Whippomorpha. Results also underline our thesis that while sequence analysis can help set up hypothesised clades, SINEs obtainable from sequencing 1-2 MB regions of placental genomes are essential to testing them. In contrast, derivations suggest that empirical Bayesian methods for sequence data may not be robust estimators of clades. Our findings, including the study of genes such as TP53, make a good case for the tree shrew as a closer relative

  15. Hemipteran genomics and psyllid gene expression

    USDA-ARS?s Scientific Manuscript database

    One of the best tools current available is the application of genomics to insect pest problems. Genomics provides rapid elucidation of the genetic basis of insect biology. Research efforts on psyllid genomics, while still in its infancy, is providing information which will aid strategies to suppress...

  16. Comparative genomics of autism and schizophrenia

    PubMed Central

    Crespi, Bernard; Stead, Philip; Elliot, Michael

    2010-01-01

    We used data from studies of copy-number variants (CNVs), single-gene associations, growth-signaling pathways, and intermediate phenotypes associated with brain growth to evaluate four alternative hypotheses for the genomic and developmental relationships between autism and schizophrenia: (i) autism subsumed in schizophrenia, (ii) independence, (iii) diametric, and (iv) partial overlap. Data from CNVs provides statistical support for the hypothesis that autism and schizophrenia are associated with reciprocal variants, such that at four loci, deletions predispose to one disorder, whereas duplications predispose to the other. Data from single-gene studies are inconsistent with a hypothesis based on independence, in that autism and schizophrenia share associated genes more often than expected by chance. However, differentiation between the partial overlap and diametric hypotheses using these data is precluded by limited overlap in the specific genetic markers analyzed in both autism and schizophrenia. Evidence from the effects of risk variants on growth-signaling pathways shows that autism-spectrum conditions tend to be associated with up-regulation of pathways due to loss of function mutations in negative regulators, whereas schizophrenia is associated with reduced pathway activation. Finally, data from studies of head and brain size phenotypes indicate that autism is commonly associated with developmentally-enhanced brain growth, whereas schizophrenia is characterized, on average, by reduced brain growth. These convergent lines of evidence appear most compatible with the hypothesis that autism and schizophrenia represent diametric conditions with regard to their genomic underpinnings, neurodevelopmental bases, and phenotypic manifestations as reflecting under-development versus dysregulated over-development of the human social brain. PMID:19955444

  17. Regulation of methane genes and genome expression

    SciTech Connect

    John N. Reeve

    2009-09-09

    At the start of this project, it was known that methanogens were Archaeabacteria (now Archaea) and were therefore predicted to have gene expression and regulatory systems different from Bacteria, but few of the molecular biology details were established. The goals were then to establish the structures and organizations of genes in methanogens, and to develop the genetic technologies needed to investigate and dissect methanogen gene expression and regulation in vivo. By cloning and sequencing, we established the gene and operon structures of all of the “methane” genes that encode the enzymes that catalyze methane biosynthesis from carbon dioxide and hydrogen. This work identified unique sequences in the methane gene that we designated mcrA, that encodes the largest subunit of methyl-coenzyme M reductase, that could be used to identify methanogen DNA and establish methanogen phylogenetic relationships. McrA sequences are now the accepted standard and used extensively as hybridization probes to identify and quantify methanogens in environmental research. With the methane genes in hand, we used northern blot and then later whole-genome microarray hybridization analyses to establish how growth phase and substrate availability regulated methane gene expression in Methanobacterium thermautotrophicus ΔH (now Methanothermobacter thermautotrophicus). Isoenzymes or pairs of functionally equivalent enzymes catalyze several steps in the hydrogen-dependent reduction of carbon dioxide to methane. We established that hydrogen availability determine which of these pairs of methane genes is expressed and therefore which of the alternative enzymes is employed to catalyze methane biosynthesis under different environmental conditions. As were unable to establish a reliable genetic system for M. thermautotrophicus, we developed in vitro transcription as an alternative system to investigate methanogen gene expression and regulation. This led to the discovery that an archaeal protein

  18. Gramene 2016: comparative plant genomics and pathway resources

    USDA-ARS?s Scientific Manuscript database

    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the data...

  19. Phytozome: a comparative platform for green plant genomics.

    PubMed

    Goodstein, David M; Shu, Shengqiang; Howson, Russell; Neupane, Rochak; Hayes, Richard D; Fazo, Joni; Mitros, Therese; Dirks, William; Hellsten, Uffe; Putnam, Nicholas; Rokhsar, Daniel S

    2012-01-01

    The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

  20. Comparative genomic hybridizations reveal absence of large Streptomyces coelicolor genomic islands in Streptomyces lividans

    PubMed Central

    Jayapal, Karthik P; Lian, Wei; Glod, Frank; Sherman, David H; Hu, Wei-Shou

    2007-01-01

    Background The genomes of Streptomyces coelicolor and Streptomyces lividans bear a considerable degree of synteny. While S. coelicolor is the model streptomycete for studying antibiotic synthesis and differentiation, S. lividans is almost exclusively considered as the preferred host, among actinomycetes, for cloning and expression of exogenous DNA. We used whole genome microarrays as a comparative genomics tool for identifying the subtle differences between these two chromosomes. Results We identified five large S. coelicolor genomic islands (larger than 25 kb) and 18 smaller islets absent in S. lividans chromosome. Many of these regions show anomalous GC bias and codon usage patterns. Six of them are in close vicinity of tRNA genes while nine are flanked with near perfect repeat sequences indicating that these are probable recent evolutionary acquisitions into S. coelicolor. Embedded within these segments are at least four DNA methylases and two probable methyl-sensing restriction endonucleases. Comparison with S. coelicolor transcriptome and proteome data revealed that some of the missing genes are active during the course of growth and differentiation in S. coelicolor. In particular, a pair of methylmalonyl CoA mutase (mcm) genes involved in polyketide precursor biosynthesis, an acyl-CoA dehydrogenase implicated in timing of actinorhodin synthesis and bldB, a developmentally significant regulator whose mutation causes complete abrogation of antibiotic synthesis belong to this category. Conclusion Our findings provide tangible hints for elucidating the genetic basis of important phenotypic differences between these two streptomycetes. Importantly, absence of certain genes in S. lividans identified here could potentially explain the relative ease of DNA transformations and the conditional lack of actinorhodin synthesis in S. lividans. PMID:17623098

  1. Functional and Comparative Genomics of Lignocellulose Degradation by Schizophyllum commune

    SciTech Connect

    Ohm, Robin A.; Lee, Hanbyul; Park, Hongjae; Brewer, Heather M.; Carver, Akiko; Copeland, Alex; Grimwood, Jane; Lindquist, Erika; Lipzen, Anna; Martin, Joel; Purvine, Samuel O.; Schackwitz, Wendy; Tegelaar, Martin; Tritt, Andrew; Baker, Scott; Choi, In-Geol; Lugones, Luis G.; Wosten, Han A. B.; Grigoriev, Igor V.

    2014-03-14

    The Basidiomycete fungus Schizophyllum commune is a wood-decaying fungus and is used as a model system to study lignocellulose degradation. Version 3.0 of the genome assembly filled 269 of 316 sequence gaps and added 680 kb of sequence. This new assembly was reannotated using RNAseq transcriptomics data, and this resulted in 3110 (24percent) more genes. Two additional S. commune strains with different wood-decaying properties were sequenced, from Tattone (France) and Loenen (The Netherlands). Sequence comparison shows remarkably high sequence diversity between the strains. The overall SNP rate of > 100 SNPs/kb is among the highest rates of within-species polymorphisms in Basidiomycetes. Some well-described proteins like hydrophobins and transcription factors have less than 70percent sequence identity among the strains. Some chromosomes are better conserved than others and in some cases large parts of chromosomes are missing from one or more strains. Gene expression on glucose, cellulose and wood was analyzed in two S. commune strains. Overall, gene expression correlated between the two strains, but there were some notable exceptions. Of particular interest are CAZymes (carbohydrate-active enzymes) that are regulated in different ways in the different strains. In both strains the transcription factor Fsp1 was strongly up-regulated during growth on cellulose and wood, when compared to glucose. Over-expression of Fsp1 using a constitutive promoter resulted in higher cellulose and xylose-degrading enzyme activity, which suggests that Fsp1 is involved in regulating CAZyme gene expression. Two CAZyme genes (of family GH61 and GH11) were shown to be strongly up-regulated during growth on cellulose, compared to glucose. Proteomics on the secreted proteins in the growth medium confirmed this. A promoter analysis revealed the shortest active promoters for these two genes, as well as putative transcription factor binding sites.

  2. In silico comparative genomic analysis of GABAA receptor transcriptional regulation.

    PubMed

    Joyce, Christopher J

    2007-06-30

    Subtypes of the GABAA receptor subunit exhibit diverse temporal and spatial expression patterns. In silico comparative analysis was used to predict transcriptional regulatory features in individual mammalian GABAA receptor subunit genes, and to identify potential transcriptional regulatory components involved in the coordinate regulation of the GABAA receptor gene clusters. Previously unreported putative promoters were identified for the beta2, gamma1, gamma3, epsilon, theta and pi subunit genes. Putative core elements and proximal transcriptional factors were identified within these predicted promoters, and within the experimentally determined promoters of other subunit genes. Conserved intergenic regions of sequence in the mammalian GABAA receptor gene cluster comprising the alpha1, beta2, gamma2 and alpha6 subunits were identified as potential long range transcriptional regulatory components involved in the coordinate regulation of these genes. A region of predicted DNase I hypersensitive sites within the cluster may contain transcriptional regulatory features coordinating gene expression. A novel model is proposed for the coordinate control of the gene cluster and parallel expression of the alpha1 and beta2 subunits, based upon the selective action of putative Scaffold/Matrix Attachment Regions (S/MARs). The putative regulatory features identified by genomic analysis of GABAA receptor genes were substantiated by cross-species comparative analysis and now require experimental verification. The proposed model for the coordinate regulation of genes in the cluster accounts for the head-to-head orientation and parallel expression of the alpha1 and beta2 subunit genes, and for the disruption of transcription caused by insertion of a neomycin gene in the close vicinity of the alpha6 gene, which is proximal to a putative critical S/MAR.

  3. Sinbase: an integrated database to study genomics, genetics and comparative genomics in Sesamum indicum.

    PubMed

    Wang, Linhai; Yu, Jingyin; Li, Donghua; Zhang, Xiurong

    2015-01-01

    Sesame (Sesamum indicum L.) is an ancient and important oilseed crop grown widely in tropical and subtropical areas. It belongs to the gigantic order Lamiales, which includes many well-known or economically important species, such as olive (Olea europaea), leonurus (Leonurus japonicus) and lavender (Lavandula spica), many of which have important pharmacological properties. Despite their importance, genetic and genomic analyses on these species have been insufficient due to a lack of reference genome information. The now available S. indicum genome will provide an unprecedented opportunity for studying both S. indicum genetic traits and comparative genomics. To deliver S. indicum genomic information to the worldwide research community, we designed Sinbase, a web-based database with comprehensive sesame genomic, genetic and comparative genomic information. Sinbase includes sequences of assembled sesame pseudomolecular chromosomes, protein-coding genes (27,148), transposable elements (372,167) and non-coding RNAs (1,748). In particular, Sinbase provides unique and valuable information on colinear regions with various plant genomes, including Arabidopsis thaliana, Glycine max, Vitis vinifera and Solanum lycopersicum. Sinbase also provides a useful search function and data mining tools, including a keyword search and local BLAST service. Sinbase will be updated regularly with new features, improvements to genome annotation and new genomic sequences, and is freely accessible at http://ocri-genomics.org/Sinbase/.

  4. Comparative genomic analysis of Acinetobacter oleivorans DR1 to determine strain-specific genomic regions and gentisate biodegradation.

    PubMed

    Jung, Jaejoon; Madsen, Eugene L; Jeon, Che Ok; Park, Woojun

    2011-10-01

    The comparative genomics of Acinetobacter oleivorans DR1 assayed with A. baylyi ADP1, A. calcoaceticus PHEA-2, and A. baumannii ATCC 17978 revealed that the incorporation of phage-related genomic regions and the absence of transposable elements have contributed to the large size (4.15 Mb) of the DR1 genome. A horizontally transferred genomic region and a higher proportion of transcriptional regulator- and signal peptide-coding genes were identified as characteristics of the DR1 genome. Incomplete glucose metabolism, metabolic pathways of aromatic compounds, biofilm formation, antibiotics and metal resistance, and natural competence genes were conserved in four compared genomes. Interestingly, only strain DR1 possesses gentisate 1,2-dioxygenase (nagI) and grows on gentisate, whereas other species cannot. Expression of the nagI gene was upregulated during gentisate utilization, and four downstream open reading frames (ORFs) were cotranscribed, supporting the notion that gentisate metabolism is a unique characteristic of strain DR1. The genomic analysis of strain DR1 provides additional insights into the function, ecology, and evolution of Acinetobacter species.

  5. Reference-Free Comparative Genomics of 174 Chloroplasts

    PubMed Central

    Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.

    2012-01-01

    Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied

  6. Comparative Genomics and Extensive Recombinations in Phage Communities

    NASA Astrophysics Data System (ADS)

    Poisson, Guylaine; Belcaid, Mahdi; Bergeron, Anne

    Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities.

  7. Enhanced annotations and features for comparing thousands of Pseudomonas genomes in the Pseudomonas genome database.

    PubMed

    Winsor, Geoffrey L; Griffiths, Emma J; Lo, Raymond; Dhillon, Bhavjinder K; Shay, Julie A; Brinkman, Fiona S L

    2016-01-04

    The Pseudomonas Genome Database (http://www.pseudomonas.com) is well known for the application of community-based annotation approaches for producing a high-quality Pseudomonas aeruginosa PAO1 genome annotation, and facilitating whole-genome comparative analyses with other Pseudomonas strains. To aid analysis of potentially thousands of complete and draft genome assemblies, this database and analysis platform was upgraded to integrate curated genome annotations and isolate metadata with enhanced tools for larger scale comparative analysis and visualization. Manually curated gene annotations are supplemented with improved computational analyses that help identify putative drug targets and vaccine candidates or assist with evolutionary studies by identifying orthologs, pathogen-associated genes and genomic islands. The database schema has been updated to integrate isolate metadata that will facilitate more powerful analysis of genomes across datasets in the future. We continue to place an emphasis on providing high-quality updates to gene annotations through regular review of the scientific literature and using community-based approaches including a major new Pseudomonas community initiative for the assignment of high-quality gene ontology terms to genes. As we further expand from thousands of genomes, we plan to provide enhancements that will aid data visualization and analysis arising from whole-genome comparative studies including more pan-genome and population-based approaches.

  8. Dyneins Across Eukaryotes: A Comparative Genomic Analysis

    PubMed Central

    Wickstead, Bill; Gull, Keith

    2007-01-01

    Dyneins are large minus-end-directed microtubule motors. Each dynein contains at least one dynein heavy chain (DHC) and a variable number of intermediate chains (IC), light intermediate chains (LIC) and light chains (LC). Here, we used genome sequence data from 24 diverse eukaryotes to assess the distribution of DHCs, ICs, LICs and LCs across Eukaryota. Phylogenetic inference identified nine DHC families (two cytoplasmic and seven axonemal) and six IC families (one cytoplasmic). We confirm that dyneins have been lost from higher plants and show that this is most likely because of a single loss of cytoplasmic dynein 1 from the ancestor of Rhodophyta and Viridiplantae, followed by lineage-specific losses of other families. Independent losses in Entamoeba mean that at least three extant eukaryotic lineages are entirely devoid of dyneins. Cytoplasmic dynein 2 is associated with intraflagellar transport (IFT), but in two chromalveolate organisms, we find an IFT footprint without the retrograde motor. The distribution of one family of outer-arm dyneins accounts for 2-headed or 3-headed outer-arm ultrastructures observed in different organisms. One diatom species builds motile axonemes without any inner-arm dyneins (IAD), and the unexpected conservation of IAD I1 in non-flagellate algae and LC8 (DYNLL1/2) in all lineages reveals a surprising fluidity to dynein function. PMID:17897317

  9. Genome-Wide Expression Profiling of Complex Regional Pain Syndrome

    PubMed Central

    Jin, Eun-Heui; Zhang, Enji; Ko, Youngkwon; Sim, Woo Seog; Moon, Dong Eon; Yoon, Keon Jung; Hong, Jang Hee; Lee, Won Hyung

    2013-01-01

    Complex regional pain syndrome (CRPS) is a chronic, progressive, and devastating pain syndrome characterized by spontaneous pain, hyperalgesia, allodynia, altered skin temperature, and motor dysfunction. Although previous gene expression profiling studies have been conducted in animal pain models, there genome-wide expression profiling in the whole blood of CRPS patients has not been reported yet. Here, we successfully identified certain pain-related genes through genome-wide expression profiling in the blood from CRPS patients. We found that 80 genes were differentially expressed between 4 CRPS patients (2 CRPS I and 2 CRPS II) and 5 controls (cut-off value: 1.5-fold change and p<0.05). Most of those genes were associated with signal transduction, developmental processes, cell structure and motility, and immunity and defense. The expression levels of major histocompatibility complex class I A subtype (HLA-A29.1), matrix metalloproteinase 9 (MMP9), alanine aminopeptidase N (ANPEP), l-histidine decarboxylase (HDC), granulocyte colony-stimulating factor 3 receptor (G-CSF3R), and signal transducer and activator of transcription 3 (STAT3) genes selected from the microarray were confirmed in 24 CRPS patients and 18 controls by quantitative reverse transcription-polymerase chain reaction (qRT-PCR). We focused on the MMP9 gene that, by qRT-PCR, showed a statistically significant difference in expression in CRPS patients compared to controls with the highest relative fold change (4.0±1.23 times and p = 1.4×10−4). The up-regulation of MMP9 gene in the blood may be related to the pain progression in CRPS patients. Our findings, which offer a valuable contribution to the understanding of the differential gene expression in CRPS may help in the understanding of the pathophysiology of CRPS pain progression. PMID:24244504

  10. Comparative genomics meets topology: a novel view on genome median and halving problems.

    PubMed

    Alexeev, Nikita; Avdeyev, Pavel; Alekseyev, Max A

    2016-11-11

    Genome median and genome halving are combinatorial optimization problems that aim at reconstruction of ancestral genomes by minimizing the number of evolutionary events between them and genomes of the extant species. While these problems have been widely studied in past decades, their solutions are often either not efficient or not biologically adequate. These shortcomings have been recently addressed by restricting the problems solution space. We show that the restricted variants of genome median and halving problems are, in fact, closely related. We demonstrate that these problems have a neat topological interpretation in terms of embedded graphs and polygon gluings. We illustrate how such interpretation can lead to solutions to these problems in particular cases. This study provides an unexpected link between comparative genomics and topology, and demonstrates advantages of solving genome median and halving problems within the topological framework.

  11. Whole Genome Amplification of Labeled Viable Single Cells Suited for Array-Comparative Genomic Hybridization.

    PubMed

    Kroneis, Thomas; El-Heliebi, Amin

    2015-01-01

    Understanding details of a complex biological system makes it necessary to dismantle it down to its components. Immunostaining techniques allow identification of several distinct cell types thereby giving an inside view of intercellular heterogeneity. Often staining reveals that the most remarkable cells are the rarest. To further characterize the target cells on a molecular level, single cell techniques are necessary. Here, we describe the immunostaining, micromanipulation, and whole genome amplification of single cells for the purpose of genomic characterization. First, we exemplify the preparation of cell suspensions from cultured cells as well as the isolation of peripheral mononucleated cells from blood. The target cell population is then subjected to immunostaining. After cytocentrifugation target cells are isolated by micromanipulation and forwarded to whole genome amplification. For whole genome amplification, we use GenomePlex(®) technology allowing downstream genomic analysis such as array-comparative genomic hybridization.

  12. The perennial ryegrass GenomeZipper: targeted use of genome resources for comparative grass genomics.

    PubMed

    Pfeifer, Matthias; Martis, Mihaela; Asp, Torben; Mayer, Klaus F X; Lübberstedt, Thomas; Byrne, Stephen; Frei, Ursula; Studer, Bruno

    2013-02-01

    Whole-genome sequences established for model and major crop species constitute a key resource for advanced genomic research. For outbreeding forage and turf grass species like ryegrasses (Lolium spp.), such resources have yet to be developed. Here, we present a model of the perennial ryegrass (Lolium perenne) genome on the basis of conserved synteny to barley (Hordeum vulgare) and the model grass genome Brachypodium (Brachypodium distachyon) as well as rice (Oryza sativa) and sorghum (Sorghum bicolor). A transcriptome-based genetic linkage map of perennial ryegrass served as a scaffold to establish the chromosomal arrangement of syntenic genes from model grass species. This scaffold revealed a high degree of synteny and macrocollinearity and was then utilized to anchor a collection of perennial ryegrass genes in silico to their predicted genome positions. This resulted in the unambiguous assignment of 3,315 out of 8,876 previously unmapped genes to the respective chromosomes. In total, the GenomeZipper incorporates 4,035 conserved grass gene loci, which were used for the first genome-wide sequence divergence analysis between perennial ryegrass, barley, Brachypodium, rice, and sorghum. The perennial ryegrass GenomeZipper is an ordered, information-rich genome scaffold, facilitating map-based cloning and genome assembly in perennial ryegrass and closely related Poaceae species. It also represents a milestone in describing synteny between perennial ryegrass and fully sequenced model grass genomes, thereby increasing our understanding of genome organization and evolution in the most important temperate forage and turf grass species.

  13. Comparative Genomics of Ethanolamine Utilization▿ † ‡

    PubMed Central

    Tsoy, Olga; Ravcheev, Dmitry; Mushegian, Arcady

    2009-01-01

    Ethanolamine can be used as a source of carbon and nitrogen by phylogenetically diverse bacteria. Ethanolamine-ammonia lyase, the enzyme that breaks ethanolamine into acetaldehyde and ammonia, is encoded by the gene tandem eutBC. Despite extensive studies of ethanolamine utilization in Salmonella enterica serovar Typhimurium, much remains to be learned about EutBC structure and catalytic mechanism, about the evolutionary origin of ethanolamine utilization, and about regulatory links between the metabolism of ethanolamine itself and the ethanolamine-ammonia lyase cofactor adenosylcobalamin. We used computational analysis of sequences, structures, genome contexts, and phylogenies of ethanolamine-ammonia lyases to address these questions and to evaluate recent data-mining studies that have suggested an association between bacterial food poisoning and the diol utilization pathways. We found that EutBC evolution included recruitment of a TIM barrel and a Rossmann fold domain and their fusion to N-terminal α-helical domains to give EutB and EutC, respectively. This fusion was followed by recruitment and occasional loss of auxiliary ethanolamine utilization genes in Firmicutes and by several horizontal transfers, most notably from the firmicute stem to the Enterobacteriaceae and from Alphaproteobacteria to Actinobacteria. We identified a conserved DNA motif that likely represents the EutR-binding site and is shared by the ethanolamine and cobalamin operons in several enterobacterial species, suggesting a mechanism for coupling the biosyntheses of apoenzyme and cofactor in these species. Finally, we found that the food poisoning phenotype is associated with the structural components of metabolosome more strongly than with ethanolamine utilization genes or with paralogous propanediol utilization genes per se. PMID:19783625

  14. The dog genome: survey sequencing and comparative analysis.

    PubMed

    Kirkness, Ewen F; Bafna, Vineet; Halpern, Aaron L; Levy, Samuel; Remington, Karin; Rusch, Douglas B; Delcher, Arthur L; Pop, Mihai; Wang, Wei; Fraser, Claire M; Venter, J Craig

    2003-09-26

    A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.

  15. What constitutes an Arabian Helicobacter pylori? Lessons from comparative genomics.

    PubMed

    Kumar, Narender; Albert, M John; Al Abkal, Hanan; Siddique, Iqbal; Ahmed, Niyaz

    2017-02-01

    Helicobacter pylori, the human gastric pathogen, causes a variety of gastric diseases ranging from mild gastritis to gastric cancer. While the studies on H. pylori are dominated by those based on either East Asian or Western strains, information regarding H. pylori strains prevalent in the Middle East remains scarce. Therefore, we carried out whole-genome sequencing and comparative analysis of three H. pylori strains isolated from three native Arab, Kuwaiti patients. H. pylori strains were sequenced using Illumina platform. The sequence reads were filtered and draft genomes were assembled and annotated. Various pathogenicity-associated regions and phages present within the genomes were identified. Phylogenetic analysis was carried out to determine the genetic relatedness of Kuwaiti strains to various lineages of H. pylori. The core genome content and virulence-related genes were analyzed to assess the pathogenic potential. The three genomes clustered along with HpEurope strains in the phylogenetic tree comprising various H. pylori lineages. A total of 1187 genes spread among various functional classes were identified in the core genome analysis. The three genomes possessed a complete cagPAI and also retained most of the known outer membrane proteins as well as virulence-related genes. The cagA gene in all three strains consisted of an AB-C type EPIYA motif. The comparative genomic analysis of Kuwaiti H. pylori strains revealed a European ancestry and a high pathogenic potential. © 2016 John Wiley & Sons Ltd.

  16. Comparative genomic paleontology across plant kingdom reveals the dynamics of TE-driven genome evolution.

    PubMed

    El Baidouri, Moaine; Panaud, Olivier

    2013-01-01

    Long terminal repeat-retrotransposons (LTR-RTs) are the most abundant class of transposable elements (TEs) in plants. They strongly impact the structure, function, and evolution of their host genome, and, in particular, their role in genome size variation has been clearly established. However, the dynamics of the process through which LTR-RTs have differentially shaped plant genomes is still poorly understood because of a lack of comparative studies. Using a new robust and automated family classification procedure, we exhaustively characterized the LTR-RTs in eight plant genomes for which a high-quality sequence is available (i.e., Arabidopsis thaliana, A. lyrata, grapevine, soybean, rice, Brachypodium dystachion, sorghum, and maize). This allowed us to perform a comparative genome-wide study of the retrotranspositional landscape in these eight plant lineages from both monocots and dicots. We show that retrotransposition has recurrently occurred in all plant genomes investigated, regardless their size, and through bursts, rather than a continuous process. Moreover, in each genome, only one or few LTR-RT families have been active in the recent past, and the difference in genome size among the species studied could thus mostly be accounted for by the extent of the latest transpositional burst(s). Following these bursts, LTR-RTs are efficiently eliminated from their host genomes through recombination and deletion, but we show that the removal rate is not lineage specific. These new findings lead us to propose a new model of TE-driven genome evolution in plants.

  17. IMGD: an integrated platform supporting comparative genomics and phylogenetics of insect mitochondrial genomes

    PubMed Central

    Lee, Wonhoon; Park, Jongsun; Choi, Jaeyoung; Jung, Kyongyong; Park, Bongsoo; Kim, Donghan; Lee, Jaeyoung; Ahn, Kyohun; Song, Wonho; Kang, Seogchan; Lee, Yong-Hwan; Lee, Seunghwan

    2009-01-01

    Background Sequences and organization of the mitochondrial genome have been used as markers to investigate evolutionary history and relationships in many taxonomic groups. The rapidly increasing mitochondrial genome sequences from diverse insects provide ample opportunities to explore various global evolutionary questions in the superclass Hexapoda. To adequately support such questions, it is imperative to establish an informatics platform that facilitates the retrieval and utilization of available mitochondrial genome sequence data. Results The Insect Mitochondrial Genome Database (IMGD) is a new integrated platform that archives the mitochondrial genome sequences from 25,747 hexapod species, including 112 completely sequenced and 20 nearly completed genomes and 113,985 partially sequenced mitochondrial genomes. The Species-driven User Interface (SUI) of IMGD supports data retrieval and diverse analyses at multi-taxon levels. The Phyloviewer implemented in IMGD provides three methods for drawing phylogenetic trees and displays the resulting trees on the web. The SNP database incorporated to IMGD presents the distribution of SNPs and INDELs in the mitochondrial genomes of multiple isolates within eight species. A newly developed comparative SNU Genome Browser supports the graphical presentation and interactive interface for the identified SNPs/INDELs. Conclusion The IMGD provides a solid foundation for the comparative mitochondrial genomics and phylogenetics of insects. All data and functions described here are available at the web site . PMID:19351385

  18. Evolutionary insights into scleractinian corals using comparative genomic hybridizations

    PubMed Central

    2012-01-01

    Background Coral reefs belong to the most ecologically and economically important ecosystems on our planet. Yet, they are under steady decline worldwide due to rising sea surface temperatures, disease, and pollution. Understanding the molecular impact of these stressors on different coral species is imperative in order to predict how coral populations will respond to this continued disturbance. The use of molecular tools such as microarrays has provided deep insight into the molecular stress response of corals. Here, we have performed comparative genomic hybridizations (CGH) with different coral species to an Acropora palmata microarray platform containing 13,546 cDNA clones in order to identify potentially rapidly evolving genes and to determine the suitability of existing microarray platforms for use in gene expression studies (via heterologous hybridization). Results Our results showed that the current microarray platform for A. palmata is able to provide biological relevant information for a wide variety of coral species covering both the complex clade as well the robust clade. Analysis of the fraction of highly diverged genes showed a significantly higher amount of genes without annotation corroborating previous findings that point towards a higher rate of divergence for taxonomically restricted genes. Among the genes with annotation, we found many mitochondrial genes to be highly diverged in M. faveolata when compared to A. palmata, while the majority of nuclear encoded genes maintained an average divergence rate. Conclusions The use of present microarray platforms for transcriptional analyses in different coral species will greatly enhance the understanding of the molecular basis of stress and health and highlight evolutionary differences between scleractinian coral species. On a genomic basis, we show that cDNA arrays can be used to identify patterns of divergence. Mitochondrion-encoded genes seem to have diverged faster than nuclear encoded genes in robust

  19. Genome engineering and gene expression control for bacterial strain development.

    PubMed

    Song, Chan Woo; Lee, Joungmin; Lee, Sang Yup

    2015-01-01

    In recent years, a number of techniques and tools have been developed for genome engineering and gene expression control to achieve desired phenotypes of various bacteria. Here we review and discuss the recent advances in bacterial genome manipulation and gene expression control techniques, and their actual uses with accompanying examples. Genome engineering has been commonly performed based on homologous recombination. During such genome manipulation, the counterselection systems employing SacB or nucleases have mainly been used for the efficient selection of desired engineered strains. The recombineering technology enables simple and more rapid manipulation of the bacterial genome. The group II intron-mediated genome engineering technology is another option for some bacteria that are difficult to be engineered by homologous recombination. Due to the increasing demands on high-throughput screening of bacterial strains having the desired phenotypes, several multiplex genome engineering techniques have recently been developed and validated in some bacteria. Another approach to achieve desired bacterial phenotypes is the repression of target gene expression without the modification of genome sequences. This can be performed by expressing antisense RNA, small regulatory RNA, or CRISPR RNA to repress target gene expression at the transcriptional or translational level. All of these techniques allow efficient and rapid development and screening of bacterial strains having desired phenotypes, and more advanced techniques are expected to be seen. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  20. Mycobacterial species as case-study of comparative genome analysis.

    PubMed

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  1. Evolutionary and comparative analyses of the soybean genome

    PubMed Central

    Cannon, Steven B.; Shoemaker, Randy C.

    2012-01-01

    The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483

  2. Comparative genomics of insect juvenile hormone biosynthesis⋆

    PubMed Central

    Noriega, F.G.; Ribeiro, J.M.C.; Koener, J.F.; Valenzuela, J.G.; Hernandez-Martinez, S.; Pham, V.M.; Feyereisen, R.

    2009-01-01

    The biosynthesis of insect juvenile hormone (JH) and its neuroendocrine control are attractive targets for chemical control of insect pests and vectors of disease. To facilitate the molecular study of JH biosynthesis, we analyzed ESTs from the glands producing JH, the corpora allata (CA) in the cockroach Diploptera punctata, an insect long used as a physiological model species and compared them with ESTs from the CA of the mosquitoes Aedes aegypti and Anopheles albimanus. The predicted genes were analyzed according to their probable functions with the Gene Ontology classification, and compared to Drosophila and Anopheles gambiae genes. A large number of reciprocal matches in the cDNA libraries of cockroach and mosquito CA were found. These matches defined known and suspected enzymes of the JH biosynthetic pathway, but also several proteins associated with signal transduction that might play a role in the modulation of JH synthesis by neuropeptides. The identification in both cockroach and mosquito CA of homologs of the small ligand binding proteins from insects, Takeout/JH binding protein and retinol-binding protein highlights a hitherto unsuspected complexity of metabolite trafficking, perhaps JH precursor trafficking, in these endocrine glands. Furthermore, many reciprocal matches for genes of unknown function may provide a fertile ground for an in-depth study of allatal-specific cell physiology. PMID:16551550

  3. Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

    PubMed

    Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

    2017-05-22

    Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.

  4. Comparing Vertebrate Whole-Genome Shotgun Reads to the Human Genome

    PubMed Central

    Chen, Rui; Bouck, John B.; Weinstock, George M.; Gibbs, Richard A.

    2001-01-01

    Multi-species sequence comparisons are a very efficient way to reveal conserved genes. Because sequence finishing is expensive and time consuming, many genome sequences are likely to stay incomplete. A challenge is to use these fragmented data for understanding the human genome. Methods for using cross-species whole-genome shotgun sequence (WGS) for genome annotation are described in this paper. About one-half million high-quality rat WGS reads (covering 7.5% of the rat genome) generated at the Baylor College of Medicine Human Genome Sequencing Center were compared with the human genome. Using computer-generated random reads as a negative control, a set of parameters was determined for reliable interpretation of BLAST search results. About 10% of the rat reads contain regions that are conserved in the human genomic sequence and about one-third of these include known gene-coding regions. Mapping the conserved regions to human chromosomes showed a 23-fold enrichment for coding regions compared with noncoding regions. This approach can also be applied to other mammalian genomes for gene finding. These data predicted ∼42,500 genes in the human, slightly more than reported previously. PMID:11691844

  5. Comparative genomics of vesicomyid clam (Bivalvia: Mollusca) chemosynthetic symbionts

    PubMed Central

    Newton, Irene LG; Girguis, Peter R; Cavanaugh, Colleen M

    2008-01-01

    Background The Vesicomyidae (Bivalvia: Mollusca) are a family of clams that form symbioses with chemosynthetic gamma-proteobacteria. They exist in environments such as hydrothermal vents and cold seeps and have a reduced gut and feeding groove, indicating a large dependence on their endosymbionts for nutrition. Recently, two vesicomyid symbiont genomes were sequenced, illuminating the possible nutritional contributions of the symbiont to the host and making genome-wide evolutionary analyses possible. Results To examine the genomic evolution of the vesicomyid symbionts, a comparative genomics framework, including the existing genomic data combined with heterologous microarray hybridization results, was used to analyze conserved gene content in four vesicomyid symbiont genomes. These four symbionts were chosen to include a broad phylogenetic sampling of the vesicomyid symbionts and represent distinct chemosynthetic environments: cold seeps and hydrothermal vents. Conclusion The results of this comparative genomics analysis emphasize the importance of the symbionts' chemoautotrophic metabolism within their hosts. The fact that these symbionts appear to be metabolically capable autotrophs underscores the extent to which the host depends on them for nutrition and reveals the key to invertebrate colonization of these challenging environments. PMID:19055818

  6. Comparative Genomics of a Parthenogenesis-Inducing Wolbachia Symbiont

    PubMed Central

    Lindsey, Amelia R. I.; Werren, John H.; Richards, Stephen; Stouthamer, Richard

    2016-01-01

    Wolbachia is an intracellular symbiont of invertebrates responsible for inducing a wide variety of phenotypes in its host. These host-Wolbachia relationships span the continuum from reproductive parasitism to obligate mutualism, and provide a unique system to study genomic changes associated with the evolution of symbiosis. We present the genome sequence from a parthenogenesis-inducing Wolbachia strain (wTpre) infecting the minute parasitoid wasp Trichogramma pretiosum. The wTpre genome is the most complete parthenogenesis-inducing Wolbachia genome available to date. We used comparative genomics across 16 Wolbachia strains, representing five supergroups, to identify a core Wolbachia genome of 496 sets of orthologous genes. Only 14 of these sets are unique to Wolbachia when compared to other bacteria from the Rickettsiales. We show that the B supergroup of Wolbachia, of which wTpre is a member, contains a significantly higher number of ankyrin repeat-containing genes than other supergroups. In the wTpre genome, there is evidence for truncation of the protein coding sequences in 20% of ORFs, mostly as a result of frameshift mutations. The wTpre strain represents a conversion from cytoplasmic incompatibility to a parthenogenesis-inducing lifestyle, and is required for reproduction in the Trichogramma host it infects. We hypothesize that the large number of coding frame truncations has accompanied the change in reproductive mode of the wTpre strain. PMID:27194801

  7. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    PubMed

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  8. Comparative genomic analysis of eutherian interferon-γ-inducible GTPases.

    PubMed

    Premzl, Marko

    2012-11-01

    The interferon-γ-inducible GTPases, IFGGs, are intracellular proteins involved in immune response against pathogens. A comprehensive comparative genomic review and analysis of eutherian IFGGs was carried out using public genomic sequences. The 64 eutherian IFGG genes were examined in detail and annotated. The eutherian IFGG promoter types were first catalogued followed by a phylogenetic analysis of eutherian IFGGs, which described five major IFGG clusters. The patterns of differential gene expansions and protein regions that may regulate IFGG catalytic features suggested a new classification of eutherian IFGGs. This mini-review has also provided new tests of reliability of public genomic sequences as well as tests of protein molecular evolution.

  9. Comparative Genome Analysis of Basidiomycete Fungi

    SciTech Connect

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  10. Comparative genomics of first available bovine Anaplasma phagocytophilum genome obtained with targeted sequence capture.

    PubMed

    Dugat, Thibaud; Loux, Valentin; Marthey, Sylvain; Moroldo, Marco; Lagrée, Anne-Claire; Boulouis, Henri-Jean; Haddad, Nadia; Maillard, Renaud

    2014-11-17

    Anaplasma phagocytophilum is a zoonotic and obligate intracellular bacterium transmitted by ticks. In domestic ruminants, it is the causative agent of tick-borne fever, which causes significant economic losses in Europe. As A. phagocytophilum is difficult to isolate and cultivate, only nine genome sequences have been published to date, none of which originate from a bovine strain.Our goals were to; 1/ develop a sequencing methodology which efficiently circumvents the difficulties associated with A. phagocytophilum isolation and culture; 2/ describe the first genome of a bovine strain; and 3/ compare it with available genomes, in order to both explore key genomic features at the species level, and to identify candidate genes that could be specific to bovine strains. DNA was extracted from a bovine blood sample infected by A. phagocytophilum. Following a whole genome capture approach, A. phagocytophilum DNA was enriched 197-fold in the sample and then sequenced using Illumina technology. In total, 58.9% of obtained reads corresponded to the A. phagocytophilum genome, covering 85.3% of the HZ genome. Then by performing comparisons with nine previously-sequenced A. phagocytophilum genomes, we determined the core genome of these ten strains. Following analysis, 1281 coding DNA sequences, including 1001 complete sequences, were detected in the A. phagocytophilum bovine genome, of which four appeared to be unique to the bovine isolate. These four coding DNA sequences coded for "hypothetical proteins of unknown function" and require further analysis. We also identified nine proteins common to both European domestic ruminants tested. Using a whole genome capture approach, we have sequenced the first A. phagocytophilum genome isolated from a cow. To the best of our knowledge, this is the first time that this method has been used to selectively enrich pathogenic bacterial DNA from samples also containing host DNA. The four proteins unique to the A. phagocytophilum bovine

  11. Genomic and comparative genomic analyses of Rickettsia heilongjiangensis provide insight into its evolution and pathogenesis.

    PubMed

    Duan, Changsong; Xiong, Xiaolu; Qi, Yong; Gong, Wenping; Jiao, Jun; Wen, Bohai

    2014-08-01

    Rickettsia heilongjiangensis, the causative agent of far eastern spotted fever, is an obligate intracellular gram-negative bacterium that belongs to the spotted fever group rickettsiae. To understand the evolution and pathogenesis of R. heilongjiangensis, we analyzed its genome and compared it with other rickettsial genomes available in GenBank. The R. heilongjiangensis chromosome contains 1333 genes, including 1297 protein coding genes and 36 RNA coding genes. The genome also contains 121 pseudogenes, 54 insertion sequences, and 39 tandem repeats. Sixteen genes encoding the major components of the type IV secretion systems were identified in the R. heilongjiangensis genome. In total, 37 β-barrel outer membrane proteins were predicted in the genome, eight of which have been previously confirmed to be outer membrane proteins. In addition, 266 potential virulence factor genes, seven partially deleted antibiotic resistance genes, and a genomic island were identified in the genome. The codon usage in the genome is compatible with its low GC content, and the amino acid usage shows apparent bias. A comparative genomic analysis showed that R. heilongjiangensis and R. japonica share one unique fragment that may be a target sequence for a diagnostic assay. The orthologs of 37 genes of R. heilongjiangensis were found in pathogenic R. rickettsii str. Sheila Smith but not in non-pathogenic R. rickettsii str. Iowa, which may explain why R. heilongjiangensis is pathogenic. Pan-genome analysis showed that R. heilongjiangensis and 42 other rickettsiae strains share 693 core genes with a pan-genome size of 4837 genes. The pan-genome-based phylogeny showed that R. heilongjiangensis was closely related to R. japonica.

  12. Lactobacillus paracasei comparative genomics: towards species pan-genome definition and exploitation of diversity.

    PubMed

    Smokvina, Tamara; Wels, Michiel; Polka, Justyna; Chervaux, Christian; Brisse, Sylvain; Boekhorst, Jos; van Hylckama Vlieg, Johan E T; Siezen, Roland J

    2013-01-01

    Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to link

  13. Sputnik: a database platform for comparative plant genomics.

    PubMed

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F X

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics.

  14. The MicrobesOnline Web site for comparative genomics

    SciTech Connect

    Alm, Eric J.; Huang, Katherine H.; Price, Morgan N.; Koche,Richard P.; Keller, Keith; Dubchak, Inna L.; Arkin, Adam P.

    2004-11-05

    At present, hundreds of microbial genomes have been sequenced, and hundreds more are currently in the pipeline. The Virtual Institute for Microbial Stress and Survival has developed a publicly available suite of Web-based comparative genomic tools (http://www.microbesonline.org) designed to facilitate multispecies comparison among prokaryotes. Highlights of the Microbes Online Web site include operon and regulon predictions, a multispecies genome browser, a multispecies Gene Ontology browser, a comparative KEGG metabolic pathway viewer, a Bioinformatics Workbench for in-depth sequence analysis, and Gene Carts that allow users to save genes of interest for further study while they browse. In addition, we provide an interface for genome annotation, which like all of the tools reported here, is freely available to the scientific community.

  15. Comparative Genomics via Wavelet Analysis for Closely Related Bacteria

    NASA Astrophysics Data System (ADS)

    Song, Jiuzhou; Ware, Tony; Liu, Shu-Lin; Surette, M.

    2004-12-01

    Comparative genomics has been a valuable method for extracting and extrapolating genome information among closely related bacteria. The efficiency of the traditional methods is extremely influenced by the software method used. To overcome the problem here, we propose using wavelet analysis to perform comparative genomics. First, global comparison using wavelet analysis gives the difference at a quantitative level. Then local comparison using keto-excess or purine-excess plots shows precise positions of inversions, translocations, and horizontally transferred DNA fragments. We firstly found that the level of energy spectra difference is related to the similarity of bacteria strains; it could be a quantitative index to describe the similarities of genomes. The strategy is described in detail by comparisons of closely related strains: S.typhi CT18, S.typhi Ty2, S.typhimurium LT2, H.pylori 26695, and H.pylori J99.

  16. Comparative genomics of transcriptional regulation of methionine metabolism in Proteobacteria.

    PubMed

    Leyn, Semen A; Suvorova, Inna A; Kholina, Tatiana D; Sherstneva, Sofia S; Novichkov, Pavel S; Gelfand, Mikhail S; Rodionov, Dmitry A

    2014-01-01

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ∼ 200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.

  17. Sputnik: a database platform for comparative plant genomics

    PubMed Central

    Rudd, Stephen; Mewes, Hans-Werner; Mayer, Klaus F.X.

    2003-01-01

    Two million plant ESTs, from 20 different plant species, and totalling more than one 1000 Mbp of DNA sequence, represents a formidable transcriptomic resource. Sputnik uses the potential of this sequence resource to fill some of the information gap in the un-sequenced plant genomes and to serve as the foundation for in silicio comparative plant genomics. The complexity of the individual EST collections has been reduced using optimised EST clustering techniques. Annotation of cluster sequences is performed by exploiting and transferring information from the comprehensive knowledgebase already produced for the completed model plant genome (Arabidopsis thaliana) and by performing additional state of-the-art sequence analyses relevant to today's plant biologist. Functional predictions, comparative analyses and associative annotations for 500 000 plant EST derived peptides make Sputnik (http://mips.gsf.de/proj/sputnik/) a valid platform for contemporary plant genomics. PMID:12519965

  18. PSAT: A web tool to compare genomic neighborhoods of multiple prokaryotic genomes

    PubMed Central

    Fong, Christine; Rohmer, Laurence; Radey, Matthew; Wasnick, Michael; Brittnacher, Mitchell J

    2008-01-01

    Background The conservation of gene order among prokaryotic genomes can provide valuable insight into gene function, protein interactions, or events by which genomes have evolved. Although some tools are available for visualizing and comparing the order of genes between genomes of study, few support an efficient and organized analysis between large numbers of genomes. The Prokaryotic Sequence homology Analysis Tool (PSAT) is a web tool for comparing gene neighborhoods among multiple prokaryotic genomes. Results PSAT utilizes a database that is preloaded with gene annotation, BLAST hit results, and gene-clustering scores designed to help identify regions of conserved gene order. Researchers use the PSAT web interface to find a gene of interest in a reference genome and efficiently retrieve the sequence homologs found in other bacterial genomes. The tool generates a graphic of the genomic neighborhood surrounding the selected gene and the corresponding regions for its homologs in each comparison genome. Homologs in each region are color coded to assist users with analyzing gene order among various genomes. In contrast to common comparative analysis methods that filter sequence homolog data based on alignment score cutoffs, PSAT leverages gene context information for homologs, including those with weak alignment scores, enabling a more sensitive analysis. Features for constraining or ordering results are designed to help researchers browse results from large numbers of comparison genomes in an organized manner. PSAT has been demonstrated to be useful for helping to identify gene orthologs and potential functional gene clusters, and detecting genome modifications that may result in loss of function. Conclusion PSAT allows researchers to investigate the order of genes within local genomic neighborhoods of multiple genomes. A PSAT web server for public use is available for performing analyses on a growing set of reference genomes through any web browser with no client

  19. Alfresco—A Workbench for Comparative Genomic Sequence Analysis

    PubMed Central

    Jareborg, Niclas; Durbin, Richard

    2000-01-01

    Comparative analysis of genomic sequences provides a powerful tool for identifying regions of potential biologic function; by comparing corresponding regions of genomes from suitable species, protein coding or regulatory regions can be identified by their homology. This requires the use of several specific types of computational analysis tools. Many programs exist for these types of analysis; not many exist for overall view/control of the results, which is necessary for large-scale genomic sequence analysis. Using Java, we have developed a new visualization tool that allows effective comparative genome sequence analysis. The program handles a pair of sequences from putatively homologous regions in different species. Results from various different existing external analysis programs, such as database searching, gene prediction, repeat masking, and alignment programs, are visualized and used to find corresponding functional sequence domains in the two sequences. The user interacts with the program through a graphic display of the genome regions, in which an independently scrollable and zoomable symbolic representation of the sequences is shown. As an example, the analysis of two unannotated orthologous genomic sequences from human and mouse containing parts of the UTY locus is presented. PMID:10958633

  20. Kiwifruit Information Resource (KIR): a comparative platform for kiwifruit genomics

    PubMed Central

    Yue, Junyang; Liu, Jian; Ban, Rongjun; Tang, Wei; Deng, Lin; Fei, Zhangjun; Liu, Yongsheng

    2015-01-01

    The Kiwifruit Information Resource (KIR) is dedicated to maintain and integrate comprehensive datasets on genomics, functional genomics and transcriptomics of kiwifruit (Actinidiaceae). KIR serves as a central access point for existing/new genomic and genetic data. KIR also provides researchers with a variety of visualization and analysis tools. Current developments include the updated genome structure of Actinidia chinensis cv. Hongyang and its newest genome annotation, putative transcripts, gene expression, physical markers of genetic traits as well as relevant publications based on the latest genome assembly. Nine thousand five hundred and forty-seven new transcripts are detected and 21 132 old transcripts are changed. At the present release, the next-generation transcriptome sequencing data has been incorporated into gene models and splice variants. Protein–protein interactions are also identified based on experimentally determined orthologous interactions. Furthermore, the experimental results reported in peer-reviewed literature are manually extracted and integrated within a well-developed query page. In total, 122 identifications are currently associated, including commonly used gene names and symbols. All KIR datasets are helpful to facilitate a broad range of kiwifruit research topics and freely available to the research community. Database URL: http://bdg.hfut.edu.cn/kir/index.html. PMID:26656885

  1. Kiwifruit Information Resource (KIR): a comparative platform for kiwifruit genomics.

    PubMed

    Yue, Junyang; Liu, Jian; Ban, Rongjun; Tang, Wei; Deng, Lin; Fei, Zhangjun; Liu, Yongsheng

    2015-01-01

    The Kiwifruit Information Resource (KIR) is dedicated to maintain and integrate comprehensive datasets on genomics, functional genomics and transcriptomics of kiwifruit (Actinidiaceae). KIR serves as a central access point for existing/new genomic and genetic data. KIR also provides researchers with a variety of visualization and analysis tools. Current developments include the updated genome structure of Actinidia chinensis cv. Hongyang and its newest genome annotation, putative transcripts, gene expression, physical markers of genetic traits as well as relevant publications based on the latest genome assembly. Nine thousand five hundred and forty-seven new transcripts are detected and 21 132 old transcripts are changed. At the present release, the next-generation transcriptome sequencing data has been incorporated into gene models and splice variants. Protein-protein interactions are also identified based on experimentally determined orthologous interactions. Furthermore, the experimental results reported in peer-reviewed literature are manually extracted and integrated within a well-developed query page. In total, 122 identifications are currently associated, including commonly used gene names and symbols. All KIR datasets are helpful to facilitate a broad range of kiwifruit research topics and freely available to the research community. Database URL: http://bdg.hfut.edu.cn/kir/index.html.

  2. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes

    SciTech Connect

    Gupta, Nitin; Benhamida, Jamal; Bhargava, Vipul; Goodman, Daniel; Kain , Elisabeth; Kerman, Ian; Nguyen , Ngan; Ollikainen, Noah; Rodriguez, Jesse; Wang, J.; Lipton, Mary S.; Romine, Margaret F.; Bafna, Vineet; Smith, Richard D.; Pevzner, Pavel A.

    2008-07-30

    While bacterial genome annotations have significantly improved in recent years, techniques for bacterial proteome annotation (including post-translational chemical modifications, signal peptides, proteolytic events, etc.) are still in their infancy. At the same time, the number of sequenced bacterial genomes is rising sharply, far outpacing our ability to validate the predicted genes, let alone annotate bacterial proteomes. In this study, we use tandem mass spectrometry (MS/MS) to annotate the proteome of Shewanella oneidensis MR-1, an important microbe for bioremediation. In particular, we provide the first comprehensive map of post-translational modifications in a bacterial genome, including a large number of chemical modifications, signal peptide cleavages and cleavage of N-terminal methionine residues. We also detect multiple genes that were missed or assigned incorrect start positions by gene prediction programs and suggest corrections to improve the gene annotation. This study demonstrates that complementing every genome sequencing project by an MS/MS project would significantly improve both genome and proteome annotations for a reasonable cost.

  3. Mammalian Comparative Genomics Reveals Genetic and Epigenetic Features Associated with Genome Reshuffling in Rodentia

    PubMed Central

    Capilla, Laia; Sánchez-Guillén, Rosa Ana; Farré, Marta; Paytuví-Gallart, Andreu; Malinverni, Roberto; Ventura, Jacint; Larkin, Denis M.

    2016-01-01

    Abstract Understanding how mammalian genomes have been reshuffled through structural changes is fundamental to the dynamics of its composition, evolutionary relationships between species and, in the long run, speciation. In this work, we reveal the evolutionary genomic landscape in Rodentia, the most diverse and speciose mammalian order, by whole-genome comparisons of six rodent species and six representative outgroup mammalian species. The reconstruction of the evolutionary breakpoint regions across rodent phylogeny shows an increased rate of genome reshuffling that is approximately two orders of magnitude greater than in other mammalian species here considered. We identified novel lineage and clade-specific breakpoint regions within Rodentia and analyzed their gene content, recombination rates and their relationship with constitutive lamina genomic associated domains, DNase I hypersensitivity sites and chromatin modifications. We detected an accumulation of protein-coding genes in evolutionary breakpoint regions, especially genes implicated in reproduction and pheromone detection and mating. Moreover, we found an association of the evolutionary breakpoint regions with active chromatin state landscapes, most probably related to gene enrichment. Our results have two important implications for understanding the mechanisms that govern and constrain mammalian genome evolution. The first is that the presence of genes related to species-specific phenotypes in evolutionary breakpoint regions reinforces the adaptive value of genome reshuffling. Second, that chromatin conformation, an aspect that has been often overlooked in comparative genomic studies, might play a role in modeling the genomic distribution of evolutionary breakpoints. PMID:28175287

  4. Comparative genomics of phages and prophages in lactic acid bacteria.

    PubMed

    Desiere, Frank; Lucchini, Sacha; Canchaya, Carlos; Ventura, Marco; Brüssow, Harald

    2002-08-01

    Comparative phage genomics has become possible due to the availability of more than 100 complete phage genome sequences and the development of powerful bioinformatics tools. This technology, profiting from classical molecular-biology knowledge, has opened avenues of research for topics, which were difficult to address in the past. Now, it is possible to retrace part of the evolutionary history of phage modules by comparative genomics. The diagnosis of relatedness is hereby not uniquely based on sequence similarity alone, but includes topological considerations of genome organization. Detailed transcription maps have allowed in silico predictions of genome organization to be verified and refined. This comparative knowledge is providing the basis for a new taxonomic classification concept for bacteriophages infecting low G + C-content Gram-positive bacteria based on the genetic organization of the structural gene module. An Sfi21-like and an Sfi11-like genus of Siphoviridae is proposed. The gene maps of many phages show remarkable synteny in their structural genes defining a lambda super-group within Siphoviridae. A hierarchy of relatedness within the lambda super-group suggests elements of vertical evolution in Siphoviridae. Tailed phages are the result of both vertical and horizontal evolution and are thus fascinating objects for the study of molecular evolution. Prophage sequences integrated into the genomes of their bacterial host present theoretical challenges for evolutionary biologists. Prophages represent up to 10% of the genome in some LAB. In pathogenic streptococci prophages confer genes of selective value for the lysogenic cell. The lysogenic conversion genes are located between the lysin gene and the right phage attachment site. Non-attributed genes were found at the same genome position of prophages from lactic streptococci. These genes belong to the few prophage genes transcribed in the lysogen. Prophages from dairy bacteria might therefore also

  5. SUPERFAMILY--sophisticated comparative genomics, data mining, visualization and phylogeny.

    PubMed

    Wilson, Derek; Pethica, Ralph; Zhou, Yiduo; Talbot, Charles; Vogel, Christine; Madera, Martin; Chothia, Cyrus; Gough, Julian

    2009-01-01

    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.

  6. SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny

    PubMed Central

    Wilson, Derek; Pethica, Ralph; Zhou, Yiduo; Talbot, Charles; Vogel, Christine; Madera, Martin; Chothia, Cyrus; Gough, Julian

    2009-01-01

    SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site. PMID:19036790

  7. Comparative Genomic and Phylogenomic Analyses Reveal a Conserved Core Genome Shared by Estuarine and Oceanic Cyanopodoviruses.

    PubMed

    Huang, Sijun; Zhang, Si; Jiao, Nianzhi; Chen, Feng

    2015-01-01

    Podoviruses are among the major viral groups that infect marine picocyanobacteria Prochlorococcus and Synechococcus. Here, we reported the genome sequences of five Synechococcus podoviruses isolated from the estuarine environment, and performed comparative genomic and phylogenomic analyses based on a total of 20 cyanopodovirus genomes. The genomes of all the known marine cyanopodoviruses are highly syntenic. A pan-genome of 349 clustered orthologous groups was determined, among which 15 were core genes. These core genes make up nearly half of each genome in length, reflecting the high level of genome conservation among this cyanophage type. The whole genome phylogenies based on concatenated core genes and gene content were highly consistent and confirmed the separation of two discrete marine cyanopodovirus clusters MPP-A and MPP-B. The genomes within cluster MPP-B grouped into subclusters mainly corresponding to Prochlorococcus or Synechococcus host types. Auxiliary metabolic genes tend to occur in a specific phylogenetic group of these cyanopodoviruses. All the MPP-B phages analyzed here encode the photosynthesis gene psbA, which are absent in all the MPP-A genomes thus far. Interestingly, all the MPP-B and two MPP-A Synechococcus podoviruses encode the thymidylate synthase gene thyX, while at the same genome locus all the MPP-B Prochlorococcus podoviruses encode the transaldolase gene talC. Both genes are hypothesized to have the potential to facilitate the biosynthesis of deoxynucleotide for phage replication. Inheritance of specific functional genes could be important to the evolution and ecological fitness of certain cyanophage genotypes. Our analyses demonstrate that cyanopodoviruses of estuarine and oceanic origins share a conserved core genome and suggest that accessory genes may be related to environmental adaptation.

  8. Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics

    PubMed Central

    Linard, Benjamin; Crampton-Platt, Alex; Gillett, Conrad P.D.T.; Timmermans, Martijn J.T.N.; Vogler, Alfried P.

    2015-01-01

    Metagenomic analyses are challenging in metazoans, but high-copy number and repeat regions can be assembled from low-coverage sequencing by “genome skimming,” which is applied here as a new way of characterizing metagenomes obtained in an ecological or taxonomic context. Illumina shotgun sequencing on two pools of Coleoptera (beetles) of approximately 200 species each were assembled into tens of thousands of scaffolds. Repeated low-coverage sequencing recovered similar scaffold sets consistently, although approximately 70% of scaffolds could not be identified against existing genome databases. Identifiable scaffolds included mitochondrial DNA, conserved sequences with hits to expressed sequence tag and protein databases, and known repeat elements of high and low complexity, including numerous copies of rRNA and histone genes. Assemblies of histones captured a diversity of gene order and primary sequence in Coleoptera. Scaffolds with similarity to multiple sites in available coleopteran genome sequences for Dendroctonus and Tribolium revealed high specificity of scaffolds to either of these genomes, in particular for high-copy number repeats. Numerous “clusters” of scaffolds mapped to the same genomic site revealed intra- and/or intergenomic variation within a metagenome pool. In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear. Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts. The “metagenome skimming” approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics. PMID:25979752

  9. PGSB PlantsDB: updates to the database framework for comparative plant genome research

    PubMed Central

    Spannagl, Manuel; Nussbaumer, Thomas; Bader, Kai C.; Martis, Mihaela M.; Seidel, Michael; Kugler, Karl G.; Gundlach, Heidrun; Mayer, Klaus F.X.

    2016-01-01

    PGSB (Plant Genome and Systems Biology: formerly MIPS) PlantsDB (http://pgsb.helmholtz-muenchen.de/plant/index.jsp) is a database framework for the comparative analysis and visualization of plant genome data. The resource has been updated with new data sets and types as well as specialized tools and interfaces to address user demands for intuitive access to complex plant genome data. In its latest incarnation, we have re-worked both the layout and navigation structure and implemented new keyword search options and a new BLAST sequence search functionality. Actively involved in corresponding sequencing consortia, PlantsDB has dedicated special efforts to the integration and visualization of complex triticeae genome data, especially for barley, wheat and rye. We enhanced CrowsNest, a tool to visualize syntenic relationships between genomes, with data from the wheat sub-genome progenitor Aegilops tauschii and added functionality to the PGSB RNASeqExpressionBrowser. GenomeZipper results were integrated for the genomes of barley, rye, wheat and perennial ryegrass and interactive access is granted through PlantsDB interfaces. Data exchange and cross-linking between PlantsDB and other plant genome databases is stimulated by the transPLANT project (http://transplantdb.eu/). PMID:26527721

  10. Piggy-BACing the human genome I: constructing a porcine BAC physical map through comparative genomics.

    PubMed

    Rogatcheva, Margarita B; Chen, Kefei; Larkin, Denis M; Meyers, Stacey N; Marron, Brandy M; He, Weisong; Schook, Lawrence B; Beever, Jonathan E

    2008-01-01

    Availability of the human genome sequence and high similarity between humans and pigs at the molecular level provides an opportunity to use a comparative mapping approach to piggy-BAC the human genome. In order to advance the pig genome sequencing initiative, sequence similarity between large-scale porcine BAC-end sequences (BESs) and human genome sequence was used to construct a comparatively-anchored porcine physical map that is a first step towards sequencing the pig genome. A total of 50,300 porcine BAC clones were end-sequenced, yielding 76,906 BESs after trimming with an average read length of 538 bp. To anchor the porcine BACs on the human genome, these BESs were subjected to BLAST analysis using the human draft sequence, revealing 31.5% significant hits (E < e(-5)). Both genic and non-genic regions of homology contributed to the alignments between the human and porcine genomes. Porcine BESs with unique homology matches within the human genome provided a source of markers spaced approximately 70 to 300 kb along each human chromosome. In order to evaluate the utility of piggy-BACing human genome sequences, and confirm predictions of orthology, 193 evenly spaced BESs with similarity to HSA3 and HSA21 were selected and then utilized for developing a high-resolution (1.22 Mb) comparative radiation hybrid map of SSC13 that represents a fusion of HSA3 and HSA21. Resulting RH mapping of SSC13 covers 99% and 97% of HSA3 and HSA21, respectively. Seven evolutionary conserved blocks were identified including six on HSA3 and a single syntenic block corresponding to HSA21. The strategy of piggy-BACing the human genome described in this study demonstrates that through a directed, targeted comparative genomics approach construction of a high-resolution anchored physical map of the pig genome can be achieved. This map supports the selection of BACs to construct a minimal tiling path for genome sequencing and targeted gap filling. Moreover, this approach is highly relevant

  11. The Absence of the Transcription Factor Yrr1p, Identified from Comparative Genome Profiling, Increased Vanillin Tolerance Due to Enhancements of ABC Transporters Expressing, rRNA Processing and Ribosome Biogenesis in Saccharomyces cerevisiae

    PubMed Central

    Wang, Xinning; Liang, Zhenzhen; Hou, Jin; Shen, Yu; Bao, Xiaoming

    2017-01-01

    Enhancing the tolerance of Saccharomyces cerevisiae to inhibitors derived from lignocellulose is conducive to producing biofuel and chemicals using abundant lignocellulosic materials. Vanillin is a major type of phenolic inhibitor in lignocellulose hydrolysates for S. cerevisiae. In the present work, the factors beneficial to vanillin resistance in yeast were identified from the vanillin-resistant strain EMV-8, which was derived from strain NAN-27 by adaptive evolution. We found 450 SNPs and 44 genes with InDels in the vanillin-tolerant strain EMV-8 by comparing the genome sequences of EMV-8 and NAN-27. To investigate the effects of InDels, InDels were deleted in BY4741, respectively. We demonstrated that the deletion of YRR1 improved vanillin tolerance of strain. In the presence of 6 mM vanillin, deleting YRR1 increase the maximum specific growth rate and the vanillin consumption rate by 142 and 51%, respectively. The subsequent transcriptome analysis revealed that deleting YRR1 resulted in changed expression of over 200 genes in the presence of 5 mM vanillin. The most marked changes were the significant up-regulation of the dehydrogenase ADH7, several ATP-binding cassette (ABC) transporters, and dozens of genes involved in ribosome biogenesis and rRNA processing. Coincidently, the crude enzyme solution of BY4741(yrr1Δ) exhibited higher NADPH-dependent vanillin reduction activity than control. In addition, overexpressing the ABC transporter genes PDR5, YOR1, and SNQ2, as well as the RNA helicase gene DBP2, increased the vanillin tolerance of strain. Interestingly, unlike the marked changes we mentioned above, under vanillin-free conditions, there are only limited transcriptional differences between wildtype and yrr1Δ. This indicated that vanillin might act as an effector in Yrr1p-related regulatory processes. The new findings of the relationship between YRR1 and vanillin tolerance, as well as the contribution of rRNA processing and ribosome biogenesis to

  12. OGRe: a relational database for comparative analysis of mitochondrial genomes

    PubMed Central

    Jameson, Daniel; Gibson, Andrew P.; Hudelot, Cendrine; Higgs, Paul G.

    2003-01-01

    Organellar Genome Retrieval (OGRe) is a relational database of complete mitochondrial genome sequences for over 250 Metazoan species. OGRe provides a resource for the comparative analysis of mitochondrial genomes at several levels. At the sequence level, OGRe allows the retrieval of any selected set of mitochondrial genes from any selected set of species. Species are classified using a taxonomic system that allows easy selection of related groups of species. Sequence alignments are also available for some species. At the level of individual nucleotides, the system contains information on base frequencies and codon usage frequencies that can be compared between organisms. At the level of whole genomes, OGRe provides several ways of visualizing information on gene order. Diagrams illustrating the genome arrangement can be generated for any selected set of species automatically from the information in the database. Searches can be done based on gene arrangement to find sets of species that have the same order as one another. Diagrams for pairwise comparison of species can be produced that show the positions of break-points in the gene order and use colour to highlight the sections of the genome that have moved. OGRe is available from http://www.bioinf.man.ac.uk/ogre. PMID:12519982

  13. Assigning protein functions by comparative genome analysis protein phylogenetic profiles

    DOEpatents

    Pellegrini, Matteo; Marcotte, Edward M.; Thompson, Michael J.; Eisenberg, David; Grothe, Robert; Yeates, Todd O.

    2003-05-13

    A computational method system, and computer program are provided for inferring functional links from genome sequences. One method is based on the observation that some pairs of proteins A' and B' have homologs in another organism fused into a single protein chain AB. A trans-genome comparison of sequences can reveal these AB sequences, which are Rosetta Stone sequences because they decipher an interaction between A' and B. Another method compares the genomic sequence of two or more organisms to create a phylogenetic profile for each protein indicating its presence or absence across all the genomes. The profile provides information regarding functional links between different families of proteins. In yet another method a combination of the above two methods is used to predict functional links.

  14. Phytozome: a Tool for Green Plant Comparative Genomics

    DOE Data Explorer

    Phytozome is a joint project of the Department of Energy's Joint Genome Institute and the Center for Integrative Genomics to facilitate comparative genomic studies amongst green plants. Clusters of orthologous and paralogous genes that represent the modern descendents of ancestral gene sets are constructed at key phylogenetic nodes. These clusters allow easy access to clade specific orthology/paralogy relationships as well as clade specific genes and gene expansions. As of release v4.0, Phytozome provides access to nine sequenced and annotated green plant genomes, eight of which have been clustered into gene families at six evolutionarily significant nodes. Where possible, each gene has been annotated with PFAM, KOG, KEGG, and PANTHER assignments, and publicly available annotations from RefSeq, UniProt, TAIR, JGI are hyper-linked and searchable. [Copied from the Overview at http://www.phytozome.net/Phytozome_info.php

  15. FLAGdb(++): A Bioinformatic Environment to Study and Compare Plant Genomes.

    PubMed

    Tamby, Jean Philippe; Brunaud, Véronique

    2017-01-01

    Today, the growing knowledge and data accumulation on plant genomes do not solve in a simple way the task of gene function inference. Because data of different types are coming from various sources, we need to integrate and analyze them to help biologists in this task. We created FLAGdb(++) ( http://tools.ips2.u-psud.fr/FLAGdb ) to take up this challenge for a selection of plant genomes. In order to enrich gene function predictions, structural and functional annotations of the genomes are explored to generate meta-data and to compare them. Since data are numerous and complex, we focused on accessibility and visualization with an original and user-friendly interface. In this chapter we present the main tools of FLAGdb(++) and a use-case to explore a gene family: structural and functional properties of this family and research of orthologous genes in the other plant genomes.

  16. Comparative Genomics of Mycobacteria: Some Answers, Yet More New Questions

    PubMed Central

    Behr, Marcel A.

    2015-01-01

    Comparative genomic studies permit a genus-level perspective on the distinction between environmental mycobacteria and Mycobacterium tuberculosis, as well as a species-level assessment of genetic variability within M. tuberculosis. Both of these strata of evolutionary analysis serve to generate hypotheses regarding the genomic basis of M. tuberculosis virulence. In contrasting lessons from macroevolutionary study and microevolutionary study, one can form predictions about which segments of the genome are likely to be essential for or dispensable for the pathogenesis of tuberculosis. Although some of these predictions have been experimentally verified, notable exceptions challenge the direct link between these virulence factors and the capacity of M. tuberculosis to successfully cause disease and propagate between human hosts. These unexpected findings serve as the stimulus for further studies, using genomic comparisons and other approaches, to better define the remarkable success of this recalcitrant pathogen. PMID:25395374

  17. Comparative genomics and transcriptomics of lineages I, II, and III strains of Listeria monocytogenes

    PubMed Central

    2012-01-01

    Background Listeria monocytogenes is a food-borne pathogen that causes infections with a high-mortality rate and has served as an invaluable model for intracellular parasitism. Here, we report complete genome sequences for two L. monocytogenes strains belonging to serotype 4a (L99) and 4b (CLIP80459), and transcriptomes of representative strains from lineages I, II, and III, thereby permitting in-depth comparison of genome- and transcriptome -based data from three lineages of L. monocytogenes. Lineage III, represented by the 4a L99 genome is known to contain strains less virulent for humans. Results The genome analysis of the weakly pathogenic L99 serotype 4a provides extensive evidence of virulence gene decay, including loss of several important surface proteins. The 4b CLIP80459 genome, unlike the previously sequenced 4b F2365 genome harbours an intact inlB invasion gene. These lineage I strains are characterized by the lack of prophage genes, as they share only a single prophage locus with other L. monocytogenes genomes 1/2a EGD-e and 4a L99. Comparative transcriptome analysis during intracellular growth uncovered adaptive expression level differences in lineages I, II and III of Listeria, notable amongst which was a strong intracellular induction of flagellar genes in strain 4a L99 compared to the other lineages. Furthermore, extensive differences between strains are manifest at levels of metabolic flux control and phosphorylated sugar uptake. Intriguingly, prophage gene expression was found to be a hallmark of intracellular gene expression. Deletion mutants in the single shared prophage locus of lineage II strain EGD-e 1/2a, the lma operon, revealed severe attenuation of virulence in a murine infection model. Conclusion Comparative genomics and transcriptome analysis of L. monocytogenes strains from three lineages implicate prophage genes in intracellular adaptation and indicate that gene loss and decay may have led to the emergence of attenuated lineages

  18. Malignant canine mammary tumours: Preliminary genomic insights using oligonucleotide array comparative genomic hybridisation analysis.

    PubMed

    Santos, Marta; Dias-Pereira, Patrícia; Williams, Christina; Lopes, Carlos; Breen, Matthew

    2017-03-28

    Neoplastic mammary disease in female dogs represents a major health concern for dog owners and veterinarians, but the genomic basis of the disease is poorly understood. In this study, we performed high resolution oligonucleotide array comparative genomic hybridisation (oaCGH) to assess genome wide DNA copy number changes in 10 malignant canine mammary tumours from seven female dogs, including multiple tumours collected at one time from each of three female dogs. In all but two tumours, genomic imbalances were detected, with losses being more common than gains. Canine chromosomes 9, 22, 26, 27, 34 and X were most frequently affected. Dissimilar oaCGH ratio profiles were observed in multiple tumours from the same dogs, providing preliminary evidence for probable independent pathogenesis. Analysis of adjacent samples of one tumour revealed regional differences in the number of genomic imbalances, suggesting heterogeneity within tumours.

  19. The tiger genome and comparative analysis with lion and snow leopard genomes.

    PubMed

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  20. The tiger genome and comparative analysis with lion and snow leopard genomes

    PubMed Central

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  1. Comparative genomics of drug resistance in Trypanosoma brucei rhodesiense.

    PubMed

    Graf, Fabrice E; Ludin, Philipp; Arquint, Christian; Schmidt, Remo S; Schaub, Nadia; Kunz Renggli, Christina; Munday, Jane C; Krezdorn, Jessica; Baker, Nicola; Horn, David; Balmer, Oliver; Caccone, Adalgisa; de Koning, Harry P; Mäser, Pascal

    2016-09-01

    Trypanosoma brucei rhodesiense is one of the causative agents of human sleeping sickness, a fatal disease that is transmitted by tsetse flies and restricted to Sub-Saharan Africa. Here we investigate two independent lines of T. b. rhodesiense that have been selected with the drugs melarsoprol and pentamidine over the course of 2 years, until they exhibited stable cross-resistance to an unprecedented degree. We apply comparative genomics and transcriptomics to identify the underlying mutations. Only few mutations have become fixed during selection. Three genes were affected by mutations in both lines: the aminopurine transporter AT1, the aquaporin AQP2, and the RNA-binding protein UBP1. The melarsoprol-selected line carried a large deletion including the adenosine transporter gene AT1, whereas the pentamidine-selected line carried a heterozygous point mutation in AT1, G430R, which rendered the transporter non-functional. Both resistant lines had lost AQP2, and both lines carried the same point mutation, R131L, in the RNA-binding motif of UBP1. The finding that concomitant deletion of the known resistance genes AT1 and AQP2 in T. b. brucei failed to phenocopy the high levels of resistance of the T. b. rhodesiense mutants indicated a possible role of UBP1 in melarsoprol-pentamidine cross-resistance. However, homozygous in situ expression of UBP1-Leu(131) in T. b. brucei did not affect the sensitivity to melarsoprol or pentamidine.

  2. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    PubMed

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  3. Low-pass sequencing for microbial comparative genomics

    PubMed Central

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

    2004-01-01

    Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich

  4. Low-pass sequencing for microbial comparative genomics.

    PubMed

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Ng, Wailap Victor; Hood, Leroy

    2004-01-12

    We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1

  5. Comparative genomics boosts target prediction for bacterial small RNAs.

    PubMed

    Wright, Patrick R; Richter, Andreas S; Papenfort, Kai; Mann, Martin; Vogel, Jörg; Hess, Wolfgang R; Backofen, Rolf; Georg, Jens

    2013-09-10

    Small RNAs (sRNAs) constitute a large and heterogeneous class of bacterial gene expression regulators. Much like eukaryotic microRNAs, these sRNAs typically target multiple mRNAs through short seed pairing, thereby acting as global posttranscriptional regulators. In some bacteria, evidence for hundreds to possibly more than 1,000 different sRNAs has been obtained by transcriptome sequencing. However, the experimental identification of possible targets and, therefore, their confirmation as functional regulators of gene expression has remained laborious. Here, we present a strategy that integrates phylogenetic information to predict sRNA targets at the genomic scale and reconstructs regulatory networks upon functional enrichment and network analysis (CopraRNA, for Comparative Prediction Algorithm for sRNA Targets). Furthermore, CopraRNA precisely predicts the sRNA domains for target recognition and interaction. When applied to several model sRNAs, CopraRNA revealed additional targets and functions for the sRNAs CyaR, FnrS, RybB, RyhB, SgrS, and Spot42. Moreover, the mRNAs gdhA, lrp, marA, nagZ, ptsI, sdhA, and yobF-cspC were suggested as regulatory hubs targeted by up to seven different sRNAs. The verification of many previously undetected targets by CopraRNA, even for extensively investigated sRNAs, demonstrates its advantages and shows that CopraRNA-based analyses can compete with experimental target prediction approaches. A Web interface allows high-confidence target prediction and efficient classification of bacterial sRNAs.

  6. Comparative genomics boosts target prediction for bacterial small RNAs

    PubMed Central

    Wright, Patrick R.; Richter, Andreas S.; Papenfort, Kai; Mann, Martin; Vogel, Jörg; Hess, Wolfgang R.; Backofen, Rolf; Georg, Jens

    2013-01-01

    Small RNAs (sRNAs) constitute a large and heterogeneous class of bacterial gene expression regulators. Much like eukaryotic microRNAs, these sRNAs typically target multiple mRNAs through short seed pairing, thereby acting as global posttranscriptional regulators. In some bacteria, evidence for hundreds to possibly more than 1,000 different sRNAs has been obtained by transcriptome sequencing. However, the experimental identification of possible targets and, therefore, their confirmation as functional regulators of gene expression has remained laborious. Here, we present a strategy that integrates phylogenetic information to predict sRNA targets at the genomic scale and reconstructs regulatory networks upon functional enrichment and network analysis (CopraRNA, for Comparative Prediction Algorithm for sRNA Targets). Furthermore, CopraRNA precisely predicts the sRNA domains for target recognition and interaction. When applied to several model sRNAs, CopraRNA revealed additional targets and functions for the sRNAs CyaR, FnrS, RybB, RyhB, SgrS, and Spot42. Moreover, the mRNAs gdhA, lrp, marA, nagZ, ptsI, sdhA, and yobF-cspC were suggested as regulatory hubs targeted by up to seven different sRNAs. The verification of many previously undetected targets by CopraRNA, even for extensively investigated sRNAs, demonstrates its advantages and shows that CopraRNA-based analyses can compete with experimental target prediction approaches. A Web interface allows high-confidence target prediction and efficient classification of bacterial sRNAs. PMID:23980183

  7. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  8. Comparative genomic analysis of two brucellaphages of distant origins.

    PubMed

    Flores, Victor; López-Merino, Ahidé; Mendoza-Hernandez, Guillermo; Guarneros, Gabriel

    2012-04-01

    Here, we present the first complete genome sequence of brucellaphage Tbilisi (Tb) and compared it with that of Pr, a broad host-range brucellaphage recently isolated in Mexico. The genomes consist of 41,148 bp (Tb) and 38,253 bp (Pr), they differ mainly in the region encoding structural proteins, in which the genome of Tb shows two major insertions. Both genomes share 99.87% nucleotide identity, a high percentage of identity among phages isolated at so globally distant locations and temporally different occasions. Sequence analysis revealed 57 conserved ORFs, three transcriptional terminators and four putative transcriptional promoters. The co-occurrence of an ORF encoding a putative DnaA-like protein and a putative oriC-like origin of replication was found in both brucellaphages genomes, a feature not described in any other phage genome. These elements suggest that DNA replication in brucellaphages differs from other phages, and might resemble that of bacterial chromosomes. Copyright © 2012 Elsevier Inc. All rights reserved.

  9. Sequencing and comparative analyses of the genomes of zoysiagrasses

    PubMed Central

    Tanaka, Hidenori; Hirakawa, Hideki; Kosugi, Shunichi; Nakayama, Shinobu; Ono, Akiko; Watanabe, Akiko; Hashiguchi, Masatsugu; Gondo, Takahiro; Ishigaki, Genki; Muguerza, Melody; Shimizu, Katsuya; Sawamura, Noriko; Inoue, Takayasu; Shigeki, Yuichi; Ohno, Naoki; Tabata, Satoshi; Akashi, Ryo; Sato, Shusei

    2016-01-01

    Zoysia is a warm-season turfgrass, which comprises 11 allotetraploid species (2n = 4x = 40), each possessing different morphological and physiological traits. To characterize the genetic systems of Zoysia plants and to analyse their structural and functional differences in individual species and accessions, we sequenced the genomes of Zoysia species using HiSeq and MiSeq platforms. As a reference sequence of Zoysia species, we generated a high-quality draft sequence of the genome of Z. japonica accession ‘Nagirizaki’ (334 Mb) in which 59,271 protein-coding genes were predicted. In parallel, draft genome sequences of Z. matrella ‘Wakaba’ and Z. pacifica ‘Zanpa’ were also generated for comparative analyses. To investigate the genetic diversity among the Zoysia species, genome sequence reads of three additional accessions, Z. japonica ‘Kyoto’, Z. japonica ‘Miyagi’ and Z. matrella ‘Chiba Fair Green’, were accumulated, and aligned against the reference genome of ‘Nagirizaki’ along with those from ‘Wakaba’ and ‘Zanpa’. As a result, we detected 7,424,163 single-nucleotide polymorphisms and 852,488 short indels among these species. The information obtained in this study will be valuable for basic studies on zoysiagrass evolution and genetics as well as for the breeding of zoysiagrasses, and is made available in the ‘Zoysia Genome Database’ at http://zoysia.kazusa.or.jp. PMID:26975196

  10. Comparative Genomics and stx Phage Characterization of LEE-Negative Shiga Toxin-Producing Escherichia coli

    PubMed Central

    Steyert, Susan R.; Sahl, Jason W.; Fraser, Claire M.; Teel, Louise D.; Scheutz, Flemming; Rasko, David A.

    2012-01-01

    Infection by Escherichia coli and Shigella species are among the leading causes of death due to diarrheal disease in the world. Shiga toxin-producing E. coli (STEC) that do not encode the locus of enterocyte effacement (LEE-negative STEC) often possess Shiga toxin gene variants and have been isolated from humans and a variety of animal sources. In this study, we compare the genomes of nine LEE-negative STEC harboring various stx alleles with four complete reference LEE-positive STEC isolates. Compared to a representative collection of prototype E. coli and Shigella isolates representing each of the pathotypes, the whole genome phylogeny demonstrated that these isolates are diverse. Whole genome comparative analysis of the 13 genomes revealed that in addition to the absence of the LEE pathogenicity island, phage-encoded genes including non-LEE encoded effectors, were absent from all nine LEE-negative STEC genomes. Several plasmid-encoded virulence factors reportedly identified in LEE-negative STEC isolates were identified in only a subset of the nine LEE-negative isolates further confirming the diversity of this group. In combination with whole genome analysis, we characterized the lambdoid phages harboring the various stx alleles and determined their genomic insertion sites. Although the integrase gene sequence corresponded with genomic location, it was not correlated with stx variant, further highlighting the mosaic nature of these phages. The transcription of these phages in different genomic backgrounds was examined. Expression of the Shiga toxin genes, stx1 and/or stx2, as well as the Q genes, were examined with quantitative reverse transcriptase polymerase chain reaction assays. A wide range of basal and induced toxin induction was observed. Overall, this is a first significant foray into the genome space of this unexplored group of emerging and divergent pathogens. PMID:23162798

  11. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    PubMed Central

    2011-01-01

    Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae. PMID:21226921

  12. Comparative and demographic analysis of orang-utan genomes.

    PubMed

    Locke, Devin P; Hillier, LaDeana W; Warren, Wesley C; Worley, Kim C; Nazareth, Lynne V; Muzny, Donna M; Yang, Shiaw-Pyng; Wang, Zhengyuan; Chinwalla, Asif T; Minx, Pat; Mitreva, Makedonka; Cook, Lisa; Delehaunty, Kim D; Fronick, Catrina; Schmidt, Heather; Fulton, Lucinda A; Fulton, Robert S; Nelson, Joanne O; Magrini, Vincent; Pohl, Craig; Graves, Tina A; Markovic, Chris; Cree, Andy; Dinh, Huyen H; Hume, Jennifer; Kovar, Christie L; Fowler, Gerald R; Lunter, Gerton; Meader, Stephen; Heger, Andreas; Ponting, Chris P; Marques-Bonet, Tomas; Alkan, Can; Chen, Lin; Cheng, Ze; Kidd, Jeffrey M; Eichler, Evan E; White, Simon; Searle, Stephen; Vilella, Albert J; Chen, Yuan; Flicek, Paul; Ma, Jian; Raney, Brian; Suh, Bernard; Burhans, Richard; Herrero, Javier; Haussler, David; Faria, Rui; Fernando, Olga; Darré, Fleur; Farré, Domènec; Gazave, Elodie; Oliva, Meritxell; Navarro, Arcadi; Roberto, Roberta; Capozzi, Oronzo; Archidiacono, Nicoletta; Della Valle, Giuliano; Purgato, Stefania; Rocchi, Mariano; Konkel, Miriam K; Walker, Jerilyn A; Ullmer, Brygg; Batzer, Mark A; Smit, Arian F A; Hubley, Robert; Casola, Claudio; Schrider, Daniel R; Hahn, Matthew W; Quesada, Victor; Puente, Xose S; Ordoñez, Gonzalo R; López-Otín, Carlos; Vinar, Tomas; Brejova, Brona; Ratan, Aakrosh; Harris, Robert S; Miller, Webb; Kosiol, Carolin; Lawson, Heather A; Taliwal, Vikas; Martins, André L; Siepel, Adam; Roychoudhury, Arindam; Ma, Xin; Degenhardt, Jeremiah; Bustamante, Carlos D; Gutenkunst, Ryan N; Mailund, Thomas; Dutheil, Julien Y; Hobolth, Asger; Schierup, Mikkel H; Ryder, Oliver A; Yoshinaga, Yuko; de Jong, Pieter J; Weinstock, George M; Rogers, Jeffrey; Mardis, Elaine R; Gibbs, Richard A; Wilson, Richard K

    2011-01-27

    'Orang-utan' is derived from a Malay term meaning 'man of the forest' and aptly describes the southeast Asian great apes native to Sumatra and Borneo. The orang-utan species, Pongo abelii (Sumatran) and Pongo pygmaeus (Bornean), are the most phylogenetically distant great apes from humans, thereby providing an informative perspective on hominid evolution. Here we present a Sumatran orang-utan draft genome assembly and short read sequence data from five Sumatran and five Bornean orang-utan genomes. Our analyses reveal that, compared to other primates, the orang-utan genome has many unique features. Structural evolution of the orang-utan genome has proceeded much more slowly than other great apes, evidenced by fewer rearrangements, less segmental duplication, a lower rate of gene family turnover and surprisingly quiescent Alu repeats, which have played a major role in restructuring other primate genomes. We also describe a primate polymorphic neocentromere, found in both Pongo species, emphasizing the gradual evolution of orang-utan genome structure. Orang-utans have extremely low energy usage for a eutherian mammal, far lower than their hominid relatives. Adding their genome to the repertoire of sequenced primates illuminates new signals of positive selection in several pathways including glycolipid metabolism. From the population perspective, both Pongo species are deeply diverse; however, Sumatran individuals possess greater diversity than their Bornean counterparts, and more species-specific variation. Our estimate of Bornean/Sumatran speciation time, 400,000 years ago, is more recent than most previous studies and underscores the complexity of the orang-utan speciation process. Despite a smaller modern census population size, the Sumatran effective population size (N(e)) expanded exponentially relative to the ancestral N(e) after the split, while Bornean N(e) declined over the same period. Overall, the resources and analyses presented here offer new

  13. Comparative Analysis of Six Lagerstroemia Complete Chloroplast Genomes

    PubMed Central

    Xu, Chao; Dong, Wenpan; Li, Wenqing; Lu, Yizeng; Xie, Xiaoman; Jin, Xiaobai; Shi, Jipu; He, Kaihong; Suo, Zhili

    2017-01-01

    Crape myrtles are economically important ornamental trees of the genus Lagerstroemia L. (Lythraceae), with a distribution from tropical to northern temperate zones. They are positioned phylogenetically to a large subclade of rosids (in the eudicots) which contain more than 25% of all the angiosperms. They commonly bloom from summer till fall and are of significant value in city landscape and environmental protection. Morphological traits are shared inter-specifically among plants of Lagerstroemia to certain extent and are also influenced by environmental conditions and different developmental stages. Thus, classification of plants in Lagerstroemia at species and cultivar levels is still a challenging task. Chloroplast (cp) genome sequences have been proven to be an informative and valuable source of cp DNA markers for genetic diversity evaluation. In this study, the complete cp genomes of three Lagerstroemia species were newly sequenced, and three other published cp genome sequences of Lagerstroemia were retrieved for comparative analyses in order to obtain an upgraded understanding of the application value of genetic information from the cp genomes. The six cp genomes ranged from 152,049 bp (L. subcostata) to 152,526 bp (L. speciosa) in length. We analyzed nucleotide substitutions, insertions/deletions, and simple sequence repeats in the cp genomes, and discovered 12 relatively highly variable regions that will potentially provide plastid markers for further taxonomic, phylogenetic, and population genetics studies in Lagerstroemia. The phylogenetic relationships of the Lagerstroemia taxa inferred from the datasets from the cp genomes obtained high support, indicating that cp genome data may be useful in resolving relationships in this genus. PMID:28154574

  14. Evolution of cancer suppression as revealed by mammalian comparative genomics.

    PubMed

    Tollis, Marc; Schiffman, Joshua D; Boddy, Amy M

    2017-02-02

    Cancer suppression is an important feature in the evolution of large and long-lived animals. While some tumor suppression pathways are conserved among all multicellular organisms, others mechanisms of cancer resistance are uniquely lineage specific. Comparative genomics has become a powerful tool to discover these unique and shared molecular adaptations in respect to cancer suppression. These findings may one day be translated to human patients through evolutionary medicine. Here, we will review theory and methods of comparative cancer genomics and highlight major findings of cancer suppression across mammals. Our current knowledge of cancer genomics suggests that more efficient DNA repair and higher sensitivity to DNA damage may be the key to tumor suppression in large or long-lived mammals.

  15. Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics.

    PubMed

    Haft, Daniel H; Selengut, Jeremy D; Brinkac, Lauren M; Zafar, Nikhat; White, Owen

    2005-02-01

    The presence or absence of metabolic pathways and structures provide a context that makes protein annotation far more reliable. Compiling such information across microbial genomes improves the functional classification of proteins and provides a valuable resource for comparative genomics. We have created a Genome Properties system to present key aspects of prokaryotic biology using standardized computational methods and controlled vocabularies. Properties reflect gene content, phenotype, phylogeny and computational analyses. The results of searches using hidden Markov models allow many properties to be deduced automatically, especially for families of proteins (equivalogs) conserved in function since their last common ancestor. Additional properties are derived from curation, published reports and other forms of evidence. Genome Properties system was applied to 156 complete prokaryotic genomes, and is easily mined to find differences between species, correlations between metabolic features and families of uncharacterized proteins, or relationships among properties. Genome Properties can be found at http://www.tigr.org/Genome_Properties http://www.tigr.org/tigr-scripts/CMR2/genome_properties_references.spl.

  16. Genome informatics and vaccine targets in Corynebacterium urealyticum using two whole genomes, comparative genomics, and reverse vaccinology.

    PubMed

    Guimarães, Luis; Soares, Siomar; Trost, Eva; Blom, Jochen; Ramos, Rommel; Silva, Artur; Barh, Debmalya; Azevedo, Vasco

    2015-01-01

    Corynebacterium urealyticum is an opportunistic pathogen that normally lives on skin and mucous membranes in humans. This high Gram-positive bacteria can cause acute or encrusted cystitis, encrusted pyelitis, and pyelonephritis in immunocompromised patients. The bacteria is multi-drug resistant, and knowledge about the genes that contribute to its virulence is very limited. Two complete genome sequences were used in this comparative genomic study: C. urealyticum DSM 7109 and C. urealyticum DSM 7111. We used comparative genomics strategies to compare the two strains, DSM 7109 and DSM 7111, and to analyze their metabolic pathways, genome plasticity, and to predict putative antigenic targets. The genomes of these two strains together encode 2,115 non-redundant coding sequences, 1,823 of which are common to both genomes. We identified 188 strain-specific genes in DSM 7109 and 104 strain-specific genes in DSM 7111. The high number of strain-specific genes may be a result of horizontal gene transfer triggered by the large number of transposons in the genomes of these two strains. Screening for virulence factors revealed the presence of the spaDEF operon that encodes pili forming proteins. Therefore, spaDEF may play a pivotal role in facilitating the adhesion of the pathogen to the host tissue. Application of the reverse vaccinology method revealed 19 putative antigenic proteins that may be used in future studies as candidate drug or vaccine targets. The genome features and the presence of virulence factors in genomic islands in the two strains of C. urealyticum provide insights in the lifestyle of this opportunistic pathogen and may be useful in developing future therapeutic strategies.

  17. Comparative genomic analysis of the compound Brassica napus Rf locus.

    PubMed

    Gaborieau, Lydiane; Brown, Gregory G

    2016-10-26

    The plant trait of cytoplasmically-inherited male sterility (CMS) and its suppression by nuclear restorer-of-fertility (Rf) genes can be viewed as a genetic arms race between the mitochondrial and nuclear genomes. Most nuclear Rf genes have been shown to encode P-type pentatricopeptide repeat proteins (PPRs). Phylogenetic analysis of P-class PPRs from sequenced plants genomes has shown that Rf-proteins cluster in a distinct clade of P-class PPRs, RFL-PPRs, that display hallmarks of positive evolutionary selection. Genes encoding RFL-PPRs (RFLs) within a given plant genome tend to be closely related both in sequence and position, but a detailed understanding of how such species-specific expansion occurs is lacking. In the canola, (oilseed rape) species Brassica napus, previous work has indicated the nuclear restorer genes for the two native forms of CMS, Rfn (for nap CMS) and Rfp (pol CMS), represent alternate haplotypes, or alleles, of a single nuclear locus. Fine genetic mapping indicates that Rfn does indeed localize to the same genomic region as Rfp. We find this region is enriched in RFL genes, three of which, based on their position and expression, represent potential candidates for Rfn; one of these genes, designated PPR4, is a preferred candidate in that it is not expressed in the nap CMS line. Comparison of the corresponding regions of the genomes of B. rapa, B. oleracea, Arabidopsis thaliana and A. lyrata provides insight into the expansion of this group of RFL genes in different lines of evolutionary descent. Unlike other nuclear restorer loci containing multiple RFL genes, the RFL genes in the Rf region of B. napus are not present in tandem arrays but rather are dispersed in genomic location. The genes do not share similar flanking non-coding regions and do not contain introns, indicating that they have duplicated primarily through a retrotransposition-mediated process. In contrast, segmental duplication has been responsible for the distribution of the

  18. Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity

    PubMed Central

    Xu, Teng; Qin, Song; Hu, Yongwu; Song, Zhijian; Ying, Jianchao; Li, Peizhen; Dong, Wei; Zhao, Fangqing; Yang, Huanming; Bao, Qiyu

    2016-01-01

    Arthrospira platensis is a multi-cellular and filamentous non-N2-fixing cyanobacterium that is capable of performing oxygenic photosynthesis. In this study, we determined the nearly complete genome sequence of A. platensis YZ. A. platensis YZ genome is a single, circular chromosome of 6.62 Mb in size. Phylogenetic and comparative genomic analyses revealed that A. platensis YZ was more closely related to A. platensis NIES-39 than Arthrospira sp. PCC 8005 and A. platensis C1. Broad gene gains were identified between A. platensis YZ and three other Arthrospira speices, some of which have been previously demonstrated that can be laterally transferred among different species, such as restriction-modification systems-coding genes. Moreover, unprecedented extensive chromosomal rearrangements among different strains were observed. The chromosomal rearrangements, particularly the chromosomal inversions, were analysed and estimated to be closely related to palindromes that involved long inverted repeat sequences and the extensively distributed type IIR restriction enzyme in the Arthrospira genome. In addition, species from genus Arthrospira unanimously contained the highest rate of repetitive sequence compared with the other species of order Oscillatoriales, suggested that sequence duplication significantly contributed to Arthrospira genome phylogeny. These results provided in-depth views into the genomic phylogeny and structural variation of A. platensis, as well as provide a valuable resource for functional genomics studies. PMID:27330141

  19. Comparative genomics of wild type yeast strains unveils important genome diversity

    PubMed Central

    Carreto, Laura; Eiriz, Maria F; Gomes, Ana C; Pereira, Patrícia M; Schuller, Dorit; Santos, Manuel AS

    2008-01-01

    Background Genome variability generates phenotypic heterogeneity and is of relevance for adaptation to environmental change, but the extent of such variability in natural populations is still poorly understood. For example, selected Saccharomyces cerevisiae strains are variable at the ploidy level, have gene amplifications, changes in chromosome copy number, and gross chromosomal rearrangements. This suggests that genome plasticity provides important genetic diversity upon which natural selection mechanisms can operate. Results In this study, we have used wild-type S. cerevisiae (yeast) strains to investigate genome variation in natural and artificial environments. We have used comparative genome hybridization on array (aCGH) to characterize the genome variability of 16 yeast strains, of laboratory and commercial origin, isolated from vineyards and wine cellars, and from opportunistic human infections. Interestingly, sub-telomeric instability was associated with the clinical phenotype, while Ty element insertion regions determined genomic differences of natural wine fermentation strains. Copy number depletion of ASP3 and YRF1 genes was found in all wild-type strains. Other gene families involved in transmembrane transport, sugar and alcohol metabolism or drug resistance had copy number changes, which also distinguished wine from clinical isolates. Conclusion We have isolated and genotyped more than 1000 yeast strains from natural environments and carried out an aCGH analysis of 16 strains representative of distinct genotype clusters. Important genomic variability was identified between these strains, in particular in sub-telomeric regions and in Ty-element insertion sites, suggesting that this type of genome variability is the main source of genetic diversity in natural populations of yeast. The data highlights the usefulness of yeast as a model system to unravel intraspecific natural genome diversity and to elucidate how natural selection shapes the yeast genome

  20. Whole genome annotation and comparative genomic analyses of bio-control fungus Purpureocillium lilacinum.

    PubMed

    Prasad, Pushplata; Varshney, Deepti; Adholeya, Alok

    2015-11-25

    The fungus Purpureocillium lilacinum is widely known as a biological control agent against plant parasitic nematodes. This research article consists of genomic annotation of the first draft of whole genome sequence of P. lilacinum. The study aims to decipher the putative genetic components of the fungus involved in nematode pathogenesis by performing comparative genomic analysis with nine closely related fungal species in Hypocreales. de novo genomic assembly was done and a total of 301 scaffolds were constructed for P. lilacinum genomic DNA. By employing structural genome prediction models, 13, 266 genes coding for proteins were predicted in the genome. Approximately 73% of the predicted genes were functionally annotated using Blastp, InterProScan and Gene Ontology. A 14.7% fraction of the predicted genes shared significant homology with genes in the Pathogen Host Interactions (PHI) database. The phylogenomic analysis carried out using maximum likelihood RAxML algorithm provided insight into the evolutionary relationship of P. lilacinum. In congruence with other closely related species in the Hypocreales namely, Metarhizium spp., Pochonia chlamydosporia, Cordyceps militaris, Trichoderma reesei and Fusarium spp., P. lilacinum has large gene sets coding for G-protein coupled receptors (GPCRs), proteases, glycoside hydrolases and carbohydrate esterases that are required for degradation of nematode-egg shell components. Screening of the genome by Antibiotics & Secondary Metabolite Analysis Shell (AntiSMASH) pipeline indicated that the genome potentially codes for a variety of secondary metabolites, possibly required for adaptation to heterogeneous lifestyles reported for P. lilacinum. Significant up-regulation of subtilisin-like serine protease genes in presence of nematode eggs in quantitative real-time analyses suggested potential role of serine proteases in nematode pathogenesis. The data offer a better understanding of Purpureocillium lilacinum genome and will

  1. Using comparative genomics to drive new discoveries in microbiology.

    PubMed

    Haft, Daniel H

    2015-02-01

    Bioinformatics looks to many microbiologists like a service industry. In this view, annotation starts with what is known from experiments in the lab, makes reasonable inferences of which genes match other genes in function, builds databases to make all that we know accessible, but creates nothing truly new. Experiments lead, then biocuration and computational biology follow. But the astounding success of genome sequencing is changing the annotation paradigm. Every genome sequenced is an intercepted coded message from the microbial world, and as all cryptographers know, it is easier to decode a thousand messages than a single message. Some biology is best discovered not by phenomenology, but by decoding genome content, forming hypotheses, and doing the first few rounds of validation computationally. Through such reasoning, a role and function may be assigned to a protein with no sequence similarity to any protein yet studied. Experimentation can follow after the discovery to cement and to extend the findings. Unfortunately, this approach remains so unfamiliar to most bench scientists that lab work and comparative genomics typically segregate to different teams working on unconnected projects. This review will discuss several themes in comparative genomics as a discovery method, including highly derived data, use of patterns of design to reason by analogy, and in silico testing of computationally generated hypotheses.

  2. CFGP: a web-based, comparative fungal genomics platform.

    PubMed

    Park, Jongsun; Park, Bongsoo; Jung, Kyongyong; Jang, Suwang; Yu, Kwangyul; Choi, Jaeyoung; Kong, Sunghyung; Park, Jaejin; Kim, Seryun; Kim, Hyojeong; Kim, Soonok; Kim, Jihyun F; Blair, Jaime E; Lee, Kwangwon; Kang, Seogchan; Lee, Yong-Hwan

    2008-01-01

    Since the completion of the Saccharomyces cerevisiae genome sequencing project in 1996, the genomes of over 80 fungal species have been sequenced or are currently being sequenced. Resulting data provide opportunities for studying and comparing fungal biology and evolution at the genome level. To support such studies, the Comparative Fungal Genomics Platform (CFGP; http://cfgp.snu.ac.kr), a web-based multifunctional informatics workbench, was developed. The CFGP comprises three layers, including the basal layer, middleware and the user interface. The data warehouse in the basal layer contains standardized genome sequences of 65 fungal species. The middleware processes queries via six analysis tools, including BLAST, ClustalW, InterProScan, SignalP 3.0, PSORT II and a newly developed tool named BLASTMatrix. The BLASTMatrix permits the identification and visualization of genes homologous to a query across multiple species. The Data-driven User Interface (DUI) of the CFGP was built on a new concept of pre-collecting data and post-executing analysis instead of the 'fill-in-the-form-and-press-SUBMIT' user interfaces utilized by most bioinformatics sites. A tool termed Favorite, which supports the management of encapsulated sequence data and provides a personalized data repository to users, is another novel feature in the DUI.

  3. Comparative genomics and evolution of transcriptional regulons in Proteobacteria

    PubMed Central

    Kazakov, Alexey E.; Ravcheev, Dmitry A.; Stepanova, Vita V.; Novichkov, Pavel S.

    2016-01-01

    Comparative genomics approaches are broadly used for analysis of transcriptional regulation in bacterial genomes. In this work, we identified binding sites and reconstructed regulons for 33 orthologous groups of transcription factors (TFs) in 196 reference genomes from 21 taxonomic groups of Proteobacteria. Overall, we predict over 10 600 TF binding sites and identified more than 15 600 target genes for 1896 TFs constituting the studied orthologous groups of regulators. These include a set of orthologues for 21 metabolism-associated TFs from Escherichia coli and/or Shewanella that are conserved in five or more taxonomic groups and several additional TFs that represent non-orthologous substitutions of the metabolic regulators in some lineages of Proteobacteria. By comparing gene contents of the reconstructed regulons, we identified the core, taxonomy-specific and genome-specific TF regulon members and classified them by their metabolic functions. Detailed analysis of ArgR, TyrR, TrpR, HutC, HypR and other amino-acid-specific regulons demonstrated remarkable differences in regulatory strategies used by various lineages of Proteobacteria. The obtained genomic collection of in silico reconstructed TF regulons contains a large number of new regulatory interactions that await future experimental validation. The collection provides a framework for future evolutionary studies of transcriptional regulatory networks in Bacteria. It can be also used for functional annotation of putative metabolic transporters and enzymes that are abundant in the reconstructed regulons. PMID:28348857

  4. Comparative genomics and evolution of transcriptional regulons in Proteobacteria.

    PubMed

    Leyn, Semen A; Suvorova, Inna A; Kazakov, Alexey E; Ravcheev, Dmitry A; Stepanova, Vita V; Novichkov, Pavel S; Rodionov, Dmitry A

    2016-07-01

    Comparative genomics approaches are broadly used for analysis of transcriptional regulation in bacterial genomes. In this work, we identified binding sites and reconstructed regulons for 33 orthologous groups of transcription factors (TFs) in 196 reference genomes from 21 taxonomic groups of Proteobacteria. Overall, we predict over 10 600 TF binding sites and identified more than 15 600 target genes for 1896 TFs constituting the studied orthologous groups of regulators. These include a set of orthologues for 21 metabolism-associated TFs from Escherichia coli and/or Shewanella that are conserved in five or more taxonomic groups and several additional TFs that represent non-orthologous substitutions of the metabolic regulators in some lineages of Proteobacteria. By comparing gene contents of the reconstructed regulons, we identified the core, taxonomy-specific and genome-specific TF regulon members and classified them by their metabolic functions. Detailed analysis of ArgR, TyrR, TrpR, HutC, HypR and other amino-acid-specific regulons demonstrated remarkable differences in regulatory strategies used by various lineages of Proteobacteria. The obtained genomic collection of in silico reconstructed TF regulons contains a large number of new regulatory interactions that await future experimental validation. The collection provides a framework for future evolutionary studies of transcriptional regulatory networks in Bacteria. It can be also used for functional annotation of putative metabolic transporters and enzymes that are abundant in the reconstructed regulons.

  5. Comparative Whole-Genome Mapping To Determine Staphylococcus aureus Genome Size, Virulence Motifs, and Clonality

    PubMed Central

    Pantrang, Madhulatha; Stahl, Buffy; Briska, Adam M.; Stemper, Mary E.; Wagner, Trevor K.; Zentz, Emily B.; Callister, Steven M.; Lovrich, Steven D.; Henkhaus, John K.; Dykes, Colin W.

    2012-01-01

    Despite being a clonal pathogen, Staphylococcus aureus continues to acquire virulence and antibiotic-resistant genes located on mobile genetic elements such as genomic islands, prophages, pathogenicity islands, and the staphylococcal chromosomal cassette mec (SCCmec) by horizontal gene transfer from other staphylococci. The potential virulence of a S. aureus strain is often determined by comparing its pulsed-field gel electrophoresis (PFGE) or multilocus sequence typing profiles to that of known epidemic or virulent clones and by PCR of the toxin genes. Whole-genome mapping (formerly optical mapping), which is a high-resolution ordered restriction mapping of a bacterial genome, is a relatively new genomic tool that allows comparative analysis across entire bacterial genomes to identify regions of genomic similarities and dissimilarities, including small and large insertions and deletions. We explored whether whole-genome maps (WGMs) of methicillin-resistant S. aureus (MRSA) could be used to predict the presence of methicillin resistance, SCCmec type, and Panton-Valentine leukocidin (PVL)-producing genes on an S. aureus genome. We determined the WGMs of 47 diverse clinical isolates of S. aureus, including well-characterized reference MRSA strains, and annotated the signature restriction pattern in SCCmec types, arginine catabolic mobile element (ACME), and PVL-carrying prophage, PhiSa2 or PhiSa2-like regions on the genome. WGMs of these isolates accurately characterized them as MRSA or methicillin-sensitive S. aureus based on the presence or absence of the SCCmec motif, ACME and the unique signature pattern for the prophage insertion that harbored the PVL genes. Susceptibility to methicillin resistance and the presence of mecA, SCCmec types, and PVL genes were confirmed by PCR. A WGM clustering approach was further able to discriminate isolates within the same PFGE clonal group. These results showed that WGMs could be used not only to genotype S. aureus but also to

  6. Comparative whole-genome mapping to determine Staphylococcus aureus genome size, virulence motifs, and clonality.

    PubMed

    Shukla, Sanjay K; Pantrangi, Madhulatha; Stahl, Buffy; Briska, Adam M; Stemper, Mary E; Wagner, Trevor K; Zentz, Emily B; Callister, Steven M; Lovrich, Steven D; Henkhaus, John K; Dykes, Colin W

    2012-11-01

    Despite being a clonal pathogen, Staphylococcus aureus continues to acquire virulence and antibiotic-resistant genes located on mobile genetic elements such as genomic islands, prophages, pathogenicity islands, and the staphylococcal chromosomal cassette mec (SCCmec) by horizontal gene transfer from other staphylococci. The potential virulence of a S. aureus strain is often determined by comparing its pulsed-field gel electrophoresis (PFGE) or multilocus sequence typing profiles to that of known epidemic or virulent clones and by PCR of the toxin genes. Whole-genome mapping (formerly optical mapping), which is a high-resolution ordered restriction mapping of a bacterial genome, is a relatively new genomic tool that allows comparative analysis across entire bacterial genomes to identify regions of genomic similarities and dissimilarities, including small and large insertions and deletions. We explored whether whole-genome maps (WGMs) of methicillin-resistant S. aureus (MRSA) could be used to predict the presence of methicillin resistance, SCCmec type, and Panton-Valentine leukocidin (PVL)-producing genes on an S. aureus genome. We determined the WGMs of 47 diverse clinical isolates of S. aureus, including well-characterized reference MRSA strains, and annotated the signature restriction pattern in SCCmec types, arginine catabolic mobile element (ACME), and PVL-carrying prophage, PhiSa2 or PhiSa2-like regions on the genome. WGMs of these isolates accurately characterized them as MRSA or methicillin-sensitive S. aureus based on the presence or absence of the SCCmec motif, ACME and the unique signature pattern for the prophage insertion that harbored the PVL genes. Susceptibility to methicillin resistance and the presence of mecA, SCCmec types, and PVL genes were confirmed by PCR. A WGM clustering approach was further able to discriminate isolates within the same PFGE clonal group. These results showed that WGMs could be used not only to genotype S. aureus but also to

  7. Comparing thousands of circular genomes using the CGView Comparison Tool

    PubMed Central

    2012-01-01

    Background Continued sequencing efforts coupled with advances in sequencing technology will lead to the completion of a vast number of small genomes. Whole-genome comparisons represent an important part of the analysis of any new genome sequence, as they can provide a better understanding of the biology and evolution of the source organism. Visualization of the results is important, as it allows information from a variety of sources to be integrated and interpreted. However, existing graphical comparison tools lack features needed for efficiently comparing a new genome to hundreds or thousands of existing sequences. Moreover, existing tools are limited in terms of the types of comparisons that can be performed, the extent to which the output can be customized, and the ease with which the entire process can be automated. Results The CGView Comparison Tool (CCT) is a package for visually comparing bacterial, plasmid, chloroplast, or mitochondrial sequences of interest to existing genomes or sequence collections. The comparisons are conducted using BLAST, and the BLAST results are presented in the form of graphical maps that can also show sequence features, gene and protein names, COG (Clusters of Orthologous Groups of proteins) category assignments, and sequence composition characteristics. CCT can generate maps in a variety of sizes, including 400 Megapixel maps suitable for posters. Comparisons can be conducted within a particular species or genus, or all available genomes can be used. The entire map creation process, from downloading sequences to redrawing zoomed maps, can be completed easily using scripts included with the CCT. User-defined features or analysis results can be included on maps, and maps can be extensively customized. To simplify program setup, a CCT virtual machine that includes all dependencies preinstalled is available. Detailed tutorials illustrating the use of CCT are included with the CCT documentation. Conclusion CCT can be used to visually

  8. Inferring divergence of context-dependent substitution rates in Drosophila genomes with applications to comparative genomics.

    PubMed

    Chachick, Ran; Tanay, Amos

    2012-07-01

    Nucleotide substitution is a major evolutionary driving force that can incrementally and stochastically give rise to broad divergence patterns among species. The substitution process at each genomic position is frequently modeled independently of the other positions, although complex interactions between nearby bases are known to significantly affect mutation rates. Here, we study the evolution of 12 fly genomes using new algorithms for accurate inference of parameter-rich substitution models. By comparing models between lineages, we reveal the evolutionary histories of substitution rates at different flanking nucleotide contexts. We demonstrate these driving forces of molecular evolution to be constantly changing, suggesting that neutral drift of mutation rates is an important factor in the evolution of genomes and their sequence composition. This observation is used to develop a scalable approach for parameter-rich comparative genomics. By screening short DNA sequences, we demonstrate how homeoboxes and other transcription factor binding motifs are highly conserved based on our parameter-rich models but not according to standard conservation assays. With the increasing availability of genome sequences, rich substitution models become an attractive and practical approach for evolutionary analysis in general and comparative genomics in particular.

  9. Genome-Based Comparative Analyses of Antarctic and Temperate Species of Paenibacillus

    PubMed Central

    Dsouza, Melissa; Taylor, Michael W.; Turner, Susan J.; Aislabie, Jackie

    2014-01-01

    Antarctic soils represent a unique environment characterised by extremes of temperature, salinity, elevated UV radiation, low nutrient and low water content. Despite the harshness of this environment, members of 15 bacterial phyla have been identified in soils of the Ross Sea Region (RSR). However, the survival mechanisms and ecological roles of these phyla are largely unknown. The aim of this study was to investigate whether strains of Paenibacillus darwinianus owe their resilience to substantial genomic changes. For this, genome-based comparative analyses were performed on three P. darwinianus strains, isolated from gamma-irradiated RSR soils, together with nine temperate, soil-dwelling Paenibacillus spp. The genome of each strain was sequenced to over 1,000-fold coverage, then assembled into contigs totalling approximately 3 Mbp per genome. Based on the occurrence of essential, single-copy genes, genome completeness was estimated at approximately 88%. Genome analysis revealed between 3,043–3,091 protein-coding sequences (CDSs), primarily associated with two-component systems, sigma factors, transporters, sporulation and genes induced by cold-shock, oxidative and osmotic stresses. These comparative analyses provide an insight into the metabolic potential of P. darwinianus, revealing potential adaptive mechanisms for survival in Antarctic soils. However, a large proportion of these mechanisms were also identified in temperate Paenibacillus spp., suggesting that these mechanisms are beneficial for growth and survival in a range of soil environments. These analyses have also revealed that the P. darwinianus genomes contain significantly fewer CDSs and have a lower paralogous content. Notwithstanding the incompleteness of the assemblies, the large differences in genome sizes, determined by the number of genes in paralogous clusters and the CDS content, are indicative of genome content scaling. Finally, these sequences are a resource for further investigations into

  10. Genome-based comparative analyses of Antarctic and temperate species of Paenibacillus.

    PubMed

    Dsouza, Melissa; Taylor, Michael W; Turner, Susan J; Aislabie, Jackie

    2014-01-01

    Antarctic soils represent a unique environment characterised by extremes of temperature, salinity, elevated UV radiation, low nutrient and low water content. Despite the harshness of this environment, members of 15 bacterial phyla have been identified in soils of the Ross Sea Region (RSR). However, the survival mechanisms and ecological roles of these phyla are largely unknown. The aim of this study was to investigate whether strains of Paenibacillus darwinianus owe their resilience to substantial genomic changes. For this, genome-based comparative analyses were performed on three P. darwinianus strains, isolated from gamma-irradiated RSR soils, together with nine temperate, soil-dwelling Paenibacillus spp. The genome of each strain was sequenced to over 1,000-fold coverage, then assembled into contigs totalling approximately 3 Mbp per genome. Based on the occurrence of essential, single-copy genes, genome completeness was estimated at approximately 88%. Genome analysis revealed between 3,043-3,091 protein-coding sequences (CDSs), primarily associated with two-component systems, sigma factors, transporters, sporulation and genes induced by cold-shock, oxidative and osmotic stresses. These comparative analyses provide an insight into the metabolic potential of P. darwinianus, revealing potential adaptive mechanisms for survival in Antarctic soils. However, a large proportion of these mechanisms were also identified in temperate Paenibacillus spp., suggesting that these mechanisms are beneficial for growth and survival in a range of soil environments. These analyses have also revealed that the P. darwinianus genomes contain significantly fewer CDSs and have a lower paralogous content. Notwithstanding the incompleteness of the assemblies, the large differences in genome sizes, determined by the number of genes in paralogous clusters and the CDS content, are indicative of genome content scaling. Finally, these sequences are a resource for further investigations into

  11. Statistical methods for detecting genomic alterations through array-based comparative genomic hybridization (CGH).

    PubMed

    Wang, Yuedong; Guo, Sun-Wei

    2004-01-01

    Array-based comparative genomic hybridization (ABCGH) is an emerging high-resolution and high-throughput molecular genetic technique that allows genome-wide screening for chromosome alterations associated with tumorigenesis. Like the cDNA microarrays, ABCGH uses two differentially labeled test and reference DNAs which are cohybridized to cloned genomic fragments immobilized on glass slides. The hybridized DNAs are then detected in two different fluorochromes, and the significant deviation from unity in the ratios of the digitized intensity values is indicative of copy-number differences between the test and reference genomes. Proper statistical analyses need to account for many sources of variation besides genuine differences between the two genomes. In particular, spatial correlations, the variable nature of the ratio variance and non-Normal distribution call for careful statistical modeling. We propose two new statistics, the standard t-statistic and its modification with variances smoothed along the genome, and two tests for each statistic, the standard t-test and a test based on the hybrid adaptive spline (HAS). Simulations indicate that the smoothed t-statistic always improves the performance over the standard t-statistic. The t-tests are more powerful in detecting isolated alterations while those based on HAS are more powerful in detecting a cluster of alterations. We apply the proposed methods to the identification of genomic alterations in endometrium in women with endometriosis.

  12. What can whole genome expression data tell us about the ecology and evolution of personality?

    PubMed

    Bell, Alison M; Aubin-Horth, Nadia

    2010-12-27

    Consistent individual differences in behaviour, aka personality, pose several evolutionary questions. For example, it is difficult to explain within-individual consistency in behaviour because behavioural plasticity is often advantageous. In addition, selection erodes heritable behavioural variation that is related to fitness, therefore we wish to know the mechanisms that can maintain between-individual variation in behaviour. In this paper, we argue that whole genome expression data can reveal new insights into the proximate mechanisms underlying personality, as well as its evolutionary consequences. After introducing the basics of whole genome expression analysis, we show how whole genome expression data can be used to understand whether behaviours in different contexts are affected by the same molecular mechanisms. We suggest strategies for using the power of genomics to understand what maintains behavioural variation, to study the evolution of behavioural correlations and to compare personality traits across diverse organisms.

  13. What can whole genome expression data tell us about the ecology and evolution of personality?

    PubMed Central

    Bell, Alison M.; Aubin-Horth, Nadia

    2010-01-01

    Consistent individual differences in behaviour, aka personality, pose several evolutionary questions. For example, it is difficult to explain within-individual consistency in behaviour because behavioural plasticity is often advantageous. In addition, selection erodes heritable behavioural variation that is related to fitness, therefore we wish to know the mechanisms that can maintain between-individual variation in behaviour. In this paper, we argue that whole genome expression data can reveal new insights into the proximate mechanisms underlying personality, as well as its evolutionary consequences. After introducing the basics of whole genome expression analysis, we show how whole genome expression data can be used to understand whether behaviours in different contexts are affected by the same molecular mechanisms. We suggest strategies for using the power of genomics to understand what maintains behavioural variation, to study the evolution of behavioural correlations and to compare personality traits across diverse organisms. PMID:21078652

  14. fPoxDB: fungal peroxidase database for comparative genomics.

    PubMed

    Choi, Jaeyoung; Détry, Nicolas; Kim, Ki-Tae; Asiegbu, Fred O; Valkonen, Jari P T; Lee, Yong-Hwan

    2014-05-08

    analysis toolkits with easy-to-follow web interface offer a useful workbench to study comparative and evolutionary genomics of peroxidases in fungi.

  15. African relapsing Fever borreliae genomospecies revealed by comparative genomics.

    PubMed

    Elbir, Haitham; Abi-Rached, Laurent; Pontarotti, Pierre; Yoosuf, Niyaz; Drancourt, Michel

    2014-01-01

    Relapsing fever borreliae are vector-borne bacteria responsible for febrile infection in humans in North America, Africa, Asia, and in the Iberian Peninsula in Europe. Relapsing fever borreliae are phylogenetically closely related, yet they differ in pathogenicity and vectors. Their long-term taxonomy, based on geography and vector grouping, needs to be re-apprised in a genomic context. We therefore embarked into genomic analyses of relapsing fever borreliae, focusing on species found in Africa. Genome-wide phylogenetic analyses group Old World Borrelia crocidurae, Borrelia hispanica, B. duttonii, and B. recurrentis in one clade, and New World Borrelia turicatae and Borrelia hermsii in a second clade. Accordingly, average nucleotide identity is 99% among B. duttonii, B. recurrentis, and B. crocidurae and 96% between latter borreliae and B. hispanica while the similarity is 86% between Old World and New World borreliae. Comparative genomics indicates that the Old World relapsing fever B. duttonii, B. recurrentis, B. crocidurae, and B. hispanica have a 2,514-gene pan genome and a 933-gene core genome that includes 788 chromosomal and 145 plasmidic genes. Analyzing the role that natural selection has played in the evolution of Old World borreliae species revealed that 55 loci were under positive diversifying selection, including loci coding for membrane, flagellar, and chemotaxis proteins, three categories associated with adaption to specific niches. Genomic analyses led to a reappraisal of the taxonomy of relapsing fever borreliae in Africa. These analyses suggest that B. crocidurae, B. duttonii, and B. recurrentis are ecotypes of a unique genomospecies, while B. hispanica is a distinct species.

  16. African Relapsing Fever Borreliae Genomospecies Revealed by Comparative Genomics

    PubMed Central

    Elbir, Haitham; Abi-Rached, Laurent; Pontarotti, Pierre; Yoosuf, Niyaz; Drancourt, Michel

    2014-01-01

    Background: Relapsing fever borreliae are vector-borne bacteria responsible for febrile infection in humans in North America, Africa, Asia, and in the Iberian Peninsula in Europe. Relapsing fever borreliae are phylogenetically closely related, yet they differ in pathogenicity and vectors. Their long-term taxonomy, based on geography and vector grouping, needs to be re-apprised in a genomic context. We therefore embarked into genomic analyses of relapsing fever borreliae, focusing on species found in Africa. Results: Genome-wide phylogenetic analyses group Old World Borrelia crocidurae, Borrelia hispanica, B. duttonii, and B. recurrentis in one clade, and New World Borrelia turicatae and Borrelia hermsii in a second clade. Accordingly, average nucleotide identity is 99% among B. duttonii, B. recurrentis, and B. crocidurae and 96% between latter borreliae and B. hispanica while the similarity is 86% between Old World and New World borreliae. Comparative genomics indicates that the Old World relapsing fever B. duttonii, B. recurrentis, B. crocidurae, and B. hispanica have a 2,514-gene pan genome and a 933-gene core genome that includes 788 chromosomal and 145 plasmidic genes. Analyzing the role that natural selection has played in the evolution of Old World borreliae species revealed that 55 loci were under positive diversifying selection, including loci coding for membrane, flagellar, and chemotaxis proteins, three categories associated with adaption to specific niches. Conclusion: Genomic analyses led to a reappraisal of the taxonomy of relapsing fever borreliae in Africa. These analyses suggest that B. crocidurae, B. duttonii, and B. recurrentis are ecotypes of a unique genomospecies, while B. hispanica is a distinct species. PMID:25229054

  17. Bidirectional promoters of insects: genome-wide comparison, evolutionary implication and influence on gene expression.

    PubMed

    Behura, Susanta K; Severson, David W

    2015-01-30

    Bidirectional promoters are widespread in insect genomes. By analyzing 23 insect genomes we show that the frequency of bidirectional gene pairs varies according to genome compactness and density of genes among the species. The density of bidirectional genes expected based on number of genes per megabase of genome explains the observed density suggesting that bidirectional pairing of genes may be due to random event. We identified specific transcription factor binding motifs that are enriched in bidirectional promoters across insect species. Furthermore, we observed that bidirectional promoters may act as transcriptional hotspots in insect genomes where protein coding genes tend to aggregate in significantly biased (p < 0.001) manner compared to unidirectional promoters. Natural selection seems to have an association with the extent of bidirectionality of genes among the species. The rate of non-synonymous-to-synonymous changes (dN/dS) shows a second-order polynomial distribution with bidirectionality between species indicating that bidirectionality is dependent upon evolutionary pressure acting on the genomes. Analysis of genome-wide microarray expression data of multiple insect species suggested that bidirectionality has a similar association with transcriptome variation across species. Furthermore, bidirectional promoters show significant association with correlated expression of the divergent gene pairs depending upon their motif composition. Analysis of gene ontology showed that bidirectional genes tend to have a common association with functions related to "binding" (including ion binding, nucleotide binding and protein binding) across genomes. Such functional constraint of bidirectional genes may explain their widespread persistence in genome of diverse insect species.

  18. Comparative Analysis of Whole-Genome Gene Expression Changes in Cultured Human Embryonic Stem Cells in Response to Low, Clinical Diagnostic Relevant, and High Doses of Ionizing Radiation Exposure.

    PubMed

    Sokolov, Mykyta; Nguyen, Van; Neumann, Ronald

    2015-06-30

    The biological effects of low-dose ionizing radiation (LDIR) exposure in humans are not comprehensively understood, generating a high degree of controversy in published literature. The earliest stages of human development are known to be among the most sensitive to stress exposures, especially genotoxic stresses. However, the risks stemming from exposure to LDIR, particularly within the clinical diagnostic relevant dose range, have not been directly evaluated in human embryonic stem cells (hESCs). Here, we describe the dynamics of the whole genome transcriptional responses of different hESC lines to both LDIR and, as a reference, high-dose IR (HDIR). We found that even doses as low as 0.05 Gy could trigger statistically significant transient changes in a rather limited subset of genes in all hESCs lines examined. Gene expression signatures of hESCs exposed to IR appear to be highly dose-, time-, and cell line-dependent. We identified 50 genes constituting consensus gene expression signature as an early response to HDIR across all lines of hESC examined. We observed substantial differences in biological pathways affected by either LDIR or HDIR in hESCs, suggesting that the molecular mechanisms underpinning the responses of hESC may fundamentally differ depending on radiation doses.

  19. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    SciTech Connect

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  20. A web server for mining Comparative Genomic Hybridization (CGH) data

    NASA Astrophysics Data System (ADS)

    Liu, Jun; Ranka, Sanjay; Kahveci, Tamer

    2007-11-01

    Advances in cytogenetics and molecular biology has established that chromosomal alterations are critical in the pathogenesis of human cancer. Recurrent chromosomal alterations provide cytological and molecular markers for the diagnosis and prognosis of disease. They also facilitate the identification of genes that are important in carcinogenesis, which in the future may help in the development of targeted therapy. A large amount of publicly available cancer genetic data is now available and it is growing. There is a need for public domain tools that allow users to analyze their data and visualize the results. This chapter describes a web based software tool that will allow researchers to analyze and visualize Comparative Genomic Hybridization (CGH) datasets. It employs novel data mining methodologies for clustering and classification of CGH datasets as well as algorithms for identifying important markers (small set of genomic intervals with aberrations) that are potentially cancer signatures. The developed software will help in understanding the relationships between genomic aberrations and cancer types.

  1. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  2. Comparative genomics of Neisseria meningitidis: core genome, islands of horizontal transfer and pathogen-specific genes.

    PubMed

    Dunning Hotopp, Julie C; Grifantini, Renata; Kumar, Nikhil; Tzeng, Yih Ling; Fouts, Derrick; Frigimelica, Elisabetta; Draghi, Monia; Giuliani, Marzia Monica; Rappuoli, Rino; Stephens, David S; Grandi, Guido; Tettelin, Hervé

    2006-12-01

    To better understand Neisseria meningitidis genomes and virulence, microarray comparative genome hybridization (mCGH) data were collected from one Neisseria cinerea, two Neisseria lactamica, two Neisseria gonorrhoeae and 48 Neisseria meningitidis isolates. For N. meningitidis, these isolates are from diverse clonal complexes, invasive and carriage strains, and all major serogroups. The microarray platform represented N. meningitidis strains MC58, Z2491 and FAM18, and N. gonorrhoeae FA1090. By comparing hybridization data to genome sequences, the core N. meningitidis genome and insertions/deletions (e.g. capsule locus, type I secretion system) related to pathogenicity were identified, including further characterization of the capsule locus, bioinformatics analysis of a type I secretion system, and identification of some metabolic pathways associated with intracellular survival in pathogens. Hybridization data clustered meningococcal isolates from similar clonal complexes that were distinguished by the differential presence of six distinct islands of horizontal transfer. Several of these islands contained prophage or other mobile elements, including a novel prophage and a transposon carrying portions of a type I secretion system. Acquisition of some genetic islands appears to have occurred in multiple lineages, including transfer between N. lactamica and N. meningitidis. However, island acquisition occurs infrequently, such that the genomic-level relationship is not obscured within clonal complexes. The N. meningitidis genome is characterized by the horizontal acquisition of multiple genetic islands; the study of these islands reveals important sets of genes varying between isolates and likely to be related to pathogenicity.

  3. Allelic genome structural variations in maize detected by array comparative genome hybridization.

    PubMed

    Beló, André; Beatty, Mary K; Hondred, David; Fengler, Kevin A; Li, Bailin; Rafalski, Antoni

    2010-01-01

    DNA polymorphisms such as insertion/deletions and duplications affecting genome segments larger than 1 kb are known as copy-number variations (CNVs) or structural variations (SVs). They have been recently studied in animals and humans by using array-comparative genome hybridization (aCGH), and have been associated with several human diseases. Their presence and phenotypic effects in plants have not been investigated on a genomic scale, although individual structural variations affecting traits have been described. We used aCGH to investigate the presence of CNVs in maize by comparing the genome of 13 maize inbred lines to B73. Analysis of hybridization signal ratios of 60,472 60-mer oligonucleotide probes between inbreds in relation to their location in the reference genome (B73) allowed us to identify clusters of probes that deviated from the ratio expected for equal copy-numbers. We found CNVs distributed along the maize genome in all chromosome arms. They occur with appreciable frequency in different germplasm subgroups, suggesting ancient origin. Validation of several CNV regions showed both insertion/deletions and copy-number differences. The nature of CNVs detected suggests CNVs might have a considerable impact on plant phenotypes, including disease response and heterosis.

  4. Comparative analysis of methods for genome-wide nucleosome cartography.

    PubMed

    Quintales, Luis; Vázquez, Enrique; Antequera, Francisco

    2015-07-01

    Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.

  5. Comparative Genomics and the Gene Complement of a Minimal Cell

    NASA Astrophysics Data System (ADS)

    Islas, Sara; Becerra, Arturo; Luisi, P. Luigi; Lazcano, Antonio

    2004-02-01

    The concept of a minimal cell is discussed from the viewpoint of comparative genomics. Analysis of published DNA content values determined for 641 different archaeal and bacterial species by pulsed field gel electrophoresis has lead to a more precise definition of the genome size ranges of free-living and host-associated organisms. DNA content is not an indicator of phylogenetic position. However, the smallest genomes in our sample do not have a random distribution in rRNA-based evolutionary trees, and are found mostly in (a) the basal branches of the tree where thermophiles are located; and (b) in late clades, such as those of Gram positive bacteria. While the smallest-known genome size for an endosymbiont is only 450 kb, no free-living prokaryote has been described to have genomes <1450 kb. Estimates of the size of minimal gene complement can provide important insights in the primary biological functions required for a sustainable, reproducing cell nowadays and throughout evolutionary times, but definitions of the minimum cell is dependent on specific environments.

  6. Sequencing and comparative genome analysis of two pathogenic Streptococcus gallolyticus subspecies: genome plasticity, adaptation and virulence.

    PubMed

    Lin, I-Hsuan; Liu, Tze-Tze; Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops.

  7. Sequencing and Comparative Genome Analysis of Two Pathogenic Streptococcus gallolyticus Subspecies: Genome Plasticity, Adaptation and Virulence

    PubMed Central

    Teng, Yu-Ting; Wu, Hui-Lun; Liu, Yen-Ming; Wu, Keh-Ming; Chang, Chuan-Hsiung; Hsu, Ming-Ta

    2011-01-01

    Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I) and S. pasteurianus ATCC 43144 (biotype II.2). The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92%) and 1607 (86%) of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS) and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops. PMID:21633709

  8. Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

    PubMed Central

    Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

    2012-01-01

    Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of

  9. Lactobacillus paracasei Comparative Genomics: Towards Species Pan-Genome Definition and Exploitation of Diversity

    PubMed Central

    Smokvina, Tamara; Wels, Michiel; Polka, Justyna; Chervaux, Christian; Brisse, Sylvain; Boekhorst, Jos; Vlieg, Johan E. T. van Hylckama; Siezen, Roland J.

    2013-01-01

    Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its “pan-genome”. We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800–3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25–53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to

  10. Comparative genomics reveals diversity among xanthomonads infecting tomato and pepper

    PubMed Central

    2011-01-01

    Background Bacterial spot of tomato and pepper is caused by four Xanthomonas species and is a major plant disease in warm humid climates. The four species are distinct from each other based on physiological and molecular characteristics. The genome sequence of strain 85-10, a member of one of the species, Xanthomonas euvesicatoria (Xcv) has been previously reported. To determine the relationship of the four species at the genome level and to investigate the molecular basis of their virulence and differing host ranges, draft genomic sequences of members of the other three species were determined and compared to strain 85-10. Results We sequenced the genomes of X. vesicatoria (Xv) strain 1111 (ATCC 35937), X. perforans (Xp) strain 91-118 and X. gardneri (Xg) strain 101 (ATCC 19865). The genomes were compared with each other and with the previously sequenced Xcv strain 85-10. In addition, the molecular features were predicted that may be required for pathogenicity including the type III secretion apparatus, type III effectors, other secretion systems, quorum sensing systems, adhesins, extracellular polysaccharide, and lipopolysaccharide determinants. Several novel type III effectors from Xg strain 101 and Xv strain 1111 genomes were computationally identified and their translocation was validated using a reporter gene assay. A homolog to Ax21, the elicitor of XA21-mediated resistance in rice, and a functional Ax21 sulfation system were identified in Xcv. Genes encoding proteins with functions mediated by type II and type IV secretion systems have also been compared, including enzymes involved in cell wall deconstruction, as contributors to pathogenicity. Conclusions Comparative genomic analyses revealed considerable diversity among bacterial spot pathogens, providing new insights into differences and similarities that may explain the diverse nature of these strains. Genes specific to pepper pathogens, such as the O-antigen of the lipopolysaccharide cluster, and genes

  11. Inference of self-regulated transcriptional networks by comparative genomics.

    PubMed

    Cornish, Joseph P; Matthews, Fialelei; Thomas, Julien R; Erill, Ivan

    2012-01-01

    The assumption of basic properties, like self-regulation, in simple transcriptional regulatory networks can be exploited to infer regulatory motifs from the growing amounts of genomic and meta-genomic data. These motifs can in principle be used to elucidate the nature and scope of transcriptional networks through comparative genomics. Here we assess the feasibility of this approach using the SOS regulatory network of Gram-positive bacteria as a test case. Using experimentally validated data, we show that the known regulatory motif can be inferred through the assumption of self-regulation. Furthermore, the inferred motif provides a more robust search pattern for comparative genomics than the experimental motifs defined in reference organisms. We take advantage of this robustness to generate a functional map of the SOS response in Gram-positive bacteria. Our results reveal definite differences in the composition of the LexA regulon between Firmicutes and Actinobacteria, and confirm that regulation of cell-division inhibition is a widespread characteristic of this network among Gram-positive bacteria.

  12. Comparative genomics of transcriptional regulation of methionine metabolism in proteobacteria

    DOE PAGES

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; ...

    2014-11-20

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ~200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific andmore » genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.« less

  13. Comparative genomics of transcriptional regulation of methionine metabolism in proteobacteria

    SciTech Connect

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; Sherstneva, Sofia S.; Novichkov, Pavel S.; Gelfand, Mikhail S.; Rodionov, Dmitry A.; Kuipers, Oscar P.

    2014-11-20

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ~200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria.

  14. Comparative Genomics of Transcriptional Regulation of Methionine Metabolism in Proteobacteria

    PubMed Central

    Leyn, Semen A.; Suvorova, Inna A.; Kholina, Tatiana D.; Sherstneva, Sofia S.; Novichkov, Pavel S.; Gelfand, Mikhail S.; Rodionov, Dmitry A.

    2014-01-01

    Methionine metabolism and uptake genes in Proteobacteria are controlled by a variety of RNA and DNA regulatory systems. We have applied comparative genomics to reconstruct regulons for three known transcription factors, MetJ, MetR, and SahR, and three known riboswitch motifs, SAH, SAM-SAH, and SAM_alpha, in ∼200 genomes from 22 taxonomic groups of Proteobacteria. We also identified two novel regulons: a SahR-like transcription factor SamR controlling various methionine biosynthesis genes in the Xanthomonadales group, and a potential RNA regulatory element with terminator-antiterminator mechanism controlling the metX or metZ genes in beta-proteobacteria. For each analyzed regulator we identified the core, taxon-specific and genome-specific regulon members. By analyzing the distribution of these regulators in bacterial genomes and by comparing their regulon contents we elucidated possible evolutionary scenarios for the regulation of the methionine metabolism genes in Proteobacteria. PMID:25411846

  15. Comparative genomics and transcriptomics of trait-gene association

    PubMed Central

    2012-01-01

    Background The Order Rickettsiales includes important tick-borne pathogens, from Rickettsia rickettsii, which causes Rocky Mountain spotted fever, to Anaplasma marginale, the most prevalent vector-borne pathogen of cattle. Although most pathogens in this Order are transmitted by arthropod vectors, little is known about the microbial determinants of transmission. A. marginale provides unique tools for studying the determinants of transmission, with multiple strain sequences available that display distinct and reproducible transmission phenotypes. The closed core A. marginale genome suggests that any phenotypic differences are due to single nucleotide polymorphisms (SNPs). We combined DNA/RNA comparative genomic approaches using strains with different tick transmission phenotypes and identified genes that segregate with transmissibility. Results Comparison of seven strains with different transmission phenotypes generated a list of SNPs affecting 18 genes and nine promoters. Transcriptional analysis found two candidate genes downstream from promoter SNPs that were differentially transcribed. To corroborate the comparative genomics approach we used three RNA-seq platforms to analyze the transcriptomes from two A. marginale strains with different transmission phenotypes. RNA-seq analysis confirmed the comparative genomics data and found 10 additional genes whose transcription between strains with distinct transmission efficiencies was significantly different. Six regions of the genome that contained no annotation were found to be transcriptionally active, and two of these newly identified transcripts were differentially transcribed. Conclusions This approach identified 30 genes and two novel transcripts potentially involved in tick transmission. We describe the transcriptome of an obligate intracellular bacterium in depth, while employing massive parallel sequencing to dissect an important trait in bacterial pathogenesis. PMID:23181781

  16. Substantial genome synteny preservation among woody angiosperm species: comparative genomics of Chinese chestnut (Castanea mollissima) and plant reference genomes.

    PubMed

    Staton, Margaret; Zhebentyayeva, Tetyana; Olukolu, Bode; Fang, Guang Chen; Nelson, Dana; Carlson, John E; Abbott, Albert G

    2015-10-05

    Chinese chestnut (Castanea mollissima) has emerged as a model species for the Fagaceae family with extensive genomic resources including a physical map, a dense genetic map and quantitative trait loci (QTLs) for chestnut blight resistance. These resources enable comparative genomics analyses relative to model plants. We assessed the degree of conservation between the chestnut genome and other well annotated and assembled plant genomic sequences, focusing on the QTL regions of most interest to the chestnut breeding community. The integrated physical and genetic map of Chinese chestnut has been improved to now include 858 shared sequence-based markers. The utility of the integrated map has also been improved through the addition of 42,970 BAC (bacterial artificial chromosome) end sequences spanning over 26 million bases of the estimated 800 Mb chestnut genome. Synteny between chestnut and ten model plant species was conducted on a macro-syntenic scale using sequences from both individual probes and BAC end sequences across the chestnut physical map. Blocks of synteny with chestnut were found in all ten reference species, with the percent of the chestnut physical map that could be aligned ranging from 10 to 39 %. The integrated genetic and physical map was utilized to identify BACs that spanned the three previously identified QTL regions conferring blight resistance. The clones were pooled and sequenced, yielding 396 sequence scaffolds covering 13.9 Mbp. Comparative genomic analysis on a microsytenic scale, using the QTL-associated genomic sequence, identified synteny from chestnut to other plant genomes ranging from 5.4 to 12.9 % of the genome sequences aligning. On both the macro- and micro-synteny levels, the peach, grape and poplar genomes were found to be the most structurally conserved with chestnut. Interestingly, these results did not strictly follow the expectation that decreased phylogenetic distance would correspond to increased levels of genome

  17. Comparing genomes with duplications: a computational complexity point of view.

    PubMed

    Blin, Guillaume; Chauve, Cedric; Fertin, Guillaume; Rizzi, Romeo; Vialette, Stéphane

    2007-01-01

    In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G_1 and G_2: first, one establishes a oneto- one correspondence between genes of G_1 and genes of G_2 ; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy are deleted in both genomes for each gene family, and the matching model, that computes a maximal correspondence for each gene family. We show that for these two models, and for three (dis)similarity measures on permutations, namely the number of common intervals, the maximum adjacency disruption (MAD) number and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete, and even APXhard for the MAD number and SAD number.

  18. Comparative Physical Mapping Between Oryza sativa (AA Genome Type) and O. punctata (BB Genome Type)

    PubMed Central

    Kim, HyeRan; Miguel, Phillip San; Nelson, William; Collura, Kristi; Wissotski, Marina; Walling, Jason G.; Kim, Jun Pyo; Jackson, Scott A.; Soderlund, Carol; Wing, Rod A.

    2007-01-01

    A comparative physical map of the AA genome (Oryza sativa) and the BB genome (O. punctata) was constructed by aligning a physical map of O. punctata, deduced from 63,942 BAC end sequences (BESs) and 34,224 fingerprints, onto the O. sativa genome sequence. The level of conservation of each chromosome between the two species was determined by calculating a ratio of BES alignments. The alignment result suggests more divergence of intergenic and repeat regions in comparison to gene-rich regions. Further, this characteristic enabled localization of heterochromatic and euchromatic regions for each chromosome of both species. The alignment identified 16 locations containing expansions, contractions, inversions, and transpositions. By aligning 40% of the punctata BES on the map, 87% of the punctata FPC map covered 98% of the O. sativa genome sequence. The genome size of O. punctata was estimated to be 8% larger than that of O. sativa with individual chromosome differences of 1.5–16.5%. The sum of expansions and contractions observed in regions >500 kb were similar, suggesting that most of the contractions/expansions contributing to the genome size difference between the two species are small, thus preserving the macro-collinearity between these species, which diverged ∼2 million years ago. PMID:17339227

  19. The Whole Genome Assembly and Comparative Genomic Research of Thellungiella parvula (Extremophile Crucifer) Mitochondrion

    PubMed Central

    Wang, Xuelin; Bi, Changwei; Xu, Yiqing; Wei, Suyun; Dai, Xiaogang; Yin, Tongming; Ye, Ning

    2016-01-01

    The complete nucleotide sequences of the mitochondrial (mt) genome of an extremophile species Thellungiella parvula (T. parvula) have been determined with the lengths of 255,773 bp. T. parvula mt genome is a circular sequence and contains 32 protein-coding genes, 19 tRNA genes, and three ribosomal RNA genes with a 11.5% coding sequence. The base composition of 27.5% A, 27.5% T, 22.7% C, and 22.3% G in descending order shows a slight bias of 55% AT. Fifty-three repeats were identified in the mitochondrial genome of T. parvula, including 24 direct repeats, 28 tandem repeats (TRs), and one palindromic repeat. Furthermore, a total of 199 perfect microsatellites have been mined with a high A/T content (83.1%) through simple sequence repeat (SSR) analysis and they were distributed unevenly within this mitochondrial genome. We also analyzed other plant mitochondrial genomes' evolution in general, providing clues for the understanding of the evolution of organelles genomes in plants. Comparing with other Brassicaceae species, T. parvula is related to Arabidopsis thaliana whose characters of low temperature resistance have been well documented. This study will provide important genetic tools for other Brassicaceae species research and improve yields of economically important plants. PMID:27148547

  20. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  1. Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome

    PubMed Central

    Bergman, Casey M; Pfeiffer, Barret D; Rincón-Limas, Diego E; Hoskins, Roger A; Gnirke, Andreas; Mungall, Chris J; Wang, Adrienne M; Kronmiller, Brent; Pacleb, Joanne; Park, Soo; Stapleton, Mark; Wan, Kenneth; George, Reed A; de Jong, Pieter J; Botas, Juan; Rubin, Gerald M; Celniker, Susan E

    2002-01-01

    Background It is widely accepted that comparative sequence data can aid the functional annotation of genome sequences; however, the most informative species and features of genome evolution for comparison remain to be determined. Results We analyzed conservation in eight genomic regions (apterous, even-skipped, fushi tarazu, twist, and Rhodopsins 1, 2, 3 and 4) from four Drosophila species (D. erecta, D. pseudoobscura, D. willistoni, and D. littoralis) covering more than 500 kb of the D. melanogaster genome. All D. melanogaster genes (and 78-82% of coding exons) identified in divergent species such as D. pseudoobscura show evidence of functional constraint. Addition of a third species can reveal functional constraint in otherwise non-significant pairwise exon comparisons. Microsynteny is largely conserved, with rearrangement breakpoints, novel transposable element insertions, and gene transpositions occurring in similar numbers. Rates of amino-acid substitution are higher in uncharacterized genes relative to genes that have previously been studied. Conserved non-coding sequences (CNCSs) tend to be spatially clustered with conserved spacing between CNCSs, and clusters of CNCSs can be used to predict enhancer sequences. Conclusions Our results provide the basis for choosing species whose genome sequences would be most useful in aiding the functional annotation of coding and cis-regulatory sequences in Drosophila. Furthermore, this work shows how decoding the spatial organization of conserved sequences, such as the clustering of CNCSs, can complement efforts to annotate eukaryotic genomes on the basis of sequence conservation alone. PMID:12537575

  2. A Web-Based Comparative Genomics Tutorial for Investigating Microbial Genomes

    PubMed Central

    STRONG, MICHAEL; CASCIO, DUILIO; EISENBERG, DAVID

    2004-01-01

    As the number of completely sequenced microbial genomes continues to rise at an impressive rate, it is important to prepare students with the skills necessary to investigate microorganisms at the genomic level. As a part of the core curriculum for first-year graduate students in the biological sciences, we have implemented a web-based tutorial to introduce students to the fields of comparative and functional genomics. The tutorial focuses on recent computational methods for identifying functionally linked genes and proteins on a genome-wide scale and was used to introduce students to the Rosetta Stone, Phylogenetic Profile, conserved Gene Neighbor, and Operon computational methods. Students learned to use a number of publicly available web servers and databases to identify functionally linked genes in the Escherichia coli genome, with emphasis on genome organization and operon structure. The overall effectiveness of the tutorial was assessed based on student evaluations and homework assignments. The tutorial is available to other educators at http://www.doe-mbi.ucla.edu/~strong/m253.php. PMID:23653555

  3. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes

    PubMed Central

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M.

    2016-01-01

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea. PMID:27756915

  4. Detection of genomic imbalances by array based comparative genomic hybridisation in fetuses with multiple malformations

    PubMed Central

    Le Caignec, C; Boceno, M; Saugier-Veber, P; Jacquemont, S; Joubert, M; David, A; Frebourg, T; Rival, J

    2005-01-01

    Background: Malformations are a major cause of morbidity and mortality in full term infants and genomic imbalances are a significant component of their aetiology. However, the causes of defects in many patients with multiple congenital malformations remain unexplained despite thorough clinical examination and laboratory investigations. Methods: We used a commercially available array based comparative genomic hybridisation method (array CGH), able to screen all subtelomeric regions, main microdeletion syndromes, and 201 other regions covering the genome, to detect submicroscopic chromosomal imbalances in 49 fetuses with three or more significant anomalies and normal karyotype. Results: Array CGH identified eight genomic rearrangements (16.3%), all confirmed by quantitative multiplex PCR of short fluorescent fragments. Subtelomeric and interstitial deletions, submicroscopic duplications, and a complex genomic imbalance were identified. In four de novo cases (15qtel deletion, 16q23.1–q23.3 deletion, 22q11.2 deletion, and mosaicism for a rearranged chromosome 18), the genomic imbalance identified clearly underlay the pathological phenotype. In one case, the relationship between the genotype and phenotype was unclear, since a subtelomeric 6q deletion was detected in a mother and her two fetuses bearing multiple malformations. In three cases, a subtelomeric 10q duplication, probably a genomic polymorphism, was identified. Conclusions: The detection of 5/49 causative chromosomal imbalances (or 4/49 if the 6qtel deletion is not considered as causative) suggests wide genome screening when standard chromosome analysis is normal and confirms that array CGH will have a major impact on pre and postnatal diagnosis as well as providing information for more accurate genetic counselling. PMID:15689449

  5. Genomic characteristics and comparative genomics analysis of Penicillium chrysogenum KF-25

    PubMed Central

    2014-01-01

    Background Penicillium chrysogenum has been used in producing penicillin and derived β-lactam antibiotics for many years. Although the genome of the mutant strain P. chrysogenum Wisconsin 54-1255 has already been sequenced, the versatility and genetic diversity of this species still needs to be intensively studied. In this study, the genome of the wild-type P. chrysogenum strain KF-25, which has high activity against Ustilaginoidea virens, was sequenced and characterized. Results The genome of KF-25 was about 29.9 Mb in size and contained 9,804 putative open reading frames (orfs). Thirteen genes were predicted to encode two-component system proteins, of which six were putatively involved in osmolarity adaption. There were 33 putative secondary metabolism pathways and numerous genes that were essential in metabolite biosynthesis. Several P. chrysogenum virus untranslated region sequences were found in the KF-25 genome, suggesting that there might be a relationship between the virus and P. chrysogenum in evolution. Comparative genome analysis showed that the genomes of KF-25 and Wisconsin 54-1255 were highly similar, except that KF-25 was 2.3 Mb smaller. Three hundred and fifty-five KF-25 specific genes were found and the biological functions of the proteins encoded by these genes were mainly unknown (232, representing 65%), except for some orfs encoding proteins with predicted functions in transport, metabolism, and signal transduction. Numerous KF-25-specific genes were found to be associated with the pathogenicity and virulence of the strains, which were identical to those of wild-type P. chrysogenum NRRL 1951. Conclusion Genome sequencing and comparative analysis are helpful in further understanding the biology, evolution, and environment adaption of P. chrysogenum, and provide a new tool for identifying further functional metabolites. PMID:24555742

  6. Genomic characteristics and comparative genomics analysis of Penicillium chrysogenum KF-25.

    PubMed

    Peng, Qin; Yuan, Yihui; Gao, Meiying; Chen, Xupeng; Liu, Biao; Liu, Pengming; Wu, Yan; Wu, Dandan

    2014-02-21

    Penicillium chrysogenum has been used in producing penicillin and derived β-lactam antibiotics for many years. Although the genome of the mutant strain P. chrysogenum Wisconsin 54-1255 has already been sequenced, the versatility and genetic diversity of this species still needs to be intensively studied. In this study, the genome of the wild-type P. chrysogenum strain KF-25, which has high activity against Ustilaginoidea virens, was sequenced and characterized. The genome of KF-25 was about 29.9 Mb in size and contained 9,804 putative open reading frames (orfs). Thirteen genes were predicted to encode two-component system proteins, of which six were putatively involved in osmolarity adaption. There were 33 putative secondary metabolism pathways and numerous genes that were essential in metabolite biosynthesis. Several P. chrysogenum virus untranslated region sequences were found in the KF-25 genome, suggesting that there might be a relationship between the virus and P. chrysogenum in evolution. Comparative genome analysis showed that the genomes of KF-25 and Wisconsin 54-1255 were highly similar, except that KF-25 was 2.3 Mb smaller. Three hundred and fifty-five KF-25 specific genes were found and the biological functions of the proteins encoded by these genes were mainly unknown (232, representing 65%), except for some orfs encoding proteins with predicted functions in transport, metabolism, and signal transduction. Numerous KF-25-specific genes were found to be associated with the pathogenicity and virulence of the strains, which were identical to those of wild-type P. chrysogenum NRRL 1951. Genome sequencing and comparative analysis are helpful in further understanding the biology, evolution, and environment adaption of P. chrysogenum, and provide a new tool for identifying further functional metabolites.

  7. Genome Sequence and Comparative Genome Analysis of Lactobacillus casei: Insights into Their Niche-Associated Evolution

    PubMed Central

    Cai, Hui; Thompson, Rebecca; Budinich, Mateo F.; Broadbent, Jeff R.

    2009-01-01

    Lactobacillus casei is remarkably adaptable to diverse habitats and widely used in the food industry. To reveal the genomic features that contribute to its broad ecological adaptability and examine the evolution of the species, the genome sequence of L. casei ATCC 334 is analyzed and compared with other sequenced lactobacilli. This analysis reveals that ATCC 334 contains a high number of coding sequences involved in carbohydrate utilization and transcriptional regulation, reflecting its requirement for dealing with diverse environmental conditions. A comparison of the genome sequences of ATCC 334 to L. casei BL23 reveals 12 and 19 genomic islands, respectively. For a broader assessment of the genetic variability within L. casei, gene content of 21 L. casei strains isolated from various habitats (cheeses, n = 7; plant materials, n = 8; and human sources, n = 6) was examined by comparative genome hybridization with an ATCC 334-based microarray. This analysis resulted in identification of 25 hypervariable regions. One of these regions contains an overrepresentation of genes involved in carbohydrate utilization and transcriptional regulation and was thus proposed as a lifestyle adaptation island. Differences in L. casei genome inventory reveal both gene gain and gene decay. Gene gain, via acquisition of genomic islands, likely confers a fitness benefit in specific habitats. Gene decay, that is, loss of unnecessary ancestral traits, is observed in the cheese isolates and likely results in enhanced fitness in the dairy niche. This study gives the first picture of the stable versus variable regions in L. casei and provides valuable insights into evolution, lifestyle adaptation, and metabolic diversity of L. casei. PMID:20333194

  8. Comparative genomics of 9 novel Paenibacillus larvae bacteriophages

    PubMed Central

    Stamereilers, Casey; LeBlanc, Lucy; Yost, Diane; Amy, Penny S.; Tsourkas, Philippos K.

    2016-01-01

    ABSTRACT American Foulbrood Disease, caused by the bacterium Paenibacillus larvae, is one of the most destructive diseases of the honeybee, Apis mellifera. Our group recently published the sequences of 9 new phages with the ability to infect and lyse P. larvae. Here, we characterize the genomes of these P. larvae phages, compare them to each other and to other sequenced P. larvae phages, and putatively identify protein function. The phage genomes are 38–45 kb in size and contain 68–86 genes, most of which appear to be unique to P. larvae phages. We classify P. larvae phages into 2 main clusters and one singleton based on nucleotide sequence identity. Three of the new phages show sequence similarity to other sequenced P. larvae phages, while the remaining 6 do not. We identified functions for roughly half of the P. larvae phage proteins, including structural, assembly, host lysis, DNA replication/metabolism, regulatory, and host-related functions. Structural and assembly proteins are highly conserved among our phages and are located at the start of the genome. DNA replication/metabolism, regulatory, and host-related proteins are located in the middle and end of the genome, and are not conserved, with many of these genes found in some of our phages but not others. All nine phages code for a conserved N-acetylmuramoyl-L-alanine amidase. Comparative analysis showed the phages use the “cohesive ends with 3′ overhang” DNA packaging strategy. This work is the first in-depth study of P. larvae phage genomics, and serves as a marker for future work in this area. PMID:27738559

  9. Gene discovery in the hamster: a comparative genomics approach for gene annotation by sequencing of hamster testis cDNAs

    PubMed Central

    Oduru, Sreedhar; Campbell, Janee L; Karri, SriTulasi; Hendry, William J; Khan, Shafiq A; Williams, Simon C

    2003-01-01

    Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish) genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells. PMID:12783626

  10. G-compass: a web-based comparative genome browser between human and other vertebrate genomes.

    PubMed

    Kawahara, Yoshihiro; Sakate, Ryuichi; Matsuya, Akihiro; Murakami, Katsuhiko; Sato, Yoshiharu; Zhang, Hao; Gojobori, Takashi; Itoh, Takeshi; Imanishi, Tadashi

    2009-12-15

    G-compass is designed for efficient comparative genome analysis between human and other vertebrate genomes. The current version of G-compass allows us to browse two corresponding genomic regions between human and another species in parallel. One-to-one evolutionarily conserved regions (i.e. orthologous regions) between species are highlighted along the genomes. Information such as locations of duplicated regions, copy number variations and mammalian ultra-conserved elements is also provided. These features of G-compass enable us to easily determine patterns of genomic rearrangements and changes in gene orders through evolutionary time. Since G-compass is a satellite database of H-InvDB, which is a comprehensive annotation resource for human genes and transcripts, users can easily refer to manually curated functional annotations and other abundant biological information for each human transcript. G-compass is expected to be a valuable tool for comparing human and model organisms and promoting the exchange of functional information. G-compass is freely available at http://www.h-invitational.jp/g-compass/. t.imanishi@aist.go.jp

  11. Streptococcus thermophilus core genome: comparative genome hybridization study of 47 strains.

    PubMed

    Rasmussen, Thomas Bovbjerg; Danielsen, Morten; Valina, Ondrej; Garrigues, Christel; Johansen, Eric; Pedersen, Martin Bastian

    2008-08-01

    A DNA microarray platform based on 2,200 genes from publicly available sequences was designed for Streptococcus thermophilus. We determined how single-nucleotide polymorphisms in the 65- to 75-mer oligonucleotide probe sequences affect the hybridization signals. The microarrays were then used for comparative genome hybridization (CGH) of 47 dairy S. thermophilus strains. An analysis of the exopolysaccharide genes in each strain confirmed previous findings that this class of genes is indeed highly variable. A phylogenetic tree based on the CGH data showed similar distances for most strains, indicating frequent recombination or gene transfer within S. thermophilus. By comparing genome sizes estimated from the microarrays and pulsed-field gel electrophoresis, the amount of unknown DNA in each strain was estimated. A core genome comprised of 1,271 genes detected in all 47 strains was identified. Likewise, a set of noncore genes detected in only some strains was identified. The concept of an industrial core genome is proposed. This is comprised of the genes in the core genome plus genes that are necessary in an applied industrial context.

  12. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    PubMed Central

    2010-01-01

    Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation. PMID:20929575

  13. Assigning biological functions to rice genes by genome annotation, expression analysis and mutagenesis.

    PubMed

    Jiang, Shu-Ye; Ramachandran, Srinivasan

    2010-12-01

    Rice is the first cereal genome to be completely sequenced. Since the completion of its genome sequencing, considerable progress has been made in multiple areas including the whole genome annotation, gene expression profiling, mutant collection, etc. Here, we summarize the current status of rice genome annotation and review the methodology of assigning biological functions to hundreds of thousands of rice genes as well as discuss the major limitations and the future perspective in rice functional genomics. Available data analysis shows that the rice genome encodes around 32,000 protein-coding genes. Expression analysis revealed at least 31,000 genes with expression evidence from full-length cDNA/EST collection or other transcript profiling. In addition, we have summarized various strategies to generate mutant population including natural, physical, chemical, T-DNA, transposon/retrotransposon or gene silencing based mutagenesis. Currently, more than 1 million of mutants have been generated and 27,551 of them have their flanking sequence tags. To assign biological functions to hundreds of thousands of rice genes, global co-operations are required, various genetic resources should be more easily accessible and diverse data from transcriptomics, proteomics, epigenetics, comparative genomics and bioinformatics should be integrated to better understand the functions of these genes and their regulatory mechanisms.

  14. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    PubMed

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  15. Unlocking Holocentric Chromosomes: New Perspectives from Comparative and Functional Genomics?

    PubMed Central

    Mandrioli, Mauro; Manicardi, Gian Carlo

    2012-01-01

    The presence of chromosomes with diffuse centromeres (holocentric chromosomes) has been reported in several taxa since more than fifty years, but a full understanding of their origin is still lacking. Comparative and functional genomics are nowadays furnishing new data to better understand holocentric chromosome evolution thus opening new perspectives to analyse karyotype rearrangements in species with holocentric chromosomes in particular evidencing unusual common features, such as the uniform GC content and gene distribution along chromosomes. PMID:23372420

  16. MicrobesOnline: an integrated portal for comparative and functional genomics

    SciTech Connect

    Dehal, Paramvir; Joachimiak, Marcin; Price, Morgan; Bates, John; Baumohl, Jason; Chivian, Dylan; Friedland, Greg; Huang, Kathleen; Keller, Keith; Novichkov, Pavel; Dubchak, Inna; Alm, Eric; Arkin, Adam

    2011-07-14

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  17. MicrobesOnline: an integrated portal for comparative and functional genomics

    SciTech Connect

    Dehal, Paramvir S.; Joachimiak, Marcin P.; Price, Morgan N.; Bates, John T.; Baumohl, Jason K.; Chivian, Dylan; Friedland, Greg D.; Huang, Katherine H.; Keller, Keith; Novichkov, Pavel S.; Dubchak, Inna L.; Alm, Eric J.; Arkin, Adam P.

    2009-09-17

    Since 2003, MicrobesOnline (http://www.microbesonline.org) has been providing a community resource for comparative and functional genome analysis. The portal includes over 1000 complete genomes of bacteria, archaea and fungi and thousands of expression microarrays from diverse organisms ranging from model organisms such as Escherichia coli and Saccharomyces cerevisiae to environmental microbes such as Desulfovibrio vulgaris and Shewanella oneidensis. To assist in annotating genes and in reconstructing their evolutionary history, MicrobesOnline includes a comparative genome browser based on phylogenetic trees for every gene family as well as a species tree. To identify co-regulated genes, MicrobesOnline can search for genes based on their expression profile, and provides tools for identifying regulatory motifs and seeing if they are conserved. MicrobesOnline also includes fast phylogenetic profile searches, comparative views of metabolic pathways, operon predictions, a workbench for sequence analysis and integration with RegTransBase and other microbial genome resources. The next update of MicrobesOnline will contain significant new functionality, including comparative analysis of metagenomic sequence data. Programmatic access to the database, along with source code and documentation, is available at http://microbesonline.org/programmers.html.

  18. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    DOE PAGES

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; ...

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The speciesmore » P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but

  19. Comparative genome analysis of Pseudomonas genomes including Populus-associated isolates

    SciTech Connect

    Jun, Se Ran; Wassenaar, Trudy; Nookaew, Intawat; Hauser, Loren John; Wanchai, Visanu; Land, Miriam L.; Timm, Collin M.; Lu, Tse-Yuan S.; Schadt, Christopher Warren; Doktycz, Mitchel John; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches including the rhizosphere and endosphere of many plants influencing phylogenetic diversity and heterogeneity. In this study, comparative genome analysis was performed on over one thousand Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides. Based on average amino acid identity, genomic clusters were identified within the Pseudomonas genus, which showed agreements with clades by NCBI and cliques by IMG. The P. fluorescens group was organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. The species P. aeruginosa showed clear distinction in their genomic relatedness compared to other Pseudomonas species groups based on the pan and core genome analysis. The 19 isolates of our 21 Populus-associated isolates formed three distinct subgroups within the P. fluorescens major group, supported by pathway profiles analysis, while two isolates were more closely related to P. chlororaphis and P. putida. The specific genes to Populus-associated subgroups were identified where genes specific to subgroup 1 include several sensory systems such as proteins which act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor; specific genes to subgroup 2 contain unique hypothetical genes; and genes specific to subgroup 3 organisms have a different hydrolase activity. IMPORTANCE The comparative genome analyses of the genus Pseudomonas that included Populus-associated isolates resulted in novel insights into high diversity of Pseudomonas. Consistent and robust genomic clusters with phylogenetic homogeneity were identified, which resolved species-clades that are not clearly defined by 16S rRNA gene sequence analysis alone. The genomic clusters may be reflective of distinct ecological niches to which the organisms have adapted, but this

  20. Sequencing and comparative analysis of the gorilla MHC genomic sequence.

    PubMed

    Wilming, Laurens G; Hart, Elizabeth A; Coggill, Penny C; Horton, Roger; Gilbert, James G R; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.

  1. Sequencing and comparative analysis of the gorilla MHC genomic sequence

    PubMed Central

    Wilming, Laurens G.; Hart, Elizabeth A.; Coggill, Penny C.; Horton, Roger; Gilbert, James G. R.; Clee, Chris; Jones, Matt; Lloyd, Christine; Palmer, Sophie; Sims, Sarah; Whitehead, Siobhan; Wiley, David; Beck, Stephan; Harrow, Jennifer L.

    2013-01-01

    Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC. PMID:23589541

  2. Ecology of marine Bacteroidetes: a comparative genomics approach

    PubMed Central

    Fernández-Gómez, Beatriz; Richter, Michael; Schüler, Margarete; Pinhassi, Jarone; Acinas, Silvia G; González, José M; Pedrós-Alió, Carlos

    2013-01-01

    Bacteroidetes are commonly assumed to be specialized in degrading high molecular weight (HMW) compounds and to have a preference for growth attached to particles, surfaces or algal cells. The first sequenced genomes of marine Bacteroidetes seemed to confirm this assumption. Many more genomes have been sequenced recently. Here, a comparative analysis of marine Bacteroidetes genomes revealed a life strategy different from those of other important phyla of marine bacterioplankton such as Cyanobacteria and Proteobacteria. Bacteroidetes have many adaptations to grow attached to particles, have the capacity to degrade polymers, including a large number of peptidases, glycoside hydrolases (GHs), glycosyl transferases, adhesion proteins, as well as the genes for gliding motility. Several of the polymer degradation genes are located in close association with genes for TonB-dependent receptors and transducers, suggesting an integrated regulation of adhesion and degradation of polymers. This confirmed the role of this abundant group of marine bacteria as degraders of particulate matter. Marine Bacteroidetes had a significantly larger number of proteases than GHs, while non-marine Bacteroidetes had equal numbers of both. Proteorhodopsin containing Bacteroidetes shared two characteristics: small genome size and a higher number of genes involved in CO2 fixation per Mb. The latter may be important in order to survive when floating freely in the illuminated, but nutrient-poor, ocean surface. PMID:23303374

  3. Use of comparative genomics to develop EST-SSRs for red drum (Sciaenops ocellatus).

    PubMed

    Hollenbeck, Christopher M; Portnoy, David S; Gold, John R

    2012-12-01

    Microsatellites physically linked to expressed sequence tags (EST-SSRs) are an important resource for linkage mapping and comparative genomics, and data mining in publicly available EST databases is a common strategy for EST-SSR discovery. At present, many species lack species-specific EST sequence data needed for the efficient characterization of EST-SSRs. This paper describes the discovery and development of EST-SSRs for red drum (Sciaenops ocellatus), an estuarine-dependent sciaenid species of economic importance in the USA and elsewhere, using a phylogenetically informed, comparative genomics approach to primer design. The approach entailed comparing existing genomic resources from species closely allied phylogenetically to red drum, with resources from more distantly related outgroup species. By taking into account the degree to which flanking regions are conserved across taxa, the efficiency of PCR primer design was increased greatly. The amplification success rate for primers designed for red drum was 100 % when using EST libraries from confamilial species and 92 % when using an EST library from a species in the same suborder. The primers developed also amplified EST-SSRs in a wide range of perciform fishes, suggesting potential use in comparative genomics. This study demonstrates that EST-SSRs can be efficiently developed for an organism when limited species-specific data are available by exploiting genomic resources from well-studied species, even those at extended phylogenetic distances.

  4. Revealing the missing expressed genes beyond the human reference genome by RNA-Seq.

    PubMed

    Chen, Geng; Li, Ruiyuan; Shi, Leming; Qi, Junyi; Hu, Pengzhan; Luo, Jian; Liu, Mingyao; Shi, Tieliu

    2011-12-02

    The complete and accurate human reference genome is important for functional genomics researches. Therefore, the incomplete reference genome and individual specific sequences have significant effects on various studies. we used two RNA-Seq datasets from human brain tissues and 10 mixed cell lines to investigate the completeness of human reference genome. First, we demonstrated that in previously identified ~5 Mb Asian and ~5 Mb African novel sequences that are absent from the human reference genome of NCBI build 36, ~211 kb and ~201 kb of them could be transcribed, respectively. Our results suggest that many of those transcribed regions are not specific to Asian and African, but also present in Caucasian. Then, we found that the expressions of 104 RefSeq genes that are unalignable to NCBI build 37 in brain and cell lines are higher than 0.1 RPKM. 55 of them are conserved across human, chimpanzee and macaque, suggesting that there are still a significant number of functional human genes absent from the human reference genome. Moreover, we identified hundreds of novel transcript contigs that cannot be aligned to NCBI build 37, RefSeq genes and EST sequences. Some of those novel transcript contigs are also conserved among human, chimpanzee and macaque. By positioning those contigs onto the human genome, we identified several large deletions in the reference genome. Several conserved novel transcript contigs were further validated by RT-PCR. Our findings demonstrate that a significant number of genes are still absent from the incomplete human reference genome, highlighting the importance of further refining the human reference genome and curating those missing genes. Our study also shows the importance of de novo transcriptome assembly. The comparative approach between reference genome and other related human genomes based on the transcriptome provides an alternative way to refine the human reference genome.

  5. Complete chloroplast genome sequences of Solanum bulbocastanum, Solanum lycopersicum and comparative analyses with other Solanaceae genomes.

    PubMed

    Daniell, Henry; Lee, Seung-Bum; Grevich, Justin; Saski, Christopher; Quesada-Vargas, Tania; Guda, Chittibabu; Tomkins, Jeffrey; Jansen, Robert K

    2006-05-01

    Despite the agricultural importance of both potato and tomato, very little is known about their chloroplast genomes. Analysis of the complete sequences of tomato, potato, tobacco, and Atropa chloroplast genomes reveals significant insertions and deletions within certain coding regions or regulatory sequences (e.g., deletion of repeated sequences within 16S rRNA, ycf2 or ribosomal binding sites in ycf2). RNA, photosynthesis, and atp synthase genes are the least divergent and the most divergent genes are clpP, cemA, ccsA, and matK. Repeat analyses identified 33-45 direct and inverted repeats >or=30 bp with a sequence identity of at least 90%; all but five of the repeats shared by all four Solanaceae genomes are located in the same genes or intergenic regions, suggesting a functional role. A comprehensive genome-wide analysis of all coding sequences and intergenic spacer regions was done for the first time in chloroplast genomes. Only four spacer regions are fully conserved (100% sequence identity) among all genomes; deletions or insertions within some intergenic spacer regions result in less than 25% sequence identity, underscoring the importance of choosing appropriate intergenic spacers for plastid transformation and providing valuable new information for phylogenetic utility of the chloroplast intergenic spacer regions. Comparison of coding sequences with expressed sequence tags showed considerable amount of variation, resulting in amino acid changes; none of the C-to-U conversions observed in potato and tomato were conserved in tobacco and Atropa. It is possible that there has been a loss of conserved editing sites in potato and tomato.

  6. High variability of genomic instability and gene expression profiling in different HeLa clones

    PubMed Central

    Frattini, Annalisa; Fabbri, Marco; Valli, Roberto; De Paoli, Elena; Montalbano, Giuseppe; Gribaldo, Laura; Pasquali, Francesco; Maserati, Emanuela

    2015-01-01

    The HeLa cell line is one of the most popular cell lines in biomedical research, despite its well-known chromosomal instability. We compared the genomic and transcriptomic profiles of 4 different HeLa batches and showed that the gain and loss of genomic material varies widely between batches, drastically affecting basal gene expression. Moreover, different pathways were activated in response to a hypoxic stimulus. Our study emphasizes the large genomic and transcriptomic variability among different batches, to the point that the same experiment performed with different batches can lead to distinct conclusions and irreproducible results. The HeLa cell line is thought to be a unique cell line but it is clear that substantial differences between the primary tumour and the human genome exist and that an indeterminate number of HeLa cell lines may exist, each with a unique genomic profile. PMID:26483214

  7. Capturing the heterogeneity in systemic sclerosis with genome-wide expression profiling

    PubMed Central

    Sargent, Jennifer L; Whitfield, Michael L

    2011-01-01

    Heterogeneity in the clinical presentation and basic science findings of systemic sclerosis (SSc) has hindered the understanding of pathogenesis and development of effective treatments. Genome-wide profiling of SSc has measured this heterogeneity. Gene expression studies of diffuse SSc skin have shown reproducible, disease-specific gene expression signatures when compared with healthy controls and, surprisingly, disease-specific gene expression was found in both lesional and non-lesional skin. SSc-specific gene expression in peripheral blood cells and the lungs has also been demonstrated. Hypothesis-driven approaches that assess the contribution of individual pathways provide insight into the etiology of gene expression subsets. PMID:21790289

  8. Comparative genomics of parasitic silkworm microsporidia reveal an association between genome expansion and host adaptation

    PubMed Central

    2013-01-01

    Background Microsporidian Nosema bombycis has received much attention because the pébrine disease of domesticated silkworms results in great economic losses in the silkworm industry. So far, no effective treatment could be found for pébrine. Compared to other known Nosema parasites, N. bombycis can unusually parasitize a broad range of hosts. To gain some insights into the underlying genetic mechanism of pathological ability and host range expansion in this parasite, a comparative genomic approach is conducted. The genome of two Nosema parasites, N. bombycis and N. antheraeae (an obligatory parasite to undomesticated silkworms Antheraea pernyi), were sequenced and compared with their distantly related species, N. ceranae (an obligatory parasite to honey bees). Results Our comparative genomics analysis show that the N. bombycis genome has greatly expanded due to the following three molecular mechanisms: 1) the proliferation of host-derived transposable elements, 2) the acquisition of many horizontally transferred genes from bacteria, and 3) the production of abundnant gene duplications. To our knowledge, duplicated genes derived not only from small-scale events (e.g., tandem duplications) but also from large-scale events (e.g., segmental duplications) have never been seen so abundant in any reported microsporidia genomes. Our relative dating analysis further indicated that these duplication events have arisen recently over very short evolutionary time. Furthermore, several duplicated genes involving in the cytotoxic metabolic pathway were found to undergo positive selection, suggestive of the role of duplicated genes on the adaptive evolution of pathogenic ability. Conclusions Genome expansion is rarely considered as the evolutionary outcome acting on those highly reduced and compact parasitic microsporidian genomes. This study, for the first time, demonstrates that the parasitic genomes can expand, instead of shrink, through several common molecular mechanisms

  9. Digital gene expression analysis of the zebra finch genome

    PubMed Central

    2010-01-01

    Background In order to understand patterns of adaptation and molecular evolution it is important to quantify both variation in gene expression and nucleotide sequence divergence. Gene expression profiling in non-model organisms has recently been facilitated by the advent of massively parallel sequencing technology. Here we investigate tissue specific gene expression patterns in the zebra finch (Taeniopygia guttata) with special emphasis on the genes of the major histocompatibility complex (MHC). Results Almost 2 million 454-sequencing reads from cDNA of six different tissues were assembled and analysed. A total of 11,793 zebra finch transcripts were represented in this EST data, indicating a transcriptome coverage of about 65%. There was a positive correlation between the tissue specificity of gene expression and non-synonymous to synonymous nucleotide substitution ratio of genes, suggesting that genes with a specialised function are evolving at a higher rate (or with less constraint) than genes with a more general function. In line with this, there was also a negative correlation between overall expression levels and expression specificity of contigs. We found evidence for expression of 10 different genes related to the MHC. MHC genes showed relatively tissue specific expression levels and were in general primarily expressed in spleen. Several MHC genes, including MHC class I also showed expression in brain. Furthermore, for all genes with highest levels of expression in spleen there was an overrepresentation of several gene ontology terms related to immune function. Conclusions Our study highlights the usefulness of next-generation sequence data for quantifying gene expression in the genome as a whole as well as in specific candidate genes. Overall, the data show predicted patterns of gene expression profiles and molecular evolution in the zebra finch genome. Expression of MHC genes in particular, corresponds well with expression patterns in other vertebrates

  10. The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics

    PubMed Central

    Bao, Zhirong; Blasiar, Darin; Blumenthal, Thomas; Brent, Michael R; Chen, Nansheng; Chinwalla, Asif; Clarke, Laura; Clee, Chris; Coghlan, Avril; Coulson, Alan; D'Eustachio, Peter; Fitch, David H. A; Fulton, Lucinda A; Fulton, Robert E; Griffiths-Jones, Sam; Harris, Todd W; Hillier, LaDeana W; Kamath, Ravi; Kuwabara, Patricia E; Mardis, Elaine R; Marra, Marco A; Miner, Tracie L; Minx, Patrick; Mullikin, James C; Plumb, Robert W; Rogers, Jane; Schein, Jacqueline E; Sohrmann, Marc; Spieth, John; Stajich, Jason E; Wei, Chaochun; Willey, David; Wilson, Richard K; Durbin, Richard; Waterston, Robert H

    2003-01-01

    The soil nematodes Caenorhabditis briggsae and Caenorhabditis elegans diverged from a common ancestor roughly 100 million years ago and yet are almost indistinguishable by eye. They have the same chromosome number and genome sizes, and they occupy the same ecological niche. To explore the basis for this striking conservation of structure and function, we have sequenced the C. briggsae genome to a high-quality draft stage and compared it to the finished C. elegans sequence. We predict approximately 19,500 protein-coding genes in the C. briggsae genome, roughly the same as in C. elegans. Of these, 12,200 have clear C. elegans orthologs, a further 6,500 have one or more clearly detectable C. elegans homologs, and approximately 800 C. briggsae genes have no detectable matches in C. elegans. Almost all of the noncoding RNAs (ncRNAs) known are shared between the two species. The two genomes exhibit extensive colinearity, and the rate of divergence appears to be higher in the chromosomal arms than in the centers. Operons, a distinctive feature of C. elegans, are highly conserved in C. briggsae, with the arrangement of genes being preserved in 96% of cases. The difference in size between the C. briggsae (estimated at approximately 104 Mbp) and C. elegans (100.3 Mbp) genomes is almost entirely due to repetitive sequence, which accounts for 22.4% of the C. briggsae genome in contrast to 16.5% of the C. elegans genome. Few, if any, repeat families are shared, suggesting that most were acquired after the two species diverged or are undergoing rapid evolution. Coclustering the C. elegans and C. briggsae proteins reveals 2,169 protein families of two or more members. Most of these are shared between the two species, but some appear to be expanding or contracting, and there seem to be as many as several hundred novel C. briggsae gene families. The C. briggsae draft sequence will greatly improve the annotation of the C. elegans genome. Based on similarity to C. briggsae, we found

  11. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio).

    PubMed

    Dong, Chuanju; Chen, Lin; Feng, Jingyan; Xu, Jian; Mahboob, Shahid; Al-Ghanim, Khalid; Li, Xuejun; Xu, Peng

    2016-01-01

    Aquaporins (Aqps) are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication. In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event. To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family provides an essential genomic

  12. Genome Wide Identification, Phylogeny, and Expression of Aquaporin Genes in Common Carp (Cyprinus carpio)

    PubMed Central

    Feng, Jingyan; Xu, Jian; Mahboob, Shahid; Al-Ghanim, Khalid; Li, Xuejun

    2016-01-01

    Background Aquaporins (Aqps) are integral membrane proteins that facilitate the transport of water and small solutes across cell membranes. Among vertebrate species, Aqps are highly conserved in both gene structure and amino acid sequence. These proteins are vital for maintaining water homeostasis in living organisms, especially for aquatic animals such as teleost fish. Studies on teleost Aqps are mainly limited to several model species with diploid genomes. Common carp, which has a tetraploidized genome, is one of the most common aquaculture species being adapted to a wide range of aquatic environments. The complete common carp genome has recently been released, providing us the possibility for gene evolution of aqp gene family after whole genome duplication. Results In this study, we identified a total of 37 aqp genes from common carp genome. Phylogenetic analysis revealed that most of aqps are highly conserved. Comparative analysis was performed across five typical vertebrate genomes. We found that almost all of the aqp genes in common carp were duplicated in the evolution of the gene family. We postulated that the expansion of the aqp gene family in common carp was the result of an additional whole genome duplication event and that the aqp gene family in other teleosts has been lost in their evolution history with the reason that the functions of genes are redundant and conservation. Expression patterns were assessed in various tissues, including brain, heart, spleen, liver, intestine, gill, muscle, and skin, which demonstrated the comprehensive expression profiles of aqp genes in the tetraploidized genome. Significant gene expression divergences have been observed, revealing substantial expression divergences or functional divergences in those duplicated aqp genes post the latest WGD event. Conclusions To some extent, the gene families are also considered as a unique source for evolutionary studies. Moreover, the whole set of common carp aqp gene family

  13. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    PubMed

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2015-10-30

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants.

  14. A Comparative Analysis of Mitochondrial Genomes in Eustigmatophyte Algae

    PubMed Central

    Ševčíková, Tereza; Klimeš, Vladimír; Zbránková, Veronika; Strnad, Hynek; Hroudová, Miluše; Vlček, Čestmír; Eliáš, Marek

    2016-01-01

    Eustigmatophyceae (Ochrophyta, Stramenopiles) is a small algal group with species of the genus Nannochloropsis being its best studied representatives. Nuclear and organellar genomes have been recently sequenced for several Nannochloropsis spp., but phylogenetically wider genomic studies are missing for eustigmatophytes. We sequenced mitochondrial genomes (mitogenomes) of three species representing most major eustigmatophyte lineages, Monodopsis sp. MarTras21, Vischeria sp. CAUP Q 202 and Trachydiscus minutus, and carried out their comparative analysis in the context of available data from Nannochloropsis and other stramenopiles, revealing a number of noticeable findings. First, mitogenomes of most eustigmatophytes are highly collinear and similar in the gene content, but extensive rearrangements and loss of three otherwise ubiquitous genes happened in the Vischeria lineage; this correlates with an accelerated evolution of mitochondrial gene sequences in this lineage. Second, eustigmatophytes appear to be the only ochrophyte group with the Atp1 protein encoded by the mitogenome. Third, eustigmatophyte mitogenomes uniquely share a truncated nad11 gene encoding only the C-terminal part of the Nad11 protein, while the N-terminal part is encoded by a separate gene in the nuclear genome. Fourth, UGA as a termination codon and the cognate release factor mRF2 were lost from mitochondria independently by the Nannochloropsis and T. minutus lineages. Finally, the rps3 gene in the mitogenome of Vischeria sp. is interrupted by the UAG codon, but the genome includes a gene for an unusual tRNA with an extended anticodon loop that we speculate may serve as a suppressor tRNA to properly decode the rps3 gene. PMID:26872774

  15. Comparative genomics of Wolbachia and the bacterial species concept.

    PubMed

    Ellegaard, Kirsten Maren; Klasson, Lisa; Näslund, Kristina; Bourtzis, Kostas; Andersson, Siv G E

    2013-04-01

    The importance of host-specialization to speciation processes in obligate host-associated bacteria is well known, as is also the ability of recombination to generate cohesion in bacterial populations. However, whether divergent strains of highly recombining intracellular bacteria, such as Wolbachia, can maintain their genetic distinctness when infecting the same host is not known. We first developed a protocol for the genome sequencing of uncultivable endosymbionts. Using this method, we have sequenced the complete genomes of the Wolbachia strains wHa and wNo, which occur as natural double infections in Drosophila simulans populations on the Seychelles and in New Caledonia. Taxonomically, wHa belong to supergroup A and wNo to supergroup B. A comparative genomics study including additional strains supported the supergroup classification scheme and revealed 24 and 33 group-specific genes, putatively involved in host-adaptation processes. Recombination frequencies were high for strains of the same supergroup despite different host-preference patterns, leading to genomic cohesion. The inferred recombination fragments for strains of different supergroups were of short sizes, and the genomes of the co-infecting Wolbachia strains wHa and wNo were not more similar to each other and did not share more genes than other A- and B-group strains that infect different hosts. We conclude that Wolbachia strains of supergroup A and B represent genetically distinct clades, and that strains of different supergroups can co-exist in the same arthropod host without converging into the same species. This suggests that the supergroups are irreversibly separated and that barriers other than host-specialization are able to maintain distinct clades in recombining endosymbiont populations. Acquiring a good knowledge of the barriers to genetic exchange in Wolbachia will advance our understanding of how endosymbiont communities are constructed from vertically and horizontally transmitted genes.

  16. Comparative Genomics of Wolbachia and the Bacterial Species Concept

    PubMed Central

    Näslund, Kristina; Bourtzis, Kostas; Andersson, Siv G. E.

    2013-01-01

    The importance of host-specialization to speciation processes in obligate host-associated bacteria is well known, as is also the ability of recombination to generate cohesion in bacterial populations. However, whether divergent strains of highly recombining intracellular bacteria, such as Wolbachia, can maintain their genetic distinctness when infecting the same host is not known. We first developed a protocol for the genome sequencing of uncultivable endosymbionts. Using this method, we have sequenced the complete genomes of the Wolbachia strains wHa and wNo, which occur as natural double infections in Drosophila simulans populations on the Seychelles and in New Caledonia. Taxonomically, wHa belong to supergroup A and wNo to supergroup B. A comparative genomics study including additional strains supported the supergroup classification scheme and revealed 24 and 33 group-specific genes, putatively involved in host-adaptation processes. Recombination frequencies were high for strains of the same supergroup despite different host-preference patterns, leading to genomic cohesion. The inferred recombination fragments for strains of different supergroups were of short sizes, and the genomes of the co-infecting Wolbachia strains wHa and wNo were not more similar to each other and did not share more genes than other A- and B-group strains that infect different hosts. We conclude that Wolbachia strains of supergroup A and B represent genetically distinct clades, and that strains of different supergroups can co-exist in the same arthropod host without converging into the same species. This suggests that the supergroups are irreversibly separated and that barriers other than host-specialization are able to maintain distinct clades in recombining endosymbiont populations. Acquiring a good knowledge of the barriers to genetic exchange in Wolbachia will advance our understanding of how endosymbiont communities are constructed from vertically and horizontally transmitted genes

  17. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes

    PubMed Central

    Gil, Rosario; Silva, Francisco J.; Zientz, Evelyn; Delmotte, François; González-Candelas, Fernando; Latorre, Amparo; Rausell, Carolina; Kamerbeek, Judith; Gadau, Jürgen; Hölldobler, Bert; van Ham, Roeland C. H. J.; Gross, Roy; Moya, Andrés

    2003-01-01

    Bacterial symbioses are widespread among insects, probably being one of the key factors of their evolutionary success. We present the complete genome sequence of Blochmannia floridanus, the primary endosymbiont of carpenter ants. Although these ants feed on a complex diet, this symbiosis very likely has a nutritional basis: Blochmannia is able to supply nitrogen and sulfur compounds to the host while it takes advantage of the host metabolic machinery. Remarkably, these bacteria lack all known genes involved in replication initiation (dnaA, priA, and recA). The phylogenetic analysis of a set of conserved protein-coding genes shows that Bl. floridanus is phylogenetically related to Buchnera aphidicola and Wigglesworthia glossinidia, the other endosymbiotic bacteria whose complete genomes have been sequenced so far. Comparative analysis of the five known genomes from insect endosymbiotic bacteria reveals they share only 313 genes, a number that may be close to the minimum gene set necessary to sustain endosymbiotic life. PMID:12886019

  18. fPoxDB: fungal peroxidase database for comparative genomics

    PubMed Central

    2014-01-01

    -based prediction and diverse analysis toolkits with easy-to-follow web interface offer a useful workbench to study comparative and evolutionary genomics of peroxidases in fungi. PMID:24885079

  19. Comparative genome analysis of Bacillus cereus group genomes withBacillus subtilis

    SciTech Connect

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D'Souza, Mark; Larsen, Niels; Pusch,Gordon; Liolios, Konstantinos; Grechkin, Yuri; Lapidus, Alla; Goltsman,Eugene; Chu, Lien; Fonstein, Michael; Ehrlich, S. Dusko; Overbeek, Ross; Kyrpides, Nikos; Ivanova, Natalia

    2005-09-14

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.

  20. Exploring the Zoonotic Potential of Mycobacterium avium Subspecies paratuberculosis through Comparative Genomics

    PubMed Central

    Seemann, Torsten; Bulach, Dieter M.; Wagner, Josef; Kirkwood, Carl D.; Michalski, Wojtek P.

    2011-01-01

    A comparative genomics approach was utilised to compare the genomes of Mycobacterium avium subspecies paratuberculosis (MAP) isolated from early onset paediatric Crohn's disease (CD) patients as well as Johne's diseased animals. Draft genome sequences were produced for MAP isolates derived from four CD patients, one ulcerative colitis (UC) patient, and two non-inflammatory bowel disease (IBD) control individuals using Illumina sequencing, complemented by comparative genome hybridisation (CGH). MAP isolates derived from two bovine and one ovine host were also subjected to whole genome sequencing and CGH. All seven human derived MAP isolates were highly genetically similar and clustered together with one bovine type isolate following phylogenetic analysis. Three other sequenced isolates (including the reference bovine derived isolate K10) were genetically distinct. The human isolates contained two large tandem duplications, the organisations of which were confirmed by PCR. Designated vGI-17 and vGI-18 these duplications spanned 63 and 109 open reading frames, respectively. PCR screening of over 30 additional MAP isolates (3 human derived, 27 animal derived and one environmental isolate) confirmed that vGI-17 and vGI-18 are common across many isolates. Quantitative real-time PCR of vGI-17 demonstrated that the proportion of cells containing the vGI-17 duplication varied between 0.01 to 15% amongst isolates with human isolates containing a higher proportion of vGI-17 compared to most animal isolates. These findings suggest these duplications are transient genomic rearrangements. We hypothesise that the over-representation of vGI-17 in human derived MAP strains may enhance their ability to infect or persist within a human host by increasing genome redundancy and conferring crude regulation of protein expression across biologically important regions. PMID:21799786

  1. A process for analysis of microarray comparative genomics hybridisation studies for bacterial genomes

    PubMed Central

    Carter, Ben; Wu, Guanghui; Woodward, Martin J; Anjum, Muna F

    2008-01-01

    Background Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes. PMID:18230148

  2. Genome sequence of the β-rhizobium Cupriavidus taiwanensis and comparative genomics of rhizobia

    PubMed Central

    Amadou, Claire; Pascal, Géraldine; Mangenot, Sophie; Glew, Michelle; Bontemps, Cyril; Capela, Delphine; Carrère, Sébastien; Cruveiller, Stéphane; Dossat, Carole; Lajus, Aurélie; Marchetti, Marta; Poinsot, Véréna; Rouy, Zoé; Servin, Bertrand; Saad, Maged; Schenowitz, Chantal; Barbe, Valérie; Batut, Jacques; Médigue, Claudine; Masson-Boivin, Catherine

    2008-01-01

    We report the first complete genome sequence of a β-proteobacterial nitrogen-fixing symbiont of legumes, Cupriavidus taiwanensis LMG19424. The genome consists of two chromosomes of size 3.42 Mb and 2.50 Mb, and a large symbiotic plasmid of 0.56 Mb. The C. taiwanensis genome displays an unexpected high similarity with the genome of the saprophytic bacterium C. eutrophus H16, despite being 0.94 Mb smaller. Both organisms harbor two chromosomes with large regions of synteny interspersed by specific regions. In contrast, the two species host highly divergent plasmids, with the consequence that C. taiwanensis is symbiotically proficient and less metabolically versatile. Altogether, specific regions in C. taiwanensis compared with C. eutrophus cover 1.02 Mb and are enriched in genes associated with symbiosis or virulence in other bacteria. C. taiwanensis reveals characteristics of a minimal rhizobium, including the most compact (35-kb) symbiotic island (nod and nif) identified so far in any rhizobium. The atypical phylogenetic position of C. taiwanensis allowed insightful comparative genomics of all available rhizobium genomes. We did not find any gene that was both common and specific to all rhizobia, thus suggesting that a unique shared genetic strategy does not support symbiosis of rhizobia with legumes. Instead, phylodistribution analysis of more than 200 Sinorhizobium meliloti known symbiotic genes indicated large and complex variations of their occurrence in rhizobia and non-rhizobia. This led us to devise an in silico method to extract genes preferentially associated with rhizobia. We discuss how the novel genes we have identified may contribute to symbiotic adaptation. PMID:18490699

  3. SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets.

    PubMed

    Sarovich, Derek S; Price, Erin P

    2014-09-08

    Next-generation sequencing (NGS) is now a commonplace tool for molecular characterisation of virtually any species of interest. Despite the ever-increasing use of NGS in laboratories worldwide, analysis of whole genome re-sequencing (WGS) datasets from start to finish remains nontrivial due to the fragmented nature of NGS software and the lack of experienced bioinformaticists in many research teams. We describe SPANDx (Synergised Pipeline for Analysis of NGS Data in Linux), a new tool for high-throughput comparative analysis of haploid WGS datasets comprising one through thousands of genomes. SPANDx consolidates several well-validated, open-source packages into a single tool, mitigating the need to learn and manipulate individual NGS programs. SPANDx incorporates BWA for alignment of raw NGS reads against a reference genome or pan-genome, followed by data filtering, variant calling and annotation using Picard, GATK, SAMtools and SnpEff. BEDTools has also been included for genetic locus presence/absence (P/A) determination to easily visualise the core and accessory genomes. Additional SPANDx features include construction of error-corrected single-nucleotide polymorphism (SNP) and insertion-deletion matrices, and P/A matrices, to enable user-friendly visualisation of genetic variants. The SNP matrices generated using VCFtools and GATK are directly importable into PAUP*, PHYLIP or RAxML for downstream phylogenetic analysis. SPANDx has been developed to handle NGS data from Illumina, Ion Personal Genome Machine (PGM) and 454 platforms, and we demonstrate that it has comparable performance across Illumina MiSeq/HiSeq2000 and Ion PGM data. SPANDx is an all-in-one tool for comprehensive haploid WGS analysis. SPANDx is open source and is freely available at: http://sourceforge.net/projects/spandx/.

  4. Enabling comparative modeling of closely related genomes: Example genus Brucella

    DOE PAGES

    Faria, José P.; Edirisinghe, Janaka N.; Davis, James J.; ...

    2014-03-08

    For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this study, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as wellmore » as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.« less

  5. New Target Regions for Human Hypertension via Comparative Genomics

    PubMed Central

    Stoll, Monika; Kwitek-Black, Anne E.; Cowley, Allen W.; Harris, Eugenie L.; Harrap, Stephen B.; Krieger, José E.; Printz, Morton P.; Provoost, Abraham P.; Sassard, Jean; Jacob, Howard J.

    2000-01-01

    Models of human disease have long been used to understand the basic pathophysiology of disease and to facilitate the discovery of new therapeutics. However, as long as models have been used there have been debates about the utility of these models and their ability to mimic clinical disease at the phenotypic level. The application of genetic studies to both humans and model systems allows for a new paradigm, whereby a novel comparative genomics strategy combined with phenotypic correlates can be used to bridge between clinical relevance and model utility. This study presents a comparative genomic map for “candidate hypertension loci in humans” based on translating QTLs between rat and human, predicting 26 chromosomal regions in the human genome that are very likely to harbor hypertension genes. The predictive power appears robust, as several of these regions have also been implicated in mouse, suggesting that these regions represent primary targets for the development of SNPs for linkage disequilibrium testing in humans and/or provide a means to select specific models for additional functional studies and the development of new therapeutics. PMID:10779487

  6. Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus

    PubMed Central

    2011-01-01

    Background Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. Results To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp). Conclusions We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S. oneidensis MR-1. Analysis of

  7. MGcV: the microbial genomic context viewer for comparative genome analysis

    PubMed Central

    2013-01-01

    Background Conserved gene context is used in many types of comparative genome analyses. It is used to provide leads on gene function, to guide the discovery of regulatory sequences, but also to aid in the reconstruction of metabolic networks. We present the Microbial Genomic context Viewer (MGcV), an interactive, web-based application tailored to strengthen the practice of manual comparative genome context analysis for bacteria. Results MGcV is a versatile, easy-to-use tool that renders a visualization of the genomic context of any set of selected genes, genes within a phylogenetic tree, genomic segments, or regulatory elements. It is tailored to facilitate laborious tasks such as the interactive annotation of gene function, the discovery of regulatory elements, or the sequence-based reconstruction of gene regulatory networks. We illustrate that MGcV can be used in gene function annotation by visually integrating information on prokaryotic genes, like their annotation as available from NCBI with other annotation data such as Pfam domains, sub-cellular location predictions and gene-sequence characteristics such as GC content. We also illustrate the usefulness of the interactive features that allow the graphical selection of genes to facilitate data gathering (e.g. upstream regions, ID’s or annotation), in the analysis and reconstruction of transcription regulation. Moreover, putative regulatory elements and their corresponding scores or data from RNA-seq and microarray experiments can be uploaded, visualized and interpreted in (ranked-) comparative context maps. The ranked maps allow the interpretation of predicted regulatory elements and experimental data in light of each other. Conclusion MGcV advances the manual comparative analysis of genes and regulatory elements by providing fast and flexible integration of gene related data combined with straightforward data retrieval. MGcV is available at http://mgcv.cmbi.ru.nl. PMID:23547764

  8. Ancient signals: comparative genomics of green plant CDPKs.

    PubMed

    Hamel, Louis-Philippe; Sheen, Jen; Séguin, Armand

    2014-02-01

    Calcium-dependent protein kinases (CDPKs) are multifunctional proteins that combine calcium-binding and signaling capabilities within a single gene product. This unique versatility enables multiple plant biological processes to be controlled, including developmental programs and stress responses. The genome of flowering plants typically encodes around 30 CDPK homologs that cluster in four conserved clades. In this review, we take advantage of the recent availability of genome sequences from green algae and early land plants to examine how well the previously described CDPK family from angiosperms compares to the broader evolutionary states associated with early diverging green plant lineages. Our analysis suggests that the current architecture of the CDPK family was shaped during the colonization of the land by plants, whereas CDPKs from ancestor green algae have continued to evolve independently. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.

  9. A Comparative Encyclopedia of DNA Elements in the Mouse Genome

    PubMed Central

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing

    2014-01-01

    Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824

  10. Beyond the thale: comparative genomics and genetics of Arabidopsis relatives.

    PubMed

    Koenig, Daniel; Weigel, Detlef

    2015-05-01

    For decades a small number of model species have rightly occupied a privileged position in laboratory experiments, but it is becoming increasingly clear that our knowledge of biology is greatly improved when informed by a broader diversity of species and evolutionary context. Arabidopsis thaliana has been the primary model organism for plants, benefiting from a high-quality reference genome sequence and resources for reverse genetics. However, recent studies have made a group of species also in the Brassicaceae family and closely related to A. thaliana a focal point for comparative molecular, genomic, phenotypic and evolutionary studies. In this Review, we emphasize how such studies complement continued study of the model plant itself, provide an evolutionary perspective and summarize our current understanding of genetic and phenotypic diversity in plants.

  11. A comparative encyclopedia of DNA elements in the mouse genome.

    PubMed

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

    2014-11-20

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

  12. Industrial Acetogenic Biocatalysts: A Comparative Metabolic and Genomic Analysis

    PubMed Central

    Bengelsdorf, Frank R.; Poehlein, Anja; Linder, Sonja; Erz, Catarina; Hummel, Tim; Hoffmeister, Sabrina; Daniel, Rolf; Dürre, Peter

    2016-01-01

    Synthesis gas (syngas) fermentation by anaerobic acetogenic bacteria employing the Wood–Ljungdahl pathway is a bioprocess for production of biofuels and biocommodities. The major fermentation products of the most relevant biocatalytic strains (Clostridium ljungdahlii, C. autoethanogenum, C. ragsdalei, and C. coskatii) are acetic acid and ethanol. A comparative metabolic and genomic analysis using the mentioned biocatalysts might offer targets for metabolic engineering and thus improve the production of compounds apart from ethanol. Autotrophic growth and product formation of the four wild type (WT) strains were compared in uncontrolled batch experiments. The genomes of C. ragsdalei and C. coskatii were sequenced and the genome sequences of all four biocatalytic strains analyzed in comparative manner. Growth and product spectra (acetate, ethanol, 2,3-butanediol) of C. autoethanogenum, C. ljungdahlii, and C. ragsdalei were rather similar. In contrast, C. coskatii produced significantly less ethanol and its genome sequence lacks two genes encoding aldehyde:ferredoxin oxidoreductases (AOR). Comparative genome sequence analysis of the four WT strains revealed high average nucleotide identity (ANI) of C. ljungdahlii and C. autoethanogenum (99.3%) and C. coskatii (98.3%). In contrast, C. ljungdahlii WT and C. ragsdalei WT showed an ANI-based similarity of only 95.8%. Additionally, recombinant C. ljungdahlii strains were constructed that harbor an artificial acetone synthesis operon (ASO) consisting of the following genes: adc, ctfA, ctfB, and thlA (encoding acetoacetate decarboxylase, acetoacetyl-CoA:acetate/butyrate:CoA-transferase subunits A and B, and thiolase) under the control of thlA promoter (PthlA) from C. acetobutylicum or native pta-ack promoter (Ppta-ack) from C. ljungdahlii. Respective recombinant strains produced 2-propanol rather than acetone, due to the presence of a NADPH-dependent primary-secondary alcohol dehydrogenase that converts acetone to 2

  13. Comparative Genomics of the Ubiquitous, Hydrocarbon-degrading Genus Marinobacter

    NASA Astrophysics Data System (ADS)

    Singer, E.; Webb, E.; Edwards, K. J.

    2012-12-01

    The genus Marinobacter is amongst the most ubiquitous in the global oceans and strains have been isolated from a wide variety of marine environments, including offshore oil-well heads, coastal thermal springs, Antarctic sea water, saline soils and associations with diatoms and dinoflagellates. Many strains have been recognized to be important hydrocarbon degraders in various marine habitats presenting sometimes extreme pH or salinity conditions. Analysis of the genome of M. aquaeolei revealed enormous adaptation versatility with an assortment of strategies for carbon and energy acquisition, sensation, and defense. In an effort to elucidate the ecological and biogeochemical significance of the Marinobacters, seven Marinobacter strains from diverse environments were included in a comparative genomics study. Genomes were screened for metabolic and adaptation potential to elucidate the strategies responsible for the omnipresence of the Marinobacter genus and their remedial action potential in hydrocarbon-polluted waters. The core genome predominantly encodes for key genes involved in hydrocarbon degradation, biofilm-relevant processes, including utilization of external DNA, halotolerance, as well as defense mechanisms against heavy metals, antibiotics, and toxins. All Marinobacter strains were observed to degrade a wide spectrum of hydrocarbon species, including aliphatic, polycyclic aromatic as well as acyclic isoprenoid compounds. Various genes predicted to facilitate hydrocarbon degradation, e.g. alkane 1-monooxygenase, appear to have originated from lateral gene transfer as they are located on gene clusters of 10-20% lower GC-content compared to genome averages and are flanked by transposases. Top ortholog hits are found in other hydrocarbon degrading organisms, e.g. Alcanivorax borkumensis. Strategies for hydrocarbon uptake encoded by various Marinobacter strains include cell surface hydrophobicity adaptation via capsular polysaccharide biosynthesis and attachment

  14. Array comparative genomic hybridization in retinoma and retinoblastoma tissues.

    PubMed

    Sampieri, Katia; Amenduni, Mariangela; Papa, Filomena Tiziana; Katzaki, Eleni; Mencarelli, Maria Antonietta; Marozza, Annabella; Epistolato, Maria Carmela; Toti, Paolo; Lazzi, Stefano; Bruttini, Mirella; De Filippis, Roberta; De Francesco, Sonia; Longo, Ilaria; Meloni, Ilaria; Mari, Francesca; Acquaviva, Antonio; Hadjistilianou, Theodora; Renieri, Alessandra; Ariani, Francesca

    2009-03-01

    In retinoblastoma, two RB1 mutations are necessary for tumor development. Recurrent genomic rearrangements may represent subsequent events required for retinoblastoma progression. Array-comparative genomic hybridization was carried out in 18 eye samples, 10 from bilateral and eight from unilateral retinoblastoma patients. Two unilateral cases also showed areas of retinoma. The most frequent imbalance in retinoblastomas was 6p gain (40%), followed by gains at 1q12-q25.3, 2p24.3-p24.2, 9q22.2, and 9q33.1 and losses at 11q24.3, 13q13.2-q22.3, and 16q12.1-q21. Bilateral cases showed a lower number of imbalances than unilateral cases (P = 0.002). Unilateral cases were divided into low-level (< or = 4) and high-level (> or = 7) chromosomal instability groups. The first group presented with younger age at diagnosis (mean 511 days) compared with the second group (mean 1606 days). In one retinoma case ophthalmoscopically diagnosed as a benign lesion no rearrangements were detected, whereas the adjacent retinoblastoma displayed seven aberrations. The other retinoma case identified by retrospective histopathological examination shared three rearrangements with the adjacent retinoblastoma. Two other gene-free rearrangements were retinoma specific. One rearrangement, dup5p, was retinoblastoma specific and included the SKP2 gene. Genomic profiling indicated that the first retinoma was a pretumoral lesion, whereas the other represents a subclone of cells bearing 'benign' rearrangements overwhelmed by another subclone presenting aberrations with higher 'oncogenic' potential. In summary, the present study shows that bilateral and unilateral retinoblastoma have different chromosomal instability that correlates with the age of tumor onset in unilateral cases. This is the first report of genomic profiling in retinoma tissue, shedding light on the different nature of lesions named 'retinoma'.

  15. Comparative Genomics of Serratia spp.: Two Paths towards Endosymbiotic Life

    PubMed Central

    Manzano-Marín, Alejandro; Lamelas, Araceli; Moya, Andrés; Latorre, Amparo

    2012-01-01

    Symbiosis is a widespread phenomenon in nature, in which insects show a great number of these associations. Buchnera aphidicola, the obligate endosymbiont of aphids, coexists in some species with another intracellular bacterium, Serratia symbiotica. Of particular interest is the case of the cedar aphid Cinara cedri, where B. aphidicola BCc and S. symbiotica SCc need each other to fulfil their symbiotic role with the insect. Moreover, various features seem to indicate that S. symbiotica SCc is closer to an obligate endosymbiont than to other facultative S. symbiotica, such as the one described for the aphid Acirthosyphon pisum (S. symbiotica SAp). This work is based on the comparative genomics of five strains of Serratia, three free-living and two endosymbiotic ones (one facultative and one obligate) which should allow us to dissect the genome reduction taking place in the adaptive process to an intracellular life-style. Using a pan-genome approach, we have identified shared and strain-specific genes from both endosymbiotic strains and gained insight into the different genetic reduction both S. symbiotica have undergone. We have identified both retained and reduced functional categories in S. symbiotica compared to the Free-Living Serratia (FLS) that seem to be related with its endosymbiotic role in their specific host-symbiont systems. By means of a phylogenomic reconstruction we have solved the position of both endosymbionts with confidence, established the probable insect-pathogen origin of the symbiotic clade as well as the high amino-acid substitution rate in S. symbiotica SCc. Finally, we were able to quantify the minimal number of rearrangements suffered in the endosymbiotic lineages and reconstruct a minimal rearrangement phylogeny. All these findings provide important evidence for the existence of at least two distinctive S. symbiotica lineages that are characterized by different rearrangements, gene content, genome size and branch lengths. PMID:23077583

  16. Comparative and functional genomics of lipases in holometabolous insects.

    PubMed

    Horne, Irene; Haritos, Victoria S; Oakeshott, John G

    2009-08-01

    Lipases have key roles in insect lipid acquisition, storage and mobilisation and are also fundamental to many physiological processes underpinning insect reproduction, development, defence from pathogens and oxidative stress, and pheromone signalling. We have screened the recently sequenced genomes of five species from four orders of holometabolous insects, the dipterans Drosophila melanogaster and Anopheles gambiae, the hymenopteran Apis mellifera, the moth Bombyx mori and the beetle Tribolium castaneum, for the six major lipase families that are also found in other organisms. The two most numerous families in the insects, the neutral and acid lipases, are also the main families in mammals, albeit not in Caenorhabditis elegans, plants or microbes. Total numbers of the lipases vary two-fold across the five insect species, from numbers similar to those in mammals up to numbers comparable to those seen in C. elegans. Whilst there is a high degree of orthology with mammalian lipases in the other four families, the great majority of the insect neutral and acid lipases have arisen since the insect orders themselves diverged. Intriguingly, about 10% of the insect neutral and acid lipases have lost motifs critical for catalytic function. Examination of the length of lid and loop regions of the neutral lipase sequences suggest that most of the insect lipases lack triacylglycerol (TAG) hydrolysis activity, although the acid lipases all have intact cap domains required for TAG hydrolysis. We have also reviewed the sequence databases and scientific literature for insights into the expression profiles and functions of the insect neutral and acid lipases and the orthologues of the mammalian adipose triglyceride lipase which has a pivotal role in lipid mobilisation. These data suggest that some of the acid and neutral lipase diversity may be due to a requirement for rapid accumulation of dietary lipids. The different roles required of lipases at the four discrete life stages of

  17. Comparative genomic analysis reveals a distant liver enhancer upstream of the COUP-TFII gene

    SciTech Connect

    Baroukh, Nadine; Ahituv, Nadav; Chang, Jessie; Shoukry, Malak; Afzal, Veena; Rubin, Edward M.; Pennacchio, Len A.

    2004-08-20

    COUP-TFII is a central nuclear hormone receptor that tightly regulates the expression of numerous target lipid metabolism genes in vertebrates. However, it remains unclear how COUP-TFII itself is transcriptionally controlled since studies with its promoter and upstream region fail to recapitulate the genes liver expression. In an attempt to identify liver enhancers in the vicinity of COUP-TFII, we employed a comparative genomic approach. Initial comparisons between humans and mice of the 3,470kb gene poor region surrounding COUP-TFII revealed 2,023 conserved non-coding elements. To prioritize a subset of these elements for functional studies, we performed further genomic comparisons with the orthologous pufferfish (Fugu rubripes) locus and uncovered two anciently conserved non-coding sequences (CNS) upstream of COUP-TFII (CNS-62kb and CNS-66kb). Testing these two elements using reporter constructs in liver (HepG2) cells revealed that CNS-66kb, but not CNS-62kb, yielded robust in vitro enhancer activity. In addition, an in vivo reporter assay using naked DNA transfer with CNS-66kb linked to luciferase displayed strong reproducible liver expression in adult mice, further supporting its role as a liver enhancer. Together, these studies further support the utility of comparative genomics to uncover gene regulatory sequences based on evolutionary conservation and provide the substrates to better understand the regulation and expression of COUP-TFII.

  18. CoGemiR: A comparative genomics microRNA database

    PubMed Central

    Maselli, Vincenza; Di Bernardo, Diego; Banfi, Sandro

    2008-01-01

    Background MicroRNAs are small highly conserved non-coding RNAs which play an important role in regulating gene expression by binding the 3'UTR of target mRNAs. The majority of microRNAs are localized within other transcriptional units (host genes) and are co-expressed with them, which strongly suggests that microRNAs and corresponding host genes use the same promoter and other expression control elements. The remaining fraction of microRNAs is intergenic and is endowed with an independent regulatory region. A number of databases have already been developed to collect information about microRNAs but none of them allow an easy exploration of microRNA genomic organization across evolution. Results CoGemiR is a publicly available microRNA-centered database whose aim is to offer an overview of the genomic organization of microRNAs and of its extent of conservation during evolution in different metazoan species. The database collects information on genomic location, conservation and expression data of both known and newly predicted microRNAs and displays the data by privileging a comparative point of view. The database also includes a microRNA prediction pipeline to annotate microRNAs in recently sequenced genomes. This information is easily accessible via web through a user-friendly query page. The CoGemiR database is available at Conclusion The knowledge of the genomic organization of microRNAs can provide useful information to understand their biology. In order to have a comparative genomics overview of microRNAs genomic organization, we developed CoGemiR. To achieve this goal, we both collected and integrated data from pre-existing databases and generated new ones, such as the identification in several species of a number of previously unannotated microRNAs. For a more effective use of this data, we developed a user-friendly web interface that simply shows how a microRNA genomic context is related in different species. PMID:18837977

  19. Comparative Analysis of Genomics and Proteomics in Bacillus thuringiensis 4.0718

    PubMed Central

    Rang, Jie; He, Hao; Wang, Ting; Ding, Xuezhi; Zuo, Mingxing; Quan, Meifang; Sun, Yunjun; Yu, Ziquan; Hu, Shengbiao; Xia, Liqiu

    2015-01-01

    Bacillus thuringiensis is a widely used biopesticide that produced various insecticidal active substances during its life cycle. Separation and purification of numerous insecticide active substances have been difficult because of the relatively short half-life of such substances. On the other hand, substances can be synthetized at different times during development, so samples at different stages have to be studied, further complicating the analysis. A dual genomic and proteomic approach would enhance our ability to identify such substances, and particularily using mass spectrometry-based proteomic methods. The comparative analysis for genomic and proteomic data have showed that not all of the products deduced from the annotated genome could be identified among the proteomic data. For instance, genome annotation results showed that 39 coding sequences in the whole genome were related to insect pathogenicity, including five cry genes. However, Cry2Ab, Cry1Ia, Cytotoxin K, Bacteriocin, Exoenzyme C3 and Alveolysin could not be detected in the proteomic data obtained. The sporulation-related proteins were also compared analysis, results showed that the great majority sporulation-related proteins can be detected by mass spectrometry. This analysis revealed Spo0A~P, SigF, SigE(+), SigK(+) and SigG(+), all known to play an important role in the process of spore formation regulatory network, also were displayed in the proteomic data. Through the comparison of the two data sets, it was possible to infer that some genes were silenced or were expressed at very low levels. For instance, found that cry2Ab seems to lack a functional promoter while cry1Ia may not be expressed due to the presence of transposons. With this comparative study a relatively complete database can be constructed and used to transform hereditary material, thereby prompting the high expression of toxic proteins. A theoretical basis is provided for constructing highly virulent engineered bacteria and for

  20. An Ordered Comparative Map of the Cattle and Human Genomes

    PubMed Central

    Band, Mark R.; Larson, Joshua H.; Rebeiz, Mark; Green, Cheryl A.; Heyen, D. Wayne; Donovan, Jena; Windish, Ryan; Steining, Chad; Mahyuddin, Prapti; Womack, James E.; Lewin, Harris A.

    2000-01-01

    A cattle–human whole-genome comparative map was constructed using parallel radiation hybrid (RH) mapping in conjunction with EST sequencing, database mining for unmapped cattle genes, and a predictive bioinformatics approach (COMPASS) for targeting specific homologous regions. A total of 768 genes were placed on the RH map in addition to 319 microsatellites used as anchor markers. Of these, 638 had human orthologs with mapping data, thus permitting construction of an ordered comparative map. The large number of ordered loci revealed ⋝ 105 conserved segments between the two genomes. The comparative map suggests that 41 translocation events, a minimum of 54 internal rearrangements, and repositioning of all but one centromere can account for the observed organizations of the cattle and human genomes. In addition, the COMPASS in silico mapping tool was shown to be 95% accurate in its ability to predict cattle chromosome location from random sequence data, demonstrating this tool to be valuable for efficient targeting of specific regions for detailed mapping. The comparative map generated will be a cornerstone for elucidating mammalian chromosome phylogeny and the identification of genes of agricultural importance.“Ought we, for instance, to begin by discussing each separate species—in virtue of some common element of their nature, and proceed from this as a basis for the consideration of them separately?” from Aristotle, On the Parts of Animals, 350 B.C.E. [The sequence data described in this paper have been submitted to the GenBank data library under accession nos. AW244888-AW244897, AW261132-AW261195, AW266849-AW267161, AW289175-AW289430, AW428566-AW428607, AW621146, AW621147.] PMID:10984454

  1. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus.

    PubMed

    Li, Fagen; Zhou, Changpin; Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10-56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa.

  2. Comparative Genomics Analyses Reveal Extensive Chromosome Colinearity and Novel Quantitative Trait Loci in Eucalyptus

    PubMed Central

    Weng, Qijie; Li, Mei; Yu, Xiaoli; Guo, Yong; Wang, Yu; Zhang, Xiaohong; Gan, Siming

    2015-01-01

    Dense genetic maps, along with quantitative trait loci (QTLs) detected on such maps, are powerful tools for genomics and molecular breeding studies. In the important woody genus Eucalyptus, the recent release of E. grandis genome sequence allows for sequence-based genomic comparison and searching for positional candidate genes within QTL regions. Here, dense genetic maps were constructed for E. urophylla and E. tereticornis using genomic simple sequence repeats (SSR), expressed sequence tag (EST) derived SSR, EST-derived cleaved amplified polymorphic sequence (EST-CAPS), and diversity arrays technology (DArT) markers. The E. urophylla and E. tereticornis maps comprised 700 and 585 markers across 11 linkage groups, totaling at 1,208.2 and 1,241.4 cM in length, respectively. Extensive synteny and colinearity were observed as compared to three earlier DArT-based eucalypt maps (two maps with E. grandis × E. urophylla and one map of E. globulus) and with the E. grandis genome sequence. Fifty-three QTLs for growth (10–56 months of age) and wood density (56 months) were identified in 22 discrete regions on both maps, in which only one colocalizaiton was found between growth and wood density. Novel QTLs were revealed as compared with those previously detected on DArT-based maps for similar ages in Eucalyptus. Eleven to 585 positional candidate genes were obained for a 56-month-old QTL through aligning QTL confidence interval with the E. grandis genome. These results will assist in comparative genomics studies, targeted gene characterization, and marker-assisted selection in Eucalyptus and the related taxa. PMID:26695430

  3. Genomic and proteomic expression analysis of Leishmania promastigote and amastigote life stages: the Leishmania genome is constitutively expressed.

    PubMed

    Leifso, Kirk; Cohen-Freue, Gabriela; Dogra, Nisha; Murray, Angus; McMaster, W Robert

    2007-03-01

    Leishmania are protozoan parasites that cause a wide spectrum of clinical diseases in humans and are a major public health risk in several countries. Leishmania life cycle consists of an extracellular flagellated promastigote stage within the midgut of a sandfly vector, and a morphological distinct intracellular amastigote stage within macrophages of a mammalian host. This study reports the use of DNA oligonucleotide genome microarrays representing 8160 genes to analyze the mRNA expression profiles of L. major promastigotes and lesion derived amastigotes. Over 94% of the genes were expressed in both life stages. Advanced statistical analysis identified a surprisingly low degree of differential mRNA expression: 1.4% of the total genes in amastigotes and 1.5% in promastigotes. These microarray results demonstrate that the L. major genome is essentially constitutively expressed in both life stages and suggest that Leishmania is constitutively adapted for survival and replication in either the sandfly vector or macrophage host utilizing an appropriate set of genes for each vastly different environment. Quantitative proteomics, using the isotope coded affinity tag (ICAT) technology and mass spectrometry, was used to identify L. infantum promastigote and axenic amastigote differentially expressed proteins. Of the 91 distinct proteins identified, 8% were differentially expressed in the amastigote stage, 20% were differentially expressed in the promastigote stage, and the remaining 72% were considered constitutively expressed. The differential expression was validated by the identification of previously reported stage specific proteins and identified several amastigote and promastigote novel stage specific proteins.

  4. Comparative Genomics of Flatworms (Platyhelminthes) Reveals Shared Genomic Features of Ecto- and Endoparastic Neodermata

    PubMed Central

    Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

    2014-01-01

    The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host–parasite interactions and speciation in the highly diverse monogenean flatworms. PMID:24732282

  5. Comparative genomics of flatworms (platyhelminthes) reveals shared genomic features of ecto- and endoparastic neodermata.

    PubMed

    Hahn, Christoph; Fromm, Bastian; Bachmann, Lutz

    2014-05-01

    The ectoparasitic Monogenea comprise a major part of the obligate parasitic flatworm diversity. Although genomic adaptations to parasitism have been studied in the endoparasitic tapeworms (Cestoda) and flukes (Trematoda), no representative of the Monogenea has been investigated yet. We present the high-quality draft genome of Gyrodactylus salaris, an economically important monogenean ectoparasite of wild Atlantic salmon (Salmo salar). A total of 15,488 gene models were identified, of which 7,102 were functionally annotated. The controversial phylogenetic relationships within the obligate parasitic Neodermata were resolved in a phylogenomic analysis using 1,719 gene models (alignment length of >500,000 amino acids) for a set of 16 metazoan taxa. The Monogenea were found basal to the Cestoda and Trematoda, which implies ectoparasitism being plesiomorphic within the Neodermata and strongly supports a common origin of complex life cycles. Comparative analysis of seven parasitic flatworm genomes identified shared genomic features for the ecto- and endoparasitic lineages, such as a substantial reduction of the core bilaterian gene complement, including the homeodomain-containing genes, and a loss of the piwi and vasa genes, which are considered essential for animal development. Furthermore, the shared loss of functional fatty acid biosynthesis pathways and the absence of peroxisomes, the latter organelles presumed ubiquitous in eukaryotes except for parasitic protozoans, were inferred. The draft genome of G. salaris opens for future in-depth analyses of pathogenicity and host specificity of poorly characterized G. salaris strains, and will enhance studies addressing the genomics of host-parasite interactions and speciation in the highly diverse monogenean flatworms.

  6. Genome Sequence of Desulfurella amilsii Strain TR1 and Comparative Genomics of Desulfurellaceae Family

    PubMed Central

    Florentino, Anna P.; Stams, Alfons J. M.; Sánchez-Andrea, Irene

    2017-01-01

    The acidotolerant sulfur reducer Desulfurella amilsii was isolated from sediments of Tinto River, an extremely acidic environment. Its ability to grow in a broad range of pH and to tolerate certain heavy metals offers potential for metal recovery processes. Here we report its high-quality draft genome sequence and compare it to the available genome sequences of other members of Desulfurellaceae family: D. acetivorans. D. multipotens, Hippea maritima. H. alviniae, H. medeae, and H. jasoniae. For most species, pairwise comparisons for average nucleotide identity (ANI) and in silico DNA–DNA hybridization (DDH) revealed ANI values from 67.5 to 80% and DDH values from 12.9 to 24.2%. D. acetivorans and D. multipotens, however, surpassed the estimated thresholds of species definition for both DDH (98.6%) and ANI (88.1%). Therefore, they should be merged to a single species. Comparative analysis of Desulfurellaceae genomes revealed different gene content for sulfur respiration between Desulfurella and Hippea species. Sulfur reductase is only encoded in D. amilsii, in which it is suggested to play a role in sulfur respiration, especially at low pH. Polysulfide reductase is only encoded in Hippea species; it is likely that this genus uses polysulfide as electron acceptor. Genes encoding thiosulfate reductase are present in all the genomes, but dissimilatory sulfite reductase is only present in Desulfurella species. Thus, thiosulfate respiration via sulfite is only likely in this genus. Although sulfur disproportionation occurs in Desulfurella species, the molecular mechanism behind this process is not yet understood, hampering a genome prediction. The metabolism of acetate in Desulfurella species can occur via the acetyl-CoA synthetase or via acetate kinase in combination with phosphate acetyltransferase, while in Hippea species, it might occur via the acetate kinase. Large differences in gene sets involved in resistance to acidic conditions were not detected among the

  7. Comparative Genomics of the Listeria monocytogenes ST204 Subgroup.

    PubMed

    Fox, Edward M; Allnutt, Theodore; Bradbury, Mark I; Fanning, Séamus; Chandry, P Scott

    2016-01-01

    The ST204 subgroup of Listeria monocytogenes is among the most frequently isolated in Australia from a range of environmental niches. In this study we provide a comparative genomics analysis of food and food environment isolates from geographically diverse sources. Analysis of the ST204 genomes showed a highly conserved core genome with the majority of variation seen in mobile genetic elements such as plasmids, transposons and phage insertions. Most strains (13/15) harbored plasmids, which although varying in size contained highly conserved sequences. Interestingly 4 isolates contained a conserved plasmid of 91,396 bp. The strains examined were isolated over a period of 12 years and from different geographic locations suggesting plasmids are an important component of the genetic repertoire of this subgroup and may provide a range of stress tolerance mechanisms. In addition to this 4 phage insertion sites and 2 transposons were identified among isolates, including a novel transposon. These genetic elements were highly conserved across isolates that harbored them, and also contained a range of genetic markers linked to stress tolerance and virulence. The maintenance of conserved mobile genetic elements in the ST204 population suggests these elements may contribute to the diverse range of niches colonized by ST204 isolates. Environmental stress selection may contribute to maintaining these genetic features, which in turn may be co-selecting for virulence markers relevant to clinical infection with ST204 isolates.

  8. Comparative genomics of the mimicry switch in Papilio dardanus

    PubMed Central

    Timmermans, Martijn J. T. N.; Baxter, Simon W.; Clark, Rebecca; Heckel, David G.; Vogel, Heiko; Collins, Steve; Papanicolaou, Alexie; Fukova, Iva; Joron, Mathieu; Thompson, Martin J.; Jiggins, Chris D.; ffrench-Constant, Richard H.; Vogler, Alfried P.

    2014-01-01

    The African Mocker Swallowtail, Papilio dardanus, is a textbook example in evolutionary genetics. Classical breeding experiments have shown that wing pattern variation in this polymorphic Batesian mimic is determined by the polyallelic H locus that controls a set of distinct mimetic phenotypes. Using bacterial artificial chromosome (BAC) sequencing, recombination analyses and comparative genomics, we show that H co-segregates with an interval of less than 500 kb that is collinear with two other Lepidoptera genomes and contains 24 genes, including the transcription factor genes engrailed (en) and invected (inv). H is located in a region of conserved gene order, which argues against any role for genomic translocations in the evolution of a hypothesized multi-gene mimicry locus. Natural populations of P. dardanus show significant associations of specific morphs with single nucleotide polymorphisms (SNPs), centred on en. In addition, SNP variation in the H region reveals evidence of non-neutral molecular evolution in the en gene alone. We find evidence for a duplication potentially driving physical constraints on recombination in the lamborni morph. Absence of perfect linkage disequilibrium between different genes in the other morphs suggests that H is limited to nucleotide positions in the regulatory and coding regions of en. Our results therefore support the hypothesis that a single gene underlies wing pattern variation in P. dardanus. PMID:24920480

  9. Comparative genomics of the mimicry switch in Papilio dardanus.

    PubMed

    Timmermans, Martijn J T N; Baxter, Simon W; Clark, Rebecca; Heckel, David G; Vogel, Heiko; Collins, Steve; Papanicolaou, Alexie; Fukova, Iva; Joron, Mathieu; Thompson, Martin J; Jiggins, Chris D; ffrench-Constant, Richard H; Vogler, Alfried P

    2014-07-22

    The African Mocker Swallowtail, Papilio dardanus, is a textbook example in evolutionary genetics. Classical breeding experiments have shown that wing pattern variation in this polymorphic Batesian mimic is determined by the polyallelic H locus that controls a set of distinct mimetic phenotypes. Using bacterial artificial chromosome (BAC) sequencing, recombination analyses and comparative genomics, we show that H co-segregates with an interval of less than 500 kb that is collinear with two other Lepidoptera genomes and contains 24 genes, including the transcription factor genes engrailed (en) and invected (inv). H is located in a region of conserved gene order, which argues against any role for genomic translocations in the evolution of a hypothesized multi-gene mimicry locus. Natural populations of P. dardanus show significant associations of specific morphs with single nucleotide polymorphisms (SNPs), centred on en. In addition, SNP variation in the H region reveals evidence of non-neutral molecular evolution in the en gene alone. We find evidence for a duplication potentially driving physical constraints on recombination in the lamborni morph. Absence of perfect linkage disequilibrium between different genes in the other morphs suggests that H is limited to nucleotide positions in the regulatory and coding regions of en. Our results therefore support the hypothesis that a single gene underlies wing pattern variation in P. dardanus.

  10. Comparative analysis of essential genes in prokaryotic genomic islands

    PubMed Central

    Zhang, Xi; Peng, Chong; Zhang, Ge; Gao, Feng

    2015-01-01

    Essential genes are thought to encode proteins that carry out the basic functions to sustain a cellular life, and genomic islands (GIs) usually contain clusters of horizontally transferred genes. It has been assumed that essential genes are not likely to be located in GIs, but systematical analysis of essential genes in GIs has not been explored before. Here, we have analyzed the essential genes in 28 prokaryotes by statistical method and reached a conclusion that essential genes in GIs are significantly fewer than those outside GIs. The function of 362 essential genes found in GIs has been explored further by BLAST against the Virulence Factor Database (VFDB) and the phage/prophage sequence database of PHAge Search Tool (PHAST). Consequently, 64 and 60 eligible essential genes are found to share the sequence similarity with the virulence factors and phage/prophages-related genes, respectively. Meanwhile, we find several toxin-related proteins and repressors encoded by these essential genes in GIs. The comparative analysis of essential genes in genomic islands will not only shed new light on the development of the prediction algorithm of essential genes, but also give a clue to detect the functionality of essential genes in genomic islands. PMID:26223387

  11. Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium

    SciTech Connect

    Ma, Li Jun; van der Does, H. C.; Borkovich, Katherine A.; Coleman, Jeffrey J.; Daboussi, Marie-Jose; Di Pietro, Antonio; Dufresne, Marie; Freitag, Michael; Grabherr, Manfred; Henrissat, Bernard; Houterman, Petra M.; Kang, Seogchan; Shim, Won-Bo; Wolochuk, Charles; Xie, Xiaohui; Xu, Jin Rong; Antoniw, John; Baker, Scott E.; Bluhm, Burton H.; Breakspear, Andrew; Brown, Daren W.; Butchko, Robert A.; Chapman, Sinead; Coulson, Richard; Coutinho, Pedro M.; Danchin, Etienne G.; Diener, Andrew; Gale, Liane R.; Gardiner, Donald; Goff, Steven; Hammond-Kossack, Kim; Hilburn, Karen; Hua-Van, Aurelie; Jonkers, Wilfried; Kazan, Kemal; Kodira, Chinnappa D.; Koehrsen, Michael; Kumar, Lokesh; Lee, Yong Hwan; Li, Liande; Manners, John M.; Miranda-Saavedra, Diego; Mukherjee, Mala; Park, Gyungsoon; Park, Jongsun; Park, Sook Young; Proctor, Robert H.; Regev, Aviv; Ruiz-Roldan, M. C.; Sain, Divya; Sakthikumar, Sharadha; Sykes, Sean; Schwartz, David C.; Turgeon, Barbara G.; Wapinski, Ilan; Yoder, Olen; Young, Sarah; Zeng, Qiandong; Zhou, Shiguo; Galagan, James; Cuomo, Christina A.; Kistler, H. Corby; Rep, Martijn

    2010-03-18

    Fusarium species are among the most important phytopathogenic and toxigenic fungi, having significant impact on crop production and animal health. Distinctively, members of the F. oxysporum species complex exhibit wide host range but discontinuously distributed host specificity, reflecting remarkable genetic adaptability. To understand the molecular underpinnings of diverse phenotypic traits and their evolution in Fusarium, we compared the genomes of three economically important and phylogenetically related, yet phenotypically diverse plant-pathogenic species, F. graminearum, F. verticillioides and F. oxysporum f. sp. lycopersici. Our analysis revealed greatly expanded lineage-specific (LS) genomic regions in F. oxysporum that include four entire chromosomes, accounting for more than one-quarter of the genome. LS regions are rich in transposons and genes with distinct evolutionary profiles but related to pathogenicity. Experimentally, we demonstrate for the first time the transfer of two LS chromosomes between strains of F. oxysporum, resulting in the conversion of a non-pathogenic strain into a pathogen. Transfer of LS chromosomes between otherwise genetically isolated strains explains the polyphyletic origin of host specificity and the emergence of new pathogenic lineages in the F. oxysporum species complex, putting the evolution of fungal pathogenicity into a new perspective.

  12. Comparative Genomics of the Listeria monocytogenes ST204 Subgroup

    PubMed Central

    Fox, Edward M.; Allnutt, Theodore; Bradbury, Mark I.; Fanning, Séamus; Chandry, P. Scott

    2016-01-01

    The ST204 subgroup of Listeria monocytogenes is among the most frequently isolated in Australia from a range of environmental niches. In this study we provide a comparative genomics analysis of food and food environment isolates from geographically diverse sources. Analysis of the ST204 genomes showed a highly conserved core genome with the majority of variation seen in mobile genetic elements such as plasmids, transposons and phage insertions. Most strains (13/15) harbored plasmids, which although varying in size contained highly conserved sequences. Interestingly 4 isolates contained a conserved plasmid of 91,396 bp. The strains examined were isolated over a period of 12 years and from different geographic locations suggesting plasmids are an important component of the genetic repertoire of this subgroup and may provide a range of stress tolerance mechanisms. In addition to this 4 phage insertion sites and 2 transposons were identified among isolates, including a novel transposon. These genetic elements were highly conserved across isolates that harbored them, and also contained a range of genetic markers linked to stress tolerance and virulence. The maintenance of conserved mobile genetic elements in the ST204 population suggests these elements may contribute to the diverse range of niches colonized by ST204 isolates. Environmental stress selection may contribute to maintaining these genetic features, which in turn may be co-selecting for virulence markers relevant to clinical infection with ST204 isolates. PMID:28066377

  13. Detection of aneuploidy in single cells using comparative genomic hybridization.

    PubMed

    Voullaire, L; Wilton, L; Slater, H; Williamson, R

    1999-09-01

    The ability of comparative genomic hybridization (CGH) to detect aneuploidy following universal amplification of DNA from a single cell, or a small number of cells, was investigated with a view to preimplantation diagnosis following in vitro fertilization, and prenatal diagnosis using fetal erythroblasts obtained from maternal blood. The DNA obtained from lysed single cells was amplified using degenerate oligonucleotide-primed PCR (DOP-PCR). This product was labelled using nick translation and hybridized together with normal reference genomic DNA. The CGH fluorescent ratio profiles obtained could be used to determine aneuploidy with cut-off thresholds of 0.75 and 1.25. Deviation in the profiles in the heterochromatic regions was reduced by using, as a reference sample, normal genomic DNA that had also undergone DOP-PCR. Single cells known to be trisomic for chromosomes 13, 18 or 21 were analysed using this technique. The resolution of CGH with amplified DNA from a single cell is of the order of 40 Mb, sufficient for the diagnosis of trisomy 21, and possibly segmental aneuploidy of equivalent size. These results, and those of others, demonstrate that diagnosis of chromosomal aneuploidy in single cells is possible using CGH with DOP-PCR amplified DNA.

  14. The infectious BAC genomic DNA expression library: a high capacity vector system for functional genomics

    PubMed Central

    Lufino, Michele M. P.; Edser, Pauline A. H.; Quail, Michael A.; Rice, Stephen; Adams, David J.; Wade-Martins, Richard

    2016-01-01

    Gene dosage plays a critical role in a range of cellular phenotypes, yet most cellular expression systems use heterologous cDNA-based vectors which express proteins well above physiological levels. In contrast, genomic DNA expression vectors generate physiologically-relevant levels of gene expression by carrying the whole genomic DNA locus of a gene including its regulatory elements. Here we describe the first genomic DNA expression library generated using the high-capacity herpes simplex virus-1 amplicon technology to deliver bacterial artificial chromosomes (BACs) into cells by viral transduction. The infectious BAC (iBAC) library contains 184,320 clones with an average insert size of 134.5 kb. We show in a Chinese hamster ovary (CHO) disease model cell line and mouse embryonic stem (ES) cells that this library can be used for genetic rescue studies in a range of contexts including the physiological restoration of Ldlr deficiency, and viral receptor expression. The iBAC library represents an important new genetic analysis tool openly available to the research community. PMID:27353647

  15. Comparative genomic analysis of Chlamydia trachomatis oculotropic and genitotropic strains.

    PubMed

    Carlson, John H; Porcella, Stephen F; McClarty, Grant; Caldwell, Harlan D

    2005-10-01

    Chlamydia trachomatis infection is an important cause of preventable blindness and sexually transmitted disease (STD) in humans. C. trachomatis exists as multiple serovariants that exhibit distinct organotropism for the eye or urogenital tract. We previously reported tissue-tropic correlations with the presence or absence of a functional tryptophan synthase and a putative GTPase-inactivating domain of the chlamydial toxin gene. This suggested that these genes may be the primary factors responsible for chlamydial disease organotropism. To test this hypothesis, the genome of an oculotropic trachoma isolate (A/HAR-13) was sequenced and compared to the genome of a genitotropic (D/UW-3) isolate. Remarkably, the genomes share 99.6% identity, supporting the conclusion that a functional tryptophan synthase enzyme and toxin might be the principal virulence factors underlying disease organotropism. Tarp (translocated actin-recruiting phosphoprotein) was identified to have variable numbers of repeat units within the N and C portions of the protein. A correlation exists between lymphogranuloma venereum serovars and the number of N-terminal repeats. Single-nucleotide polymorphism (SNP) analysis between the two genomes highlighted the minimal genetic variation. A disproportionate number of SNPs were observed within some members of the polymorphic membrane protein (pmp) autotransporter gene family that corresponded to predicted T-cell epitopes that bind HLA class I and II alleles. These results implicate Pmps as novel immune targets, which could advance future chlamydial vaccine strategies. Lastly, a novel target for PCR diagnostics was discovered that can discriminate between ocular and genital strains. This discovery will enhance epidemiological investigations in nations where both trachoma and chlamydial STD are endemic.

  16. Reconstructing the Evolution of Brachypodium Genomes Using Comparative Chromosome Painting

    PubMed Central

    Betekhtin, Alexander; Jenkins, Glyn; Hasterok, Robert

    2014-01-01

    Brachypodium distachyon is a model for the temperate cereals and grasses and has a biology, genomics infrastructure and cytogenetic platform fit for purpose. It is a member of a genus with fewer than 20 species, which have different genome sizes, basic chromosome numbers and ploidy levels. The phylogeny and interspecific relationships of this group have not to date been resolved by sequence comparisons and karyotypical studies. The aims of this study are not only to reconstruct the evolution of Brachypodium karyotypes to resolve the phylogeny, but also to highlight the mechanisms that shape the evolution of grass genomes. This was achieved through the use of comparative chromosome painting (CCP) which hybridises fluorescent, chromosome-specific probes derived from B. distachyon to homoeologous meiotic chromosomes of its close relatives. The study included five diploids (B. distachyon 2n = 10, B. sylvaticum 2n = 18, B. pinnatum 2n = 16; 2n = 18, B. arbuscula 2n = 18 and B. stacei 2n = 20) three allotetraploids (B. pinnatum 2n = 28, B. phoenicoides 2n = 28 and B. hybridum 2n = 30), and two species of unknown ploidy (B. retusum 2n = 38 and B. mexicanum 2n = 40). On the basis of the patterns of hybridisation and incorporating published data, we propose two alternative, but similar, models of karyotype evolution in the genus Brachypodium. According to the first model, the extant genome of B. distachyon derives from B. mexicanum or B. stacei by several rounds of descending dysploidy, and the other diploids evolve from B. distachyon via ascending dysploidy. The allotetraploids arise by interspecific hybridisation and chromosome doubling between B. distachyon and other diploids. The second model differs from the first insofar as it incorporates an intermediate 2n = 18 species between the B. mexicanum or B. stacei progenitors and the dysploidic B. distachyon. PMID:25493646

  17. Comparative 3D genome structure analysis of the fission and the budding yeast.

    PubMed

    Gong, Ke; Tjong, Harianto; Zhou, Xianghong Jasmine; Alber, Frank

    2015-01-01

    We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species.

  18. Comparative 3D Genome Structure Analysis of the Fission and the Budding Yeast

    PubMed Central

    Gong, Ke; Tjong, Harianto; Zhou, Xianghong Jasmine; Alber, Frank

    2015-01-01

    We studied the 3D structural organization of the fission yeast genome, which emerges from the tethering of heterochromatic regions in otherwise randomly configured chromosomes represented as flexible polymer chains in an nuclear environment. This model is sufficient to explain in a statistical manner many experimentally determined distinctive features of the fission yeast genome, including chromatin interaction patterns from Hi-C experiments and the co-locations of functionally related and co-expressed genes, such as genes expressed by Pol-III. Our findings demonstrate that some previously described structure-function correlations can be explained as a consequence of random chromatin collisions driven by a few geometric constraints (mainly due to centromere-SPB and telomere-NE tethering) combined with the specific gene locations in the chromosome sequence. We also performed a comparative analysis between the fission and budding yeast genome structures, for which we previously detected a similar organizing principle. However, due to the different chromosome sizes and numbers, substantial differences are observed in the 3D structural genome organization between the two species, most notably in the nuclear locations of orthologous genes, and the extent of nuclear territories for genes and chromosomes. However, despite those differences, remarkably, functional similarities are maintained, which is evident when comparing spatial clustering of functionally related genes in both yeasts. Functionally related genes show a similar spatial clustering behavior in both yeasts, even though their nuclear locations are largely different between the yeast species. PMID:25799503

  19. Whole genome comparative analysis of four Georgian grape cultivars.

    PubMed

    Tabidze, V; Pipia, I; Gogniashvili, M; Kunelauri, N; Ujmajuridze, L; Pirtskhalava, M; Vishnepolsky, B; Hernandez, A G; Fields, C J; Beridze, Tengiz

    2017-08-07

    Grapevine is the one of the most important fruit species in the world. Comparative genome sequencing of grape cultivars is very important for the interpretation of the grape genome and understanding its evolution. The genomes of four Georgian grape cultivars-Chkhaveri, Saperavi, Meskhetian green, and Rkatsiteli, belonging to different haplogroups, were resequenced. The shotgun genomic libraries of grape cultivars were sequenced on an Illumina HiSeq. Pinot Noir nuclear, mitochondrial, and chloroplast DNA were used as reference. Mitochondrial DNA of Chkhaveri closely matches that of the reference Pinot noir mitochondrial DNA, with the exception of 16 SNPs found in the Chkhaveri mitochondrial DNA. The number of SNPs in mitochondrial DNA from Saperavi, Meskhetian green, and Rkatsiteli was 764, 702, and 822, respectively. Nuclear DNA differs from the reference by 1,800,675 nt in Chkhaveri, 1,063,063 nt in Meskhetian green, 2,174,995 in Saperavi, and 5,011,513 in Rkatsiteli. Unlike mtDNA Pinot noir, chromosomal DNA is closer to the Meskhetian green than to other cultivars. Substantial differences in the number of SNPs in mitochondrial and nuclear DNA of Chkhaveri and Pinot noir cultivars are explained by backcrossing or introgression of their wild predecessors before or during the process of domestication. Annotation of chromosomal DNA of Georgian grape cultivars by MEGANTE, a web-based annotation system, shows 66,745 predicted genes (Chkhaveri-17,409; Saperavi-17,021; Meskhetian green-18,355; and Rkatsiteli-13,960). Among them, 106 predicted genes and 43 pseudogenes of terpene synthase genes were found in chromosomes 12, 18 random (18R), and 19. Four novel TPS genes not present in reference Pinot noir DNA were detected. Two of them-germacrene A synthase (Chromosome 18R) and (-) germacrene D synthase (Chromosome 19) can be identified as putatively full-length proteins. This work performs the first attempt of the comparative whole genome analysis of different haplogroups

  20. Genomic Sequencing of Orientia tsutsugamushi Strain Karp, an Assembly Comparable to the Genome Size of the Strain Ikeda

    PubMed Central

    Liao, Hsiao-Mei; Chao, Chien-Chung; Lei, Haiyan; Li, Bingjie; Tsai, Shien; Hung, Guo-Chiuan

    2016-01-01

    Orientia tsutsugamushi, an intracellular bacterium, belongs to the family Rickettsiaceae. This study presents the draft genome sequence of strain Karp, with 2.0 Mb as the size of the completed genome. This nearly finished draft genome sequence was annotated with the RAST server and the contents compared to those of the other strains. PMID:27540052

  1. Complete chloroplast genome sequence of Elodea canadensis and comparative analyses with other monocot plastid genomes.

    PubMed

    Huotari, Tea; Korpelainen, Helena

    2012-10-15

    Elodea canadensis is an aquatic angiosperm native to North America. It has attracted great attention due to its invasive nature when transported to new areas in its non-native range. We have determined the complete nucleotide sequence of the chloroplast (cp) genome of Elodea. Taxonomically Elodea is a basal monocot, and only few monocot cp genomes representing early lineages of monocots have been sequenced so far. The genome is a circular double-stranded DNA molecule 156,700 bp in length, and has a typical structure with large (LSC 86,194 bp) and small (SSC 17,810 bp) single-copy regions separated by a pair of inverted repeats (IRs 26,348 bp each). The Elodea cp genome contains 113 unique genes and 16 duplicated genes in the IR regions. A comparative analysis showed that the gene order and organization of the Elodea cp genome is almost identical to that of Amborella trichopoda, a basal angiosperm. The structure of IRs in Elodea is unique among monocot species with the whole cp genome sequenced. In Elodea and another monocot Lemna minor the borders between IRs and LSC are located upstream of rps 19 gene and downstream of trnH-GUG gene, while in most monocots, IR has extended to include both trnH and rps 19 genes. A phylogenetic analysis conducted using Bayesian method, based on the DNA sequences of 81 chloroplast genes from 17 monocot taxa provided support for the placement of Elodea together with Lemna as a basal monocot and the next diverging lineage of monocots after Acorales. In comparison with other monocots, the Elodea cp genome has gone through only few rearrangements or gene losses. IR of Elodea has a unique structure among the monocot species studied so far as its structure is similar to that of a basal angiosperm Amborella. This result together with phylogenetic analyses supports the placement of Elodea as a basal monocot to the next diverging lineage of monocots after Acorales. So far, only few cp genomes representing early lineages of monocots have been

  2. Characterization of copy number variation in genomic regions containing STR loci using array comparative genomic hybridization.

    PubMed

    Repnikova, Elena A; Rosenfeld, Jill A; Bailes, Andrea; Weber, Cecilia; Erdman, Linda; McKinney, Aimee; Ramsey, Sarah; Hashimoto, Sayaka; Lamb Thrush, Devon; Astbury, Caroline; Reshmi, Shalini C; Shaffer, Lisa G; Gastier-Foster, Julie M; Pyatt, Robert E

    2013-09-01

    Short tandem repeat (STR) loci are commonly used in forensic casework, familial analysis for human identification, and for monitoring hematopoietic cell engraftment after bone marrow transplant. Unexpected genetic variation leading to sequence and length differences in STR loci can complicate STR typing, and presents challenges in casework interpretation. Copy number variation (CNV) is a relatively recently identified form of genetic variation consisting of genomic regions present at variable copy numbers within an individual compared to a reference genome. Large scale population studies have demonstrated that likely all individuals carry multiple regions with CNV of 1kb in size or greater in their genome. To date, no study correlating genomic regions containing STR loci with CNV has been conducted. In this study, we analyzed results from 32,850 samples sent for clinical array comparative genomic hybridization (CGH) analysis for the presence of CNV at regions containing the 13 CODIS (Combined DNA Index System) STR, and the Amelogenin X (AMELX) and Amelogenin Y (AMELY) loci. Thirty-two individuals with CNV involving STR loci on chromosomes 2, 4, 7, 11, 12, 13, 16, and 21, and twelve with CNV involving the AMELX/AMELY loci were identified. These results were correlated with data from publicly available databases housing information on CNV identified in normal populations and additional clinical cases. These collective results demonstrate the presence of CNV in regions containing 9 of the 13 CODIS STR and AMELX/Y loci. Further characterization of STR profiles within regions of CNV, additional cataloging of these variants in multiple populations, and contributing such examples to the public domain will provide valuable information for reliable use of these loci.

  3. Comparative analysis of period genes in teleost fish genomes.

    PubMed

    Wang, Han

    2008-07-01

    Period (Per) is a canonical circadian clock gene. The fruit fly, an invertebrate, has one per gene, while the human, a tetrapod vertebrate, has three Per genes. Per1, Per2, and Per3 of the tetrapods were generated from two rounds of ancient genome duplications from the ancestral chordate Per gene. Searching for five teleost fish genomes in a combination of phylogenetic, splicing site, and syntenic analyses revealed that zebrafish have two per1 genes, per1a and per1b, one per2, and one per3; medaka, fugu, and tetraodon each have two per2 genes, per2a and per2b, one per1, and one per3; sticklebacks also have per2a, per2b, and one per1 but lack per3; and per1a/per1b in zebrafish and per2a/per2b in madaka, fugu, tetraodon, and stickleback are ancient duplicates. While the dN/dS ratios of the five fish per duplicates are all <1, suggesting that they likely have been subject to purifying selection, the Tajima relative rate test showed that zebrafish per1a/per1b and fugu and medaka per2a/per2b have asymmetric evolutionary rates, implicating that one of these duplicates might have been under positive selection or relaxed functional constraint. Further, in situ hybridization showed that zebrafish per1a and per1b clearly have distinct patterns of temporal and spatial expression. These results support the notion that extra copies of teleost per genes were generated from the fish-specific genome duplication, and divergent resolution after the duplication resulted in retention of different per duplicates in different fish, most of which have diverged significantly.

  4. Comparative genomic hybridisation identifies two variants of smoldering multiple myeloma.

    PubMed

    Rosiñol, Laura; Carrió, Ana; Bladé, Joan; Queralt, Rosa; Aymerich, Marta; Cibeira, Ma Teresa; Esteve, Jordi; Rozman, Maria; Campo, Elías; Montserrat, Emili

    2005-09-01

    Two variants of smoldering multiple myeloma (SMM) have been recognised: (i) an evolving type, characterised by a progressive increase in the M-protein size and short time to progression to overt multiple myeloma (MM) and (ii) a non-evolving type, with a long-lasting, stable M-protein and longer time to progression. Comparative genomic hybridisation (CGH) analyses in both subtypes of SMM (seven evolving and eight non-evolving SMM) were performed. Evolving SMM showed cytogenetic changes consistent with those found in de novo symptomatic MM (1q gains, chromosome 13 deletions) while the non-evolving variant showed no 1q gains and deletions were uncommon.

  5. Comparative genomics of Blattabacterium cuenoti: the frozen legacy of an ancient endosymbiont genome.

    PubMed

    Patiño-Navarrete, Rafael; Moya, Andrés; Latorre, Amparo; Peretó, Juli

    2013-01-01

    Many insect species have established long-term symbiotic relationships with intracellular bacteria. Symbiosis with bacteria has provided insects with novel ecological capabilities, which have allowed them colonize previously unexplored niches. Despite its importance to the understanding of the emergence of biological complexity, the evolution of symbiotic relationships remains hitherto a mystery in evolutionary biology. In this study, we contribute to the investigation of the evolutionary leaps enabled by mutualistic symbioses by sequencing the genome of Blattabacterium cuenoti, primary endosymbiont of the omnivorous cockroach Blatta orientalis, and one of the most ancient symbiotic associations. We perform comparative analyses between the Blattabacterium cuenoti genome and that of previously sequenced endosymbionts, namely those from the omnivorous hosts the Blattella germanica (Blattelidae) and Periplaneta americana (Blattidae), and the endosymbionts harbored by two wood-feeding hosts, the subsocial cockroach Cryptocercus punctulatus (Cryptocercidae) and the termite Mastotermes darwiniensis (Termitidae). Our study shows a remarkable evolutionary stasis of this symbiotic system throughout the evolutionary history of cockroaches and the deepest branching termite M. darwiniensis, in terms of not only chromosome architecture but also gene content, as revealed by the striking conservation of the Blattabacterium core genome. Importantly, the architecture of central metabolic network inferred from the endosymbiont genomes was established very early in Blattabacterium evolutionary history and could be an outcome of the essential role played by this endosymbiont in the host's nitrogen economy.

  6. Comparative Genomics of Blattabacterium cuenoti: The Frozen Legacy of an Ancient Endosymbiont Genome

    PubMed Central

    Patiño-Navarrete, Rafael; Moya, Andrés; Latorre, Amparo; Peretó, Juli

    2013-01-01

    Many insect species have established long-term symbiotic relationships with intracellular bacteria. Symbiosis with bacteria has provided insects with novel ecological capabilities, which have allowed them colonize previously unexplored niches. Despite its importance to the understanding of the emergence of biological complexity, the evolution of symbiotic relationships remains hitherto a mystery in evolutionary biology. In this study, we contribute to the investigation of the evolutionary leaps enabled by mutualistic symbioses by sequencing the genome of Blattabacterium cuenoti, primary endosymbiont of the omnivorous cockroach Blatta orientalis, and one of the most ancient symbiotic associations. We perform comparative analyses between the Blattabacterium cuenoti genome and that of previously sequenced endosymbionts, namely those from the omnivorous hosts the Blattella germanica (Blattelidae) and Periplaneta americana (Blattidae), and the endosymbionts harbored by two wood-feeding hosts, the subsocial cockroach Cryptocercus punctulatus (Cryptocercidae) and the termite Mastotermes darwiniensis (Termitidae). Our study shows a remarkable evolutionary stasis of this symbiotic system throughout the evolutionary history of cockroaches and the deepest branching termite M. darwiniensis, in terms of not only chromosome architecture but also gene content, as revealed by the striking conservation of the Blattabacterium core genome. Importantly, the architecture of central metabolic network inferred from the endosymbiont genomes was established very early in Blattabacterium evolutionary history and could be an outcome of the essential role played by this endosymbiont in the host’s nitrogen economy. PMID:23355305

  7. Complete Genome Sequence and Comparative Genomics of Shigella flexneri Serotype 2a Strain 2457T†

    PubMed Central

    Wei, J.; Goldberg, M. B.; Burland, V.; Venkatesan, M. M.; Deng, W.; Fournier, G.; Mayhew, G. F.; Plunkett, G.; Rose, D. J.; Darling, A.; Mau, B.; Perna, N. T.; Payne, S. M.; Runyen-Janecky, L. J.; Zhou, S.; Schwartz, D. C.; Blattner, F. R.

    2003-01-01

    We determined the complete genome sequence of Shigella flexneri serotype 2a strain 2457T (4,599,354 bp). Shigella species cause >1 million deaths per year from dysentery and diarrhea and have a lifestyle that is markedly different from those of closely related bacteria, including Escherichia coli. The genome exhibits the backbone and island mosaic structure of E. coli pathogens, albeit with much less horizontally transferred DNA and lacking 357 genes present in E. coli. The strain is distinctive in its large complement of insertion sequences, with several genomic rearrangements mediated by insertion sequences, 12 cryptic prophages, 372 pseudogenes, and 195 S. flexneri-specific genes. The 2457T genome was also compared with that of a recently sequenced S. flexneri 2a strain, 301. Our data are consistent with Shigella being phylogenetically indistinguishable from E. coli. The S. flexneri-specific regions contain many genes that could encode proteins with roles in virulence. Analysis of these will reveal the genetic basis for aspects of this pathogenic organism's distinctive lifestyle that have yet to be explained. PMID:12704152

  8. Mapping Drosophila genomic aberration breakpoints with comparative genome hybridization on microarrays.

    PubMed

    Erickson, Jeremy N; Spana, Eric P

    2006-01-01

    Chromosomal aberrations are genetic "reagents" that are commonly used in Drosophila research. Stocks containing chromosomes carrying large deletions of DNA (deficiency stocks, designated Df) as well as stocks carrying an extra copy of a chromosomal region (duplication stocks, designated Dp) are essential for a variety of genetic analyses. The extent of what is deleted or duplicated has typically been determined cytologically by salivary gland polytene chromosome squashes, which identify the edges of the aberration (so-called breakpoints) of each Df or Dp at low resolution. The margin of error for this technique can be quite high, however, because it is dependent on the quality of the squash and the experience of the scientist interpreting the data. Comparative genome hybridization on microarrays provides a precise molecular method to identify which regions of the genome are deleted or duplicated in these stocks by examining a change in chromosomal ploidy across the whole genome. Furthermore, this technique allows genetic data obtained with these strains to be placed in a molecular genomic context.

  9. A comparative genomics approach to identifying the plasticity transcriptome

    PubMed Central

    Pfenning, Andreas R; Schwartz, Russell; Barth, Alison L

    2007-01-01

    Background Neuronal activity regulates gene expression to control learning and memory, homeostasis of neuronal function, and pathological disease states such as epilepsy. A great deal of experimental evidence supports the involvement of two particular transcription factors in shaping the genomic response to neuronal activity and mediating plasticity: CREB and zif268 (egr-1, krox24, NGFI-A). The gene targets of these two transcription factors are of considerable interest, since they may help develop hypotheses about how neural activity is coupled to changes in neural function. Results We have developed a computational approach for identifying binding sites for these transcription factors within the promoter regions of annotated genes in the mouse, rat, and human genomes. By combining a robust search algorithm to identify discrete binding sites, a comparison of targets across species, and an analysis of binding site locations within promoter regions, we have defined a group of candidate genes that are strong CREB- or zif268 targets and are thus regulated by neural activity. Our analysis revealed that CREB and zif268 share a disproportionate number of targets in common and that these common targets are dominated by transcription factors. Conclusion These observations may enable a more detailed understanding of the regulatory networks that are induced by neural activity and contribute to the plasticity transcriptome. The target genes identified in this study will be a valuable resource for investigators who hope to define the functions of specific genes that underlie activity-dependent changes in neuronal properties. PMID:17355637

  10. Marine invertebrate lipases: Comparative and functional genomic analysis.

    PubMed

    Rivera-Perez, Crisalejandra

    2015-09-01

    Lipases are key enzymes involved in lipid digestion, storage and mobilization of reserves during fasting or heightened metabolic demand. This is a highly conserved process, essential for survival. The genomes of five marine invertebrate species with distinctive digestive system were screened for the six major lipase families. The two most common families in marine invertebrates, the neutral an acid lipases, are also the main families in mammals and insects. The number of lipases varies two-fold across analyzed genomes. A high degree of orthology with mammalian lipases was observed. Interestingly, 19% of the marine invertebrate lipases have lost motifs required for catalysis. Analysis of the lid and loop regions of the neutral lipases suggests that many marine invertebrates have a functional triacylglycerol hydrolytic activity as well as some acid lipases. A revision of the expression profiles and functional activity on sequences in databases and scientific literature provided information regarding the function of these families of enzymes in marine invertebrates. Copyright © 2015 Elsevier Inc. All rights reserved.

  11. High-Resolution Comparative Genomic Hybridization of Inflammatory Breast Cancer and Identification of Candidate Genes

    PubMed Central

    Adelaïde, José; Ferrari, Anthony; Tarpin, Carole; Charafe-Jauffret, Emmanuelle; Charpin, Colette; Houvenaeghel, Gilles; Jacquemier, Jocelyne; Bidaut, Ghislain; Birnbaum, Daniel; Viens, Patrice; Chaffanet, Max; Bertucci, François

    2011-01-01

    Background Inflammatory breast cancer (IBC) is an aggressive form of BC poorly defined at the molecular level. We compared the molecular portraits of 63 IBC and 134 non-IBC (nIBC) clinical samples. Methodology/Findings Genomic imbalances of 49 IBCs and 124 nIBCs were determined using high-resolution array-comparative genomic hybridization, and mRNA expression profiles of 197 samples using whole-genome microarrays. Genomic profiles of IBCs were as heterogeneous as those of nIBCs, and globally relatively close. However, IBCs showed more frequent “complex” patterns and a higher percentage of genes with CNAs per sample. The number of altered regions was similar in both types, although some regions were altered more frequently and/or with higher amplitude in IBCs. Many genes were similarly altered in both types; however, more genes displayed recurrent amplifications in IBCs. The percentage of genes whose mRNA expression correlated with CNAs was similar in both types for the gained genes, but ∼7-fold lower in IBCs for the lost genes. Integrated analysis identified 24 potential candidate IBC-specific genes. Their combined expression accurately distinguished IBCs and nIBCS in an independent validation set, and retained an independent prognostic value in a series of 1,781 nIBCs, reinforcing the hypothesis for a link with IBC aggressiveness. Consistent with the hyperproliferative and invasive phenotype of IBC these genes are notably involved in protein translation, cell cycle, RNA processing and transcription, metabolism, and cell migration. Conclusions Our results suggest a higher genomic instability of IBC. We established the first repertory of DNA copy number alterations in this tumor, and provided a list of genes that may contribute to its aggressiveness and represent novel therapeutic targets. PMID:21339811

  12. Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus

    SciTech Connect

    Rodionov, Dmitry A.; Novichkov, Pavel; Stavrovskaya, Elena D.; Rodionova, Irina A.; Li, Xiaoqing; Kazanov, Marat D.; Ravcheev, Dmitry A.; Gerasimova, Anna V.; Kazakov, Alexey E.; Kovaleva, Galina Y.; Permina, Elizabeth A.; Laikova, Olga N.; Overbeek, Ross; Romine, Margaret F.; Fredrickson, Jim K.; Arkin, Adam P.; Dubchak, Inna; Osterman, Andrei L.; Gelfand, Mikhail S.

    2011-06-15

    Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. Despite the growing number of genome-scale gene expression studies, our abilities to convert the results of these studies into accurate regulatory annotations and to project them from model to other organisms are extremely limited. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. However, even orthologous regulators with conserved DNA-binding motifs may control substantially different gene sets, revealing striking differences in regulatory strategies between the Shewanella spp. and E. coli. Multiple examples of regulatory network rewiring include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), and numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. NagR for N-acetylglucosamine catabolism and PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp).

  13. High resolution comparative genomic hybridisation in clinical cytogenetics

    PubMed Central

    Kirchhoff, M.; Rose, H.; Lundsteen, C.

    2001-01-01

    High resolution comparative genomic hybridisation (HR-CGH) is a diagnostic tool in our clinical cytogenetics laboratory. The present survey reports the results of 253 clinical cases in which 47 abnormalities were detected. Among 144 dysmorphic and mentally retarded subjects with a normal conventional karyotype, 15 (10%) had small deletions or duplications, of which 11 were interstitial. In addition, a case of mosaic trisomy 9 was detected. Among 25 dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, four had deletions at translocation breakpoints and two had deletions elsewhere in the genome. Seventeen of 19 complex rearrangements were clarified by HR-CGH. A small supernumerary marker chromosome occurring with low frequency and the breakpoint of a mosaic r(18) case could not be clarified. Three of 19 other abnormalities could not be confirmed by HR-CGH. One was a Williams syndrome deletion and two were DiGeorge syndrome deletions, which were apparently below the resolution of HR-CGH. However, we were able to confirm Angelman and Prader-Willi syndrome deletions, which are about 3-5 Mb. We conclude that HR-CGH should be used for the evaluation of (1) dysmorphic and mentally retarded subjects where normal karyotyping has failed to show abnormalities, (2) dysmorphic and mentally retarded subjects carrying apparently balanced de novo translocations, (3) apparently balanced de novo translocations detected prenatally, and (4) for clarification of complex structural rearrangements.


Keywords: comparative genomic hybridisation; chromosome analysis; chromosome aberrations; dysmorphism PMID:11694545

  14. Canine urothelial carcinoma: genomically aberrant and comparatively relevant

    PubMed Central

    Shapiro, S. G.; Raghunath, S.; Williams, C.; Motsinger-Reif, A. A.; Cullen, J. M.; Liu, T.; Albertson, D.; Ruvolo, M.; Lucas, A. Bergstrom; Jin, J.; Knapp, D. W.; Schiffman, J. D.

    2015-01-01

    Urothelial carcinoma (UC), also referred to as transitional cell carcinoma (TCC), is the most common bladder malignancy in both human and canine populations. In human UC, numerous studies have demonstrated the prevalence of chromosomal imbalances. Although the histopathology of the disease is similar in both species, studies evaluating the genomic profile of canine UC are lacking, limiting the discovery of key comparative molecular markers associated with driving UC pathogenesis. In the present study, we evaluated 31 primary canine UC biopsies by oligonucleotide array comparative genomic hybridization (oaCGH). Results highlighted the presence of three highly recurrent numerical aberrations: gain of dog chromosome (CFA) 13 and 36 and loss of CFA 19. Regional gains of CFA 13 and 36 were present in 97% and 84% of cases, respectively, and losses on CFA 19 were present in 77% of cases. Fluorescence in situ hybridization (FISH), using targeted bacterial artificial chromosome (BAC) clones and custom Agilent SureFISH probes, was performed to detect and quantify these regions in paraffin-embedded biopsy sections and urine-derived urothelial cells. The data indicate that these three aberrations are potentially diagnostic of UC. Comparison of our canine oaCGH data with that of 285 human cases identified a series of shared copy number aberrations. Using an informatics approach to interrogate the frequency of copy number aberrations across both species, we identified those that had the highest joint probability of association with UC. The most significant joint region contained the gene PABPC1, which should be considered further for its role in UC progression. In addition, cross-species filtering of genome-wide copy number data highlighted several genes as high-profile candidates for further analysis, including CDKN2A, S100A8/9, and LRP1B. We propose that these common aberrations are indicative of an evolutionarily conserved mechanism of pathogenesis and harbor genes key to

  15. Comparative genomic characterization of citrus-associated Xylella fastidiosa strains

    PubMed Central

    da Silva, Vivian S; Shida, Cláudio S; Rodrigues, Fabiana B; Ribeiro, Diógenes CD; de Souza, Alessandra A; Coletta-Filho, Helvécio D; Machado, Marcos A; Nunes, Luiz R; de Oliveira, Regina Costa

    2007-01-01

    Background The xylem-inhabiting bacterium Xylella fastidiosa (Xf) is the causal agent of Pierce's disease (PD) in vineyards and citrus variegated chlorosis (CVC) in orange trees. Both of these economically-devastating diseases are caused by distinct strains of this complex group of microorganisms, which has motivated researchers to conduct extensive genomic sequencing projects with Xf strains. This sequence information, along with other molecular tools, have been used to estimate the evolutionary history of the group and provide clues to understand the capacity of Xf to infect different hosts, causing a variety of symptoms. Nonetheless, although significant amounts of information have been generated from Xf strains, a large proportion of these efforts has concentrated on the study of North American strains, limiting our understanding about the genomic composition of South American strains – which is particularly important for CVC-associated strains. Results This paper describes the first genome-wide comparison among South American Xf strains, involving 6 distinct citrus-associated bacteria. Comparative analyses performed through a microarray-based approach allowed identification and characterization of large mobile genetic elements that seem to be exclusive to South American strains. Moreover, a large-scale sequencing effort, based on Suppressive Subtraction Hybridization (SSH), identified 290 new ORFs, distributed in 135 Groups of Orthologous Elements, throughout the genomes of these bacteria. Conclusion Results from microarray-based comparisons provide further evidence concerning activity of horizontally transferred elements, reinforcing their importance as major mediators in the evolution of Xf. Moreover, the microarray-based genomic profiles showed similarity between Xf strains 9a5c and Fb7, which is unexpected, given the geographical and chronological differences associated with the isolation of these microorganisms. The newly identified ORFs, obtained by

  16. Genome-wide patterns of Arabidopsis gene expression in nature.

    PubMed

    Richards, Christina L; Rosas, Ulises; Banta, Joshua; Bhambhra, Naeha; Purugganan, Michael D

    2012-01-01

    Organisms in the wild are subject to multiple, fluctuating environmental factors, and it is in complex natural environments that genetic regulatory networks actually function and evolve. We assessed genome-wide gene expression patterns in the wild in two natural accessions of the model plant Arabidopsis thaliana and examined the nature of transcriptional variation throughout its life cycle and gene expression correlations with natural environmental fluctuations. We grew plants in a natural field environment and measured genome-wide time-series gene expression from the plant shoot every three days, spanning the seedling to reproductive stages. We find that 15,352 genes were expressed in the A. thaliana shoot in the field, and accession and flowering status (vegetative versus flowering) were strong components of transcriptional variation in this plant. We identified between ∼110 and 190 time-varying gene expression clusters in the field, many of which were significantly overrepresented by genes regulated by abiotic and biotic environmental stresses. The two main principal components of vegetative shoot gene expression (PC(veg)) correlate to temperature and precipitation occurrence in the field. The largest PC(veg) axes included thermoregulatory genes while the second major PC(veg) was associated with precipitation and contained drought-responsive genes. By exposing A. thaliana to natural environments in an open field, we provide a framework for further understanding the genetic networks that are deployed in natural environments, and we connect plant molecular genetics in the laboratory to plant organismal ecology in the wild.

  17. Comparative genomic analysis of the swine pathogen Bordetella bronchisepticastrain KM22.

    PubMed

    Nicholson, Tracy L; Shore, Sarah M; Register, Karen B; Bayles, Darrell O; Kingsley, Robert A; Brunelle, Brain W

    2016-01-01

    The well-characterized Bordetella bronchiseptica strain KM22, originally isolated from a pig with atrophic rhinitis, has been used to develop a reproducible swine respiratory disease model. The goal of this study was to identify genetic features unique to KM22 by comparing the genome sequence of KM22 to the laboratory reference strain RB50. To gain a broader perspective of the genetic relationship of KM22 among other B. bronchiseptica strains, selected genes of KM22 were then compared to five other B. bronchiseptica strains isolated from different hosts. Overall, the KM22 genome sequence is more similar to the genome sequences of the strains isolated from animals than the strains isolated from humans. The majority of virulence gene expression in Bordetella is positively regulated by the two-component sensory transduction system BvgAS. bopN, bvgA, fimB, and fimC were the most highly conserved BvgAS-regulated genes present in all seven strains analyzed. In contrast, the BvgAS-regulated genes present in all seven strains with the highest sequence divergence werefimN, fim2, fhaL, andfhaS. A total of eight major fimbrial subunit genes were identified in KM22. Quantitative real-time PCR data demonstrated that seven of the eight fimbrial subunit genes identified in KM22 are expressed and regulated by BvgAS. The annotation of the KM22 genome sequence, coupled with the comparative genomic analyses reported in this study, can be used to facilitate the development of vaccines with improved efficacy towards B. bronchiseptica in swine to decrease the prevalence and disease burden caused by this pathogen.

  18. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics

    NASA Astrophysics Data System (ADS)

    Tanifuji, Goro; Archibald, John M.; Hashimoto, Tetsuo

    2016-02-01

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  19. Comparative genomics of mitochondria in chlorarachniophyte algae: endosymbiotic gene transfer and organellar genome dynamics.

    PubMed

    Tanifuji, Goro; Archibald, John M; Hashimoto, Tetsuo

    2016-02-18

    Chlorarachniophyte algae possess four DNA-containing compartments per cell, the nucleus, mitochondrion, plastid and nucleomorph, the latter being a relic nucleus derived from a secondary endosymbiont. While the evolutionary dynamics of plastid and nucleomorph genomes have been investigated, a comparative investigation of mitochondrial genomes (mtDNAs) has not been carried out. We have sequenced the complete mtDNA of Lotharella oceanica and compared it to that of another chlorarachniophyte, Bigelowiella natans. The linear mtDNA of L. oceanica is 36.7 kbp in size and contains 35 protein genes, three rRNAs and 24 tRNAs. The codons GUG and UUG appear to be capable of acting as initiation codons in the chlorarachniophyte mtDNAs, in addition to AUG. Rpl16, rps4 and atp8 genes are missing in L.oceanica mtDNA, despite being present in B. natans mtDNA. We searched for, and found, mitochondrial rpl16 and rps4 genes with spliceosomal introns in the L. oceanica nuclear genome, indicating that mitochondrion-to-host-nucleus gene transfer occurred after the divergence of these two genera. Despite being of similar size and coding capacity, the level of synteny between L. oceanica and B. natans mtDNA is low, suggesting frequent rearrangements. Overall, our results suggest that chlorarachniophyte mtDNAs are more evolutionarily dynamic than their plastid counterparts.

  20. Comparative Analysis of Chloroplast Genomes: Functional Annotation, Genome-Based Phylogeny, and Deduced Evolutionary Patterns

    PubMed Central

    Rivas, Javier De Las; Lozano, Juan Jose; Ortiz, Angel R.

    2002-01-01

    All protein sequences from 19 complete chloroplast genomes (cpDNA) have been studied using a new computational method able to analyze functional correlations among series of protein sequences contained in complete proteomes. First, all open reading frames (ORFs) from the cpDNAs, comprising a total of 2266 protein sequences, were compared against the 3168 proteins from Synechocystis PCC6803 complete genome to find functionally related orthologous proteins. Additionally, all cpDNA genomes were pairwise compared to find orthologous groups not present in cyanobacteria. Annotations in the cluster of othologous proteins database and CyanoBase were used as reference for the functional assignments. Following this protocol, new functional assignments were made for ORFs of unknown function and for ycfs (hypothetical chloroplast frames), which still lack a functional assignment. Using this information, a matrix of functional relationships was derived from profiles of the presence and/or absence of orthologous proteins; the matrix included 1837 proteins in 277 orthologous clusters. A factor analysis study of this matrix, followed by cluster analysis, allowed us to obtain accurate phylogenetic reconstructions and the detection of genes probably involved in speciation as phylogenetic correlates. Finally, by grouping common evolutionary patterns, we show that it is possible to determine functionally linked protein networks. This has allowed us to suggest putative associations for some unknown ORFs. PMID:11932241

  1. The SOL Genomics Network. A Comparative Resource for Solanaceae Biology and Beyond1

    PubMed Central

    Mueller, Lukas A.; Solow, Teri H.; Taylor, Nicolas; Skwarecki, Beth; Buels, Robert; Binns, John; Lin, Chenwei; Wright, Mark H.; Ahrens, Robert; Wang, Ying; Herbst, Evan V.; Keyder, Emil R.; Menda, Naama; Zamir, Dani; Tanksley, Steven D.

    2005-01-01

    The SOL Genomics Network (SGN; http://sgn.cornell.edu) is a rapidly evolving comparative resource for the plants of the Solanaceae family, which includes important crop and model plants such as potato (Solanum tuberosum), eggplant (Solanum melongena), pepper (Capsicum annuum), and tomato (Solanum lycopersicum). The aim of SGN is to relate these species to one another using a comparative genomics approach and to tie them to the other dicots through the fully sequenced genome of Arabidopsis (Arabidopsis thaliana). SGN currently houses map and marker data for Solanaceae species, a large expressed sequence tag collection with computationally derived unigene sets, an extensive database of phenotypic information for a mutagenized tomato population, and associated tools such as real-time quantitative trait loci. Recently, the International Solanaceae Project (SOL) was formed as an umbrella organization for Solanaceae research in over 30 countries to address important questions in plant biology. The first cornerstone of the SOL project is the sequencing of the entire euchromatic portion of the tomato genome. SGN is collaborating with other bioinformatics centers in building the bioinformatics infrastructure for the tomato sequencing project and implementing the bioinformatics strategy of the larger SOL project. The overarching goal of SGN is to make information available in an intuitive comparative format, thereby facilitating a systems approach to investigations into the basis of adaptation and phenotypic diversity in the Solanaceae family, other species in the Asterid clade such as coffee (Coffea arabica), Rubiaciae, and beyond. PMID:16010005

  2. Sources for Comparative Studies of Placentation. II. Genomic Resources

    PubMed Central

    Wildman, Derek E.

    2008-01-01

    The genomes of dozens of placental mammal species are now publicly available. These genome sequences have the potential to provide insight into the development and evolution of the placenta. In particular, the variable anatomy of the placenta has likely been affected by natural selection on the genomes of living and extinct mammals. In this note the current availability of mammal genome sequences is reviewed, and strengths and limitations of these data are discussed. Additionally, museums, zoos, and commercial entities are available to provide genomic resources to the placental research community. Recommendations for tissue storage conditions of placentas in genomic research are given. PMID:18155141

  3. Genome Sequencing and Comparative Genomics of the Broad Host-Range Pathogen Rhizoctonia solani AG8

    PubMed Central

    Hane, James K.; Anderson, Jonathan P.; Williams, Angela H.; Sperschneider, Jana; Singh, Karam B.

    2014-01-01

    Rhizoctonia solani is a soil-borne basidiomycete fungus with a necrotrophic lifestyle which is classified into fourteen reproductively incompatible anastomosis groups (AGs). One of these, AG8, is a devastating pathogen causing bare patch of cereals, brassicas and legumes. R. solani is a multinucleate heterokaryon containing significant heterozygosity within a single cell. This complexity posed significant challenges for the assembly of its genome. We present a high quality genome assembly of R. solani AG8 and a manually curated set of 13,964 genes supported by RNA-seq. The AG8 genome assembly used novel methods to produce a haploid representation of its heterokaryotic state. The whole-genomes of AG8, the rice pathogen AG1-IA and the potato pathogen AG3 were observed to be syntenic and co-linear. Genes and functions putatively relevant to pathogenicity were highlighted by comparing AG8 to known pathogenicity genes, orthology databases spanning 197 phytopathogenic taxa and AG1-IA. We also observed SNP-level “hypermutation” of CpG dinucleotides to TpG between AG8 nuclei, with similarities to repeat-induced point mutation (RIP). Interestingly, gene-coding regions were widely affected along with repetitive DNA, which has not been previously observed for RIP in mononuclear fungi of the Pezizomycotina. The rate of heterozygous SNP mutations within this single isolate of AG8 was observed to be higher than SNP mutation rates observed across populations of most fungal species compared. Comparative analyses were combined to predict biological processes relevant to AG8 and 308 proteins with effector-like characteristics, forming a valuable resource for further study of this pathosystem. Predicted effector-like proteins had elevated levels of non-synonymous point mutations relative to synonymous mutations (dN/dS), suggesting that they may be under diversifying selection pressures. In addition, the distant relationship to sequenced necrotrophs of the Ascomycota suggests the

  4. Genome sequencing and comparative genomics of the broad host-range pathogen Rhizoctonia solani AG8.

    PubMed

    Hane, James K; Anderson, Jonathan P; Williams, Angela H; Sperschneider, Jana; Singh, Karam B

    2014-05-01

    Rhizoctonia solani is a soil-borne basidiomycete fungus with a necrotrophic lifestyle which is classified into fourteen reproductively incompatible anastomosis groups (AGs). One of these, AG8, is a devastating pathogen causing bare patch of cereals, brassicas and legumes. R. solani is a multinucleate heterokaryon containing significant heterozygosity within a single cell. This complexity posed significant challenges for the assembly of its genome. We present a high quality genome assembly of R. solani AG8 and a manually curated set of 13,964 genes supported by RNA-seq. The AG8 genome assembly used novel methods to produce a haploid representation of its heterokaryotic state. The whole-genomes of AG8, the rice pathogen AG1-IA and the potato pathogen AG3 were observed to be syntenic and co-linear. Genes and functions putatively relevant to pathogenicity were highlighted by comparing AG8 to known pathogenicity genes, orthology databases spanning 197 phytopathogenic taxa and AG1-IA. We also observed SNP-level "hypermutation" of CpG dinucleotides to TpG between AG8 nuclei, with similarities to repeat-induced point mutation (RIP). Interestingly, gene-coding regions were widely affected along with repetitive DNA, which has not been previously observed for RIP in mononuclear fungi of the Pezizomycotina. The rate of heterozygous SNP mutations within this single isolate of AG8 was observed to be higher than SNP mutation rates observed across populations of most fungal species compared. Comparative analyses were combined to predict biological processes relevant to AG8 and 308 proteins with effector-like characteristics, forming a valuable resource for further study of this pathosystem. Predicted effector-like proteins had elevated levels of non-synonymous point mutations relative to synonymous mutations (dN/dS), suggesting that they may be under diversifying selection pressures. In addition, the distant relationship to sequenced necrotrophs of the Ascomycota suggests the R

  5. LegumeIP: an integrative database for comparative genomics and transcriptomics of model legumes.

    PubMed

    Li, Jun; Dai, Xinbin; Liu, Tingsong; Zhao, Patrick Xuechun

    2012-01-01

    Legumes play a vital role in maintaining the nitrogen cycle of the biosphere. They conduct symbiotic nitrogen fixation through endosymbiotic relationships with bacteria in root nodules. However, this and other characteristics of legumes, including mycorrhization, compound leaf development and profuse secondary metabolism, are absent in the typical model plant Arabidopsis thaliana. We present LegumeIP (http://plantgrn.noble.org/LegumeIP/), an integrative database for comparative genomics and transcriptomics of model legumes, for studying gene function and genome evolution in legumes. LegumeIP compiles gene and gene family information, syntenic and phylogenetic context and tissue-specific transcriptomic profiles. The database holds the genomic sequences of three model legumes, Medicago truncatula, Glycine max and Lotus japonicus plus two reference plant species, A. thaliana and Populus trichocarpa, with annotations based on UniProt, InterProScan, Gene Ontology and the Kyoto Encyclopedia of Genes and Genomes databases. LegumeIP also contains large-scale microarray and RNA-Seq-based gene expression data. Our new database is capable of systematic synteny analysis across M. truncatula, G. max, L. japonicas and A. thaliana, as well as construction and phylogenetic analysis of gene families across the five hosted species. Finally, LegumeIP provides comprehensive search and visualization tools that enable flexible queries based on gene annotation, gene family, synteny and relative gene expression.

  6. Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

    PubMed Central

    Khedkar, Supriya; Seshasayee, Aswin Sai Narain

    2016-01-01

    Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences. PMID:27172194

  7. Comparative Genomics of Interreplichore Translocations in Bacteria: A Measure of Chromosome Topology?

    PubMed

    Khedkar, Supriya; Seshasayee, Aswin Sai Narain

    2016-06-01

    Genomes evolve not only in base sequence but also in terms of their architecture, defined by gene organization and chromosome topology. Whereas genome sequence data inform us about the changes in base sequences for a large variety of organisms, the study of chromosome topology is restricted to a few model organisms studied using microscopy and chromosome conformation capture techniques. Here, we exploit whole genome sequence data to study the link between gene organization and chromosome topology in bacteria. Using comparative genomics across ∼250 pairs of closely related bacteria we show that: (a) many organisms show a high degree of interreplichore translocations throughout the chromosome and not limited to the inversion-prone terminus (ter) or the origin of replication (oriC); (b) translocation maps may reflect chromosome topologies; and (c) symmetric interreplichore translocations do not disrupt the distance of a gene from oriC or affect gene expression states or strand biases in gene densities. In summary, we suggest that translocation maps might be a first line in defining a gross chromosome topology given a pair of closely related genome sequences.

  8. Use of methylation filtration and C(0)t fractionation for analysis of genome composition and comparative genomics in bread wheat.

    PubMed

    Bandopadhyay, Rajib; Rustgi, Sachin; Chaudhuri, Rajat Kanti; Khurana, Paramjit; Khurana, Jitendra Paul; Tyagi, Akhilesh Kumar; Balyan, Harindra Singh; Houben, Andreas; Gupta, Pushpendra Kumar

    2011-07-20

    We investigated the compositional and structural differences in sequences derived from different fractions of wheat genomic DNA obtained using methylation filtration and C(0)t fractionation. Comparative analysis of these sequences revealed large compositional and structural variations in terms of GC content, different structural elements including repeat sequences (e.g., transposable elements and simple sequence repeats), protein coding genes, and non-coding RNA genes. A correlation between methylation status [determined on the basis of selective inclusion/exclusion in methylation-filtered (MF) library] of different repeat elements and expression level was observed. The expression levels were determined by comparing MF sequences with expressed sequence tags (ESTs) available in the public domain. Only a limited overlap among MF, high C(0)t (HC), and ESTs was observed, suggesting that these sequences may largely either represent the low-copy non-transcribed sequences or include genes with low expression levels. Thus, these results indicated a need to study MF and HC sequences along with ESTs to fully appreciate complexity of wheat gene space.

  9. Mitochondrial and Nuclear Genomic Responses to Loss of LRPPRC Expression*

    PubMed Central

    Gohil, Vishal M.; Nilsson, Roland; Belcher-Timme, Casey A.; Luo, Biao; Root, David E.; Mootha, Vamsi K.

    2010-01-01

    Rapid advances in genotyping and sequencing technology have dramatically accelerated the discovery of genes underlying human disease. Elucidating the function of such genes and understanding their role in pathogenesis, however, remain challenging. Here, we introduce a genomic strategy to characterize such genes functionally, and we apply it to LRPPRC, a poorly studied gene that is mutated in Leigh syndrome, French-Canadian type (LSFC). We utilize RNA interference to engineer an allelic series of cellular models in which LRPPRC has been stably silenced to different levels of knockdown efficiency. We then combine genome-wide expression profiling with gene set enrichment analysis to identify cellular responses that correlate with the loss of LRPPRC. Using this strategy, we discovered a specific role for LRPPRC in the expression of all mitochondrial DNA-encoded mRNAs, but not the rRNAs, providing mechanistic insights into the enzymatic defects observed in the disease. Our analysis shows that nuclear genes encoding mitochondrial proteins are not collectively affected by the loss of LRPPRC. We do observe altered expression of genes related to hexose metabolism, prostaglandin synthesis, and glycosphingolipid biology that may either play an adaptive role in cell survival or contribute to pathogenesis. The combination of genetic perturbation, genomic profiling, and pathway analysis represents a generic strategy for understanding disease pathogenesis. PMID:20220140

  10. phyloXML: XML for evolutionary biology and comparative genomics.

    PubMed

    Han, Mira V; Zmasek, Christian M

    2009-10-27

    Evolutionary trees are central to a wide range of biolo