Sample records for conserved genomic elements

  1. Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

    PubMed

    Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

    2017-03-27

    Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.

  2. Conserved Noncoding Elements in the Most Distant Genera of Cephalochordates: The Goldilocks Principle

    PubMed Central

    Yue, Jia-Xing; Kozmikova, Iryna; Ono, Hiroki; Nossa, Carlos W.; Kozmik, Zbynek; Putnam, Nicholas H.; Yu, Jr-Kai; Holland, Linda Z.

    2016-01-01

    Cephalochordates, the sister group of vertebrates + tunicates, are evolving particularly slowly. Therefore, genome comparisons between two congeners of Branchiostoma revealed so many conserved noncoding elements (CNEs), that it was not clear how many are functional regulatory elements. To more effectively identify CNEs with potential regulatory functions, we compared noncoding sequences of genomes of the most phylogenetically distant cephalochordate genera, Asymmetron and Branchiostoma, which diverged approximately 120–160 million years ago. We found 113,070 noncoding elements conserved between the two species, amounting to 3.3% of the genome. The genomic distribution, target gene ontology, and enriched motifs of these CNEs all suggest that many of them are probably cis-regulatory elements. More than 90% of previously verified amphioxus regulatory elements were re-captured in this study. A search of the cephalochordate CNEs around 50 developmental genes in several vertebrate genomes revealed eight CNEs conserved between cephalochordates and vertebrates, indicating sequence conservation over >500 million years of divergence. The function of five CNEs was tested in reporter assays in zebrafish, and one was also tested in amphioxus. All five CNEs proved to be tissue-specific enhancers. Taken together, these findings indicate that even though Branchiostoma and Asymmetron are distantly related, as they are evolving slowly, comparisons between them are likely optimal for identifying most of their tissue-specific cis-regulatory elements laying the foundation for functional characterizations and a better understanding of the evolution of developmental regulation in cephalochordates. PMID:27412606

  3. Variation in the genomic locations and sequence conservation of STAR elements among staphylococcal species provides insight into DNA repeat evolution

    PubMed Central

    2012-01-01

    Background Staphylococcus aureus Repeat (STAR) elements are a type of interspersed intergenic direct repeat. In this study the conservation and variation in these elements was explored by bioinformatic analyses of published staphylococcal genome sequences and through sequencing of specific STAR element loci from a large set of S. aureus isolates. Results Using bioinformatic analyses, we found that the STAR elements were located in different genomic loci within each staphylococcal species. There was no correlation between the number of STAR elements in each genome and the evolutionary relatedness of staphylococcal species, however higher levels of repeats were observed in both S. aureus and S. lugdunensis compared to other staphylococcal species. Unexpectedly, sequencing of the internal spacer sequences of individual repeat elements from multiple isolates showed conservation at the sequence level within deep evolutionary lineages of S. aureus. Whilst individual STAR element loci were demonstrated to expand and contract, the sequences associated with each locus were stable and distinct from one another. Conclusions The high degree of lineage and locus-specific conservation of these intergenic repeat regions suggests that STAR elements are maintained due to selective or molecular forces with some of these elements having an important role in cell physiology. The high prevalence in two of the more virulent staphylococcal species is indicative of a potential role for STAR elements in pathogenesis. PMID:23020678

  4. Evolutionary conservation of regulatory elements in vertebrate HOX gene clusters

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Santini, Simona; Boore, Jeffrey L.; Meyer, Axel

    2003-12-31

    Due to their high degree of conservation, comparisons of DNA sequences among evolutionarily distantly-related genomes permit to identify functional regions in noncoding DNA. Hox genes are optimal candidate sequences for comparative genome analyses, because they are extremely conserved in vertebrates and occur in clusters. We aligned (Pipmaker) the nucleotide sequences of HoxA clusters of tilapia, pufferfish, striped bass, zebrafish, horn shark, human and mouse (over 500 million years of evolutionary distance). We identified several highly conserved intergenic sequences, likely to be important in gene regulation. Only a few of these putative regulatory elements have been previously described as being involvedmore » in the regulation of Hox genes, while several others are new elements that might have regulatory functions. The majority of these newly identified putative regulatory elements contain short fragments that are almost completely conserved and are identical to known binding sites for regulatory proteins (Transfac). The conserved intergenic regions located between the most rostrally expressed genes in the developing embryo are longer and better retained through evolution. We document that presumed regulatory sequences are retained differentially in either A or A clusters resulting from a genome duplication in the fish lineage. This observation supports both the hypothesis that the conserved elements are involved in gene regulation and the Duplication-Deletion-Complementation model.« less

  5. Comparative Genomics of the Listeria monocytogenes ST204 Subgroup

    PubMed Central

    Fox, Edward M.; Allnutt, Theodore; Bradbury, Mark I.; Fanning, Séamus; Chandry, P. Scott

    2016-01-01

    The ST204 subgroup of Listeria monocytogenes is among the most frequently isolated in Australia from a range of environmental niches. In this study we provide a comparative genomics analysis of food and food environment isolates from geographically diverse sources. Analysis of the ST204 genomes showed a highly conserved core genome with the majority of variation seen in mobile genetic elements such as plasmids, transposons and phage insertions. Most strains (13/15) harbored plasmids, which although varying in size contained highly conserved sequences. Interestingly 4 isolates contained a conserved plasmid of 91,396 bp. The strains examined were isolated over a period of 12 years and from different geographic locations suggesting plasmids are an important component of the genetic repertoire of this subgroup and may provide a range of stress tolerance mechanisms. In addition to this 4 phage insertion sites and 2 transposons were identified among isolates, including a novel transposon. These genetic elements were highly conserved across isolates that harbored them, and also contained a range of genetic markers linked to stress tolerance and virulence. The maintenance of conserved mobile genetic elements in the ST204 population suggests these elements may contribute to the diverse range of niches colonized by ST204 isolates. Environmental stress selection may contribute to maintaining these genetic features, which in turn may be co-selecting for virulence markers relevant to clinical infection with ST204 isolates. PMID:28066377

  6. Comparative Genomics of the Listeria monocytogenes ST204 Subgroup.

    PubMed

    Fox, Edward M; Allnutt, Theodore; Bradbury, Mark I; Fanning, Séamus; Chandry, P Scott

    2016-01-01

    The ST204 subgroup of Listeria monocytogenes is among the most frequently isolated in Australia from a range of environmental niches. In this study we provide a comparative genomics analysis of food and food environment isolates from geographically diverse sources. Analysis of the ST204 genomes showed a highly conserved core genome with the majority of variation seen in mobile genetic elements such as plasmids, transposons and phage insertions. Most strains (13/15) harbored plasmids, which although varying in size contained highly conserved sequences. Interestingly 4 isolates contained a conserved plasmid of 91,396 bp. The strains examined were isolated over a period of 12 years and from different geographic locations suggesting plasmids are an important component of the genetic repertoire of this subgroup and may provide a range of stress tolerance mechanisms. In addition to this 4 phage insertion sites and 2 transposons were identified among isolates, including a novel transposon. These genetic elements were highly conserved across isolates that harbored them, and also contained a range of genetic markers linked to stress tolerance and virulence. The maintenance of conserved mobile genetic elements in the ST204 population suggests these elements may contribute to the diverse range of niches colonized by ST204 isolates. Environmental stress selection may contribute to maintaining these genetic features, which in turn may be co-selecting for virulence markers relevant to clinical infection with ST204 isolates.

  7. The SIDER2 elements, interspersed repeated sequences that populate the Leishmania genomes, constitute subfamilies showing chromosomal proximity relationship.

    PubMed

    Requena, Jose M; Folgueira, Cristina; López, Manuel C; Thomas, M Carmen

    2008-06-02

    Protozoan parasites of the genus Leishmania are causative agents of a diverse spectrum of human diseases collectively known as leishmaniasis. These eukaryotic pathogens that diverged early from the main eukaryotic lineage possess a number of unusual genomic, molecular and biochemical features. The completion of the genome projects for three Leishmania species has generated invaluable information enabling a direct analysis of genome structure and organization. By using DNA macroarrays, made with Leishmania infantum genomic clones and hybridized with total DNA from the parasite, we identified a clone containing a repeated sequence. An analysis of the recently completed genome sequence of L. infantum, using this repeated sequence as bait, led to the identification of a new class of repeated elements that are interspersed along the different L. infantum chromosomes. These elements turned out to be homologues of SIDER2 sequences, which were recently identified in the Leishmania major genome; thus, we adopted this nomenclature for the Leishmania elements described herein. Since SIDER2 elements are very heterogeneous in sequence, their precise identification is rather laborious. We have characterized 54 LiSIDER2 elements in chromosome 32 and 27 ones in chromosome 20. The mean size for these elements is 550 bp and their sequence is G+C rich (mean value of 66.5%). On the basis of sequence similarity, these elements can be grouped in subfamilies that show a remarkable relationship of proximity, i.e. SIDER2s of a given subfamily locate close in a chromosomal region without intercalating elements. For comparative purposes, we have identified the SIDER2 elements existing in L. major and Leishmania braziliensis chromosomes 32. While SIDER2 elements are highly conserved both in number and location between L. infantum and L. major, no such conservation exists when comparing with SIDER2s in L. braziliensis chromosome 32. SIDER2 elements constitute a relevant piece in the Leishmania genome organization. Sequence characteristics, genomic distribution and evolutionarily conservation of SIDER2s are suggestive of relevant functions for these elements in Leishmania. Apart from a proved involvement in post-transcriptional mechanisms of gene regulation, SIDER2 elements could be involved in DNA amplification processes and, perhaps, in chromosome segregation as centromeric sequences.

  8. Early Evolution of Conserved Regulatory Sequences Associated with Development in Vertebrates

    PubMed Central

    McEwen, Gayle K.; Goode, Debbie K.; Parker, Hugo J.; Woolfe, Adam; Callaway, Heather; Elgar, Greg

    2009-01-01

    Comparisons between diverse vertebrate genomes have uncovered thousands of highly conserved non-coding sequences, an increasing number of which have been shown to function as enhancers during early development. Despite their extreme conservation over 500 million years from humans to cartilaginous fish, these elements appear to be largely absent in invertebrates, and, to date, there has been little understanding of their mode of action or the evolutionary processes that have modelled them. We have now exploited emerging genomic sequence data for the sea lamprey, Petromyzon marinus, to explore the depth of conservation of this type of element in the earliest diverging extant vertebrate lineage, the jawless fish (agnathans). We searched for conserved non-coding elements (CNEs) at 13 human gene loci and identified lamprey elements associated with all but two of these gene regions. Although markedly shorter and less well conserved than within jawed vertebrates, identified lamprey CNEs are able to drive specific patterns of expression in zebrafish embryos, which are almost identical to those driven by the equivalent human elements. These CNEs are therefore a unique and defining characteristic of all vertebrates. Furthermore, alignment of lamprey and other vertebrate CNEs should permit the identification of persistent sequence signatures that are responsible for common patterns of expression and contribute to the elucidation of the regulatory language in CNEs. Identifying the core regulatory code for development, common to all vertebrates, provides a foundation upon which regulatory networks can be constructed and might also illuminate how large conserved regulatory sequence blocks evolve and become fixed in genomic DNA. PMID:20011110

  9. Repetitive sequences: the hidden diversity of heterochromatin in prochilodontid fish

    PubMed Central

    Terencio, Maria L.; Schneider, Carlos H.; Gross, Maria C.; do Carmo, Edson Junior; Nogaroto, Viviane; de Almeida, Mara Cristina; Artoni, Roberto Ferreira; Vicari, Marcelo R.; Feldberg, Eliana

    2015-01-01

    Abstract The structure and organization of repetitive elements in fish genomes are still relatively poorly understood, although most of these elements are believed to be located in heterochromatic regions. Repetitive elements are considered essential in evolutionary processes as hotspots for mutations and chromosomal rearrangements, among other functions – thus providing new genomic alternatives and regulatory sites for gene expression. The present study sought to characterize repetitive DNA sequences in the genomes of Semaprochilodus insignis (Jardine & Schomburgk, 1841) and Semaprochilodus taeniurus (Valenciennes, 1817) and identify regions of conserved syntenic blocks in this genome fraction of three species of Prochilodontidae (Semaprochilodus insignis, Semaprochilodus taeniurus, and Prochilodus lineatus (Valenciennes, 1836) by cross-FISH using Cot-1 DNA (renaturation kinetics) probes. We found that the repetitive fractions of the genomes of Semaprochilodus insignis and Semaprochilodus taeniurus have significant amounts of conserved syntenic blocks in hybridization sites, but with low degrees of similarity between them and the genome of Prochilodus lineatus, especially in relation to B chromosomes. The cloning and sequencing of the repetitive genomic elements of Semaprochilodus insignis and Semaprochilodus taeniurus using Cot-1 DNA identified 48 fragments that displayed high similarity with repetitive sequences deposited in public DNA databases and classified as microsatellites, transposons, and retrotransposons. The repetitive fractions of the Semaprochilodus insignis and Semaprochilodus taeniurus genomes exhibited high degrees of conserved syntenic blocks in terms of both the structures and locations of hybridization sites, but a low degree of similarity with the syntenic blocks of the Prochilodus lineatus genome. Future comparative analyses of other prochilodontidae species will be needed to advance our understanding of the organization and evolution of the genomes in this group of fish. PMID:26752156

  10. Highly conserved elements discovered in vertebrates are present in non-syntenic loci of tunicates, act as enhancers and can be transcribed during development

    PubMed Central

    Sanges, Remo; Hadzhiev, Yavor; Gueroult-Bellone, Marion; Roure, Agnes; Ferg, Marco; Meola, Nicola; Amore, Gabriele; Basu, Swaraj; Brown, Euan R.; De Simone, Marco; Petrera, Francesca; Licastro, Danilo; Strähle, Uwe; Banfi, Sandro; Lemaire, Patrick; Birney, Ewan; Müller, Ferenc; Stupka, Elia

    2013-01-01

    Co-option of cis-regulatory modules has been suggested as a mechanism for the evolution of expression sites during development. However, the extent and mechanisms involved in mobilization of cis-regulatory modules remains elusive. To trace the history of non-coding elements, which may represent candidate ancestral cis-regulatory modules affirmed during chordate evolution, we have searched for conserved elements in tunicate and vertebrate (Olfactores) genomes. We identified, for the first time, 183 non-coding sequences that are highly conserved between the two groups. Our results show that all but one element are conserved in non-syntenic regions between vertebrate and tunicate genomes, while being syntenic among vertebrates. Nevertheless, in all the groups, they are significantly associated with transcription factors showing specific functions fundamental to animal development, such as multicellular organism development and sequence-specific DNA binding. The majority of these regions map onto ultraconserved elements and we demonstrate that they can act as functional enhancers within the organism of origin, as well as in cross-transgenesis experiments, and that they are transcribed in extant species of Olfactores. We refer to the elements as ‘Olfactores conserved non-coding elements’. PMID:23393190

  11. Endangered Species Hold Clues to Human Evolution

    PubMed Central

    Bejerano, Gill; Salama, Sofie R.; Haussler, David

    2010-01-01

    We report that 18 conserved, and by extension functional, elements in the human genome are the result of retroposon insertions that are evolving under purifying selection in mammals. We show evidence that 1 of the 18 elements regulates the expression of ASXL3 during development by encoding an alternatively spliced exon that causes nonsense-mediated decay of the transcript. The retroposon that gave rise to these functional elements was quickly inactivated in the mammalian ancestor, and all traces of it have been lost due to neutral decay. However, the tuatara has maintained a near-ancestral version of this retroposon in its extant genome, which allows us to connect the 18 human elements to the evolutionary events that created them. We propose that conservation efforts over more than 100 years may not have only prevented the tuatara from going extinct but could have preserved our ability to understand the evolutionary history of functional elements in the human genome. Through simulations, we argue that species with historically low population sizes are more likely to harbor ancient mobile elements for long periods of time and in near-ancestral states, making these species indispensable in understanding the evolutionary origin of functional elements in the human genome. PMID:20332163

  12. Interpreting Mammalian Evolution using Fugu Genome Comparisons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stubbs, L; Ovcharenko, I; Loots, G G

    2004-04-02

    Comparative sequence analysis of the human and the pufferfish Fugu rubripes (fugu) genomes has revealed several novel functional coding and noncoding regions in the human genome. In particular, the fugu genome has been extremely valuable for identifying transcriptional regulatory elements in human loci harboring unusually high levels of evolutionary conservation to rodent genomes. In such regions, the large evolutionary distance between human and fishes provides an additional filter through which functional noncoding elements can be detected with high efficiency.

  13. Use of a Drosophila Genome-Wide Conserved Sequence Database to Identify Functionally Related cis-Regulatory Enhancers

    PubMed Central

    Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F

    2012-01-01

    Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086

  14. Identification of a Conserved Non-Protein-Coding Genomic Element that Plays an Essential Role in Alphabaculovirus Pathogenesis

    PubMed Central

    Kikhno, Irina

    2014-01-01

    Highly homologous sequences 154–157 bp in length grouped under the name of “conserved non-protein-coding element” (CNE) were revealed in all of the sequenced genomes of baculoviruses belonging to the genus Alphabaculovirus. A CNE alignment led to the detection of a set of highly conserved nucleotide clusters that occupy strictly conserved positions in the CNE sequence. The significant length of the CNE and conservation of both its length and cluster architecture were identified as a combination of characteristics that make this CNE different from known viral non-coding functional sequences. The essential role of the CNE in the Alphabaculovirus life cycle was demonstrated through the use of a CNE-knockout Autographa californica multiple nucleopolyhedrovirus (AcMNPV) bacmid. It was shown that the essential function of the CNE was not mediated by the presumed expression activities of the protein- and non-protein-coding genes that overlap the AcMNPV CNE. On the basis of the presented data, the AcMNPV CNE was categorized as a complex-structured, polyfunctional genomic element involved in an essential DNA transaction that is associated with an undefined function of the baculovirus genome. PMID:24740153

  15. Defining functional DNA elements in the human genome

    PubMed Central

    Kellis, Manolis; Wold, Barbara; Snyder, Michael P.; Bernstein, Bradley E.; Kundaje, Anshul; Marinov, Georgi K.; Ward, Lucas D.; Birney, Ewan; Crawford, Gregory E.; Dekker, Job; Dunham, Ian; Elnitski, Laura L.; Farnham, Peggy J.; Feingold, Elise A.; Gerstein, Mark; Giddings, Morgan C.; Gilbert, David M.; Gingeras, Thomas R.; Green, Eric D.; Guigo, Roderic; Hubbard, Tim; Kent, Jim; Lieb, Jason D.; Myers, Richard M.; Pazin, Michael J.; Ren, Bing; Stamatoyannopoulos, John A.; Weng, Zhiping; White, Kevin P.; Hardison, Ross C.

    2014-01-01

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease. PMID:24753594

  16. Mapping cis-Regulatory Domains in the Human Genome UsingMulti-Species Conservation of Synteny

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ahituv, Nadav; Prabhakar, Shyam; Poulin, Francis

    2005-06-13

    Our inability to associate distant regulatory elements with the genes that they regulate has largely precluded their examination for sequence alterations contributing to human disease. One major obstacle is the large genomic space surrounding targeted genes in which such elements could potentially reside. In order to delineate gene regulatory boundaries we used whole-genome human-mouse-chicken (HMC) and human-mouse-frog (HMF) multiple alignments to compile conserved blocks of synteny (CBS), under the hypothesis that these blocks have been kept intact throughout evolution at least in part by the requirement of regulatory elements to stay linked to the genes that they regulate. A totalmore » of 2,116 and 1,942 CBS>200 kb were assembled for HMC and HMF respectively, encompassing 1.53 and 0.86 Gb of human sequence. To support the existence of complex long-range regulatory domains within these CBS we analyzed the prevalence and distribution of chromosomal aberrations leading to position effects (disruption of a genes regulatory environment), observing a clear bias not only for mapping onto CBS but also for longer CBS size. Our results provide a genome wide data set characterizing the regulatory domains of genes and the conserved regulatory elements within them.« less

  17. RUDI, a short interspersed element of the V-SINE superfamily widespread in molluscan genomes.

    PubMed

    Luchetti, Andrea; Šatović, Eva; Mantovani, Barbara; Plohl, Miroslav

    2016-06-01

    Short interspersed elements (SINEs) are non-autonomous retrotransposons that are widespread in eukaryotic genomes. They exhibit a chimeric sequence structure consisting of a small RNA-related head, an anonymous body and an AT-rich tail. Although their turnover and de novo emergence is rapid, some SINE elements found in distantly related species retain similarity in certain core segments (or highly conserved domains, HCD). We have characterized a new SINE element named RUDI in the bivalve molluscs Ruditapes decussatus and R. philippinarum and found this element to be widely distributed in the genomes of a number of mollusc species. An unexpected structural feature of RUDI is the HCD domain type V, which was first found in non-amniote vertebrate SINEs and in the SINE from one cnidarian species. In addition to the V domain, the overall sequence conservation pattern of RUDI elements resembles that found in ancient AmnSINE (~310 Myr old) and Au SINE (~320 Myr old) families, suggesting that RUDI might be among the most ancient SINE families. Sequence conservation suggests a monophyletic origin of RUDI. Nucleotide variability and phylogenetic analyses suggest long-term vertical inheritance combined with at least one horizontal transfer event as the most parsimonious explanation for the observed taxonomic distribution.

  18. Comparative transgenic analysis of enhancers from the human SHOX and mouse Shox2 genomic regions.

    PubMed

    Rosin, Jessica M; Abassah-Oppong, Samuel; Cobb, John

    2013-08-01

    Disruption of presumptive enhancers downstream of the human SHOX gene (hSHOX) is a frequent cause of the zeugopodal limb defects characteristic of Léri-Weill dyschondrosteosis (LWD). The closely related mouse Shox2 gene (mShox2) is also required for limb development, but in the more proximal stylopodium. In this study, we used transgenic mice in a comparative approach to characterize enhancer sequences in the hSHOX and mShox2 genomic regions. Among conserved noncoding elements (CNEs) that function as enhancers in vertebrate genomes, those that are maintained near paralogous genes are of particular interest given their ancient origins. Therefore, we first analyzed the regulatory potential of a genomic region containing one such duplicated CNE (dCNE) downstream of mShox2 and hSHOX. We identified a strong limb enhancer directly adjacent to the mShox2 dCNE that recapitulates the expression pattern of the endogenous gene. Interestingly, this enhancer requires sequences only conserved in the mammalian lineage in order to drive strong limb expression, whereas the more deeply conserved sequences of the dCNE function as a neural enhancer. Similarly, we found that a conserved element downstream of hSHOX (CNE9) also functions as a neural enhancer in transgenic mice. However, when the CNE9 transgenic construct was enlarged to include adjacent, non-conserved sequences frequently deleted in LWD patients, the transgene drove expression in the zeugopodium of the limbs. Therefore, both hSHOX and mShox2 limb enhancers are coupled to distinct neural enhancers. This is the first report demonstrating the activity of cis-regulatory elements from the hSHOX and mShox2 genomic regions in mammalian embryos.

  19. Primary analysis of repeat elements of the Asian seabass (Lates calcarifer) transcriptome and genome

    PubMed Central

    Kuznetsova, Inna S.; Thevasagayam, Natascha M.; Sridatta, Prakki S. R.; Komissarov, Aleksey S.; Saju, Jolly M.; Ngoh, Si Y.; Jiang, Junhui; Shen, Xueyan; Orbán, László

    2014-01-01

    As part of our Asian seabass genome project, we are generating an inventory of repeat elements in the genome and transcriptome. The karyotype showed a diploid number of 2n = 24 chromosomes with a variable number of B-chromosomes. The transcriptome and genome of Asian seabass were searched for repetitive elements with experimental and bioinformatics tools. Six different types of repeats constituting 8–14% of the genome were characterized. Repetitive elements were clustered in the pericentromeric heterochromatin of all chromosomes, but some of them were preferentially accumulated in pretelomeric and pericentromeric regions of several chromosomes pairs and have chromosomes specific arrangement. From the dispersed class of fish-specific non-LTR retrotransposon elements Rex1 and MAUI-like repeats were analyzed. They were wide-spread both in the genome and transcriptome, accumulated on the pericentromeric and peritelomeric areas of all chromosomes. Every analyzed repeat was represented in the Asian seabass transcriptome, some showed differential expression between the gonads. The other group of repeats analyzed belongs to the rRNA multigene family. FISH signal for 5S rDNA was located on a single pair of chromosomes, whereas that for 18S rDNA was found on two pairs. A BAC-derived contig containing rDNA was sequenced and assembled into a scaffold containing incomplete fragments of 18S rDNA. Their assembly and chromosomal position revealed that this part of Asian seabass genome is extremely rich in repeats containing evolutionarily conserved and novel sequences. In summary, transcriptome assemblies and cDNA data are suitable for the identification of repetitive DNA from unknown genomes and for comparative investigation of conserved elements between teleosts and other vertebrates. PMID:25120555

  20. Complete genomic sequence of Powassan virus: evaluation of genetic elements in tick-borne versus mosquito-borne flaviviruses.

    PubMed

    Mandl, C W; Holzmann, H; Kunz, C; Heinz, F X

    1993-05-01

    The complete nucleotide sequence of the positive-stranded RNA genome of the tick-borne flavivirus Powassan (10,839 nucleotides) was elucidated and the amino acid sequence of all viral proteins was derived. Based on this sequence as well as serological data, Powassan virus represents the most divergent member of the tick-borne serocomplex within the genus flaviviruses, family Flaviviridae. The primary nucleotide sequence and potential RNA secondary structures of the Powassan virus genome as well as the protein sequences and the reactivities of the virion with a panel of monoclonal antibodies were compared to other tick-borne and mosquito-borne flaviviruses. These analyses corroborated significant differences between tick-borne and mosquito-borne flaviviruses, but also emphasized structural elements that are conserved among both vector groups. The comparisons among tick-borne flaviviruses revealed conserved sequence elements that might represent important determinants of the tick-borne flavivirus phenotype.

  1. A Rickettsia Genome Overrun by Mobile Genetic Elements Provides Insight into the Acquisition of Genes Characteristic of an Obligate Intracellular Lifestyle

    PubMed Central

    Joardar, Vinita; Williams, Kelly P.; Driscoll, Timothy; Hostetler, Jessica B.; Nordberg, Eric; Shukla, Maulik; Walenz, Brian; Hill, Catherine A.; Nene, Vishvanath M.; Azad, Abdu F.; Sobral, Bruno W.; Caler, Elisabet

    2012-01-01

    We present the draft genome for the Rickettsia endosymbiont of Ixodes scapularis (REIS), a symbiont of the deer tick vector of Lyme disease in North America. Among Rickettsia species (Alphaproteobacteria: Rickettsiales), REIS has the largest genome sequenced to date (>2 Mb) and contains 2,309 genes across the chromosome and four plasmids (pREIS1 to pREIS4). The most remarkable finding within the REIS genome is the extraordinary proliferation of mobile genetic elements (MGEs), which contributes to a limited synteny with other Rickettsia genomes. In particular, an integrative conjugative element named RAGE (for Rickettsiales amplified genetic element), previously identified in scrub typhus rickettsiae (Orientia tsutsugamushi) genomes, is present on both the REIS chromosome and plasmids. Unlike the pseudogene-laden RAGEs of O. tsutsugamushi, REIS encodes nine conserved RAGEs that include F-like type IV secretion systems similar to that of the tra genes encoded in the Rickettsia bellii and R. massiliae genomes. An unparalleled abundance of encoded transposases (>650) relative to genome size, together with the RAGEs and other MGEs, comprise ∼35% of the total genome, making REIS one of the most plastic and repetitive bacterial genomes sequenced to date. We present evidence that conserved rickettsial genes associated with an intracellular lifestyle were acquired via MGEs, especially the RAGE, through a continuum of genomic invasions. Robust phylogeny estimation suggests REIS is ancestral to the virulent spotted fever group of rickettsiae. As REIS is not known to invade vertebrate cells and has no known pathogenic effects on I. scapularis, its genome sequence provides insight on the origin of mechanisms of rickettsial pathogenicity. PMID:22056929

  2. Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution

    PubMed Central

    Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Hubisz, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Zhang, Peili; Liu, Jing; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catharine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenée; Verduzco, Daniel; Clerc-Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.

    2005-01-01

    We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila. PMID:15632085

  3. DcSto: carrot Stowaway-like elements are abundant, diverse, and polymorphic

    USDA-ARS?s Scientific Manuscript database

    We investigated nine families of Stowaway-like MITEs in the carrot genome, named DcSto1 to DcSto9. All of them were AT-rich and shared a highly conserved 6 bp-long TIR typical for Stowaways. The copy number of DcSto1 elements was estimated as ca. 5,000 per diploid genome. We observed preference for ...

  4. The Tolypocladium inflatum CPA element encodes a RecQ helicase-like gene.

    PubMed

    Kempken, Frank

    2008-12-01

    Previously, a repetitive CPA element was discovered in the genome of the filamentous fungus Tolypocladium inflatum; however, no further characterization was technically possible at that time. In this study, PCR amplification was used to detect a 4 kb conserved portion of the CPA element that appeared to be present in most, if not all, genomic CPA elements. The amplicons included a large open reading frame that was most similar to a RecQ helicase-like gene from Metarhizium anisopliae. The repetitive nature of the CPA element suggests that it is related to the eukaryotic Helitron class of transposable elements.

  5. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons.

    PubMed

    Braasch, Ingo; Gehrke, Andrew R; Smith, Jeramiah J; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M; Campbell, Michael S; Barrell, Daniel; Martin, Kyle J; Mulley, John F; Ravi, Vydianathan; Lee, Alison P; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E G; Sun, Yi; Hertel, Jana; Beam, Michael J; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H; Litman, Gary W; Litman, Ronda T; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F; Wang, Han; Taylor, John S; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M J; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T; Venkatesh, Byrappa; Holland, Peter W H; Guiguen, Yann; Bobe, Julien; Shubin, Neil H; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H

    2016-04-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.

  6. The spotted gar genome illuminates vertebrate evolution and facilitates human-to-teleost comparisons

    PubMed Central

    Braasch, Ingo; Gehrke, Andrew R.; Smith, Jeramiah J.; Kawasaki, Kazuhiko; Manousaki, Tereza; Pasquier, Jeremy; Amores, Angel; Desvignes, Thomas; Batzel, Peter; Catchen, Julian; Berlin, Aaron M.; Campbell, Michael S.; Barrell, Daniel; Martin, Kyle J.; Mulley, John F.; Ravi, Vydianathan; Lee, Alison P.; Nakamura, Tetsuya; Chalopin, Domitille; Fan, Shaohua; Wcisel, Dustin; Cañestro, Cristian; Sydes, Jason; Beaudry, Felix E. G.; Sun, Yi; Hertel, Jana; Beam, Michael J.; Fasold, Mario; Ishiyama, Mikio; Johnson, Jeremy; Kehr, Steffi; Lara, Marcia; Letaw, John H.; Litman, Gary W.; Litman, Ronda T.; Mikami, Masato; Ota, Tatsuya; Saha, Nil Ratan; Williams, Louise; Stadler, Peter F.; Wang, Han; Taylor, John S.; Fontenot, Quenton; Ferrara, Allyse; Searle, Stephen M. J.; Aken, Bronwen; Yandell, Mark; Schneider, Igor; Yoder, Jeffrey A.; Volff, Jean-Nicolas; Meyer, Axel; Amemiya, Chris T.; Venkatesh, Byrappa; Holland, Peter W. H.; Guiguen, Yann; Bobe, Julien; Shubin, Neil H.; Di Palma, Federica; Alföldi, Jessica; Lindblad-Toh, Kerstin; Postlethwait, John H.

    2016-01-01

    To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before the teleost genome duplication (TGD). The slowly evolving gar genome conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization, and development (e.g., Hox, ParaHox, and miRNA genes). Numerous conserved non-coding elements (CNEs, often cis-regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles of such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses revealed that the sum of expression domains and levels from duplicated teleost genes often approximate patterns and levels of gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes, and the function of human regulatory sequences. PMID:26950095

  7. Conserved domains and SINE diversity during animal evolution.

    PubMed

    Luchetti, Andrea; Mantovani, Barbara

    2013-10-01

    Eukaryotic genomes harbour a number of mobile genetic elements (MGEs); moving from one genomic location to another, they are known to impact on the host genome. Short interspersed elements (SINEs) are well-represented, non-autonomous retroelements and they are likely the most diversified MGEs. In some instances, sequence domains conserved across unrelated SINEs have been identified; remarkably, one of these, called Nin, has been conserved since the Radiata-Bilateria splitting. Here we report on two new domains: Inv, derived from Nin, identified in insects and in deuterostomes, and Pln, restricted to polyneopteran insects. The identification of Inv and Pln sequences allowed us to retrieve new SINEs, two in insects and one in a hemichordate. The diverse structural combination of the different domains in different SINE families, during metazoan evolution, offers a clearer view of SINE diversity and their frequent de novo emergence through module exchange, possibly underlying the high evolutionary success of SINEs. © 2013 Elsevier Inc. All rights reserved.

  8. Sub-kb Hi-C in D. melanogaster reveals conserved characteristics of TADs between insect and mammalian cells.

    PubMed

    Wang, Qi; Sun, Qiu; Czajkowsky, Daniel M; Shao, Zhifeng

    2018-01-15

    Topologically associating domains (TADs) are fundamental elements of the eukaryotic genomic structure. However, recent studies suggest that the insulating complexes, CTCF/cohesin, present at TAD borders in mammals are absent from those in Drosophila melanogaster, raising the possibility that border elements are not conserved among metazoans. Using in situ Hi-C with sub-kb resolution, here we show that the D. melanogaster genome is almost completely partitioned into >4000 TADs, nearly sevenfold more than previously identified. The overwhelming majority of these TADs are demarcated by the insulator complexes, BEAF-32/CP190, or BEAF-32/Chromator, indicating that these proteins may play an analogous role in flies as that of CTCF/cohesin in mammals. Moreover, extended regions previously thought to be unstructured are shown to consist of small contiguous TADs, a property also observed in mammals upon re-examination. Altogether, our work demonstrates that fundamental features associated with the higher-order folding of the genome are conserved from insects to mammals.

  9. A comprehensive analysis of Helicobacter pylori plasticity zones reveals that they are integrating conjugative elements with intermediate integration specificity.

    PubMed

    Fischer, Wolfgang; Breithaupt, Ute; Kern, Beate; Smith, Stella I; Spicher, Carolin; Haas, Rainer

    2014-04-27

    The human gastric pathogen Helicobacter pylori is a paradigm for chronic bacterial infections. Its persistence in the stomach mucosa is facilitated by several mechanisms of immune evasion and immune modulation, but also by an unusual genetic variability which might account for the capability to adapt to changing environmental conditions during long-term colonization. This variability is reflected by the fact that almost each infected individual is colonized by a genetically unique strain. Strain-specific genes are dispersed throughout the genome, but clusters of genes organized as genomic islands may also collectively be present or absent. We have comparatively analysed such clusters, which are commonly termed plasticity zones, in a high number of H. pylori strains of varying geographical origin. We show that these regions contain fixed gene sets, rather than being true regions of genome plasticity, but two different types and several subtypes with partly diverging gene content can be distinguished. Their genetic diversity is incongruent with variations in the rest of the genome, suggesting that they are subject to horizontal gene transfer within H. pylori populations. We identified 40 distinct integration sites in 45 genome sequences, with a conserved heptanucleotide motif that seems to be the minimal requirement for integration. The significant number of possible integration sites, together with the requirement for a short conserved integration motif and the high level of gene conservation, indicates that these elements are best described as integrating conjugative elements (ICEs) with an intermediate integration site specificity.

  10. Evolutionary conservation, diversity and specificity of LTR retrotransposons in flowering plants: insights from genome-wide analysis and multi-specific comparison

    USDA-ARS?s Scientific Manuscript database

    The availability of complete or nearly complete genome sequences from several plant species permits detailed discovery and cross-species comparison of transposable elements (TEs) at the whole genome level. We initially investigated 510 LTR-retrotransposon (LTR-RT) families that are comprised of 32,...

  11. Comprehensive analysis of single molecule sequencing-derived complete genome and whole transcriptome of Hyposidra talaca nuclear polyhedrosis virus.

    PubMed

    Nguyen, Thong T; Suryamohan, Kushal; Kuriakose, Boney; Janakiraman, Vasantharajan; Reichelt, Mike; Chaudhuri, Subhra; Guillory, Joseph; Divakaran, Neethu; Rabins, P E; Goel, Ridhi; Deka, Bhabesh; Sarkar, Suman; Ekka, Preety; Tsai, Yu-Chih; Vargas, Derek; Santhosh, Sam; Mohan, Sangeetha; Chin, Chen-Shan; Korlach, Jonas; Thomas, George; Babu, Azariah; Seshagiri, Somasekar

    2018-06-12

    We sequenced the Hyposidra talaca NPV (HytaNPV) double stranded circular DNA genome using PacBio single molecule sequencing technology. We found that the HytaNPV genome is 139,089 bp long with a GC content of 39.6%. It encodes 141 open reading frames (ORFs) including the 37 baculovirus core genes, 25 genes conserved among lepidopteran baculoviruses, 72 genes known in baculovirus, and 7 genes unique to the HytaNPV genome. It is a group II alphabaculovirus that codes for the F protein and lacks the gp64 gene found in group I alphabaculovirus viruses. Using RNA-seq, we confirmed the expression of the ORFs identified in the HytaNPV genome. Phylogenetic analysis showed HytaNPV to be closest to BusuNPV, SujuNPV and EcobNPV that infect other tea pests, Buzura suppressaria, Sucra jujuba, and Ectropis oblique, respectively. We identified repeat elements and a conserved non-coding baculovirus element in the genome. Analysis of the putative promoter sequences identified motif consistent with the temporal expression of the genes observed in the RNA-seq data.

  12. G-Anchor: a novel approach for whole-genome comparative mapping utilizing evolutionary conserved DNA sequences.

    PubMed

    Lenis, Vasileios Panagiotis E; Swain, Martin; Larkin, Denis M

    2018-05-01

    Cross-species whole-genome sequence alignment is a critical first step for genome comparative analyses, ranging from the detection of sequence variants to studies of chromosome evolution. Animal genomes are large and complex, and whole-genome alignment is a computationally intense process, requiring expensive high-performance computing systems due to the need to explore extensive local alignments. With hundreds of sequenced animal genomes available from multiple projects, there is an increasing demand for genome comparative analyses. Here, we introduce G-Anchor, a new, fast, and efficient pipeline that uses a strictly limited but highly effective set of local sequence alignments to anchor (or map) an animal genome to another species' reference genome. G-Anchor makes novel use of a databank of highly conserved DNA sequence elements. We demonstrate how these elements may be aligned to a pair of genomes, creating anchors. These anchors enable the rapid mapping of scaffolds from a de novo assembled genome to chromosome assemblies of a reference species. Our results demonstrate that G-Anchor can successfully anchor a vertebrate genome onto a phylogenetically related reference species genome using a desktop or laptop computer within a few hours and with comparable accuracy to that achieved by a highly accurate whole-genome alignment tool such as LASTZ. G-Anchor thus makes whole-genome comparisons accessible to researchers with limited computational resources. G-Anchor is a ready-to-use tool for anchoring a pair of vertebrate genomes. It may be used with large genomes that contain a significant fraction of evolutionally conserved DNA sequences and that are not highly repetitive, polypoid, or excessively fragmented. G-Anchor is not a substitute for whole-genome aligning software but can be used for fast and accurate initial genome comparisons. G-Anchor is freely available and a ready-to-use tool for the pairwise comparison of two genomes.

  13. Principles of regulatory information conservation between mouse and human.

    PubMed

    Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun; Wu, Weisheng; Cayting, Philip; Boyle, Alan P; Sundaram, Vasavi; Xing, Xiaoyun; Dogan, Nergiz; Li, Jingjing; Euskirchen, Ghia; Lin, Shin; Lin, Yiing; Visel, Axel; Kawli, Trupti; Yang, Xinqiong; Patacsil, Dorrelyn; Keller, Cheryl A; Giardine, Belinda; Kundaje, Anshul; Wang, Ting; Pennacchio, Len A; Weng, Zhiping; Hardison, Ross C; Snyder, Michael P

    2014-11-20

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human-mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and with genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.

  14. Genomic regulatory blocks encompass multiple neighboring genes and maintain conserved synteny in vertebrates

    PubMed Central

    Kikuta, Hiroshi; Laplante, Mary; Navratilova, Pavla; Komisarczuk, Anna Z.; Engström, Pär G.; Fredman, David; Akalin, Altuna; Caccamo, Mario; Sealy, Ian; Howe, Kerstin; Ghislain, Julien; Pezeron, Guillaume; Mourrain, Philippe; Ellingsen, Staale; Oates, Andrew C.; Thisse, Christine; Thisse, Bernard; Foucher, Isabelle; Adolf, Birgit; Geling, Andrea; Lenhard, Boris; Becker, Thomas S.

    2007-01-01

    We report evidence for a mechanism for the maintenance of long-range conserved synteny across vertebrate genomes. We found the largest mammal-teleost conserved chromosomal segments to be spanned by highly conserved noncoding elements (HCNEs), their developmental regulatory target genes, and phylogenetically and functionally unrelated “bystander” genes. Bystander genes are not specifically under the control of the regulatory elements that drive the target genes and are expressed in patterns that are different from those of the target genes. Reporter insertions distal to zebrafish developmental regulatory genes pax6.1/2, rx3, id1, and fgf8 and miRNA genes mirn9-1 and mirn9-5 recapitulate the expression patterns of these genes even if located inside or beyond bystander genes, suggesting that the regulatory domain of a developmental regulatory gene can extend into and beyond adjacent transcriptional units. We termed these chromosomal segments genomic regulatory blocks (GRBs). After whole genome duplication in teleosts, GRBs, including HCNEs and target genes, were often maintained in both copies, while bystander genes were typically lost from one GRB, strongly suggesting that evolutionary pressure acts to keep the single-copy GRBs of higher vertebrates intact. We show that loss of bystander genes and other mutational events suffered by duplicated GRBs in teleost genomes permits target gene identification and HCNE/target gene assignment. These findings explain the absence of evolutionary breakpoints from large vertebrate chromosomal segments and will aid in the recognition of position effect mutations within human GRBs. PMID:17387144

  15. Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.

    PubMed

    Sun, Yan-Bo; Xiong, Zi-Jun; Xiang, Xue-Yan; Liu, Shi-Ping; Zhou, Wei-Wei; Tu, Xiao-Long; Zhong, Li; Wang, Lu; Wu, Dong-Dong; Zhang, Bao-Lin; Zhu, Chun-Ling; Yang, Min-Min; Chen, Hong-Man; Li, Fang; Zhou, Long; Feng, Shao-Hong; Huang, Chao; Zhang, Guo-Jie; Irwin, David; Hillis, David M; Murphy, Robert W; Yang, Huan-Ming; Che, Jing; Wang, Jun; Zhang, Ya-Ping

    2015-03-17

    The development of efficient sequencing techniques has resulted in large numbers of genomes being available for evolutionary studies. However, only one genome is available for all amphibians, that of Xenopus tropicalis, which is distantly related from the majority of frogs. More than 96% of frogs belong to the Neobatrachia, and no genome exists for this group. This dearth of amphibian genomes greatly restricts genomic studies of amphibians and, more generally, our understanding of tetrapod genome evolution. To fill this gap, we provide the de novo genome of a Tibetan Plateau frog, Nanorana parkeri, and compare it to that of X. tropicalis and other vertebrates. This genome encodes more than 20,000 protein-coding genes, a number similar to that of Xenopus. Although the genome size of Nanorana is considerably larger than that of Xenopus (2.3 vs. 1.5 Gb), most of the difference is due to the respective number of transposable elements in the two genomes. The two frogs exhibit considerable conserved whole-genome synteny despite having diverged approximately 266 Ma, indicating a slow rate of DNA structural evolution in anurans. Multigenome synteny blocks further show that amphibians have fewer interchromosomal rearrangements than mammals but have a comparable rate of intrachromosomal rearrangements. Our analysis also identifies 11 Mb of anuran-specific highly conserved elements that will be useful for comparative genomic analyses of frogs. The Nanorana genome offers an improved understanding of evolution of tetrapod genomes and also provides a genomic reference for other evolutionary studies.

  16. Functional noncoding sequences derived from SINEs in the mammalian genome.

    PubMed

    Nishihara, Hidenori; Smit, Arian F A; Okada, Norihiro

    2006-07-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the approximately 1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality.

  17. Genetic evidence for conserved non-coding element function across species–the ears have it

    PubMed Central

    Turner, Eric E.; Cox, Timothy C.

    2014-01-01

    Comparison of genomic sequences from diverse vertebrate species has revealed numerous highly conserved regions that do not appear to encode proteins or functional RNAs. Often these “conserved non-coding elements,” or CNEs, can direct gene expression to specific tissues in transgenic models, demonstrating they have regulatory function. CNEs are frequently found near “developmental” genes, particularly transcription factors, implying that these elements have essential regulatory roles in development. However, actual examples demonstrating CNE regulatory functions across species have been few, and recent loss-of-function studies of several CNEs in mice have shown relatively minor effects. In this Perspectives article, we discuss new findings in “fancy” rats and Highland cattle demonstrating that function of a CNE near the Hmx1 gene is crucial for normal external ear development and when disrupted can mimic loss-of function Hmx1 coding mutations in mice and humans. These findings provide important support for conserved developmental roles of CNEs in divergent species, and reinforce the concept that CNEs should be examined systematically in the ongoing search for genetic causes of human developmental disorders in the era of genome-scale sequencing. PMID:24478720

  18. Allele frequencies of variants in ultra conserved elements identify selective pressure on transcription factor binding.

    PubMed

    Silla, Toomas; Kepp, Katrin; Tai, E Shyong; Goh, Liang; Davila, Sonia; Catela Ivkovic, Tina; Calin, George A; Voorhoeve, P Mathijs

    2014-01-01

    Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level.

  19. Transposable elements in Drosophila.

    PubMed

    McCullers, Tabitha J; Steiniger, Mindy

    2017-01-01

    Transposable elements (TEs) are mobile genetic elements that can mobilize within host genomes. As TEs comprise more than 40% of the human genome and are linked to numerous diseases, understanding their mechanisms of mobilization and regulation is important. Drosophila melanogaster is an ideal model organism for the study of eukaryotic TEs as its genome contains a diverse array of active TEs. TEs universally impact host genome size via transposition and deletion events, but may also adopt unique functional roles in host organisms. There are 2 main classes of TEs: DNA transposons and retrotransposons. These classes are further divided into subgroups of TEs with unique structural and functional characteristics, demonstrating the significant variability among these elements. Despite this variability, D. melanogaster and other eukaryotic organisms utilize conserved mechanisms to regulate TEs. This review focuses on the transposition mechanisms and regulatory pathways of TEs, and their functional roles in D. melanogaster .

  20. Transposable elements in Drosophila

    PubMed Central

    McCullers, Tabitha J.; Steiniger, Mindy

    2017-01-01

    ABSTRACT Transposable elements (TEs) are mobile genetic elements that can mobilize within host genomes. As TEs comprise more than 40% of the human genome and are linked to numerous diseases, understanding their mechanisms of mobilization and regulation is important. Drosophila melanogaster is an ideal model organism for the study of eukaryotic TEs as its genome contains a diverse array of active TEs. TEs universally impact host genome size via transposition and deletion events, but may also adopt unique functional roles in host organisms. There are 2 main classes of TEs: DNA transposons and retrotransposons. These classes are further divided into subgroups of TEs with unique structural and functional characteristics, demonstrating the significant variability among these elements. Despite this variability, D. melanogaster and other eukaryotic organisms utilize conserved mechanisms to regulate TEs. This review focuses on the transposition mechanisms and regulatory pathways of TEs, and their functional roles in D. melanogaster. PMID:28580197

  1. Deep conservation of cis-regulatory elements in metazoans

    PubMed Central

    Maeso, Ignacio; Irimia, Manuel; Tena, Juan J.; Casares, Fernando; Gómez-Skarmeta, José Luis

    2013-01-01

    Despite the vast morphological variation observed across phyla, animals share multiple basic developmental processes orchestrated by a common ancestral gene toolkit. These genes interact with each other building complex gene regulatory networks (GRNs), which are encoded in the genome by cis-regulatory elements (CREs) that serve as computational units of the network. Although GRN subcircuits involved in ancient developmental processes are expected to be at least partially conserved, identification of CREs that are conserved across phyla has remained elusive. Here, we review recent studies that revealed such deeply conserved CREs do exist, discuss the difficulties associated with their identification and describe new approaches that will facilitate this search. PMID:24218633

  2. Natural Selection and Functional Potentials of Human Noncoding Elements Revealed by Analysis of Next Generation Sequencing Data

    PubMed Central

    Xu, Shuhua

    2015-01-01

    Noncoding DNA sequences (NCS) have attracted much attention recently due to their functional potentials. Here we attempted to reveal the functional roles of noncoding sequences from the point of view of natural selection that typically indicates the functional potentials of certain genomic elements. We analyzed nearly 37 million single nucleotide polymorphisms (SNPs) of Phase I data of the 1000 Genomes Project. We estimated a series of key parameters of population genetics and molecular evolution to characterize sequence variations of the noncoding genome within and between populations, and identified the natural selection footprints in NCS in worldwide human populations. Our results showed that purifying selection is prevalent and there is substantial constraint of variations in NCS, while positive selectionis more likely to be specific to some particular genomic regions and regional populations. Intriguingly, we observed larger fraction of non-conserved NCS variants with lower derived allele frequency in the genome, indicating possible functional gain of non-conserved NCS. Notably, NCS elements are enriched for potentially functional markers such as eQTLs, TF motif, and DNase I footprints in the genome. More interestingly, some NCS variants associated with diseases such as Alzheimer's disease, Type 1 diabetes, and immune-related bowel disorder (IBD) showed signatures of positive selection, although the majority of NCS variants, reported as risk alleles by genome-wide association studies, showed signatures of negative selection. Our analyses provided compelling evidence of natural selection forces on noncoding sequences in the human genome and advanced our understanding of their functional potentials that play important roles in disease etiology and human evolution. PMID:26053627

  3. Long-Range Control of Gene Expression: Emerging Mechanisms and Disruption in Disease

    PubMed Central

    Kleinjan, Dirk A.; van Heyningen, Veronica

    2005-01-01

    Transcriptional control is a major mechanism for regulating gene expression. The complex machinery required to effect this control is still emerging from functional and evolutionary analysis of genomic architecture. In addition to the promoter, many other regulatory elements are required for spatiotemporally and quantitatively correct gene expression. Enhancer and repressor elements may reside in introns or up- and downstream of the transcription unit. For some genes with highly complex expression patterns—often those that function as key developmental control genes—the cis-regulatory domain can extend long distances outside the transcription unit. Some of the earliest hints of this came from disease-associated chromosomal breaks positioned well outside the relevant gene. With the availability of wide-ranging genome sequence comparisons, strong conservation of many noncoding regions became obvious. Functional studies have shown many of these conserved sites to be transcriptional regulatory elements that sometimes reside inside unrelated neighboring genes. Such sequence-conserved elements generally harbor sites for tissue-specific DNA-binding proteins. Developmentally variable chromatin conformation can control protein access to these sites and can regulate transcription. Disruption of these finely tuned mechanisms can cause disease. Some regulatory element mutations will be associated with phenotypes distinct from any identified for coding-region mutations. PMID:15549674

  4. The Autographa californica Multiple Nucleopolyhedrovirus ac83 Gene Contains a cis-Acting Element That Is Essential for Nucleocapsid Assembly.

    PubMed

    Huang, Zhihong; Pan, Mengjia; Zhu, Silei; Zhang, Hao; Wu, Wenbi; Yuan, Meijin; Yang, Kai

    2017-03-01

    Baculoviridae is a family of insect-specific viruses that have a circular double-stranded DNA genome packaged within a rod-shaped capsid. The mechanism of baculovirus nucleocapsid assembly remains unclear. Previous studies have shown that deletion of the ac83 gene of Autographa californica multiple nucleopolyhedrovirus (AcMNPV) blocks viral nucleocapsid assembly. Interestingly, the ac83 -encoded protein Ac83 is not a component of the nucleocapsid, implying a particular role for ac83 in nucleocapsid assembly that may be independent of its protein product. To examine this possibility, Ac83 synthesis was disrupted by insertion of a chloramphenicol resistance gene into its coding sequence or by deleting its promoter and translation start codon. Both mutants produced progeny viruses normally, indicating that the Ac83 protein is not required for nucleocapsid assembly. Subsequently, complementation assays showed that the production of progeny viruses required the presence of ac83 in the AcMNPV genome instead of its presence in trans Therefore, we reasoned that ac83 is involved in nucleocapsid assembly via an internal cis -acting element, which we named the nucleocapsid assembly-essential element (NAE). The NAE was identified to lie within nucleotides 1651 to 1850 of ac83 and had 8 conserved A/T-rich regions. Sequences homologous to the NAE were found only in alphabaculoviruses and have a conserved positional relationship with another essential cis -acting element that was recently identified. The identification of the NAE may help to connect the data of viral cis -acting elements and related proteins in the baculovirus nucleocapsid assembly, which is important for elucidating DNA-protein interaction events during this process. IMPORTANCE Virus nucleocapsid assembly usually requires specific cis -acting elements in the viral genome for various processes, such as the selection of the viral genome from the cellular nucleic acids, the cleavage of concatemeric viral genome replication intermediates, and the encapsidation of the viral genome into procapsids. In linear DNA viruses, such elements generally locate at the ends of the viral genome; however, most of these elements remain unidentified in circular DNA viruses (including baculovirus) due to their circular genomic conformation. Here, we identified a nucleocapsid assembly-essential element in the AcMNPV (the archetype of baculovirus) genome. This finding provides an important reference for studies of nucleocapsid assembly-related elements in baculoviruses and other circular DNA viruses. Moreover, as most of the previous studies of baculovirus nucleocapsid assembly have been focused on viral proteins, our study provides a novel entry point to investigate this mechanism via cis -acting elements in the viral genome. Copyright © 2017 American Society for Microbiology.

  5. Transposable element distribution, abundance and role in genome size variation in the genus Oryza.

    PubMed

    Zuccolo, Andrea; Sebastian, Aswathy; Talag, Jayson; Yu, Yeisoo; Kim, HyeRan; Collura, Kristi; Kudrna, Dave; Wing, Rod A

    2007-08-29

    The genus Oryza is composed of 10 distinct genome types, 6 diploid and 4 polyploid, and includes the world's most important food crop - rice (Oryza sativa [AA]). Genome size variation in the Oryza is more than 3-fold and ranges from 357 Mbp in Oryza glaberrima [AA] to 1283 Mbp in the polyploid Oryza ridleyi [HHJJ]. Because repetitive elements are known to play a significant role in genome size variation, we constructed random sheared small insert genomic libraries from 12 representative Oryza species and conducted a comprehensive study of the repetitive element composition, distribution and phylogeny in this genus. Particular attention was paid to the role played by the most important classes of transposable elements (Long Terminal Repeats Retrotransposons, Long interspersed Nuclear Elements, helitrons, DNA transposable elements) in shaping these genomes and in their contributing to genome size variation. We identified the elements primarily responsible for the most strikingly genome size variation in Oryza. We demonstrated how Long Terminal Repeat retrotransposons belonging to the same families have proliferated to very different extents in various species. We also showed that the pool of Long Terminal Repeat Retrotransposons is substantially conserved and ubiquitous throughout the Oryza and so its origin is ancient and its existence predates the speciation events that originated the genus. Finally we described the peculiar behavior of repeats in the species Oryza coarctata [HHKK] whose placement in the Oryza genus is controversial. Long Terminal Repeat retrotransposons are the major component of the Oryza genomes analyzed and, along with polyploidization, are the most important contributors to the genome size variation across the Oryza genus. Two families of Ty3-gypsy elements (RIRE2 and Atlantys) account for a significant portion of the genome size variations present in the Oryza genus.

  6. Human Variation in Short Regions Predisposed to Deep Evolutionary Conservation

    PubMed Central

    Loots, Gabriela G.; Ovcharenko, Ivan

    2010-01-01

    The landscape of the human genome consists of millions of short islands of conservation that are 100% conserved across multiple vertebrate genomes (termed “bricks”), the majority of which are located in noncoding regions. Several hundred thousand bricks are deeply conserved reaching the genomes of amphibians and fish. Deep phylogenetic conservation of noncoding DNA has been reported to be strongly associated with the presence of gene regulatory elements, introducing bricks as a proxy to the functional noncoding landscape of the human genome. Here, we report a significant overrepresentation of bricks in the promoters of transcription factors and developmental genes, where the high level of phylogenetic conservation correlates with an increase in brick overrepresentation. We also found that the presence of a brick dictates a predisposition to evolutionary constraint, with only 0.7% of the amniota brick central nucleotides being diverged within the primate lineage—an 11-fold reduction in the divergence rate compared with random expectation. Human single-nucleotide polymorphism (SNP) data explains only 3% of primate-specific variation in amniota bricks, thus arguing for a widespread fixation of brick mutations within the primate lineage and prior to human radiation. This variation, in turn, might have been utilized as a driving force for primate- and hominoid-specific adaptation. We also discovered a pronounced deviation from the evolutionary predisposition in the human lineage, with over 20-fold increase in the substitution rate at brick SNP sites over expected values. In addition, contrary to typical brick mutations, brick variation commonly encountered in the human population displays limited, if any, signatures of negative selection as measured by the minor allele frequency and population differentiation (F-statistical measure) measures. These observations argue for the plasticity of gene regulatory mechanisms in vertebrates—with evidence of strong purifying selection acting on the gene regulatory landscape of the human genome, where widespread advantageous mutations in putative regulatory elements are likely utilized in functional diversification and adaptation of species. PMID:20093432

  7. Characterization of the repetitive DNA elements in the genome of fish lymphocystis disease viruses.

    PubMed

    Schnitzler, P; Darai, G

    1989-09-01

    The complete DNA nucleotide sequence of the repetitive DNA elements in the genome of fish lymphocystis disease virus (FLDV) isolated from two different species (flounder and dab) was determined. The size of these repetitive DNA elements was found to be 1413 bp which corresponds to the DNA sequences of the 5' terminus of the EcoRI DNA fragment B (0.034 to 0.052 m.u.) and to the EcoRI DNA fragment M (0.718 to 0.736 m.u.) of the FLDV genome causing lymphocystis disease in flounder and plaice. The degree of DNA nucleotide homology between both regions was found to be 99%. The repetitive DNA element in the genome of FLDV isolated from other fish species (dab) was identified and is located within the EcoRI DNA fragment B and J of the viral genome. The DNA nucleotide sequence of one duplicate of this repetition (EcoRI DNA fragment J) was determined (1410 bp) and compared to the DNA nucleotide sequences of the repetitive DNA elements of the genome of FLDV isolated from flounder. It was found that the repetitive DNA elements of the genome of FLDV derived from two different fish species are highly conserved and possess a degree of DNA sequence homology of 94%. The DNA sequences of each strand of the individual repetitive element possess one open reading frame.

  8. Evolutionary Genomics of an Ancient Prophage of the Order Sphingomonadales

    PubMed Central

    Viswanathan, Vandana; Narjala, Anushree; Ravichandran, Aravind; Jayaprasad, Suvratha

    2017-01-01

    The order Sphingomonadales, containing the families Erythrobacteraceae and Sphingomonadaceae, is a relatively less well-studied phylogenetic branch within the class Alphaproteobacteria. Prophage elements are present in most bacterial genomes and are important determinants of adaptive evolution. An “intact” prophage was predicted within the genome of Sphingomonas hengshuiensis strain WHSC-8 and was designated Prophage IWHSC-8. Loci homologous to the region containing the first 22 open reading frames (ORFs) of Prophage IWHSC-8 were discovered among the genomes of numerous Sphingomonadales. In 17 genomes, the homologous loci were co-located with an ORF encoding a putative superoxide dismutase. Several other lines of molecular evidence implied that these homologous loci represent an ancient temperate bacteriophage integration, and this horizontal transfer event pre-dated niche-based speciation within the order Sphingomonadales. The “stabilization” of prophages in the genomes of their hosts is an indicator of “fitness” conferred by these elements and natural selection. Among the various ORFs predicted within the conserved prophages, an ORF encoding a putative proline-rich outer membrane protein A was consistently present among the genomes of many Sphingomonadales. Furthermore, the conserved prophages in six Sphingomonas sp. contained an ORF encoding a putative spermidine synthase. It is possible that one or more of these ORFs bestow selective fitness, and thus the prophages continue to be vertically transferred within the host strains. Although conserved prophages have been identified previously among closely related genera and species, this is the first systematic and detailed description of orthologous prophages at the level of an order that contains two diverse families and many pigmented species. PMID:28201618

  9. Principles of regulatory information conservation between mouse and human

    DOE PAGES

    Cheng, Yong; Ma, Zhihai; Kim, Bong-Hyun; ...

    2014-11-19

    To broaden our understanding of the evolution of gene regulation mechanisms, we generated occupancy profiles for 34 orthologous transcription factors (TFs) in human–mouse erythroid progenitor, lymphoblast and embryonic stem-cell lines. By combining the genome-wide transcription factor occupancy repertoires, associated epigenetic signals, and co-association patterns, here we deduce several evolutionary principles of gene regulatory features operating since the mouse and human lineages diverged. The genomic distribution profiles, primary binding motifs, chromatin states, and DNA methylation preferences are well conserved for TF-occupied sequences. However, the extent to which orthologous DNA segments are bound by orthologous TFs varies both among TFs and withmore » genomic location: binding at promoters is more highly conserved than binding at distal elements. Notably, occupancy-conserved TF-occupied sequences tend to be pleiotropic; they function in several tissues and also co-associate with many TFs. Lastly, single nucleotide variants at sites with potential regulatory functions are enriched in occupancy-conserved TF-occupied sequences.« less

  10. The Genome of the Western Clawed Frog Xenopus tropicalis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hellsten, Uffe; Harland, Richard M.; Gilchrist, Michael J.

    2009-10-01

    The western clawed frog Xenopus tropicalis is an important model for vertebrate development that combines experimental advantages of the African clawed frog Xenopus laevis with more tractable genetics. Here we present a draft genome sequence assembly of X. tropicalis. This genome encodes over 20,000 protein-coding genes, including orthologs of at least 1,700 human disease genes. Over a million expressed sequence tags validated the annotation. More than one-third of the genome consists of transposable elements, with unusually prevalent DNA transposons. Like other tetrapods, the genome contains gene deserts enriched for conserved non-coding elements. The genome exhibits remarkable shared synteny with humanmore » and chicken over major parts of large chromosomes, broken by lineage-specific chromosome fusions and fissions, mainly in the mammalian lineage.« less

  11. Novel cis-acting element within the capsid-coding region enhances flavivirus viral-RNA replication by regulating genome cyclization.

    PubMed

    Liu, Zhong-Yu; Li, Xiao-Feng; Jiang, Tao; Deng, Yong-Qiang; Zhao, Hui; Wang, Hong-Jiang; Ye, Qing; Zhu, Shun-Ya; Qiu, Yang; Zhou, Xi; Qin, E-De; Qin, Cheng-Feng

    2013-06-01

    cis-Acting elements in the viral genome RNA (vRNA) are essential for the translation, replication, and/or encapsidation of RNA viruses. In this study, a novel conserved cis-acting element was identified in the capsid-coding region of mosquito-borne flavivirus. The downstream of 5' cyclization sequence (5'CS) pseudoknot (DCS-PK) element has a three-stem pseudoknot structure, as demonstrated by structure prediction and biochemical analysis. Using dengue virus as a model, we show that DCS-PK enhances vRNA replication and that its function depends on its secondary structure and specific primary sequence. Mutagenesis revealed that the highly conserved stem 1 and loop 2, which are involved in potential loop-helix interactions, are crucial for DCS-PK function. A predicted loop 1-stem 3 base triple interaction is important for the structural stability and function of DCS-PK. Moreover, the function of DCS-PK depends on its position relative to the 5'CS, and the presence of DCS-PK facilitates the formation of 5'-3' RNA complexes. Taken together, our results reveal that the cis-acting element DCS-PK enhances vRNA replication by regulating genome cyclization, and DCS-PK might interplay with other cis-acting elements to form a functional vRNA cyclization domain, thus playing critical roles during the flavivirus life cycle and evolution.

  12. Novel cis-Acting Element within the Capsid-Coding Region Enhances Flavivirus Viral-RNA Replication by Regulating Genome Cyclization

    PubMed Central

    Liu, Zhong-Yu; Li, Xiao-Feng; Jiang, Tao; Deng, Yong-Qiang; Zhao, Hui; Wang, Hong-Jiang; Ye, Qing; Zhu, Shun-Ya; Qiu, Yang; Zhou, Xi; Qin, E-De

    2013-01-01

    cis-Acting elements in the viral genome RNA (vRNA) are essential for the translation, replication, and/or encapsidation of RNA viruses. In this study, a novel conserved cis-acting element was identified in the capsid-coding region of mosquito-borne flavivirus. The downstream of 5′ cyclization sequence (5′CS) pseudoknot (DCS-PK) element has a three-stem pseudoknot structure, as demonstrated by structure prediction and biochemical analysis. Using dengue virus as a model, we show that DCS-PK enhances vRNA replication and that its function depends on its secondary structure and specific primary sequence. Mutagenesis revealed that the highly conserved stem 1 and loop 2, which are involved in potential loop-helix interactions, are crucial for DCS-PK function. A predicted loop 1-stem 3 base triple interaction is important for the structural stability and function of DCS-PK. Moreover, the function of DCS-PK depends on its position relative to the 5′CS, and the presence of DCS-PK facilitates the formation of 5′-3′ RNA complexes. Taken together, our results reveal that the cis-acting element DCS-PK enhances vRNA replication by regulating genome cyclization, and DCS-PK might interplay with other cis-acting elements to form a functional vRNA cyclization domain, thus playing critical roles during the flavivirus life cycle and evolution. PMID:23576500

  13. Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis

    PubMed Central

    Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia

    2011-01-01

    Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358

  14. RNA structural constraints in the evolution of the influenza A virus genome NP segment

    PubMed Central

    Gultyaev, Alexander P; Tsyganov-Bodounov, Anton; Spronken, Monique IJ; van der Kooij, Sander; Fouchier, Ron AM; Olsthoorn, René CL

    2014-01-01

    Conserved RNA secondary structures were predicted in the nucleoprotein (NP) segment of the influenza A virus genome using comparative sequence and structure analysis. A number of structural elements exhibiting nucleotide covariations were identified over the whole segment length, including protein-coding regions. Calculations of mutual information values at the paired nucleotide positions demonstrate that these structures impose considerable constraints on the virus genome evolution. Functional importance of a pseudoknot structure, predicted in the NP packaging signal region, was confirmed by plaque assays of the mutant viruses with disrupted structure and those with restored folding using compensatory substitutions. Possible functions of the conserved RNA folding patterns in the influenza A virus genome are discussed. PMID:25180940

  15. pTC Plasmids from Sulfolobus Species in the Geothermal Area of Tengchong, China: Genomic Conservation and Naturally-Occurring Variations as a Result of Transposition by Mobile Genetic Elements

    PubMed Central

    Xiang, Xiaoyu; Huang, Xiaoxing; Wang, Haina; Huang, Li

    2015-01-01

    Plasmids occur frequently in Archaea. A novel plasmid (denoted pTC1) containing typical conjugation functions has been isolated from Sulfolobus tengchongensis RT8-4, a strain obtained from a hot spring in Tengchong, China, and characterized. The plasmid is a circular double-stranded DNA molecule of 20,417 bp. Among a total of 26 predicted pTC1 ORFs, 23 have homologues in other known Sulfolobus conjugative plasmids (CPs). pTC1 resembles other Sulfolobus CPs in genome architecture, and is most highly conserved in the genomic region encoding conjugation functions. However, attempts to demonstrate experimentally the capacity of the plasmid for conjugational transfer were unsuccessful. A survey revealed that pTC1 and its closely related plasmid variants were widespread in the geothermal area of Tengchong. Variations of the plasmids at the target sites for transposition by an insertion sequence (IS) and a miniature inverted-repeat transposable element (MITE) were readily detected. The IS was efficiently inserted into the pTC1 genome, and the inserted sequence was inactivated and degraded more frequently in an imprecise manner than in a precise manner. These results suggest that the host organism has evolved a strategy to maintain a balance between the insertion and elimination of mobile genetic elements to permit genomic plasticity while inhibiting their fast spreading. PMID:25686154

  16. pTC Plasmids from Sulfolobus Species in the Geothermal Area of Tengchong, China: Genomic Conservation and Naturally-Occurring Variations as a Result of Transposition by Mobile Genetic Elements.

    PubMed

    Xiang, Xiaoyu; Huang, Xiaoxing; Wang, Haina; Huang, Li

    2015-02-12

    Plasmids occur frequently in Archaea. A novel plasmid (denoted pTC1) containing typical conjugation functions has been isolated from Sulfolobus tengchongensis RT8-4, a strain obtained from a hot spring in Tengchong, China, and characterized. The plasmid is a circular double-stranded DNA molecule of 20,417 bp. Among a total of 26 predicted pTC1 ORFs, 23 have homologues in other known Sulfolobus conjugative plasmids (CPs). pTC1 resembles other Sulfolobus CPs in genome architecture, and is most highly conserved in the genomic region encoding conjugation functions. However, attempts to demonstrate experimentally the capacity of the plasmid for conjugational transfer were unsuccessful. A survey revealed that pTC1 and its closely related plasmid variants were widespread in the geothermal area of Tengchong. Variations of the plasmids at the target sites for transposition by an insertion sequence (IS) and a miniature inverted-repeat transposable element (MITE) were readily detected. The IS was efficiently inserted into the pTC1 genome, and the inserted sequence was inactivated and degraded more frequently in an imprecise manner than in a precise manner. These results suggest that the host organism has evolved a strategy to maintain a balance between the insertion and elimination of mobile genetic elements to permit genomic plasticity while inhibiting their fast spreading.

  17. Functional noncoding sequences derived from SINEs in the mammalian genome

    PubMed Central

    Nishihara, Hidenori; Smit, Arian F.A.; Okada, Norihiro

    2006-01-01

    Recent comparative analyses of mammalian sequences have revealed that a large number of nonprotein-coding genomic regions are under strong selective constraint. Here, we report that some of these loci have been derived from a newly defined family of ancient SINEs (short interspersed repetitive elements). This is a surprising result, as SINEs and other transposable elements are commonly thought to be genomic parasites. We named the ancient SINE family AmnSINE1, for Amniota SINE1, because we found it to be present in mammals as well as in birds, and some copies predate the mammalian-bird split 310 million years ago (Mya). AmnSINE1 has a chimeric structure of a 5S rRNA and a tRNA-derived SINE, and is related to five tRNA-derived SINE families that we characterized here in the coelacanth, dogfish shark, hagfish, and amphioxus genomes. All of the newly described SINE families have a common central domain that is also shared by zebrafish SINE3, and we collectively name them the DeuSINE (Deuterostomia SINE) superfamily. Notably, of the ∼1000 still identifiable copies of AmnSINE1 in the human genome, 105 correspond to loci phylogenetically highly conserved among mammalian orthologs. The conservation is strongest over the central domain. Thus, AmnSINE1 appears to be the best example of a transposable element of which a significant fraction of the copies have acquired genomic functionality. PMID:16717141

  18. Insights into Structural and Mechanistic Features of Viral IRES Elements

    PubMed Central

    Martinez-Salas, Encarnacion; Francisco-Velilla, Rosario; Fernandez-Chamorro, Javier; Embarek, Azman M.

    2018-01-01

    Internal ribosome entry site (IRES) elements are cis-acting RNA regions that promote internal initiation of protein synthesis using cap-independent mechanisms. However, distinct types of IRES elements present in the genome of various RNA viruses perform the same function despite lacking conservation of sequence and secondary RNA structure. Likewise, IRES elements differ in host factor requirement to recruit the ribosomal subunits. In spite of this diversity, evolutionarily conserved motifs in each family of RNA viruses preserve sequences impacting on RNA structure and RNA–protein interactions important for IRES activity. Indeed, IRES elements adopting remarkable different structural organizations contain RNA structural motifs that play an essential role in recruiting ribosomes, initiation factors and/or RNA-binding proteins using different mechanisms. Therefore, given that a universal IRES motif remains elusive, it is critical to understand how diverse structural motifs deliver functions relevant for IRES activity. This will be useful for understanding the molecular mechanisms beyond cap-independent translation, as well as the evolutionary history of these regulatory elements. Moreover, it could improve the accuracy to predict IRES-like motifs hidden in genome sequences. This review summarizes recent advances on the diversity and biological relevance of RNA structural motifs for viral IRES elements. PMID:29354113

  19. Identification of a Recently Active Mammalian SINE Derived from Ribosomal RNA

    PubMed Central

    Longo, Mark S.; Brown, Judy D.; Zhang, Chu; O’Neill, Michael J.; O’Neill, Rachel J.

    2015-01-01

    Complex eukaryotic genomes are riddled with repeated sequences whose derivation does not coincide with phylogenetic history and thus is often unknown. Among such sequences, the capacity for transcriptional activity coupled with the adaptive use of reverse transcription can lead to a diverse group of genomic elements across taxa, otherwise known as selfish elements or mobile elements. Short interspersed nuclear elements (SINEs) are nonautonomous mobile elements found in eukaryotic genomes, typically derived from cellular RNAs such as tRNAs, 7SL or 5S rRNA. Here, we identify and characterize a previously unknown SINE derived from the 3′-end of the large ribosomal subunit (LSU or 28S rDNA) and transcribed via RNA polymerase III. This new element, SINE28, is represented in low-copy numbers in the human reference genome assembly, wherein we have identified 27 discrete loci. Phylogenetic analysis indicates these elements have been transpositionally active within primate lineages as recently as 6 MYA while modern humans still carry transcriptionally active copies. Moreover, we have identified SINE28s in all currently available assembled mammalian genome sequences. Phylogenetic comparisons indicate that these elements are frequently rederived from the highly conserved LSU rRNA sequences in a lineage-specific manner. We propose that this element has not been previously recognized as a SINE given its high identity to the canonical LSU, and that SINE28 likely represents one of possibly many unidentified, active transposable elements within mammalian genomes. PMID:25637222

  20. Identification of the Intragenomic Promoter Controlling Hepatitis E Virus Subgenomic RNA Transcription.

    PubMed

    Ding, Qiang; Nimgaonkar, Ila; Archer, Nicholas F; Bram, Yaron; Heller, Brigitte; Schwartz, Robert E; Ploss, Alexander

    2018-05-08

    Approximately 20 million hepatitis E virus (HEV) infections occur annually in both developing and industrialized countries. Most infections are self-limiting, but they can lead to chronic infections and cirrhosis in immunocompromised patients, and death in pregnant women. The mechanisms of HEV replication remain incompletely understood due to scarcity of adequate experimental platforms. HEV undergoes asymmetric genome replication, but it produces an additional subgenomic (SG) RNA encoding the viral capsid and a viroporin in partially overlapping open reading frames. Using a novel transcomplementation system, we mapped the intragenomic subgenomic promoter regulating SG RNA synthesis. This cis -acting element is highly conserved across all eight HEV genotypes, and when the element is mutated, it abrogates particle assembly and release. Our work defines previously unappreciated viral regulatory elements and provides the first in-depth view of the intracellular genome dynamics of this emerging human pathogen. IMPORTANCE HEV is an emerging pathogen causing severe liver disease. The genetic information of HEV is encoded in RNA. The genomic RNA is initially copied into a complementary, antigenomic RNA that is a template for synthesis of more genomic RNA and for so-called subgenomic RNA. In this study, we identified the precise region within the HEV genome at which the synthesis of the subgenomic RNA is initiated. The nucleotides within this region are conserved across genetically distinct variants of HEV, highlighting the general importance of this segment for the virus. To identify this regulatory element, we developed a new experimental system that is a powerful tool with broad utility to mechanistically dissect many other poorly understood functional elements of HEV. Copyright © 2018 Ding et al.

  1. Characteristics of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) rRNA genes of Apis mellifera (Insecta: Hymenoptera): structure, organization, and retrotransposable elements

    PubMed Central

    Gillespie, J J; Johnston, J S; Cannone, J J; Gutell, R R

    2006-01-01

    As an accompanying manuscript to the release of the honey bee genome, we report the entire sequence of the nuclear (18S, 5.8S, 28S and 5S) and mitochondrial (12S and 16S) ribosomal RNA (rRNA)-encoding gene sequences (rDNA) and related internally and externally transcribed spacer regions of Apis mellifera (Insecta: Hymenoptera: Apocrita). Additionally, we predict secondary structures for the mature rRNA molecules based on comparative sequence analyses with other arthropod taxa and reference to recently published crystal structures of the ribosome. In general, the structures of honey bee rRNAs are in agreement with previously predicted rRNA models from other arthropods in core regions of the rRNA, with little additional expansion in non-conserved regions. Our multiple sequence alignments are made available on several public databases and provide a preliminary establishment of a global structural model of all rRNAs from the insects. Additionally, we provide conserved stretches of sequences flanking the rDNA cistrons that comprise the externally transcribed spacer regions (ETS) and part of the intergenic spacer region (IGS), including several repetitive motifs. Finally, we report the occurrence of retrotransposition in the nuclear large subunit rDNA, as R2 elements are present in the usual insertion points found in other arthropods. Interestingly, functional R1 elements usually present in the genomes of insects were not detected in the honey bee rRNA genes. The reverse transcriptase products of the R2 elements are deduced from their putative open reading frames and structurally aligned with those from another hymenopteran insect, the jewel wasp Nasonia (Pteromalidae). Stretches of conserved amino acids shared between Apis and Nasonia are illustrated and serve as potential sites for primer design, as target amplicons within these R2 elements may serve as novel phylogenetic markers for Hymenoptera. Given the impending completion of the sequencing of the Nasonia genome, we expect our report eventually to shed light on the evolution of the hymenopteran genome within higher insects, particularly regarding the relative maintenance of conserved rDNA genes, related variable spacer regions and retrotransposable elements. PMID:17069639

  2. Functional Information Stored in the Conserved Structural RNA Domains of Flavivirus Genomes

    PubMed Central

    Fernández-Sanlés, Alba; Ríos-Marco, Pablo; Romero-López, Cristina; Berzal-Herranz, Alfredo

    2017-01-01

    The genus Flavivirus comprises a large number of small, positive-sense single-stranded, RNA viruses able to replicate in the cytoplasm of certain arthropod and/or vertebrate host cells. The genus, which has some 70 member species, includes a number of emerging and re-emerging pathogens responsible for outbreaks of human disease around the world, such as the West Nile, dengue, Zika, yellow fever, Japanese encephalitis, St. Louis encephalitis, and tick-borne encephalitis viruses. Like other RNA viruses, flaviviruses have a compact RNA genome that efficiently stores all the information required for the completion of the infectious cycle. The efficiency of this storage system is attributable to supracoding elements, i.e., discrete, structural units with essential functions. This information storage system overlaps and complements the protein coding sequence and is highly conserved across the genus. It therefore offers interesting potential targets for novel therapeutic strategies. This review summarizes our knowledge of the features of flavivirus genome functional RNA domains. It also provides a brief overview of the main achievements reported in the design of antiviral nucleic acid-based drugs targeting functional genomic RNA elements. PMID:28421048

  3. Distribution, Diversity, and Long-Term Retention of Grass Short Interspersed Nuclear Elements (SINEs).

    PubMed

    Mao, Hongliang; Wang, Hao

    2017-08-01

    Instances of highly conserved plant short interspersed nuclear element (SINE) families and their enrichment near genes have been well documented, but little is known about the general patterns of such conservation and enrichment and underlying mechanisms. Here, we perform a comprehensive investigation of the structure, distribution, and evolution of SINEs in the grass family by analyzing 14 grass and 5 other flowering plant genomes using comparative genomics methods. We identify 61 SINE families composed of 29,572 copies, in which 46 families are first described. We find that comparing with other grass TEs, grass SINEs show much higher level of conservation in terms of genomic retention: The origin of at least 26% families can be traced to early grass diversification and these families are among most abundant SINE families in 86% species. We find that these families show much higher level of enrichment near protein coding genes than families of relatively recent origin (51%:28%), and that 40% of all grass SINEs are near gene and the percentage is higher than other types of grass TEs. The pattern of enrichment suggests that differential removal of SINE copies in gene-poor regions plays an important role in shaping the genomic distribution of these elements. We also identify a sequence motif located at 3' SINE end which is shared in 17 families. In short, this study provides insights into structure and evolution of SINEs in the grass family. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  4. Ancient Exaptation of a CORE-SINE Retroposon into a Highly Conserved Mammalian Neuronal Enhancer of the Proopiomelanocortin Gene

    PubMed Central

    Bumaschny, Viviana F; Low, Malcolm J; Rubinstein, Marcelo

    2007-01-01

    The proopiomelanocortin gene (POMC) is expressed in the pituitary gland and the ventral hypothalamus of all jawed vertebrates, producing several bioactive peptides that function as peripheral hormones or central neuropeptides, respectively. We have recently determined that mouse and human POMC expression in the hypothalamus is conferred by the action of two 5′ distal and unrelated enhancers, nPE1 and nPE2. To investigate the evolutionary origin of the neuronal enhancer nPE2, we searched available vertebrate genome databases and determined that nPE2 is a highly conserved element in placentals, marsupials, and monotremes, whereas it is absent in nonmammalian vertebrates. Following an in silico paleogenomic strategy based on genome-wide searches for paralog sequences, we discovered that opossum and wallaby nPE2 sequences are highly similar to members of the superfamily of CORE-short interspersed nucleotide element (SINE) retroposons, in particular to MAR1 retroposons that are widely present in marsupial genomes. Thus, the neuronal enhancer nPE2 originated from the exaptation of a CORE-SINE retroposon in the lineage leading to mammals and remained under purifying selection in all mammalian orders for the last 170 million years. Expression studies performed in transgenic mice showed that two nonadjacent nPE2 subregions are essential to drive reporter gene expression into POMC hypothalamic neurons, providing the first functional example of an exapted enhancer derived from an ancient CORE-SINE retroposon. In addition, we found that this CORE-SINE family of retroposons is likely to still be active in American and Australian marsupial genomes and that several highly conserved exonic, intronic and intergenic sequences in the human genome originated from the exaptation of CORE-SINE retroposons. Together, our results provide clear evidence of the functional novelties that transposed elements contributed to their host genomes throughout evolution. PMID:17922573

  5. The Most Deeply Conserved Noncoding Sequences in Plants Serve Similar Functions to Those in Vertebrates Despite Large Differences in Evolutionary Rates[W

    PubMed Central

    Burgess, Diane; Freeling, Michael

    2014-01-01

    In vertebrates, conserved noncoding elements (CNEs) are functionally constrained sequences that can show striking conservation over >400 million years of evolutionary distance and frequently are located megabases away from target developmental genes. Conserved noncoding sequences (CNSs) in plants are much shorter, and it has been difficult to detect conservation among distantly related genomes. In this article, we show not only that CNS sequences can be detected throughout the eudicot clade of flowering plants, but also that a subset of 37 CNSs can be found in all flowering plants (diverging ∼170 million years ago). These CNSs are functionally similar to vertebrate CNEs, being highly associated with transcription factor and development genes and enriched in transcription factor binding sites. Some of the most highly conserved sequences occur in genes encoding RNA binding proteins, particularly the RNA splicing–associated SR genes. Differences in sequence conservation between plants and animals are likely to reflect differences in the biology of the organisms, with plants being much more able to tolerate genomic deletions and whole-genome duplication events due, in part, to their far greater fecundity compared with vertebrates. PMID:24681619

  6. Evolutionary growth process of highly conserved sequences in vertebrate genomes.

    PubMed

    Ishibashi, Minaka; Noda, Akiko Ogura; Sakate, Ryuichi; Imanishi, Tadashi

    2012-08-01

    Genome sequence comparison between evolutionarily distant species revealed ultraconserved elements (UCEs) among mammals under strong purifying selection. Most of them were also conserved among vertebrates. Because they tend to be located in the flanking regions of developmental genes, they would have fundamental roles in creating vertebrate body plans. However, the evolutionary origin and selection mechanism of these UCEs remain unclear. Here we report that UCEs arose in primitive vertebrates, and gradually grew in vertebrate evolution. We searched for UCEs in two teleost fishes, Tetraodon nigroviridis and Oryzias latipes, and found 554 UCEs with 100% identity over 100 bps. Comparison of teleost and mammalian UCEs revealed 43 pairs of common, jawed-vertebrate UCEs (jUCE) with high sequence identities, ranging from 83.1% to 99.2%. Ten of them retain lower similarities to the Petromyzon marinus genome, and the substitution rates of four non-exonic jUCEs were reduced after the teleost-mammal divergence, suggesting that robust conservation had been acquired in the jawed vertebrate lineage. Our results indicate that prototypical UCEs originated before the divergence of jawed and jawless vertebrates and have been frozen as perfect conserved sequences in the jawed vertebrate lineage. In addition, our comparative sequence analyses of UCEs and neighboring regions resulted in a discovery of lineage-specific conserved sequences. They were added progressively to prototypical UCEs, suggesting step-wise acquisition of novel regulatory roles. Our results indicate that conserved non-coding elements (CNEs) consist of blocks with distinct evolutionary history, each having been frozen since different evolutionary era along the vertebrate lineage. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch.

    PubMed

    Kaplan, Oktay I; Berber, Burak; Hekim, Nezih; Doluca, Osman

    2016-11-02

    Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Disruption of long-distance highly conserved noncoding elements in neurocristopathies.

    PubMed

    Amiel, Jeanne; Benko, Sabina; Gordon, Christopher T; Lyonnet, Stanislas

    2010-12-01

    One of the key discoveries of vertebrate genome sequencing projects has been the identification of highly conserved noncoding elements (CNEs). Some characteristics of CNEs include their high frequency in mammalian genomes, their potential regulatory role in gene expression, and their enrichment in gene deserts nearby master developmental genes. The abnormal development of neural crest cells (NCCs) leads to a broad spectrum of congenital malformation(s), termed neurocristopathies, and/or tumor predisposition. Here we review recent findings that disruptions of CNEs, within or at long distance from the coding sequences of key genes involved in NCC development, result in neurocristopathies via the alteration of tissue- or stage-specific long-distance regulation of gene expression. While most studies on human genetic disorders have focused on protein-coding sequences, these examples suggest that investigation of genomic alterations of CNEs will provide a broader understanding of the molecular etiology of both rare and common human congenital malformations. © 2010 New York Academy of Sciences.

  9. Molecular characterization, genomic distribution and evolutionary dynamics of Short INterspersed Elements in the termite genome.

    PubMed

    Luchetti, Andrea; Mantovani, Barbara

    2011-02-01

    Short INterspersed Elements (SINEs) in invertebrates, and especially in animal inbred genomes such that of termites, are poorly known; in this paper we characterize three new SINE families (Talub, Taluc and Talud) through the analyses of 341 sequences, either isolated from the Reticulitermes lucifugus genome or drawn from EST Genbank collection. We further add new data to the only isopteran element known so far, Talua. These SINEs are tRNA-derived elements, with an average length ranging from 258 to 372 bp. The tails are made up by poly(A) or microsatellite motifs. Their copy number varies from 7.9 × 10(3) to 10(5) copies, well within the range observed for other metazoan genomes. Species distribution, age and target site duplication analysis indicate Talud as the oldest, possibly inactive SINE originated before the onset of Isoptera (~150 Myr ago). Taluc underwent to substantial sequence changes throughout the evolution of termites and data suggest it was silenced and then re-activated in the R. lucifugus lineage. Moreover, Taluc shares a conserved sequence block with other unrelated SINEs, as observed for some vertebrate and cephalopod elements. The study of genomic environment showed that insertions are mainly surrounded by microsatellites and other SINEs, indicating a biased accumulation within non-coding regions. The evolutionary dynamics of Talu~ elements is explained through selective mechanisms acting in an inbred genome; in this respect, the study of termites' SINEs activity may provide an interesting framework to address the (co)evolution of mobile elements and the host genome.

  10. ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements.

    PubMed

    Taylor, James; Tyekucheva, Svitlana; King, David C; Hardison, Ross C; Miller, Webb; Chiaromonte, Francesca

    2006-12-01

    Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).

  11. The chaperone-like activity of the hepatitis C virus IRES and CRE elements regulates genome dimerization.

    PubMed

    Romero-López, Cristina; Barroso-delJesus, Alicia; Berzal-Herranz, Alfredo

    2017-02-24

    The RNA genome of the hepatitis C virus (HCV) establishes a network of long-distance RNA-RNA interactions that direct the progression of the infective cycle. This work shows that the dimerization of the viral genome, which is initiated at the dimer linkage sequence (DLS) within the 3'UTR, is promoted by the CRE region, while the IRES is a negative regulatory partner. Using differential 2'-acylation probing (SHAPE-dif) and molecular interference (HMX) technologies, the CRE activity was found to mainly lie in the critical 5BSL3.2 domain, while the IRES-mediated effect is dependent upon conserved residues within the essential structural elements JIIIabc, JIIIef and PK2. These findings support the idea that, along with the DLS motif, the IRES and CRE are needed to control HCV genome dimerization. They also provide evidences of a novel function for these elements as chaperone-like partners that fine-tune the architecture of distant RNA domains within the HCV genome.

  12. The chaperone-like activity of the hepatitis C virus IRES and CRE elements regulates genome dimerization

    PubMed Central

    Romero-López, Cristina; Barroso-delJesus, Alicia; Berzal-Herranz, Alfredo

    2017-01-01

    The RNA genome of the hepatitis C virus (HCV) establishes a network of long-distance RNA-RNA interactions that direct the progression of the infective cycle. This work shows that the dimerization of the viral genome, which is initiated at the dimer linkage sequence (DLS) within the 3′UTR, is promoted by the CRE region, while the IRES is a negative regulatory partner. Using differential 2′-acylation probing (SHAPE-dif) and molecular interference (HMX) technologies, the CRE activity was found to mainly lie in the critical 5BSL3.2 domain, while the IRES-mediated effect is dependent upon conserved residues within the essential structural elements JIIIabc, JIIIef and PK2. These findings support the idea that, along with the DLS motif, the IRES and CRE are needed to control HCV genome dimerization. They also provide evidences of a novel function for these elements as chaperone-like partners that fine-tune the architecture of distant RNA domains within the HCV genome. PMID:28233845

  13. RNA connectivity requirements between conserved elements in the core of the yeast telomerase RNP

    PubMed Central

    Mefford, Melissa A; Rafiq, Qundeel; Zappulla, David C

    2013-01-01

    Telomerase is a specialized chromosome end-replicating enzyme required for genome duplication in many eukaryotes. An RNA and reverse transcriptase protein subunit comprise its enzymatic core. Telomerase is evolving rapidly, particularly its RNA component. Nevertheless, nearly all telomerase RNAs, including those of H. sapiens and S. cerevisiae, share four conserved structural elements: a core-enclosing helix (CEH), template-boundary element, template, and pseudoknot, in this order along the RNA. It is not clear how these elements coordinate telomerase activity. We find that although rearranging the order of the four conserved elements in the yeast telomerase RNA subunit, TLC1, disrupts activity, the RNA ends can be moved between the template and pseudoknot in vitro and in vivo. However, the ends disrupt activity when inserted between the other structured elements, defining an Area of Required Connectivity (ARC). Within the ARC, we find that only the junction nucleotides between the pseudoknot and CEH are essential. Integrating all of our findings provides a basic map of functional connections in the core of the yeast telomerase RNP and a framework to understand conserved element coordination in telomerase mechanism. PMID:24129512

  14. Divergent genome evolution caused by regional variation in DNA gain and loss between human and mouse

    PubMed Central

    Kortschak, R. Daniel

    2018-01-01

    The forces driving the accumulation and removal of non-coding DNA and ultimately the evolution of genome size in complex organisms are intimately linked to genome structure and organisation. Our analysis provides a novel method for capturing the regional variation of lineage-specific DNA gain and loss events in their respective genomic contexts. To further understand this connection we used comparative genomics to identify genome-wide individual DNA gain and loss events in the human and mouse genomes. Focusing on the distribution of DNA gains and losses, relationships to important structural features and potential impact on biological processes, we found that in autosomes, DNA gains and losses both followed separate lineage-specific accumulation patterns. However, in both species chromosome X was particularly enriched for DNA gain, consistent with its high L1 retrotransposon content required for X inactivation. We found that DNA loss was associated with gene-rich open chromatin regions and DNA gain events with gene-poor closed chromatin regions. Additionally, we found that DNA loss events tended to be smaller than DNA gain events suggesting that they were able to accumulate in gene-rich open chromatin regions due to their reduced capacity to interrupt gene regulatory architecture. GO term enrichment showed that mouse loss hotspots were strongly enriched for terms related to developmental processes. However, these genes were also located in regions with a high density of conserved elements, suggesting that despite high levels of DNA loss, gene regulatory architecture remained conserved. This is consistent with a model in which DNA gain and loss results in turnover or “churning” in regulatory element dense regions of open chromatin, where interruption of regulatory elements is selected against. PMID:29677183

  15. Conserved gene clusters in bacterial genomes provide further support for the primacy of RNA

    NASA Technical Reports Server (NTRS)

    Siefert, J. L.; Martin, K. A.; Abdi, F.; Widger, W. R.; Fox, G. E.

    1997-01-01

    Five complete bacterial genome sequences have been released to the scientific community. These include four (eu)Bacteria, Haemophilus influenzae, Mycoplasma genitalium, M. pneumoniae, and Synechocystis PCC 6803, as well as one Archaeon, Methanococcus jannaschii. Features of organization shared by these genomes are likely to have arisen very early in the history of the bacteria and thus can be expected to provide further insight into the nature of early ancestors. Results of a genome comparison of these five organisms confirm earlier observations that gene order is remarkably unpreserved. There are, nevertheless, at least 16 clusters of two or more genes whose order remains the same among the four (eu)Bacteria and these are presumed to reflect conserved elements of coordinated gene expression that require gene proximity. Eight of these gene orders are essentially conserved in the Archaea as well. Many of these clusters are known to be regulated by RNA-level mechanisms in Escherichia coli, which supports the earlier suggestion that this type of regulation of gene expression may have arisen very early. We conclude that although the last common ancestor may have had a DNA genome, it likely was preceded by progenotes with an RNA genome.

  16. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes

    PubMed Central

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-01-01

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution. PMID:25523484

  17. Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes.

    PubMed

    Huang, Shengfeng; Chen, Zelin; Yan, Xinyu; Yu, Ting; Huang, Guangrui; Yan, Qingyu; Pontarotti, Pierre Antoine; Zhao, Hongchen; Li, Jie; Yang, Ping; Wang, Ruihua; Li, Rui; Tao, Xin; Deng, Ting; Wang, Yiquan; Li, Guang; Zhang, Qiujin; Zhou, Sisi; You, Leiming; Yuan, Shaochun; Fu, Yonggui; Wu, Fenfang; Dong, Meiling; Chen, Shangwu; Xu, Anlong

    2014-12-19

    Vertebrates diverged from other chordates ~500 Myr ago and experienced successful innovations and adaptations, but the genomic basis underlying vertebrate origins are not fully understood. Here we suggest, through comparison with multiple lancelet (amphioxus) genomes, that ancient vertebrates experienced high rates of protein evolution, genome rearrangement and domain shuffling and that these rates greatly slowed down after the divergence of jawed and jawless vertebrates. Compared with lancelets, modern vertebrates retain, at least relatively, less protein diversity, fewer nucleotide polymorphisms, domain combinations and conserved non-coding elements (CNE). Modern vertebrates also lost substantial transposable element (TE) diversity, whereas lancelets preserve high TE diversity that includes even the long-sought RAG transposon. Lancelets also exhibit rapid gene turnover, pervasive transcription, fastest exon shuffling in metazoans and substantial TE methylation not observed in other invertebrates. These new lancelet genome sequences provide new insights into the chordate ancestral state and the vertebrate evolution.

  18. Reexamining the P-Element Invasion of Drosophila melanogaster Through the Lens of piRNA Silencing

    PubMed Central

    Kelleher, Erin S.

    2016-01-01

    Transposable elements (TEs) are both important drivers of genome evolution and genetic parasites with potentially dramatic consequences for host fitness. The recent explosion of research on regulatory RNAs reveals that small RNA-mediated silencing is a conserved genetic mechanism through which hosts repress TE activity. The invasion of the Drosophila melanogaster genome by P elements, which happened on a historical timescale, represents an incomparable opportunity to understand how small RNA-mediated silencing of TEs evolves. Repression of P-element transposition emerged almost concurrently with its invasion. Recent studies suggest that this repression is implemented in part, and perhaps predominantly, by the Piwi-interacting RNA (piRNA) pathway, a small RNA-mediated silencing pathway that regulates TE activity in many metazoan germlines. In this review, I consider the P-element invasion from both a molecular and evolutionary genetic perspective, reconciling classic studies of P-element regulation with the new mechanistic framework provided by the piRNA pathway. I further explore the utility of the P-element invasion as an exemplar of the evolution of piRNA-mediated silencing. In light of the highly-conserved role for piRNAs in regulating TEs, discoveries from this system have taxonomically broad implications for the evolution of repression. PMID:27516614

  19. Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens

    PubMed Central

    Glinsky, Gennadi V.

    2016-01-01

    Abstract Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8–10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements. PMID:27503290

  20. Cryptic glucocorticoid receptor-binding sites pervade genomic NF-κB response elements.

    PubMed

    Hudson, William H; Vera, Ian Mitchelle S de; Nwachukwu, Jerome C; Weikum, Emily R; Herbst, Austin G; Yang, Qin; Bain, David L; Nettles, Kendall W; Kojetin, Douglas J; Ortlund, Eric A

    2018-04-06

    Glucocorticoids (GCs) are potent repressors of NF-κB activity, making them a preferred choice for treatment of inflammation-driven conditions. Despite the widespread use of GCs in the clinic, current models are inadequate to explain the role of the glucocorticoid receptor (GR) within this critical signaling pathway. GR binding directly to NF-κB itself-tethering in a DNA binding-independent manner-represents the standing model of how GCs inhibit NF-κB-driven transcription. We demonstrate that direct binding of GR to genomic NF-κB response elements (κBREs) mediates GR-driven repression of inflammatory gene expression. We report five crystal structures and solution NMR data of GR DBD-κBRE complexes, which reveal that GR recognizes a cryptic response element between the binding footprints of NF-κB subunits within κBREs. These cryptic sequences exhibit high sequence and functional conservation, suggesting that GR binding to κBREs is an evolutionarily conserved mechanism of controlling the inflammatory response.

  1. Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty

    PubMed Central

    Longo, Mark S; Carone, Dawn M; Green, Eric D; O'Neill, Michael J; O'Neill, Rachel J

    2009-01-01

    Background Large-scale genome rearrangements brought about by chromosome breaks underlie numerous inherited diseases, initiate or promote many cancers and are also associated with karyotype diversification during species evolution. Recent research has shown that these breakpoints are nonrandomly distributed throughout the mammalian genome and many, termed "evolutionary breakpoints" (EB), are specific genomic locations that are "reused" during karyotypic evolution. When the phylogenetic trajectory of orthologous chromosome segments is considered, many of these EB are coincident with ancient centromere activity as well as new centromere formation. While EB have been characterized as repeat-rich regions, it has not been determined whether specific sequences have been retained during evolution that would indicate previous centromere activity or a propensity for new centromere formation. Likewise, the conservation of specific sequence motifs or classes at EBs among divergent mammalian taxa has not been determined. Results To define conserved sequence features of EBs associated with centromere evolution, we performed comparative sequence analysis of more than 4.8 Mb within the tammar wallaby, Macropus eugenii, derived from centromeric regions (CEN), euchromatic regions (EU), and an evolutionary breakpoint (EB) that has undergone convergent breakpoint reuse and past centromere activity in marsupials. We found a dramatic enrichment for long interspersed nucleotide elements (LINE1s) and endogenous retroviruses (ERVs) and a depletion of short interspersed nucleotide elements (SINEs) shared between CEN and EBs. We analyzed the orthologous human EB (14q32.33), known to be associated with translocations in many cancers including multiple myelomas and plasma cell leukemias, and found a conserved distribution of similar repetitive elements. Conclusion Our data indicate that EBs tracked within the class Mammalia harbor sequence features retained since the divergence of marsupials and eutherians that may have predisposed these genomic regions to large-scale chromosomal instability. PMID:19630942

  2. Comparative genomics using microarrays reveals divergence and loss of virulence-associated genes in host-specific strains of the insect pathogen Metarhizium anisopliae.

    PubMed

    Wang, Sibao; Leclerque, Andreas; Pava-Ripoll, Monica; Fang, Weiguo; St Leger, Raymond J

    2009-06-01

    Many strains of Metarhizium anisopliae have broad host ranges, but others are specialists and adapted to particular hosts. Patterns of gene duplication, divergence, and deletion in three generalist and three specialist strains were investigated by heterologous hybridization of genomic DNA to genes from the generalist strain Ma2575. As expected, major life processes are highly conserved, presumably due to purifying selection. However, up to 7% of Ma2575 genes were highly divergent or absent in specialist strains. Many of these sequences are conserved in other fungal species, suggesting that there has been rapid evolution and loss in specialist Metarhizium genomes. Some poorly hybridizing genes in specialists were functionally coordinated, indicative of reductive evolution. These included several involved in toxin biosynthesis and sugar metabolism in root exudates, suggesting that specialists are losing genes required to live in alternative hosts or as saprophytes. Several components of mobile genetic elements were also highly divergent or lost in specialists. Exceptionally, the genome of the specialist cricket pathogen Ma443 contained extra insertion elements that might play a role in generating evolutionary novelty. This study throws light on the abundance of orphans in genomes, as 15% of orphan sequences were found to be rapidly evolving in the Ma2575 lineage.

  3. The LINEs and SINEs of Entamoeba histolytica: comparative analysis and genomic distribution.

    PubMed

    Bakre, Abhijeet A; Rawal, Kamal; Ramaswamy, Ram; Bhattacharya, Alok; Bhattacharya, Sudha

    2005-07-01

    Autonomous non-long terminal repeat retrotransposons are commonly referred to as long interspersed elements (LINEs). Short non-autonomous elements that borrow the LINE machinery are called SINES. The Entamoeba histolytica genome contains three classes of LINEs and SINEs. Together the EhLINEs/SINEs account for about 6% of the genome. The recognizable functional domains in all three EhLINEs included reverse transcriptase and endonuclease. A novel feature was the presence of two types of members-some with a single long ORF (less frequent) and some with two ORFs (more frequent) in both EhLINE1 and 2. The two ORFs were generated by conserved changes leading to stop codon. Computational analysis of the immediate flanking sequences for each element showed that they inserted in AT-rich sequences, with a preponderance of Ts in the upstream site. The elements were very frequently located close to protein-coding genes and other EhLINEs/SINEs. The possible influence of these elements on expression of neighboring genes needs to be determined.

  4. Molecular and phylogenetic characterization of the sieve element occlusion gene family in Fabaceae and non-Fabaceae plants.

    PubMed

    Rüping, Boris; Ernst, Antonia M; Jekat, Stephan B; Nordzieke, Steffen; Reineke, Anna R; Müller, Boje; Bornberg-Bauer, Erich; Prüfer, Dirk; Noll, Gundula A

    2010-10-08

    The phloem of dicotyledonous plants contains specialized P-proteins (phloem proteins) that accumulate during sieve element differentiation and remain parietally associated with the cisternae of the endoplasmic reticulum in mature sieve elements. Wounding causes P-protein filaments to accumulate at the sieve plates and block the translocation of photosynthate. Specialized, spindle-shaped P-proteins known as forisomes that undergo reversible calcium-dependent conformational changes have evolved exclusively in the Fabaceae. Recently, the molecular characterization of three genes encoding forisome components in the model legume Medicago truncatula (MtSEO1, MtSEO2 and MtSEO3; SEO = sieve element occlusion) was reported, but little is known about the molecular characteristics of P-proteins in non-Fabaceae. We performed a comprehensive genome-wide comparative analysis by screening the M. truncatula, Glycine max, Arabidopsis thaliana, Vitis vinifera and Solanum phureja genomes, and a Malus domestica EST library for homologs of MtSEO1, MtSEO2 and MtSEO3 and identified numerous novel SEO genes in Fabaceae and even non-Fabaceae plants, which do not possess forisomes. Even in Fabaceae some SEO genes appear to not encode forisome components. All SEO genes have a similar exon-intron structure and are expressed predominantly in the phloem. Phylogenetic analysis revealed the presence of several subgroups with Fabaceae-specific subgroups containing all of the known as well as newly identified forisome component proteins. We constructed Hidden Markov Models that identified three conserved protein domains, which characterize SEO proteins when present in combination. In addition, one common and three subgroup specific protein motifs were found in the amino acid sequences of SEO proteins. SEO genes are organized in genomic clusters and the conserved synteny allowed us to identify several M. truncatula vs G. max orthologs as well as paralogs within the G. max genome. The unexpected occurrence of forisome-like genes in non-Fabaceae plants may indicate that these proteins encode species-specific P-proteins, which is backed up by the phloem-specific expression profiles. The conservation of gene structure, the presence of specific motifs and domains and the genomic synteny argue for a common phylogenetic origin of forisomes and other P-proteins.

  5. Comparative Mitogenomics of the Assassin Bug Genus Peirates (Hemiptera: Reduviidae: Peiratinae) Reveal Conserved Mitochondrial Genome Organization of P. atromaculatus, P. fulvescens and P. turpis

    PubMed Central

    Zhao, Guangyu; Li, Hu; Zhao, Ping; Cai, Wanzhi

    2015-01-01

    In this study, we sequenced four new mitochondrial genomes and presented comparative mitogenomic analyses of five species in the genus Peirates (Hemiptera: Reduviidae). Mitochondrial genomes of these five assassin bugs had a typical set of 37 genes and retained the ancestral gene arrangement of insects. The A+T content, AT- and GC-skews were similar to the common base composition biases of insect mtDNA. Genomic size ranges from 15,702 bp to 16,314 bp and most of the size variation was due to length and copy number of the repeat unit in the putative control region. All of the control region sequences included large tandem repeats present in two or more copies. Our result revealed similarity in mitochondrial genomes of P. atromaculatus, P. fulvescens and P. turpis, as well as the highly conserved genomic-level characteristics of these three species, e.g., the same start and stop codons of protein-coding genes, conserved secondary structure of tRNAs, identical location and length of non-coding and overlapping regions, and conservation of structural elements and tandem repeat unit in control region. Phylogenetic analyses also supported a close relationship between P. atromaculatus, P. fulvescens and P. turpis, which might be recently diverged species. The present study indicates that mitochondrial genome has important implications on phylogenetics, population genetics and speciation in the genus Peirates. PMID:25689825

  6. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution.

    PubMed

    2004-12-09

    We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.

  7. FA-SAT Is an Old Satellite DNA Frozen in Several Bilateria Genomes

    PubMed Central

    Chaves, Raquel; Ferreira, Daniela; Mendes-da-Silva, Ana; Meles, Susana; Adega, Filomena

    2017-01-01

    Abstract In recent years, a growing body of evidence has recognized the tandem repeat sequences, and specifically satellite DNA, as a functional class of sequences in the genomic “dark matter.” Using an original, complementary, and thus an eclectic experimental design, we show that the cat archetypal satellite DNA sequence, FA-SAT, is “frozen” conservatively in several Bilateria genomes. We found different genomic FA-SAT architectures, and the interspersion pattern was conserved. In Carnivora genomes, the FA-SAT-related sequences are also amplified, with the predominance of a specific FA-SAT variant, at the heterochromatic regions. We inspected the cat genome project to locate FA-SAT array flanking regions and revealed an intensive intermingling with transposable elements. Our results also show that FA-SAT-related sequences are transcribed and that the most abundant FA-SAT variant is not always the most transcribed. We thus conclude that the DNA sequences of FA-SAT and their transcripts are “frozen” in these genomes. Future work is needed to disclose any putative function that these sequences may play in these genomes. PMID:29608678

  8. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations.

    PubMed

    Fuentes-Pardo, Angela P; Ruzzante, Daniel E

    2017-10-01

    Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology. © 2017 John Wiley & Sons Ltd.

  9. Disrupted auto-regulation of the spliceosomal gene SNRPB causes cerebro–costo–mandibular syndrome

    PubMed Central

    Lynch, Danielle C.; Revil, Timothée; Schwartzentruber, Jeremy; Bhoj, Elizabeth J.; Innes, A. Micheil; Lamont, Ryan E.; Lemire, Edmond G.; Chodirker, Bernard N.; Taylor, Juliet P.; Zackai, Elaine H.; McLeod, D. Ross; Kirk, Edwin P.; Hoover-Fong, Julie; Fleming, Leah; Savarirayan, Ravi; Boycott, Kym; MacKenzie, Alex; Brudno, Michael; Bulman, Dennis; Dyment, David; Majewski, Jacek; Jerome-Majewska, Loydie A.; Parboosingh, Jillian S.; Bernier, Francois P.

    2014-01-01

    Elucidating the function of highly conserved regulatory sequences is a significant challenge in genomics today. Certain intragenic highly conserved elements have been associated with regulating levels of core components of the spliceosome and alternative splicing of downstream genes. Here we identify mutations in one such element, a regulatory alternative exon of SNRPB as the cause of cerebro–costo–mandibular syndrome. This exon contains a premature termination codon that triggers nonsense-mediated mRNA decay when included in the transcript. These mutations cause increased inclusion of the alternative exon and decreased overall expression of SNRPB. We provide evidence for the functional importance of this conserved intragenic element in the regulation of alternative splicing and development, and suggest that the evolution of such a regulatory mechanism has contributed to the complexity of mammalian development. PMID:25047197

  10. Disrupted auto-regulation of the spliceosomal gene SNRPB causes cerebro-costo-mandibular syndrome.

    PubMed

    Lynch, Danielle C; Revil, Timothée; Schwartzentruber, Jeremy; Bhoj, Elizabeth J; Innes, A Micheil; Lamont, Ryan E; Lemire, Edmond G; Chodirker, Bernard N; Taylor, Juliet P; Zackai, Elaine H; McLeod, D Ross; Kirk, Edwin P; Hoover-Fong, Julie; Fleming, Leah; Savarirayan, Ravi; Majewski, Jacek; Jerome-Majewska, Loydie A; Parboosingh, Jillian S; Bernier, Francois P

    2014-07-22

    Elucidating the function of highly conserved regulatory sequences is a significant challenge in genomics today. Certain intragenic highly conserved elements have been associated with regulating levels of core components of the spliceosome and alternative splicing of downstream genes. Here we identify mutations in one such element, a regulatory alternative exon of SNRPB as the cause of cerebro-costo-mandibular syndrome. This exon contains a premature termination codon that triggers nonsense-mediated mRNA decay when included in the transcript. These mutations cause increased inclusion of the alternative exon and decreased overall expression of SNRPB. We provide evidence for the functional importance of this conserved intragenic element in the regulation of alternative splicing and development, and suggest that the evolution of such a regulatory mechanism has contributed to the complexity of mammalian development.

  11. The clc Element of Pseudomonas sp. Strain B13, a Genomic Island with Various Catabolic Properties

    PubMed Central

    Gaillard, Muriel; Vallaeys, Tatiana; Vorhölter, Frank Jörg; Minoia, Marco; Werlen, Christoph; Sentchilo, Vladimir; Pühler, Alfred; van der Meer, Jan Roelof

    2006-01-01

    Pseudomonas sp. strain B13 is a bacterium known to degrade chloroaromatic compounds. The properties to use 3- and 4-chlorocatechol are determined by a self-transferable DNA element, the clc element, which normally resides at two locations in the cell's chromosome. Here we report the complete nucleotide sequence of the clc element, demonstrating the unique catabolic properties while showing its relatedness to genomic islands and integrative and conjugative elements rather than to other known catabolic plasmids. As far as catabolic functions, the clc element harbored, in addition to the genes for chlorocatechol degradation, a complete functional operon for 2-aminophenol degradation and genes for a putative aromatic compound transport protein and for a multicomponent aromatic ring dioxygenase similar to anthranilate hydroxylase. The genes for catabolic functions were inducible under various conditions, suggesting a network of catabolic pathway induction. For about half of the open reading frames (ORFs) on the clc element, no clear functional prediction could be given, although some indications were found for functions that were similar to plasmid conjugation. The region in which these ORFs were situated displayed a high overall conservation of nucleotide sequence and gene order to genomic regions in other recently completed bacterial genomes or to other genomic islands. Most notably, except for two discrete regions, the clc element was almost 100% identical over the whole length to a chromosomal region in Burkholderia xenovorans LB400. This indicates the dynamic evolution of this type of element and the continued transition between elements with a more pathogenic character and those with catabolic properties. PMID:16484212

  12. Evidence of pervasive biologically functional secondary structures within the genomes of eukaryotic single-stranded DNA viruses.

    PubMed

    Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y F; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie; Martin, Darren Patrick

    2014-02-01

    Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here.

  13. Evidence of Pervasive Biologically Functional Secondary Structures within the Genomes of Eukaryotic Single-Stranded DNA Viruses

    PubMed Central

    Muhire, Brejnev Muhizi; Golden, Michael; Murrell, Ben; Lefeuvre, Pierre; Lett, Jean-Michel; Gray, Alistair; Poon, Art Y. F.; Ngandu, Nobubelo Kwanele; Semegni, Yves; Tanov, Emil Pavlov; Monjane, Adérito Luis; Harkins, Gordon William; Varsani, Arvind; Shepherd, Dionne Natalie

    2014-01-01

    Single-stranded DNA (ssDNA) viruses have genomes that are potentially capable of forming complex secondary structures through Watson-Crick base pairing between their constituent nucleotides. A few of the structural elements formed by such base pairings are, in fact, known to have important functions during the replication of many ssDNA viruses. Unknown, however, are (i) whether numerous additional ssDNA virus genomic structural elements predicted to exist by computational DNA folding methods actually exist and (ii) whether those structures that do exist have any biological relevance. We therefore computationally inferred lists of the most evolutionarily conserved structures within a diverse selection of animal- and plant-infecting ssDNA viruses drawn from the families Circoviridae, Anelloviridae, Parvoviridae, Nanoviridae, and Geminiviridae and analyzed these for evidence of natural selection favoring the maintenance of these structures. While we find evidence that is consistent with purifying selection being stronger at nucleotide sites that are predicted to be base paired than at sites predicted to be unpaired, we also find strong associations between sites that are predicted to pair with one another and site pairs that are apparently coevolving in a complementary fashion. Collectively, these results indicate that natural selection actively preserves much of the pervasive secondary structure that is evident within eukaryote-infecting ssDNA virus genomes and, therefore, that much of this structure is biologically functional. Lastly, we provide examples of various highly conserved but completely uncharacterized structural elements that likely have important functions within some of the ssDNA virus genomes analyzed here. PMID:24284329

  14. Transposable Elements versus the Fungal Genome: Impact on Whole-Genome Architecture and Transcriptional Profiles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Castanera, Raul; Lopez-Varas, Leticia; Borgognone, Alessandra

    Transposable elements (TEs) are exceptional contributors to eukaryotic genome diversity. Their ubiquitous presence impacts the genomes of nearly all species and mediates genome evolution by causing mutations and chromosomal rearrangements and by modulating gene expression. We performed an exhaustive analysis of the TE content in 18 fungal genomes, including strains of the same species and species of the same genera. Our results depicted a scenario of exceptional variability, with species having 0.02 to 29.8% of their genome consisting of transposable elements. A detailed analysis performed on two strains of Pleurotus ostreatus uncovered a genome that is populated mainly by Classmore » I elements, especially LTR-retrotransposons amplified in recent bursts from 0 to 2 million years (My) ago. The preferential accumulation of TEs in clusters led to the presence of genomic regions that lacked intra- and inter-specific conservation. In addition, we investigated the effect of TE insertions on the expression of their nearby upstream and downstream genes. Our results showed that an important number of genes under TE influence are significantly repressed, with stronger repression when genes are localized within transposon clusters. Our transcriptional analysis performed in four additional fungal models revealed that this TE-mediated silencing was present only in species with active cytosine methylation machinery. We hypothesize that this phenomenon is related to epigenetic defense mechanisms that are aimed to suppress TE expression and control their proliferation.« less

  15. Transposable Elements versus the Fungal Genome: Impact on Whole-Genome Architecture and Transcriptional Profiles

    DOE PAGES

    Castanera, Raul; Lopez-Varas, Leticia; Borgognone, Alessandra; ...

    2016-06-13

    Transposable elements (TEs) are exceptional contributors to eukaryotic genome diversity. Their ubiquitous presence impacts the genomes of nearly all species and mediates genome evolution by causing mutations and chromosomal rearrangements and by modulating gene expression. We performed an exhaustive analysis of the TE content in 18 fungal genomes, including strains of the same species and species of the same genera. Our results depicted a scenario of exceptional variability, with species having 0.02 to 29.8% of their genome consisting of transposable elements. A detailed analysis performed on two strains of Pleurotus ostreatus uncovered a genome that is populated mainly by Classmore » I elements, especially LTR-retrotransposons amplified in recent bursts from 0 to 2 million years (My) ago. The preferential accumulation of TEs in clusters led to the presence of genomic regions that lacked intra- and inter-specific conservation. In addition, we investigated the effect of TE insertions on the expression of their nearby upstream and downstream genes. Our results showed that an important number of genes under TE influence are significantly repressed, with stronger repression when genes are localized within transposon clusters. Our transcriptional analysis performed in four additional fungal models revealed that this TE-mediated silencing was present only in species with active cytosine methylation machinery. We hypothesize that this phenomenon is related to epigenetic defense mechanisms that are aimed to suppress TE expression and control their proliferation.« less

  16. [Comparative analysis of clustered regularly interspaced short palindromic repeats (CRISPRs) loci in the genomes of halophilic archaea].

    PubMed

    Zhang, Fan; Zhang, Bing; Xiang, Hua; Hu, Songnian

    2009-11-01

    Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a widespread system that provides acquired resistance against phages in bacteria and archaea. Here we aim to genome-widely analyze the CRISPR in extreme halophilic archaea, of which the whole genome sequences are available at present time. We used bioinformatics methods including alignment, conservation analysis, GC content and RNA structure prediction to analyze the CRISPR structures of 7 haloarchaeal genomes. We identified the CRISPR structures in 5 halophilic archaea and revealed a conserved palindromic motif in the flanking regions of these CRISPR structures. In addition, we found that the repeat sequences of large CRISPR structures in halophilic archaea were greatly conserved, and two types of predicted RNA secondary structures derived from the repeat sequences were likely determined by the fourth base of the repeat sequence. Our results support the proposal that the leader sequence may function as recognition site by having palindromic structures in flanking regions, and the stem-loop secondary structure formed by repeat sequences may function in mediating the interaction between foreign genetic elements and CAS-encoded proteins.

  17. A Locus Encoding Variable Defense Systems against Invading DNA Identified in Streptococcus suis

    PubMed Central

    Okura, Masatoshi; Nozawa, Takashi; Watanabe, Takayasu; Murase, Kazunori; Nakagawa, Ichiro; Takamatsu, Daisuke; Osaki, Makoto; Sekizaki, Tsutomu; Gottschalk, Marcelo; Hamada, Shigeyuki

    2017-01-01

    Streptococcus suis, an important zoonotic pathogen, is known to have an open pan-genome and to develop a competent state. In S. suis, limited genetic lineages are suggested to be associated with zoonosis. However, little is known about the evolution of diversified lineages and their respective phenotypic or ecological characteristics. In this study, we performed comparative genome analyses of S. suis, with a focus on the competence genes, mobile genetic elements, and genetic elements related to various defense systems against exogenous DNAs (defense elements) that are associated with gene gain/loss/exchange mediated by horizontal DNA movements and their restrictions. Our genome analyses revealed a conserved competence-inducing peptide type (pherotype) of the competence system and large-scale genome rearrangements in certain clusters based on the genome phylogeny of 58 S. suis strains. Moreover, the profiles of the defense elements were similar or identical to each other among the strains belonging to the same genomic clusters. Our findings suggest that these genetic characteristics of each cluster might exert specific effects on the phenotypic or ecological differences between the clusters. We also found certain loci that shift several types of defense elements in S. suis. Of note, one of these loci is a previously unrecognized variable region in bacteria, at which strains of distinct clusters code for different and various defense elements. This locus might represent a novel defense mechanism that has evolved through an arms race between bacteria and invading DNAs, mediated by mobile genetic elements and genetic competence. PMID:28379509

  18. Genome analysis of the platypus reveals unique signatures of evolution.

    PubMed

    Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M; Sharp, Julie A; Nicholas, Kevin R; Ray, David A; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H; Taylor, James; Jones, Russell C; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N; Pohl, Craig S; Smith, Scott M; Hou, Shunfeng; Nefedov, Mikhail; de Jong, Pieter J; Renfree, Marilyn B; Mardis, Elaine R; Wilson, Richard K

    2008-05-08

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.

  19. Genome analysis of the platypus reveals unique signatures of evolution

    PubMed Central

    Warren, Wesley C.; Hillier, LaDeana W.; Marshall Graves, Jennifer A.; Birney, Ewan; Ponting, Chris P.; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T.; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P.; Miethke, Pat; Waters, Paul D.; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S.; López-Otín, Carlos; Ordóñez, Gonzalo R.; Eichler, Evan E.; Chen, Lin; Cheng, Ze; Deakin, Janine E.; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T.; Wakefield, Matthew J.; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A.; Smit, Arian F. A.; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A.; Walker, Jerilyn A.; Konkel, Miriam K.; Harris, Robert S.; Whittington, Camilla M.; Wong, Emily S. W.; Gemmell, Neil J.; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M.; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja; Churakov, Gennady; Kriegs, Jan Ole; Brosius, Juergen; Murchison, Elizabeth P.; Sachidanandam, Ravi; Smith, Carly; Hannon, Gregory J.; Tsend-Ayush, Enkhjargal; McMillan, Daniel; Attenborough, Rosalind; Rens, Willem; Ferguson-Smith, Malcolm; Lefèvre, Christophe M.; Sharp, Julie A.; Nicholas, Kevin R.; Ray, David A.; Kube, Michael; Reinhardt, Richard; Pringle, Thomas H.; Taylor, James; Jones, Russell C.; Nixon, Brett; Dacheux, Jean-Louis; Niwa, Hitoshi; Sekita, Yoko; Huang, Xiaoqiu; Stark, Alexander; Kheradpour, Pouya; Kellis, Manolis; Flicek, Paul; Chen, Yuan; Webber, Caleb; Hardison, Ross; Nelson, Joanne; Hallsworth-Pepin, Kym; Delehaunty, Kim; Markovic, Chris; Minx, Pat; Feng, Yucheng; Kremitzki, Colin; Mitreva, Makedonka; Glasscock, Jarret; Wylie, Todd; Wohldmann, Patricia; Thiru, Prathapan; Nhan, Michael N.; Pohl, Craig S.; Smith, Scott M.; Hou, Shunfeng; Renfree, Marilyn B.; Mardis, Elaine R.; Wilson, Richard K.

    2009-01-01

    We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation. PMID:18464734

  20. Genomic Identification and Analysis of Shared Cis-regulator Elements in a Developmentally Critical homeobox Cluster

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chris Amemiya

    2003-04-01

    The goals of this project were to isolate, characterize, and sequence the Dlx3/Dlx7 bigene cluster from twelve different species of mammals. The Dlx3 and Dlx7 genes are known to encode homeobox transcription factors involved in patterning of structures in the vertebrate jaw as well as vertebrate limbs. Genomic sequences from the respective taxa will subsequently be compared in order to identify conserved non-coding sequences that are potential cis-regulatory elements. Based on the comparisons they will fashion transgenic mouse experiments to functionally test the strength of the potential cis-regulatory elements. A goal of the project is to attempt to identify thosemore » elements that may function in coordinately regulating both Dlx3 and Dlx7 functions.« less

  1. Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development

    PubMed Central

    Mahmoudi Saber, Morteza

    2017-01-01

    Abstract Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes. PMID:28633494

  2. Cis-regulatory element based targeted gene finding: genome-wide identification of abscisic acid- and abiotic stress-responsive genes in Arabidopsis thaliana.

    PubMed

    Zhang, Weixiong; Ruan, Jianhua; Ho, Tuan-Hua David; You, Youngsook; Yu, Taotao; Quatrano, Ralph S

    2005-07-15

    A fundamental problem of computational genomics is identifying the genes that respond to certain endogenous cues and environmental stimuli. This problem can be referred to as targeted gene finding. Since gene regulation is mainly determined by the binding of transcription factors and cis-regulatory DNA sequences, most existing gene annotation methods, which exploit the conservation of open reading frames, are not effective in finding target genes. A viable approach to targeted gene finding is to exploit the cis-regulatory elements that are known to be responsible for the transcription of target genes. Given such cis-elements, putative target genes whose promoters contain the elements can be identified. As a case study, we apply the above approach to predict the genes in model plant Arabidopsis thaliana which are inducible by a phytohormone, abscisic acid (ABA), and abiotic stress, such as drought, cold and salinity. We first construct and analyze two ABA specific cis-elements, ABA-responsive element (ABRE) and its coupling element (CE), in A.thaliana, based on their conservation in rice and other cereal plants. We then use the ABRE-CE module to identify putative ABA-responsive genes in A.thaliana. Based on RT-PCR verification and the results from literature, this method has an accuracy rate of 67.5% for the top 40 predictions. The cis-element based targeted gene finding approach is expected to be widely applicable since a large number of cis-elements in many species are available.

  3. Accelerated Evolution in Distinctive Species Reveals Candidate Elements for Clinically Relevant Traits, Including Mutation and Cancer Resistance.

    PubMed

    Ferris, Elliott; Abegglen, Lisa M; Schiffman, Joshua D; Gregg, Christopher

    2018-03-06

    The identity of most functional elements in the mammalian genome and the phenotypes they impact are unclear. Here, we perform a genome-wide comparative analysis of patterns of accelerated evolution in species with highly distinctive traits to discover candidate functional elements for clinically important phenotypes. We identify accelerated regions (ARs) in the elephant, hibernating bat, orca, dolphin, naked mole rat, and thirteen-lined ground squirrel lineages in mammalian conserved regions, uncovering ∼33,000 elements that bind hundreds of different regulatory proteins in humans and mice. ARs in the elephant, the largest land mammal, are uniquely enriched near elephant DNA damage response genes. The genomic hotspot for elephant ARs is the E3 ligase subunit of the Fanconi anemia complex, a master regulator of DNA repair. Additionally, ARs in the six species are associated with specific human clinical phenotypes that have apparent concordance with overt traits in each species. Copyright © 2018 The Author(s). Published by Elsevier Inc. All rights reserved.

  4. RNA expression in a cartilaginous fish cell line reveals ancient 3′ noncoding regions highly conserved in vertebrates

    PubMed Central

    Forest, David; Nishikawa, Ryuhei; Kobayashi, Hiroshi; Parton, Angela; Bayne, Christopher J.; Barnes, David W.

    2007-01-01

    We have established a cartilaginous fish cell line [Squalus acanthias embryo cell line (SAE)], a mesenchymal stem cell line derived from the embryo of an elasmobranch, the spiny dogfish shark S. acanthias. Elasmobranchs (sharks and rays) first appeared >400 million years ago, and existing species provide useful models for comparative vertebrate cell biology, physiology, and genomics. Comparative vertebrate genomics among evolutionarily distant organisms can provide sequence conservation information that facilitates identification of critical coding and noncoding regions. Although these genomic analyses are informative, experimental verification of functions of genomic sequences depends heavily on cell culture approaches. Using ESTs defining mRNAs derived from the SAE cell line, we identified lengthy and highly conserved gene-specific nucleotide sequences in the noncoding 3′ UTRs of eight genes involved in the regulation of cell growth and proliferation. Conserved noncoding 3′ mRNA regions detected by using the shark nucleotide sequences as a starting point were found in a range of other vertebrate orders, including bony fish, birds, amphibians, and mammals. Nucleotide identity of shark and human in these regions was remarkably well conserved. Our results indicate that highly conserved gene sequences dating from the appearance of jawed vertebrates and representing potential cis-regulatory elements can be identified through the use of cartilaginous fish as a baseline. Because the expression of genes in the SAE cell line was prerequisite for their identification, this cartilaginous fish culture system also provides a physiologically valid tool to test functional hypotheses on the role of these ancient conserved sequences in comparative cell biology. PMID:17227856

  5. Mechanistically Distinct Pathways of Divergent Regulatory DNA Creation Contribute to Evolution of Human-Specific Genomic Regulatory Networks Driving Phenotypic Divergence of Homo sapiens.

    PubMed

    Glinsky, Gennadi V

    2016-09-19

    Thousands of candidate human-specific regulatory sequences (HSRS) have been identified, supporting the hypothesis that unique to human phenotypes result from human-specific alterations of genomic regulatory networks. Collectively, a compendium of multiple diverse families of HSRS that are functionally and structurally divergent from Great Apes could be defined as the backbone of human-specific genomic regulatory networks. Here, the conservation patterns analysis of 18,364 candidate HSRS was carried out requiring that 100% of bases must remap during the alignments of human, chimpanzee, and bonobo sequences. A total of 5,535 candidate HSRS were identified that are: (i) highly conserved in Great Apes; (ii) evolved by the exaptation of highly conserved ancestral DNA; (iii) defined by either the acceleration of mutation rates on the human lineage or the functional divergence from non-human primates. The exaptation of highly conserved ancestral DNA pathway seems mechanistically distinct from the evolution of regulatory DNA segments driven by the species-specific expansion of transposable elements. Genome-wide proximity placement analysis of HSRS revealed that a small fraction of topologically associating domains (TADs) contain more than half of HSRS from four distinct families. TADs that are enriched for HSRS and termed rapidly evolving in humans TADs (revTADs) comprise 0.8-10.3% of 3,127 TADs in the hESC genome. RevTADs manifest distinct correlation patterns between placements of human accelerated regions, human-specific transcription factor-binding sites, and recombination rates. There is a significant enrichment within revTAD boundaries of hESC-enhancers, primate-specific CTCF-binding sites, human-specific RNAPII-binding sites, hCONDELs, and H3K4me3 peaks with human-specific enrichment at TSS in prefrontal cortex neurons (P < 0.0001 in all instances). Present analysis supports the idea that phenotypic divergence of Homo sapiens is driven by the evolution of human-specific genomic regulatory networks via at least two mechanistically distinct pathways of creation of divergent sequences of regulatory DNA: (i) recombination-associated exaptation of the highly conserved ancestral regulatory DNA segments; (ii) human-specific insertions of transposable elements. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Hundreds of conserved non-coding genomic regions are independently lost in mammals

    PubMed Central

    Hiller, Michael; Schaar, Bruce T.; Bejerano, Gill

    2012-01-01

    Conserved non-protein-coding DNA elements (CNEs) often encode cis-regulatory elements and are rarely lost during evolution. However, CNE losses that do occur can be associated with phenotypic changes, exemplified by pelvic spine loss in sticklebacks. Using a computational strategy to detect complete loss of CNEs in mammalian genomes while strictly controlling for artifacts, we find >600 CNEs that are independently lost in at least two mammalian lineages, including a spinal cord enhancer near GDF11. We observed several genomic regions where multiple independent CNE loss events happened; the most extreme is the DIAPH2 locus. We show that CNE losses often involve deletions and that CNE loss frequencies are non-uniform. Similar to less pleiotropic enhancers, we find that independently lost CNEs are shorter, slightly less constrained and evolutionarily younger than CNEs without detected losses. This suggests that independently lost CNEs are less pleiotropic and that pleiotropic constraints contribute to non-uniform CNE loss frequencies. We also detected 35 CNEs that are independently lost in the human lineage and in other mammals. Our study uncovers an interesting aspect of the evolution of functional DNA in mammalian genomes. Experiments are necessary to test if these independently lost CNEs are associated with parallel phenotype changes in mammals. PMID:23042682

  7. Molecular Evolution of the Non-Coding Eosinophil Granule Ontogeny Transcript

    PubMed Central

    Rose, Dominic; Stadler, Peter F.

    2011-01-01

    Eukaryotic genomes are pervasively transcribed. A large fraction of the transcriptional output consists of long, mRNA-like, non-protein-coding transcripts (mlncRNAs). The evolutionary history of mlncRNAs is still largely uncharted territory. In this contribution, we explore in detail the evolutionary traces of the eosinophil granule ontogeny transcript (EGOT), an experimentally confirmed representative of an abundant class of totally intronic non-coding transcripts (TINs). EGOT is located antisense to an intron of the ITPR1 gene. We computationally identify putative EGOT orthologs in the genomes of 32 different amniotes, including orthologs from primates, rodents, ungulates, carnivores, afrotherians, and xenarthrans, as well as putative candidates from basal amniotes, such as opossum or platypus. We investigate the EGOT gene phylogeny, analyze patterns of sequence conservation, and the evolutionary conservation of the EGOT gene structure. We show that EGO-B, the spliced isoform, may be present throughout the placental mammals, but most likely dates back even further. We demonstrate here for the first time that the whole EGOT locus is highly structured, containing several evolutionary conserved, and thermodynamic stable secondary structures. Our analyses allow us to postulate novel functional roles of a hitherto poorly understood region at the intron of EGO-B which is highly conserved at the sequence level. The region contains a novel ITPR1 exon and also conserved RNA secondary structures together with a conserved TATA-like element, which putatively acts as a promoter of an independent regulatory element. PMID:22303364

  8. Discovery of functional non-coding conserved regions in the α-synuclein gene locus

    PubMed Central

    Sterling, Lori; Walter, Michael; Ting, Dennis; Schüle, Birgitt

    2014-01-01

    Several single nucleotide polymorphisms (SNPs) and the Rep-1 microsatellite marker of the α-synuclein ( SNCA) gene have consistently been shown to be associated with Parkinson’s disease, but the functional relevance is unclear. Based on these findings we hypothesized that conserved cis-regulatory elements in the SNCA genomic region regulate expression of SNCA, and that SNPs in these regions could be functionally modulating the expression of SNCA, thus contributing to neuronal demise and predisposing to Parkinson’s disease. In a pair-wise comparison of a 206kb genomic region encompassing the SNCA gene, we revealed 34 evolutionary conserved DNA sequences between human and mouse. All elements were cloned into reporter vectors and assessed for expression modulation in dual luciferase reporter assays.  We found that 12 out of 34 elements exhibited either an enhancement or reduction of the expression of the reporter gene. Three elements upstream of the SNCA gene displayed an approximately 1.5 fold (p<0.009) increase in expression. Of the intronic regions, three showed a 1.5 fold increase and two others indicated a 2 and 2.5 fold increase in expression (p<0.002). Three elements downstream of the SNCA gene showed 1.5 fold and 2.5 fold increase (p<0.0009). One element downstream of SNCA had a reduced expression of the reporter gene of 0.35 fold (p<0.0009) of normal activity. Our results demonstrate that the SNCA gene contains cis-regulatory regions that might regulate the transcription and expression of SNCA. Further studies in disease-relevant tissue types will be important to understand the functional impact of regulatory regions and specific Parkinson’s disease-associated SNPs and its function in the disease process. PMID:25566351

  9. Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality

    PubMed Central

    Simola, Daniel F.; Wissler, Lothar; Donahue, Greg; Waterhouse, Robert M.; Helmkampf, Martin; Roux, Julien; Nygaard, Sanne; Glastad, Karl M.; Hagen, Darren E.; Viljakainen, Lumi; Reese, Justin T.; Hunt, Brendan G.; Graur, Dan; Elhaik, Eran; Kriventseva, Evgenia V.; Wen, Jiayu; Parker, Brian J.; Cash, Elizabeth; Privman, Eyal; Childers, Christopher P.; Muñoz-Torres, Monica C.; Boomsma, Jacobus J.; Bornberg-Bauer, Erich; Currie, Cameron R.; Elsik, Christine G.; Suen, Garret; Goodisman, Michael A.D.; Keller, Laurent; Liebig, Jürgen; Rawls, Alan; Reinberg, Danny; Smith, Chris D.; Smith, Chris R.; Tsutsui, Neil; Wurm, Yannick; Zdobnov, Evgeny M.; Berger, Shelley L.; Gadau, Jürgen

    2013-01-01

    Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ∼4000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared with Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of noncoding regulatory elements; however, extant conserved regions are enriched for novel noncoding RNAs and transcription factor–binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., Creb) and trans (e.g., fork head) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, because two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared with other ants or Drosophila. Thus, while the “socio-genomes” of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations. PMID:23636946

  10. Chromosome size in diploid eukaryotic species centers on the average length with a conserved boundary

    USDA-ARS?s Scientific Manuscript database

    Understanding genome and chromosome evolution is important for understanding genetic inheritance and evolution. Universal events comprising DNA replication, transcription, repair, mobile genetic element transposition, chromosome rearrangements, mitosis, and meiosis underlie inheritance and variation...

  11. Functions of the 3′ and 5′ genome RNA regions of members of the genus Flavivirus

    PubMed Central

    Brinton, Margo A.; Basu, Mausumi

    2015-01-01

    The positive sense genomes of members of the genus Flavivirus in the family Flaviviridae are ~11 kb nts in length and have a 5′ type I cap but no 3′ poly A. The 5′ and 3′ terminal regions contain short conserved sequences that are proposed to be repeated remnants of an ancient sequence. However, the functions of most of these conserved sequences have not yet been determined. The terminal regions of the genome also contain multiple conserved RNA structures. Functional data for many of these structures has been obtained. Three sets of complementary 3′ and 5′ terminal region sequences, some of which are located in conserved RNA structures, interact to form a panhandle structure that is required for initiation of minus strand RNA synthesis with the 5′ terminal structure functioning as the promoter. How the switch from the terminal RNA structure base pairing to the long distance RNA-RNA interaction is triggered and regulated is not well understood but evidence suggests involvement of a cell protein binding to three sites on the 3′ terminal RNA structures and a cis-acting metastable 3′ RNA element in the 3′ terminal structure. Cell proteins may also be involved in facilitating exponential replication of nascent genomic RNA within replication vesicles at later times of infection cycle. Other conserved RNA structures and/or sequences in the 5′ and 3′ terminal regions have been proposed to regulate genome translation. Additional functions of the 5′ and 3′ terminal sequences have also been reported. PMID:25683510

  12. Conserved structure and inferred evolutionary history of long terminal repeats (LTRs)

    PubMed Central

    2013-01-01

    Background Long terminal repeats (LTRs, consisting of U3-R-U5 portions) are important elements of retroviruses and related retrotransposons. They are difficult to analyse due to their variability. The aim was to obtain a more comprehensive view of structure, diversity and phylogeny of LTRs than hitherto possible. Results Hidden Markov models (HMM) were created for 11 clades of LTRs belonging to Retroviridae (class III retroviruses), animal Metaviridae (Gypsy/Ty3) elements and plant Pseudoviridae (Copia/Ty1) elements, complementing our work with Orthoretrovirus HMMs. The great variation in LTR length of plant Metaviridae and the few divergent animal Pseudoviridae prevented building HMMs from both of these groups. Animal Metaviridae LTRs had the same conserved motifs as retroviral LTRs, confirming that the two groups are closely related. The conserved motifs were the short inverted repeats (SIRs), integrase recognition signals (5´TGTTRNR…YNYAACA 3´); the polyadenylation signal or AATAAA motif; a GT-rich stretch downstream of the polyadenylation signal; and a less conserved AT-rich stretch corresponding to the core promoter element, the TATA box. Plant Pseudoviridae LTRs differed slightly in having a conserved TATA-box, TATATA, but no conserved polyadenylation signal, plus a much shorter R region. The sensitivity of the HMMs for detection in genomic sequences was around 50% for most models, at a relatively high specificity, suitable for genome screening. The HMMs yielded consensus sequences, which were aligned by creating an HMM model (a ‘Superviterbi’ alignment). This yielded a phylogenetic tree that was compared with a Pol-based tree. Both LTR and Pol trees supported monophyly of retroviruses. In both, Pseudoviridae was ancestral to all other LTR retrotransposons. However, the LTR trees showed the chromovirus portion of Metaviridae clustering together with Pseudoviridae, dividing Metaviridae into two portions with distinct phylogeny. Conclusion The HMMs clearly demonstrated a unitary conserved structure of LTRs, supporting that they arose once during evolution. We attempted to follow the evolution of LTRs by tracing their functional foundations, that is, acquisition of RNAse H, a combined promoter/ polyadenylation site, integrase, hairpin priming and the primer binding site (PBS). Available information did not support a simple evolutionary chain of events. PMID:23369192

  13. V-SINEs: A New Superfamily of Vertebrate SINEs That Are Widespread in Vertebrate Genomes and Retain a Strongly Conserved Segment within Each Repetitive Unit

    PubMed Central

    Ogiwara, Ikuo; Miya, Masaki; Ohshima, Kazuhiko; Okada, Norihiro

    2002-01-01

    We have identified a new superfamily of vertebrate short interspersed repetitive elements (SINEs), designated V-SINEs, that are widespread in fishes and frogs. Each V-SINE includes a central conserved domain preceded by a 5′-end tRNA-related region and followed by a potentially recombinogenic (TG)n tract, with a 3′ tail derived from the 3′ untranslated region (UTR) of the corresponding partner long interspersed repetitive element (LINE) that encodes a functional reverse transcriptase. The central domain is strongly conserved and is even found in SINEs in the lamprey genome, suggesting that V-SINEs might be ∼550 Myr old or older in view of the timing of divergence of the lamprey lineage from the bony fish lineage. The central conserved domain might have been subject to some form of positive selection. Although the contemporary 3′ tails of V-SINEs differ from one another, it is possible that the original 3′ tail might have been replaced, via recombination, by the 3′ tails of more active partner LINEs, thereby retaining retropositional activity and the ability to survive for long periods on the evolutionary time scale. It seems plausible that V-SINEs may have some function(s) that have been maintained by the coevolution of SINEs and LINEs during the evolution of vertebrates. [The sequences reported in this paper have been deposited in the DDBJ/GenBank database under accession nos. AB072981–AB073004. Supplemental figures are available online at http://www.genome.org.] PMID:11827951

  14. Evolution of Two Short Interspersed Elements in Callorhinchus milii (Chondrichthyes, Holocephali) and Related Elements in Sharks and the Coelacanth

    PubMed Central

    Plazzi, Federico; Mantovani, Barbara

    2017-01-01

    Abstract Short interspersed elements (SINEs) are non-autonomous retrotransposons. Although they usually show fast evolutionary rates, in some instances highly conserved domains (HCDs) have been observed in elements with otherwise divergent sequences and from distantly related species. Here, we document the life history of two HCD-SINE families in the elephant shark Callorhinchus milii, one specific to the holocephalan lineage (CmiSINEs) and another one (SacSINE1-CM) with homologous elements in sharks and the coelacanth (SacSINE1s, LmeSINE1s). The analyses of their relationships indicated that these elements share the same 3′-tail, which would have allowed both elements to rise to high copy number by exploiting the C. milii L2-2_CM long interspersed element (LINE) enzymes. Molecular clock analysis on SINE activity in C. milii genome evidenced two replication bursts occurring right after two major events in the holocephalan evolution: the end-Permian mass extinction and the radiation of modern Holocephali. Accordingly, the same analysis on the coelacanth homologous elements, LmeSINE1, identified a replication wave close to the split age of the two extant Latimeria species. The genomic distribution of the studied SINEs pointed out contrasting results: some elements were preferentially sorted out from gene regions, but accumulated in flanking regions, while others appear more conserved within genes. Moreover, data from the C. milii transcriptome suggest that these SINEs could be involved in miRNA biogenesis and may be targets for miRNA-based regulation. PMID:28505260

  15. Molecular and phylogenetic characterization of the sieve element occlusion gene family in Fabaceae and non-Fabaceae plants

    PubMed Central

    2010-01-01

    Background The phloem of dicotyledonous plants contains specialized P-proteins (phloem proteins) that accumulate during sieve element differentiation and remain parietally associated with the cisternae of the endoplasmic reticulum in mature sieve elements. Wounding causes P-protein filaments to accumulate at the sieve plates and block the translocation of photosynthate. Specialized, spindle-shaped P-proteins known as forisomes that undergo reversible calcium-dependent conformational changes have evolved exclusively in the Fabaceae. Recently, the molecular characterization of three genes encoding forisome components in the model legume Medicago truncatula (MtSEO1, MtSEO2 and MtSEO3; SEO = sieve element occlusion) was reported, but little is known about the molecular characteristics of P-proteins in non-Fabaceae. Results We performed a comprehensive genome-wide comparative analysis by screening the M. truncatula, Glycine max, Arabidopsis thaliana, Vitis vinifera and Solanum phureja genomes, and a Malus domestica EST library for homologs of MtSEO1, MtSEO2 and MtSEO3 and identified numerous novel SEO genes in Fabaceae and even non-Fabaceae plants, which do not possess forisomes. Even in Fabaceae some SEO genes appear to not encode forisome components. All SEO genes have a similar exon-intron structure and are expressed predominantly in the phloem. Phylogenetic analysis revealed the presence of several subgroups with Fabaceae-specific subgroups containing all of the known as well as newly identified forisome component proteins. We constructed Hidden Markov Models that identified three conserved protein domains, which characterize SEO proteins when present in combination. In addition, one common and three subgroup specific protein motifs were found in the amino acid sequences of SEO proteins. SEO genes are organized in genomic clusters and the conserved synteny allowed us to identify several M. truncatula vs G. max orthologs as well as paralogs within the G. max genome. Conclusions The unexpected occurrence of forisome-like genes in non-Fabaceae plants may indicate that these proteins encode species-specific P-proteins, which is backed up by the phloem-specific expression profiles. The conservation of gene structure, the presence of specific motifs and domains and the genomic synteny argue for a common phylogenetic origin of forisomes and other P-proteins. PMID:20932300

  16. The History of Bordetella pertussis Genome Evolution Includes Structural Rearrangement

    PubMed Central

    Peng, Yanhui; Loparev, Vladimir; Batra, Dhwani; Bowden, Katherine E.; Burroughs, Mark; Cassiday, Pamela K.; Davis, Jamie K.; Johnson, Taccara; Juieng, Phalasy; Knipe, Kristen; Mathis, Marsenia H.; Pruitt, Andrea M.; Rowe, Lori; Sheth, Mili; Tondella, M. Lucia; Williams, Margaret M.

    2017-01-01

    ABSTRACT Despite high pertussis vaccine coverage, reported cases of whooping cough (pertussis) have increased over the last decade in the United States and other developed countries. Although Bordetella pertussis is well known for its limited gene sequence variation, recent advances in long-read sequencing technology have begun to reveal genomic structural heterogeneity among otherwise indistinguishable isolates, even within geographically or temporally defined epidemics. We have compared rearrangements among complete genome assemblies from 257 B. pertussis isolates to examine the potential evolution of the chromosomal structure in a pathogen with minimal gene nucleotide sequence diversity. Discrete changes in gene order were identified that differentiated genomes from vaccine reference strains and clinical isolates of various genotypes, frequently along phylogenetic boundaries defined by single nucleotide polymorphisms. The observed rearrangements were primarily large inversions centered on the replication origin or terminus and flanked by IS481, a mobile genetic element with >240 copies per genome and previously suspected to mediate rearrangements and deletions by homologous recombination. These data illustrate that structural genome evolution in B. pertussis is not limited to reduction but also includes rearrangement. Therefore, although genomes of clinical isolates are structurally diverse, specific changes in gene order are conserved, perhaps due to positive selection, providing novel information for investigating disease resurgence and molecular epidemiology. IMPORTANCE Whooping cough, primarily caused by Bordetella pertussis, has resurged in the United States even though the coverage with pertussis-containing vaccines remains high. The rise in reported cases has included increased disease rates among all vaccinated age groups, provoking questions about the pathogen's evolution. The chromosome of B. pertussis includes a large number of repetitive mobile genetic elements that obstruct genome analysis. However, these mobile elements facilitate large rearrangements that alter the order and orientation of essential protein-encoding genes, which otherwise exhibit little nucleotide sequence diversity. By comparing the complete genome assemblies from 257 isolates, we show that specific rearrangements have been conserved throughout recent evolutionary history, perhaps by eliciting changes in gene expression, which may also provide useful information for molecular epidemiology. PMID:28167525

  17. The genome biology of phytoplasma: modulators of plants and insects.

    PubMed

    Sugio, Akiko; Hogenhout, Saskia A

    2012-06-01

    Phytoplasmas are bacterial pathogens of plants that are transmitted by insects. These bacteria uniquely multiply intracellularly in both plants (Plantae) and insects (Animalia). Similarly to bacterial endosymbionts, phytoplasmas have reduced genomes with limited metabolic capabilities. Nonetheless, the chromosomes of many phytoplasmas are rich in repeated DNA consisting of mobile elements. Phytoplasmas produce an arsenal of effectors most of which are encoded on these mobile elements and on plasmids. These effectors target conserved plant transcription factors resulting in witches' broom and leafy flower symptoms and suppression of plant defense to insect vectors that transmit the phytoplasmas. Future studies of these fascinating microbes will generate a wealth of new knowledge about forces that shape genomes and microbial interactions with multicellular hosts. Copyright © 2012 Elsevier Ltd. All rights reserved.

  18. Mobilomics in Saccharomyces cerevisiae strains

    PubMed Central

    2013-01-01

    Background Mobile Genetic Elements (MGEs) are selfish DNA integrated in the genomes. Their detection is mainly based on consensus–like searches by scanning the investigated genome against the sequence of an already identified MGE. Mobilomics aims at discovering all the MGEs in a genome and understanding their dynamic behavior: The data for this kind of investigation can be provided by comparative genomics of closely related organisms. The amount of data thus involved requires a strong computational effort, which should be alleviated. Results Our approach proposes to exploit the high similarity among homologous chromosomes of different strains of the same species, following a progressive comparative genomics philosophy. We introduce a software tool based on our new fast algorithm, called regender, which is able to identify the conserved regions between chromosomes. Our case study is represented by a unique recently available dataset of 39 different strains of S.cerevisiae, which regender is able to compare in few minutes. By exploring the non–conserved regions, where MGEs are mainly retrotransposons called Tys, and marking the candidate Tys based on their length, we are able to locate a priori and automatically all the already known Tys and map all the putative Tys in all the strains. The remaining putative mobile elements (PMEs) emerging from this intra–specific comparison are sharp markers of inter–specific evolution: indeed, many events of non–conservation among different yeast strains correspond to PMEs. A clustering based on the presence/absence of the candidate Tys in the strains suggests an evolutionary interconnection that is very similar to classic phylogenetic trees based on SNPs analysis, even though it is computed without using phylogenetic information. Conclusions The case study indicates that the proposed methodology brings two major advantages: (a) it does not require any template sequence for the wanted MGEs and (b) it can be applied to infer MGEs also for low coverage genomes with unresolved bases, where traditional approaches are largely ineffective. PMID:23514613

  19. Genome dynamics and its impact on evolution of Escherichia coli.

    PubMed

    Dobrindt, Ulrich; Chowdary, M Geddam; Krumbholz, G; Hacker, J

    2010-08-01

    The Escherichia coli genome consists of a conserved part, the so-called core genome, which encodes essential cellular functions and of a flexible, strain-specific part. Genes that belong to the flexible genome code for factors involved in bacterial fitness and adaptation to different environments. Adaptation includes increase in fitness and colonization capacity. Pathogenic as well as non-pathogenic bacteria carry mobile and accessory genetic elements such as plasmids, bacteriophages, genomic islands and others, which code for functions required for proper adaptation. Escherichia coli is a very good example to study the interdependency of genome architecture and lifestyle of bacteria. Thus, these species include pathogenic variants as well as commensal bacteria adapted to different host organisms. In Escherichia coli, various genetic elements encode for pathogenicity factors as well as factors, which increase the fitness of non-pathogenic bacteria. The processes of genome dynamics, such as gene transfer, genome reduction, rearrangements as well as point mutations contribute to the adaptation of the bacteria into particular environments. Using Escherichia coli model organisms, such as uropathogenic strain 536 or commensal strain Nissle 1917, we studied mechanisms of genome dynamics and discuss these processes in the light of the evolution of microbes.

  20. Contribution of Mobile Group II Introns to Sinorhizobium meliloti Genome Evolution.

    PubMed

    Toro, Nicolás; Martínez-Abarca, Francisco; Molina-Sánchez, María D; García-Rodríguez, Fernando M; Nisa-Martínez, Rafael

    2018-01-01

    Mobile group II introns are ribozymes and retroelements that probably originate from bacteria. Sinorhizobium meliloti , the nitrogen-fixing endosymbiont of legumes of genus Medicago , harbors a large number of these retroelements. One of these elements, RmInt1, has been particularly successful at colonizing this multipartite genome. Many studies have improved our understanding of RmInt1 and phylogenetically related group II introns, their mobility mechanisms, spread and dynamics within S. meliloti and closely related species. Although RmInt1 conserves the ancient retroelement behavior, its evolutionary history suggests that this group II intron has played a role in the short- and long-term evolution of the S. meliloti genome. We will discuss its proposed role in genome evolution by controlling the spread and coexistence of potentially harmful mobile genetic elements, by ectopic transposition to different genetic loci as a source of early genomic variation and by generating sequence variation after a very slow degradation process, through intron remnants that may have continued to evolve, contributing to bacterial speciation.

  1. Contribution of Mobile Group II Introns to Sinorhizobium meliloti Genome Evolution

    PubMed Central

    Toro, Nicolás; Martínez-Abarca, Francisco; Molina-Sánchez, María D.; García-Rodríguez, Fernando M.; Nisa-Martínez, Rafael

    2018-01-01

    Mobile group II introns are ribozymes and retroelements that probably originate from bacteria. Sinorhizobium meliloti, the nitrogen-fixing endosymbiont of legumes of genus Medicago, harbors a large number of these retroelements. One of these elements, RmInt1, has been particularly successful at colonizing this multipartite genome. Many studies have improved our understanding of RmInt1 and phylogenetically related group II introns, their mobility mechanisms, spread and dynamics within S. meliloti and closely related species. Although RmInt1 conserves the ancient retroelement behavior, its evolutionary history suggests that this group II intron has played a role in the short- and long-term evolution of the S. meliloti genome. We will discuss its proposed role in genome evolution by controlling the spread and coexistence of potentially harmful mobile genetic elements, by ectopic transposition to different genetic loci as a source of early genomic variation and by generating sequence variation after a very slow degradation process, through intron remnants that may have continued to evolve, contributing to bacterial speciation. PMID:29670598

  2. Comparative genomics reveals insights into avian genome evolution and adaptation

    PubMed Central

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M.; Lee, Chul; Storz, Jay F.; Antunes, Agostinho; Greenwold, Matthew J.; Meredith, Robert W.; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R.; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T.; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V.; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S.; Gatesy, John; Hoffmann, Federico G.; Opazo, Juan C.; Håstad, Olle; Sawyer, Roger H.; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W.; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F.; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A.; Green, Richard E.; O’Brien, Stephen J.; Griffin, Darren; Johnson, Warren E.; Haussler, David; Ryder, Oliver A.; Willerslev, Eske; Graves, Gary R.; Alström, Per; Fjeldså, Jon; Mindell, David P.; Edwards, Scott V.; Braun, Edward L.; Rahbek, Carsten; Burt, David W.; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D.; Gilbert, M. Thomas P.; Wang, Jun

    2015-01-01

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. PMID:25504712

  3. [Analysis of cis-regulatory element distribution in gene promoters of Gossypium raimondii and Arabidopsis thaliana].

    PubMed

    Sun, Gao-Fei; He, Shou-Pu; Du, Xiong-Ming

    2013-10-01

    Cotton genomic studies have boomed since the release of Gossypium raimondii draft genome. In this study, cis-regulatory element (CRE) in 1 kb length sequence upstream 5' UTR of annotated genes were selected and scanned in the Arabidopsis thaliana (At) and Gossypium raimondii (Gr) genomes, based on the database of PLACE (Plant cis-acting Regulatory DNA Elements). According to the definition of this study, 44 (12.3%) and 57 (15.5%) CREs presented "peak-like" distribution in the 1 kb selected sequences of both genomes, respectively. Thirty-four of them were peak-like distributed in both genomes, which could be further categorized into 4 types based on their core sequences. The coincidence of TATABOX peak position and their actual position ((-) -30 bp) indicated that the position of a common CRE was conservative in different genes, which suggested that the peak position of these CREs was their possible actual position of transcription factors. The position of a common CRE was also different between the two genomes due to stronger length variation of 5' UTR in Gr than At. Furthermore, most of the peak-like CREs were located in the region of -110 bp-0 bp, which suggested that concentrated distribution might be conductive to the interaction of transcription factors, and then regulate the gene expression in downstream.

  4. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    PubMed

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  5. Insertion and deletion polymorphisms of the ancient AluS family in the human genome.

    PubMed

    Kryatova, Maria S; Steranka, Jared P; Burns, Kathleen H; Payer, Lindsay M

    2017-01-01

    Polymorphic Alu elements account for 17% of structural variants in the human genome. The majority of these belong to the youngest AluY subfamilies, and most structural variant discovery efforts have focused on identifying Alu polymorphisms from these currently retrotranspositionally active subfamilies. In this report we analyze polymorphisms from the evolutionarily older AluS subfamily, whose peak activity was tens of millions of years ago. We annotate the AluS polymorphisms, assess their likely mechanism of origin, and evaluate their contribution to structural variation in the human genome. Of 52 previously reported polymorphic AluS elements ascertained for this study, 48 were confirmed to belong to the AluS subfamily using high stringency subfamily classification criteria. Of these, the majority (77%, 37/48) appear to be deletion polymorphisms. Two polymorphic AluS elements (4%) have features of non-classical Alu insertions and one polymorphic AluS element (2%) likely inserted by a mechanism involving internal priming. Seven AluS polymorphisms (15%) appear to have arisen by the classical target-primed reverse transcription (TPRT) retrotransposition mechanism. These seven TPRT products are 3' intact with 3' poly-A tails, and are flanked by target site duplications; L1 ORF2p endonuclease cleavage sites were also observed, providing additional evidence that these are L1 ORF2p endonuclease-mediated TPRT insertions. Further sequence analysis showed strong conservation of both the RNA polymerase III promoter and SRP9/14 binding sites, important for mediating transcription and interaction with retrotransposition machinery, respectively. This conservation of functional features implies that some of these are fairly recent insertions since they have not diverged significantly from their respective retrotranspositionally competent source elements. Of the polymorphic AluS elements evaluated in this report, 15% (7/48) have features consistent with TPRT-mediated insertion, thus suggesting that some AluS elements have been more active recently than previously thought, or that fixation of AluS insertion alleles remains incomplete. These data expand the potential significance of polymorphic AluS elements in contributing to structural variation in the human genome. Future discovery efforts focusing on polymorphic AluS elements are likely to identify more such polymorphisms, and approaches tailored to identify deletion alleles may be warranted.

  6. Small RNAs, big impact: small RNA pathways in transposon control and their effect on the host stress response.

    PubMed

    Wheeler, Bayly S

    2013-12-01

    Transposons are mobile genetic elements that are a major constituent of most genomes. Organisms regulate transposable element expression, transposition, and insertion site preference, mitigating the genome instability caused by uncontrolled transposition. A recent burst of research has demonstrated the critical role of small non-coding RNAs in regulating transposition in fungi, plants, and animals. While mechanistically distinct, these pathways work through a conserved paradigm. The presence of a transposon is communicated by the presence of its RNA or by its integration into specific genomic loci. These signals are then translated into small non-coding RNAs that guide epigenetic modifications and gene silencing back to the transposon. In addition to being regulated by the host, transposable elements are themselves capable of influencing host gene expression. Transposon expression is responsive to environmental signals, and many transposons are activated by various cellular stresses. TEs can confer local gene regulation by acting as enhancers and can also confer global gene regulation through their non-coding RNAs. Thus, transposable elements can act as stress-responsive regulators that control host gene expression in cis and trans.

  7. Targeted Capture Sequencing in Whitebark Pine Reveals Range-Wide Demographic and Adaptive Patterns Despite Challenges of a Large, Repetitive Genome.

    PubMed

    Syring, John V; Tennessen, Jacob A; Jennings, Tara N; Wegrzyn, Jill; Scelfo-Dalbey, Camille; Cronn, Richard

    2016-01-01

    Whitebark pine (Pinus albicaulis) inhabits an expansive range in western North America, and it is a keystone species of subalpine environments. Whitebark is susceptible to multiple threats - climate change, white pine blister rust, mountain pine beetle, and fire exclusion - and it is suffering significant mortality range-wide, prompting the tree to be listed as 'globally endangered' by the International Union for Conservation of Nature and 'endangered' by the Canadian government. Conservation collections (in situ and ex situ) are being initiated to preserve the genetic legacy of the species. Reliable, transferrable, and highly variable genetic markers are essential for quantifying the genetic profiles of seed collections relative to natural stands, and ensuring the completeness of conservation collections. We evaluated the use of hybridization-based target capture to enrich specific genomic regions from the 27 GB genome of whitebark pine, and to evaluate genetic variation across loci, trees, and geography. Probes were designed to capture 7,849 distinct genes, and screening was performed on 48 trees. Despite the inclusion of repetitive elements in the probe pool, the resulting dataset provided information on 4,452 genes and 32% of targeted positions (528,873 bp), and we were able to identify 12,390 segregating sites from 47 trees. Variations reveal strong geographic trends in heterozygosity and allelic richness, with trees from the southern Cascade and Sierra Range showing the greatest distinctiveness and differentiation. Our results show that even under non-optimal conditions (low enrichment efficiency; inclusion of repetitive elements in baits), targeted enrichment produces high quality, codominant genotypes from large genomes. The resulting data can be readily integrated into management and gene conservation activities for whitebark pine, and have the potential to be applied to other members of 5-needle pine group (Pinus subsect. Quinquefolia) due to their limited genetic divergence.

  8. Virophages, polintons, and transpovirons: a complex evolutionary network of diverse selfish genetic elements with different reproduction strategies.

    PubMed

    Yutin, Natalya; Raoult, Didier; Koonin, Eugene V

    2013-05-23

    Recent advances of genomics and metagenomics reveal remarkable diversity of viruses and other selfish genetic elements. In particular, giant viruses have been shown to possess their own mobilomes that include virophages, small viruses that parasitize on giant viruses of the Mimiviridae family, and transpovirons, distinct linear plasmids. One of the virophages known as the Mavirus, a parasite of the giant Cafeteria roenbergensis virus, shares several genes with large eukaryotic self-replicating transposon of the Polinton (Maverick) family, and it has been proposed that the polintons evolved from a Mavirus-like ancestor. We performed a comprehensive phylogenomic analysis of the available genomes of virophages and traced the evolutionary connections between the virophages and other selfish genetic elements. The comparison of the gene composition and genome organization of the virophages reveals 6 conserved, core genes that are organized in partially conserved arrays. Phylogenetic analysis of those core virophage genes, for which a sufficient diversity of homologs outside the virophages was detected, including the maturation protease and the packaging ATPase, supports the monophyly of the virophages. The results of this analysis appear incompatible with the origin of polintons from a Mavirus-like agent but rather suggest that Mavirus evolved through recombination between a polinton and an unknown virus. Altogether, virophages, polintons, a distinct Tetrahymena transposable element Tlr1, transpovirons, adenoviruses, and some bacteriophages form a network of evolutionary relationships that is held together by overlapping sets of shared genes and appears to represent a distinct module in the vast total network of viruses and mobile elements. The results of the phylogenomic analysis of the virophages and related genetic elements are compatible with the concept of network-like evolution of the virus world and emphasize multiple evolutionary connections between bona fide viruses and other classes of capsid-less mobile elements.

  9. Virophages, polintons, and transpovirons: a complex evolutionary network of diverse selfish genetic elements with different reproduction strategies

    PubMed Central

    2013-01-01

    Background Recent advances of genomics and metagenomics reveal remarkable diversity of viruses and other selfish genetic elements. In particular, giant viruses have been shown to possess their own mobilomes that include virophages, small viruses that parasitize on giant viruses of the Mimiviridae family, and transpovirons, distinct linear plasmids. One of the virophages known as the Mavirus, a parasite of the giant Cafeteria roenbergensis virus, shares several genes with large eukaryotic self-replicating transposon of the Polinton (Maverick) family, and it has been proposed that the polintons evolved from a Mavirus-like ancestor. Results We performed a comprehensive phylogenomic analysis of the available genomes of virophages and traced the evolutionary connections between the virophages and other selfish genetic elements. The comparison of the gene composition and genome organization of the virophages reveals 6 conserved, core genes that are organized in partially conserved arrays. Phylogenetic analysis of those core virophage genes, for which a sufficient diversity of homologs outside the virophages was detected, including the maturation protease and the packaging ATPase, supports the monophyly of the virophages. The results of this analysis appear incompatible with the origin of polintons from a Mavirus-like agent but rather suggest that Mavirus evolved through recombination between a polinton and an unknownvirus. Altogether, virophages, polintons, a distinct Tetrahymena transposable element Tlr1, transpovirons, adenoviruses, and some bacteriophages form a network of evolutionary relationships that is held together by overlapping sets of shared genes and appears to represent a distinct module in the vast total network of viruses and mobile elements. Conclusions The results of the phylogenomic analysis of the virophages and related genetic elements are compatible with the concept of network-like evolution of the virus world and emphasize multiple evolutionary connections between bona fide viruses and other classes of capsid-less mobile elements. PMID:23701946

  10. Feather Development Genes and Associated Regulatory Innovation Predate the Origin of Dinosauria

    PubMed Central

    Lowe, Craig B.; Clarke, Julia A.; Baker, Allan J.; Haussler, David; Edwards, Scott V.

    2015-01-01

    The evolution of avian feathers has recently been illuminated by fossils and the identification of genes involved in feather patterning and morphogenesis. However, molecular studies have focused mainly on protein-coding genes. Using comparative genomics and more than 600,000 conserved regulatory elements, we show that patterns of genome evolution in the vicinity of feather genes are consistent with a major role for regulatory innovation in the evolution of feathers. Rates of innovation at feather regulatory elements exhibit an extended period of innovation with peaks in the ancestors of amniotes and archosaurs. We estimate that 86% of such regulatory elements and 100% of the nonkeratin feather gene set were present prior to the origin of Dinosauria. On the branch leading to modern birds, we detect a strong signal of regulatory innovation near insulin-like growth factor binding protein (IGFBP) 2 and IGFBP5, which have roles in body size reduction, and may represent a genomic signature for the miniaturization of dinosaurian body size preceding the origin of flight. PMID:25415961

  11. An Enhancer Near ISL1 and an Ultraconserved Exon of PCBP2 areDerived from a Retroposon

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bejerano, Gill; Lowe, Craig; Ahituv, Nadav

    2005-11-27

    Hundreds of highly conserved distal cis-regulatory elementshave been characterized to date in vertebrate genomes1. Many thousandsmore are predicted based on comparative genomics2,3. Yet, in starkcontrast to the genes they regulate, virtually none of these regions canbe traced using sequence similarity in invertebrates, leaving theirevolutionary origin obscure. Here we show that a class of conserved,primarily non-coding regions in tetrapods originated from a novel shortinterspersed repetitive element (SINE) retroposon family that was activein Sarcopterygii (lobe-finned fishes and terrestrial vertebrates) in theSilurian at least 410 Mya4, and, remarkably, appears to be recentlyactive in the "living fossil" Indonesian coelacanth, Latimeriamenadoensis. We show that onemore » copy is a distal enhancer, located 500kbfrom the neuro-developmental gene ISL1. Several others represent new,possibly regulatory, alternatively spliced exons in the middle ofpre-existing Sarcopterygian genes. One of these is the>200bpultraconserved region5, 100 percent identical in mammals, and 80 percentidentical to the coelacanth SINE, that contains a 31aa alternativelyspliced exon of the mRNA processing gene PCBP26. These add to a growinglist of examples7 in which relics of transposable elements have acquireda function that serves their host, a process termed "exaptation"8, andprovide an origin for at least some of the highly-conservedvertebrate-specific genomic sequences recently discovered usingcomparative genomics.« less

  12. Tibrogargan and Coastal Plains rhabdoviruses: genomic characterization, evolution of novel genes and seroprevalence in Australian livestock.

    PubMed

    Gubala, Aneta; Davis, Steven; Weir, Richard; Melville, Lorna; Cowled, Chris; Boyle, David

    2011-09-01

    Tibrogargan virus (TIBV) and Coastal Plains virus (CPV) were isolated from cattle in Australia and TIBV has also been isolated from the biting midge Culicoides brevitarsis. Complete genomic sequencing revealed that the viruses share a novel genome structure within the family Rhabdoviridae, each virus containing two additional putative genes between the matrix protein (M) and glycoprotein (G) genes and one between the G and viral RNA polymerase (L) genes. The predicted novel protein products are highly diverged at the sequence level but demonstrate clear conservation of secondary structure elements, suggesting conservation of biological functions. Phylogenetic analyses showed that TIBV and CPV form an independent group within the 'dimarhabdovirus supergroup'. Although no disease has been observed in association with these viruses, antibodies were detected at high prevalence in cattle and buffalo in northern Australia, indicating the need for disease monitoring and further study of this distinctive group of viruses.

  13. Mutations that alter a conserved element upstream of the potato virus X triple block and coat protein genes affect subgenomic RNA accumulation.

    PubMed

    Kim, K H; Hemenway, C

    1997-05-26

    The putative subgenomic RNA (sgRNA) promoter regions upstream of the potato virus X (PVX) triple block and coat protein (CP) genes contain sequences common to other potexviruses. The importance of these sequences to PVX sgRNA accumulation was determined by inoculation of Nicotiana tabacum NT1 cell suspension protoplasts with transcripts derived from wild-type and modified PVX cDNA clones. Analyses of RNA accumulation by S1 nuclease digestion and primer extension indicated that a conserved octanucleotide sequence element and the spacing between this element and the start-site for sgRNA synthesis are critical for accumulation of the two major sgRNA species. The impact of mutations on CP sgRNA levels was also reflected in the accumulation of CP. In contrast, genomic minus- and plus-strand RNA accumulation were not significantly affected by mutations in these regions. Studies involving inoculation of tobacco plants with the modified transcripts suggested that the conserved octanucleotide element functions in sgRNA accumulation and some other aspect of the infection process.

  14. Rose spring dwarf-associated virus has RNA structural and gene-expression features like those of Barley yellow dwarf virus

    PubMed Central

    Salem, Nida’ M.; Miller, W. Allen; Rowhani, Adib; Golino, Deborah A.; Moyne, Anne-Laure; Falk, Bryce W.

    2015-01-01

    We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5′- and 3′-RACE showed the RSDaV genomic RNA to be 5,808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3′-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5′ ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5′ end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3′ cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae. PMID:18329064

  15. Rose spring dwarf-associated virus has RNA structural and gene-expression features like those of Barley yellow dwarf virus.

    PubMed

    Salem, Nida' M; Miller, W Allen; Rowhani, Adib; Golino, Deborah A; Moyne, Anne-Laure; Falk, Bryce W

    2008-06-05

    We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5'- and 3'-RACE showed the RSDaV genomic RNA to be 5808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3'-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5' ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5' end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3' cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae.

  16. Conserved Non-Coding Regulatory Signatures in Arabidopsis Co-Expressed Gene Modules

    PubMed Central

    Spangler, Jacob B.; Ficklin, Stephen P.; Luo, Feng; Freeling, Michael; Feltus, F. Alex

    2012-01-01

    Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome. PMID:23024789

  17. Conserved non-coding regulatory signatures in Arabidopsis co-expressed gene modules.

    PubMed

    Spangler, Jacob B; Ficklin, Stephen P; Luo, Feng; Freeling, Michael; Feltus, F Alex

    2012-01-01

    Complex traits and other polygenic processes require coordinated gene expression. Co-expression networks model mRNA co-expression: the product of gene regulatory networks. To identify regulatory mechanisms underlying coordinated gene expression in a tissue-enriched context, ten Arabidopsis thaliana co-expression networks were constructed after manually sorting 4,566 RNA profiling datasets into aerial, flower, leaf, root, rosette, seedling, seed, shoot, whole plant, and global (all samples combined) groups. Collectively, the ten networks contained 30% of the measurable genes of Arabidopsis and were circumscribed into 5,491 modules. Modules were scrutinized for cis regulatory mechanisms putatively encoded in conserved non-coding sequences (CNSs) previously identified as remnants of a whole genome duplication event. We determined the non-random association of 1,361 unique CNSs to 1,904 co-expression network gene modules. Furthermore, the CNS elements were placed in the context of known gene regulatory networks (GRNs) by connecting 250 CNS motifs with known GRN cis elements. Our results provide support for a regulatory role of some CNS elements and suggest the functional consequences of CNS activation of co-expression in specific gene sets dispersed throughout the genome.

  18. Evolution of Two Short Interspersed Elements in Callorhinchus milii (Chondrichthyes, Holocephali) and Related Elements in Sharks and the Coelacanth.

    PubMed

    Luchetti, Andrea; Plazzi, Federico; Mantovani, Barbara

    2017-06-01

    Short interspersed elements (SINEs) are non-autonomous retrotransposons. Although they usually show fast evolutionary rates, in some instances highly conserved domains (HCDs) have been observed in elements with otherwise divergent sequences and from distantly related species. Here, we document the life history of two HCD-SINE families in the elephant shark Callorhinchus milii, one specific to the holocephalan lineage (CmiSINEs) and another one (SacSINE1-CM) with homologous elements in sharks and the coelacanth (SacSINE1s, LmeSINE1s). The analyses of their relationships indicated that these elements share the same 3'-tail, which would have allowed both elements to rise to high copy number by exploiting the C. milii L2-2_CM long interspersed element (LINE) enzymes. Molecular clock analysis on SINE activity in C. milii genome evidenced two replication bursts occurring right after two major events in the holocephalan evolution: the end-Permian mass extinction and the radiation of modern Holocephali. Accordingly, the same analysis on the coelacanth homologous elements, LmeSINE1, identified a replication wave close to the split age of the two extant Latimeria species. The genomic distribution of the studied SINEs pointed out contrasting results: some elements were preferentially sorted out from gene regions, but accumulated in flanking regions, while others appear more conserved within genes. Moreover, data from the C. milii transcriptome suggest that these SINEs could be involved in miRNA biogenesis and may be targets for miRNA-based regulation. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  19. Comparative Genomics in Drosophila.

    PubMed

    Oti, Martin; Pane, Attilio; Sammeth, Michael

    2018-01-01

    Since the pioneering studies of Thomas Hunt Morgan and coworkers at the dawn of the twentieth century, Drosophila melanogaster and its sister species have tremendously contributed to unveil the rules underlying animal genetics, development, behavior, evolution, and human disease. Recent advances in DNA sequencing technologies launched Drosophila into the post-genomic era and paved the way for unprecedented comparative genomics investigations. The complete sequencing and systematic comparison of the genomes from 12 Drosophila species represents a milestone achievement in modern biology, which allowed a plethora of different studies ranging from the annotation of known and novel genomic features to the evolution of chromosomes and, ultimately, of entire genomes. Despite the efforts of countless laboratories worldwide, the vast amount of data that were produced over the past 15 years is far from being fully explored.In this chapter, we will review some of the bioinformatic approaches that were developed to interrogate the genomes of the 12 Drosophila species. Setting off from alignments of the entire genomic sequences, the degree of conservation can be separately evaluated for every region of the genome, providing already first hints about elements that are under purifying selection and therefore likely functional. Furthermore, the careful analysis of repeated sequences sheds light on the evolutionary dynamics of transposons, an enigmatic and fascinating class of mobile elements housed in the genomes of animals and plants. Comparative genomics also aids in the computational identification of the transcriptionally active part of the genome, first and foremost of protein-coding loci, but also of transcribed nevertheless apparently noncoding regions, which were once considered "junk" DNA. Eventually, the synergy between functional and comparative genomics also facilitates in silico and in vivo studies on cis-acting regulatory elements, like transcription factor binding sites, that due to the high degree of sequence variability usually impose increased challenges for bioinformatics approaches.

  20. De novo mutations in regulatory elements in neurodevelopmental disorders

    PubMed Central

    Short, Patrick J.; McRae, Jeremy F.; Gallone, Giuseppe; Sifrim, Alejandro; Won, Hyejung; Geschwind, Daniel H.; Wright, Caroline F.; Firth, Helen V; FitzPatrick, David R.; Barrett, Jeffrey C.; Hurles, Matthew E.

    2018-01-01

    We previously estimated that 42% of patients with severe developmental disorders carry pathogenic de novo mutations in coding sequences. The role of de novo mutations in regulatory elements affecting genes associated with developmental disorders, or other genes, has been essentially unexplored. We identified de novo mutations in three classes of putative regulatory elements in almost 8,000 patients with developmental disorders. Here we show that de novo mutations in highly evolutionarily conserved fetal brain-active elements are significantly and specifically enriched in neurodevelopmental disorders. We identified a significant twofold enrichment of recurrently mutated elements. We estimate that, genome-wide, 1-3% of patients without a diagnostic coding variant carry pathogenic de novo mutations in fetal brain-active regulatory elements and that only 0.15% of all possible mutations within highly conserved fetal brain-active elements cause neurodevelopmental disorders with a dominant mechanism. Our findings represent a robust estimate of the contribution of de novo mutations in regulatory elements to this genetically heterogeneous set of disorders, and emphasize the importance of combining functional and evolutionary evidence to identify regulatory causes of genetic disorders. PMID:29562236

  1. Orangutan Alu quiescence reveals possible source element: support for ancient backseat drivers

    PubMed Central

    2012-01-01

    Background Sequence analysis of the orangutan genome revealed that recent proliferative activity of Alu elements has been uncharacteristically quiescent in the Pongo (orangutan) lineage, compared with all previously studied primate genomes. With relatively few young polymorphic insertions, the genomic landscape of the orangutan seemed like the ideal place to search for a driver, or source element, of Alu retrotransposition. Results Here we report the identification of a nearly pristine insertion possessing all the known putative hallmarks of a retrotranspositionally competent Alu element. It is located in an intronic sequence of the DGKB gene on chromosome 7 and is highly conserved in Hominidae (the great apes), but absent from Hylobatidae (gibbon and siamang). We provide evidence for the evolution of a lineage-specific subfamily of this shared Alu insertion in orangutans and possibly the lineage leading to humans. In the orangutan genome, this insertion contains three orangutan-specific diagnostic mutations which are characteristic of the youngest polymorphic Alu subfamily, AluYe5b5_Pongo. In the Homininae lineage (human, chimpanzee and gorilla), this insertion has acquired three different mutations which are also found in a single human-specific Alu insertion. Conclusions This seemingly stealth-like amplification, ongoing at a very low rate over millions of years of evolution, suggests that this shared insertion may represent an ancient backseat driver of Alu element expansion. PMID:22541534

  2. Orangutan Alu quiescence reveals possible source element: support for ancient backseat drivers.

    PubMed

    Walker, Jerilyn A; Konkel, Miriam K; Ullmer, Brygg; Monceaux, Christopher P; Ryder, Oliver A; Hubley, Robert; Smit, Arian Fa; Batzer, Mark A

    2012-04-30

    Sequence analysis of the orangutan genome revealed that recent proliferative activity of Alu elements has been uncharacteristically quiescent in the Pongo (orangutan) lineage, compared with all previously studied primate genomes. With relatively few young polymorphic insertions, the genomic landscape of the orangutan seemed like the ideal place to search for a driver, or source element, of Alu retrotransposition. Here we report the identification of a nearly pristine insertion possessing all the known putative hallmarks of a retrotranspositionally competent Alu element. It is located in an intronic sequence of the DGKB gene on chromosome 7 and is highly conserved in Hominidae (the great apes), but absent from Hylobatidae (gibbon and siamang). We provide evidence for the evolution of a lineage-specific subfamily of this shared Alu insertion in orangutans and possibly the lineage leading to humans. In the orangutan genome, this insertion contains three orangutan-specific diagnostic mutations which are characteristic of the youngest polymorphic Alu subfamily, AluYe5b5_Pongo. In the Homininae lineage (human, chimpanzee and gorilla), this insertion has acquired three different mutations which are also found in a single human-specific Alu insertion. This seemingly stealth-like amplification, ongoing at a very low rate over millions of years of evolution, suggests that this shared insertion may represent an ancient backseat driver of Alu element expansion.

  3. Comparative genomics reveals insights into avian genome evolution and adaptation.

    PubMed

    Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Jarvis, Erich D; Gilbert, M Thomas P; Wang, Jun

    2014-12-12

    Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits. Copyright © 2014, American Association for the Advancement of Science.

  4. A genomic view of 500 million years of cnidarian evolution

    PubMed Central

    Steele, Robert E.; David, Charles N.; Technau, Ulrich

    2010-01-01

    Cnidarians (corals, anemones, jellyfish, and hydras) are a diverse group of animals of interest to evolutionary biologists, ecologists, and developmental biologists. With the publication of the genome sequences of Hydra and Nematostella, whose last common ancestor was the stem cnidarian, we are beginning to see the genomic underpinnings of cnidarian biology. Cnidarians are known for the remarkable plasticity of their morphology and life cycles. This plasticity is reflected in the Hydra and Nematostella genomes, which differ to an exceptional degree in size, base composition, transposable element content, and gene conservation. We now know what cnidarian genomes are capable of doing given 500 million years; the next challenge is to understand how this genomic history has led to the striking diversity we see in cnidarians. PMID:21047698

  5. Comparative Analysis of the Genomes of Two Field Isolates of the Rice Blast Fungus Magnaporthe oryzae

    PubMed Central

    Li, Zhigang; Hu, Songnian; Yao, Nan; Dean, Ralph A.; Zhao, Wensheng; Shen, Mi; Zhang, Haiwang; Li, Chao; Liu, Liyuan; Cao, Lei; Xu, Xiaowen; Xing, Yunfei; Hsiang, Tom; Zhang, Ziding; Xu, Jin-Rong; Peng, You-Liang

    2012-01-01

    Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus. PMID:22876203

  6. Structure, replication efficiency and fragility of yeast ARS elements.

    PubMed

    Dhar, Manoj K; Sehgal, Shelly; Kaul, Sanjana

    2012-05-01

    DNA replication in eukaryotes initiates at specific sites known as origins of replication, or replicators. These replication origins occur throughout the genome, though the propensity of their occurrence depends on the type of organism. In eukaryotes, zones of initiation of replication spanning from about 100 to 50,000 base pairs have been reported. The characteristics of eukaryotic replication origins are best understood in the budding yeast Saccharomyces cerevisiae, where some autonomously replicating sequences, or ARS elements, confer origin activity. ARS elements are short DNA sequences of a few hundred base pairs, identified by their efficiency at initiating a replication event when cloned in a plasmid. ARS elements, although structurally diverse, maintain a basic structure composed of three domains, A, B and C. Domain A is comprised of a consensus sequence designated ACS (ARS consensus sequence), while the B domain has the DNA unwinding element and the C domain is important for DNA-protein interactions. Although there are ∼400 ARS elements in the yeast genome, not all of them are active origins of replication. Different groups within the genus Saccharomyces have ARS elements as components of replication origin. The present paper provides a comprehensive review of various aspects of ARSs, starting from their structural conservation to sequence thermodynamics. All significant and conserved functional sequence motifs within different types of ARS elements have been extensively described. Issues like silencing at ARSs, their inherent fragility and factors governing their replication efficiency have also been addressed. Progress in understanding crucial components associated with the replication machinery and timing at these ARS elements is discussed in the section entitled "The replicon revisited". Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  7. GREAM: A Web Server to Short-List Potentially Important Genomic Repeat Elements Based on Over-/Under-Representation in Specific Chromosomal Locations, Such as the Gene Neighborhoods, within or across 17 Mammalian Species

    PubMed Central

    Chandrashekar, Darshan Shimoga; Dey, Poulami; Acharya, Kshitish K.

    2015-01-01

    Background Genome-wide repeat sequences, such as LINEs, SINEs and LTRs share a considerable part of the mammalian nuclear genomes. These repeat elements seem to be important for multiple functions including the regulation of transcription initiation, alternative splicing and DNA methylation. But it is not possible to study all repeats and, hence, it would help to short-list before exploring their potential functional significance via experimental studies and/or detailed in silico analyses. Result We developed the ‘Genomic Repeat Element Analyzer for Mammals’ (GREAM) for analysis, screening and selection of potentially important mammalian genomic repeats. This web-server offers many novel utilities. For example, this is the only tool that can reveal a categorized list of specific types of transposons, retro-transposons and other genome-wide repetitive elements that are statistically over-/under-represented in regions around a set of genes, such as those expressed differentially in a disease condition. The output displays the position and frequency of identified elements within the specified regions. In addition, GREAM offers two other types of analyses of genomic repeat sequences: a) enrichment within chromosomal region(s) of interest, and b) comparative distribution across the neighborhood of orthologous genes. GREAM successfully short-listed a repeat element (MER20) known to contain functional motifs. In other case studies, we could use GREAM to short-list repetitive elements in the azoospermia factor a (AZFa) region of the human Y chromosome and those around the genes associated with rat liver injury. GREAM could also identify five over-represented repeats around some of the human and mouse transcription factor coding genes that had conserved expression patterns across the two species. Conclusion GREAM has been developed to provide an impetus to research on the role of repetitive sequences in mammalian genomes by offering easy selection of more interesting repeats in various contexts/regions. GREAM is freely available at http://resource.ibab.ac.in/GREAM/. PMID:26208093

  8. Small Deletion Variants Have Stable Breakpoints Commonly Associated with Alu Elements

    PubMed Central

    Coin, Lachlan J. M.; Steinfeld, Israel; Yakhini, Zohar; Sladek, Rob; Froguel, Philippe; Blakemore, Alexandra I. F.

    2008-01-01

    Copy number variants (CNVs) contribute significantly to human genomic variation, with over 5000 loci reported, covering more than 18% of the euchromatic human genome. Little is known, however, about the origin and stability of variants of different size and complexity. We investigated the breakpoints of 20 small, common deletions, representing a subset of those originally identified by array CGH, using Agilent microarrays, in 50 healthy French Caucasian subjects. By sequencing PCR products amplified using primers designed to span the deleted regions, we determined the exact size and genomic position of the deletions in all affected samples. For each deletion studied, all individuals carrying the deletion share identical upstream and downstream breakpoints at the sequence level, suggesting that the deletion event occurred just once and later became common in the population. This is supported by linkage disequilibrium (LD) analysis, which has revealed that most of the deletions studied are in moderate to strong LD with surrounding SNPs, and have conserved long-range haplotypes. Analysis of the sequences flanking the deletion breakpoints revealed an enrichment of microhomology at the breakpoint junctions. More significantly, we found an enrichment of Alu repeat elements, the overwhelming majority of which intersected deletion breakpoints at their poly-A tails. We found no enrichment of LINE elements or segmental duplications, in contrast to other reports. Sequence analysis revealed enrichment of a conserved motif in the sequences surrounding the deletion breakpoints, although whether this motif has any mechanistic role in the formation of some deletions has yet to be determined. Considered together with existing information on more complex inherited variant regions, and reports of de novo variants associated with autism, these data support the presence of different subgroups of CNV in the genome which may have originated through different mechanisms. PMID:18769679

  9. Partitioning heritability by functional annotation using genome-wide association summary statistics.

    PubMed

    Finucane, Hilary K; Bulik-Sullivan, Brendan; Gusev, Alexander; Trynka, Gosia; Reshef, Yakir; Loh, Po-Ru; Anttila, Verneri; Xu, Han; Zang, Chongzhi; Farh, Kyle; Ripke, Stephan; Day, Felix R; Purcell, Shaun; Stahl, Eli; Lindstrom, Sara; Perry, John R B; Okada, Yukinori; Raychaudhuri, Soumya; Daly, Mark J; Patterson, Nick; Neale, Benjamin M; Price, Alkes L

    2015-11-01

    Recent work has demonstrated that some functional categories of the genome contribute disproportionately to the heritability of complex diseases. Here we analyze a broad set of functional elements, including cell type-specific elements, to estimate their polygenic contributions to heritability in genome-wide association studies (GWAS) of 17 complex diseases and traits with an average sample size of 73,599. To enable this analysis, we introduce a new method, stratified LD score regression, for partitioning heritability from GWAS summary statistics while accounting for linked markers. This new method is computationally tractable at very large sample sizes and leverages genome-wide information. Our findings include a large enrichment of heritability in conserved regions across many traits, a very large immunological disease-specific enrichment of heritability in FANTOM5 enhancers and many cell type-specific enrichments, including significant enrichment of central nervous system cell types in the heritability of body mass index, age at menarche, educational attainment and smoking behavior.

  10. Placental Hypomethylation Is More Pronounced in Genomic Loci Devoid of Retroelements

    PubMed Central

    Chatterjee, Aniruddha; Macaulay, Erin C.; Rodger, Euan J.; Stockwell, Peter A.; Parry, Matthew F.; Roberts, Hester E.; Slatter, Tania L.; Hung, Noelyn A.; Devenish, Celia J.; Morison, Ian M.

    2016-01-01

    The human placenta is hypomethylated compared to somatic tissues. However, the degree and specificity of placental hypomethylation across the genome is unclear. We assessed genome-wide methylation of the human placenta and compared it to that of the neutrophil, a representative homogeneous somatic cell. We observed global hypomethylation in placenta (relative reduction of 22%) compared to neutrophils. Placental hypomethylation was pronounced in intergenic regions and gene bodies, while the unmethylated state of the promoter remained conserved in both tissues. For every class of repeat elements, the placenta showed lower methylation but the degree of hypomethylation differed substantially between these classes. However, some retroelements, especially the evolutionarily younger Alu elements, retained high levels of placental methylation. Surprisingly, nonretrotransposon-containing sequences showed a greater degree of placental hypomethylation than retrotransposons in every genomic element (intergenic, introns, and exons) except promoters. The differentially methylated fragments (DMFs) in placenta and neutrophils were enriched in gene-poor and CpG-poor regions. The placentally hypomethylated DMFs were enriched in genomic regions that are usually inactive, whereas hypermethylated DMFs were enriched in active regions. Hypomethylation of the human placenta is not specific to retroelements, indicating that the evolutionary advantages of placental hypomethylation go beyond those provided by expression of retrotransposons and retrogenes. PMID:27172225

  11. Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome.

    PubMed

    González, Leonardo Galindo; Deyholos, Michael K

    2012-11-21

    Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated.

  12. Identification, characterization and distribution of transposable elements in the flax (Linum usitatissimum L.) genome

    PubMed Central

    2012-01-01

    Background Flax (Linum usitatissimum L.) is an important crop for the production of bioproducts derived from its seed and stem fiber. Transposable elements (TEs) are widespread in plant genomes and are a key component of their evolution. The availability of a genome assembly of flax (Linum usitatissimum) affords new opportunities to explore the diversity of TEs and their relationship to genes and gene expression. Results Four de novo repeat identification algorithms (PILER, RepeatScout, LTR_finder and LTR_STRUC) were applied to the flax genome assembly. The resulting library of flax repeats was combined with the RepBase Viridiplantae division and used with RepeatMasker to identify TEs coverage in the genome. LTR retrotransposons were the most abundant TEs (17.2% genome coverage), followed by Long Interspersed Nuclear Element (LINE) retrotransposons (2.10%) and Mutator DNA transposons (1.99%). Comparison of putative flax TEs to flax transcript databases indicated that TEs are not highly expressed in flax. However, the presence of recent insertions, defined by 100% intra-element LTR similarity, provided evidence for recent TE activity. Spatial analysis showed TE-rich regions, gene-rich regions as well as regions with similar genes and TE density. Monte Carlo simulations for the 71 largest scaffolds (≥ 1 Mb each) did not show any regional differences in the frequency of TE overlap with gene coding sequences. However, differences between TE superfamilies were found in their proximity to genes. Genes within TE-rich regions also appeared to have lower transcript expression, based on EST abundance. When LTR elements were compared, Copia showed more diversity, recent insertions and conserved domains than the Gypsy, demonstrating their importance in genome evolution. Conclusions The calculated 23.06% TE coverage of the flax WGS assembly is at the low end of the range of TE coverages reported in other eudicots, although this estimate does not include TEs likely found in unassembled repetitive regions of the genome. Since enrichment for TEs in genomic regions was associated with reduced expression of neighbouring genes, and many members of the Copia LTR superfamily are inserted close to coding regions, we suggest Copia elements have a greater influence on recent flax genome evolution while Gypsy elements have become residual and highly mutated. PMID:23171245

  13. Lineage-specific genomics: Frequent birth and death in the human genome: The human genome contains many lineage-specific elements created by both sequence and functional turnover.

    PubMed

    Young, Robert S

    2016-07-01

    Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage-specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover - where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved - can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage-specific regions may play an important but previously underappreciated role in human biology and disease. © 2016 The Authors BioEssays Published by WILEY Periodicals, Inc.

  14. Non-functional plastid ndh gene fragments are present in the nuclear genome of Norway spruce (Picea abies L. Karsch): insights from in silico analysis of nuclear and organellar genomes.

    PubMed

    Ranade, Sonali Sachin; García-Gil, María Rosario; Rosselló, Josep A

    2016-04-01

    Many genes have been lost from the prokaryote plastidial genome during the early events of endosymbiosis in eukaryotes. Some of them were definitively lost, but others were relocated and functionally integrated to the host nuclear genomes through serial events of gene transfer during plant evolution. In gymnosperms, plastid genome sequencing has revealed the loss of ndh genes from several species of Gnetales and Pinaceae, including Norway spruce (Picea abies). This study aims to trace the ndh genes in the nuclear and organellar Norway spruce genomes. The plastid genomes of higher plants contain 11 ndh genes which are homologues of mitochondrial genes encoding subunits of the proton-pumping NADH-dehydrogenase (nicotinamide adenine dinucleotide dehydrogenase) or complex I (electron transport chain). Ndh genes encode 11 NDH polypeptides forming the Ndh complex (analogous to complex I) which seems to be primarily involved in chloro-respiration processes. We considered ndh genes from the plastidial genome of four gymnosperms (Cryptomeria japonica, Cycas revoluta, Ginkgo biloba, Podocarpus totara) and a single angiosperm species (Arabidopsis thaliana) to trace putative homologs in the nuclear and organellar Norway spruce genomes using tBLASTn to assess the evolutionary fate of ndh genes in Norway spruce and to address their genomic location(s), structure, integrity and functionality. The results obtained from tBLASTn were subsequently analyzed by performing homology search for finding ndh specific conserved domains using conserved domain search. We report the presence of non-functional plastid ndh gene fragments, excepting ndhE and ndhG genes, in the nuclear genome of Norway spruce. Regulatory transcriptional elements like promoters, TATA boxes and enhancers were detected in the upstream regions of some ndh fragments. We also found transposable elements in the flanking regions of few ndh fragments suggesting nuclear rearrangements in those regions. These evidences support the hypothesis that, at least in Picea, ndh translocations from the plastid to the nuclear genome have occurred, and that there might have been a functional machinery at some time during evolution to accommodate them within a nuclear-encoded environment, or attempts to form it.

  15. RegPrecise 3.0--a resource for genome-scale exploration of transcriptional regulation in bacteria.

    PubMed

    Novichkov, Pavel S; Kazakov, Alexey E; Ravcheev, Dmitry A; Leyn, Semen A; Kovaleva, Galina Y; Sutormin, Roman A; Kazanov, Marat D; Riehl, William; Arkin, Adam P; Dubchak, Inna; Rodionov, Dmitry A

    2013-11-01

    Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.

  16. Comparative mitochondrial genome analysis of Daphnis nerii and other lepidopteran insects reveals conserved mitochondrial genome organization and phylogenetic relationships

    PubMed Central

    Sun, Yu; Chen, Chen; Gao, Jin; Abbas, Muhammad Nadeem; Kausar, Saima; Qian, Cen; Wang, Lei; Wei, Guoqing; Zhu, Bao-Jian

    2017-01-01

    In the present study, the complete sequence of the mitochondrial genome (mitogenome) of Daphnis nerii (Lepidoptera: Sphingidae) is described. The mitogenome (15,247 bp) of D.nerii encodes13 protein-coding genes (PCGs), 22 transfer RNA genes (tRNAs), two ribosomal RNA genes (rRNAs) and an adenine (A) + thymine (T)-rich region. Its gene complement and order is similar to that of other sequenced lepidopterans. The 12 PCGs initiated by ATN codons except for cytochrome c oxidase subunit 1 (cox1) gene that is seemingly initiated by the CGA codon as documented in other insect mitogenomes. Four of the 13 PCGs have the incomplete termination codon T, while the remainder terminated with the canonical stop codon. This mitogenome has six major intergenic spacers, with the exception of A+T-rich region, spanning at least 10 bp. The A+T-rich region is 351 bp long, and contains some conserved regions, including ‘ATAGA’ motif followed by a 17 bp poly-T stretch, a microsatellite-like element (AT)9 and also a poly-A element. Phylogenetic analyses based on 13 PCGs using maximum likelihood (ML) and Bayesian inference (BI) revealed that D. nerii resides in the Sphingidae family. PMID:28598968

  17. Conservative site-specific and single-copy transgenesis in human LINE-1 elements

    PubMed Central

    Vijaya Chandra, Shree Harsha; Makhija, Harshyaa; Peter, Sabrina; Myint Wai, Cho Mar; Li, Jinming; Zhu, Jindong; Ren, Zhonglu; D'Alcontres, Martina Stagno; Siau, Jia Wei; Chee, Sharon; Ghadessy, Farid John; Dröge, Peter

    2016-01-01

    Genome engineering of human cells plays an important role in biotechnology and molecular medicine. In particular, insertions of functional multi-transgene cassettes into suitable endogenous sequences will lead to novel applications. Although several tools have been exploited in this context, safety issues such as cytotoxicity, insertional mutagenesis and off-target cleavage together with limitations in cargo size/expression often compromise utility. Phage λ integrase (Int) is a transgenesis tool that mediates conservative site-specific integration of 48 kb DNA into a safe harbor site of the bacterial genome. Here, we show that an Int variant precisely recombines large episomes into a sequence, termed attH4X, found in 1000 human Long INterspersed Elements-1 (LINE-1). We demonstrate single-copy transgenesis through attH4X-targeting in various cell lines including hESCs, with the flexibility of selecting clones according to transgene performance and downstream applications. This is exemplified with pluripotency reporter cassettes and constitutively expressed payloads that remain functional in LINE1-targeted hESCs and differentiated progenies. Furthermore, LINE-1 targeting does not induce DNA damage-response or chromosomal aberrations, and neither global nor localized endogenous gene expression is substantially affected. Hence, this simple transgene addition tool should become particularly useful for applications that require engineering of the human genome with multi-transgenes. PMID:26673710

  18. Physical Mapping and Refinement of the Painted Turtle Genome (Chrysemys picta) Inform Amniote Genome Evolution and Challenge Turtle-Bird Chromosomal Conservation

    PubMed Central

    Badenhorst, Daleen; Hillier, LaDeana W.; Literman, Robert; Montiel, Eugenia Elisabet; Radhakrishnan, Srihari; Shen, Yingjia; Minx, Patrick; Janes, Daniel E.; Warren, Wesley C.; Edwards, Scott V.; Valenzuela, Nicole

    2015-01-01

    Comparative genomics continues illuminating amniote genome evolution, but for many lineages our understanding remains incomplete. Here, we refine the assembly (CPI 3.0.3 NCBI AHGY00000000.2) and develop a cytogenetic map of the painted turtle (Chrysemys picta—CPI) genome, the first in turtles and in vertebrates with temperature-dependent sex determination. A comparison of turtle genomes with those of chicken, selected nonavian reptiles, and human revealed shared and novel genomic features, such as numerous chromosomal rearrangements. The largest conserved syntenic blocks between birds and turtles exist in four macrochromosomes, whereas rearrangements were evident in these and other chromosomes, disproving that turtles and birds retain fully conserved macrochromosomes for greater than 300 Myr. C-banding revealed large heterochromatic blocks in the centromeric region of only few chromosomes. The nucleolar-organizing region (NOR) mapped to a single CPI microchromosome, whereas in some turtles and lizards the NOR maps to nonhomologous sex-chromosomes, thus revealing independent translocations of the NOR in various reptilian lineages. There was no evidence for recent chromosomal fusions as interstitial telomeric-DNA was absent. Some repeat elements (CR1-like, Gypsy) were enriched in the centromeres of five chromosomes, whereas others were widespread in the CPI genome. Bacterial artificial chromosome (BAC) clones were hybridized to 18 of the 25 CPI chromosomes and anchored to a G-banded ideogram. Several CPI sex-determining genes mapped to five chromosomes, and homology was detected between yet other CPI autosomes and the globally nonhomologous sex chromosomes of chicken, other turtles, and squamates, underscoring the independent evolution of vertebrate sex-determining mechanisms. PMID:26108489

  19. Physical Mapping and Refinement of the Painted Turtle Genome (Chrysemys picta) Inform Amniote Genome Evolution and Challenge Turtle-Bird Chromosomal Conservation.

    PubMed

    Badenhorst, Daleen; Hillier, LaDeana W; Literman, Robert; Montiel, Eugenia Elisabet; Radhakrishnan, Srihari; Shen, Yingjia; Minx, Patrick; Janes, Daniel E; Warren, Wesley C; Edwards, Scott V; Valenzuela, Nicole

    2015-06-24

    Comparative genomics continues illuminating amniote genome evolution, but for many lineages our understanding remains incomplete. Here, we refine the assembly (CPI 3.0.3 NCBI AHGY00000000.2) and develop a cytogenetic map of the painted turtle (Chrysemys picta-CPI) genome, the first in turtles and in vertebrates with temperature-dependent sex determination. A comparison of turtle genomes with those of chicken, selected nonavian reptiles, and human revealed shared and novel genomic features, such as numerous chromosomal rearrangements. The largest conserved syntenic blocks between birds and turtles exist in four macrochromosomes, whereas rearrangements were evident in these and other chromosomes, disproving that turtles and birds retain fully conserved macrochromosomes for greater than 300 Myr. C-banding revealed large heterochromatic blocks in the centromeric region of only few chromosomes. The nucleolar-organizing region (NOR) mapped to a single CPI microchromosome, whereas in some turtles and lizards the NOR maps to nonhomologous sex-chromosomes, thus revealing independent translocations of the NOR in various reptilian lineages. There was no evidence for recent chromosomal fusions as interstitial telomeric-DNA was absent. Some repeat elements (CR1-like, Gypsy) were enriched in the centromeres of five chromosomes, whereas others were widespread in the CPI genome. Bacterial artificial chromosome (BAC) clones were hybridized to 18 of the 25 CPI chromosomes and anchored to a G-banded ideogram. Several CPI sex-determining genes mapped to five chromosomes, and homology was detected between yet other CPI autosomes and the globally nonhomologous sex chromosomes of chicken, other turtles, and squamates, underscoring the independent evolution of vertebrate sex-determining mechanisms. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  20. The end of the LINE?: lack of recent L1 activity in a group of South American rodents.

    PubMed Central

    Casavant, N C; Scott, L; Cantrell, M A; Wiggins, L E; Baker, R J; Wichman, H A

    2000-01-01

    L1s (LINE-1: Long Interspersed Nuclear Element 1) are present in all mammals examined to date. They occur in both placental mammals and marsupials and thus are thought to have been present in the genome prior to the mammalian radiation. This unusual conservation of a transposable element family for over 100 million years has led to speculation that these elements provide an advantage to the genomes they inhabit. We have recently identified a group of South American rodents, including rice rats (Oryzomys), in which L1s appear to be quiescent or extinct. Several observations support this conclusion. First, genomic Southern blot analysis fails to reveal genus-specific bands in Oryzomys. Second, we were unable to find recently inserted elements. Procedures to enrich for young elements did not yield any with an intact open reading frame for reverse transcriptase; all elements isolated had numerous insertions, deletions, and stop codons. Phylogenetic analysis failed to yield species-specific clusters among the L1 elements isolated, and all Oryzomys sequences had numerous private mutations. Finally, in situ hybridization of L1 to Oryzomys chromosomes failed to reveal the characteristic L1 distribution in Oryzomys with either a homologous or heterologous probe. Thus, Oryzomys is a viable candidate for L1 extinction from a mammalian host. PMID:10747071

  1. Phylogenetic shadowing of primate sequences to find functional regions of the human genome.

    PubMed

    Boffelli, Dario; McAuliffe, Jon; Ovcharenko, Dmitriy; Lewis, Keith D; Ovcharenko, Ivan; Pachter, Lior; Rubin, Edward M

    2003-02-28

    Nonhuman primates represent the most relevant model organisms to understand the biology of Homo sapiens. The recent divergence and associated overall sequence conservation between individual members of this taxon have nonetheless largely precluded the use of primates in comparative sequence studies. We used sequence comparisons of an extensive set of Old World and New World monkeys and hominoids to identify functional regions in the human genome. Analysis of these data enabled the discovery of primate-specific gene regulatory elements and the demarcation of the exons of multiple genes. Much of the information content of the comprehensive primate sequence comparisons could be captured with a small subset of phylogenetically close primates. These results demonstrate the utility of intraprimate sequence comparisons to discover common mammalian as well as primate-specific functional elements in the human genome, which are unattainable through the evaluation of more evolutionarily distant species.

  2. Phylum-Level Conservation of Regulatory Information in Nematodes despite Extensive Non-coding Sequence Divergence

    PubMed Central

    Gordon, Kacy L.; Arthur, Robert K.; Ruvinsky, Ilya

    2015-01-01

    Gene regulatory information guides development and shapes the course of evolution. To test conservation of gene regulation within the phylum Nematoda, we compared the functions of putative cis-regulatory sequences of four sets of orthologs (unc-47, unc-25, mec-3 and elt-2) from distantly-related nematode species. These species, Caenorhabditis elegans, its congeneric C. briggsae, and three parasitic species Meloidogyne hapla, Brugia malayi, and Trichinella spiralis, represent four of the five major clades in the phylum Nematoda. Despite the great phylogenetic distances sampled and the extensive sequence divergence of nematode genomes, all but one of the regulatory elements we tested are able to drive at least a subset of the expected gene expression patterns. We show that functionally conserved cis-regulatory elements have no more extended sequence similarity to their C. elegans orthologs than would be expected by chance, but they do harbor motifs that are important for proper expression of the C. elegans genes. These motifs are too short to be distinguished from the background level of sequence similarity, and while identical in sequence they are not conserved in orientation or position. Functional tests reveal that some of these motifs contribute to proper expression. Our results suggest that conserved regulatory circuitry can persist despite considerable turnover within cis elements. PMID:26020930

  3. A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data.

    PubMed

    Bertl, Johanna; Guo, Qianyun; Juul, Malene; Besenbacher, Søren; Nielsen, Morten Muhlig; Hornshøj, Henrik; Pedersen, Jakob Skou; Hobolth, Asger

    2018-04-19

    Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.

  4. The (r)evolution of SINE versus LINE distributions in primate genomes: Sex chromosomes are important

    PubMed Central

    Kvikstad, Erika M.; Makova, Kateryna D.

    2010-01-01

    The densities of transposable elements (TEs) in the human genome display substantial variation both within individual chromosomes and among chromosome types (autosomes and the two sex chromosomes). Finding an explanation for this variability has been challenging, especially in light of genome landscapes unique to the sex chromosomes. Here, using a multiple regression framework, we investigate primate Alu and L1 densities shaped by regional genome features and location on a particular chromosome type. As a result of our analysis, first, we build statistical models explaining up to 79% and 44% of variation in Alu and L1 element density, respectively. Second, we analyze sex chromosome versus autosome TE densities corrected for regional genomic effects. We discover that sex-chromosome bias in Alu and L1 distributions not only persists after accounting for these effects, but even presents differences in patterns, confirming preferential Alu integration in the male germline, yet likely integration of L1s in both male and female germlines or in early embryogenesis. Additionally, our models reveal that local base composition (measured by GC content and density of L1 target sites) and natural selection (inferred via density of most conserved elements) are significant to predicting densities of L1s. Interestingly, measurements of local double-stranded breaks (a 13-mer associated with genome instability) strongly correlate with densities of Alu elements; little evidence was found for the role of recombination-driven deletion in driving TE distributions over evolutionary time. Thus, Alu and L1 densities have been influenced by the combination of distinct local genome landscapes and the unique evolutionary dynamics of sex chromosomes. PMID:20219940

  5. A genomic view of 500 million years of cnidarian evolution.

    PubMed

    Steele, Robert E; David, Charles N; Technau, Ulrich

    2011-01-01

    Cnidarians (corals, anemones, jellyfish and hydras) are a diverse group of animals of interest to evolutionary biologists, ecologists and developmental biologists. With the publication of the genome sequences of Hydra and Nematostella, whose last common ancestor was the stem cnidarian, researchers are beginning to see the genomic underpinnings of cnidarian biology. Cnidarians are known for the remarkable plasticity of their morphology and life cycles. This plasticity is reflected in the Hydra and Nematostella genomes, which differ to an exceptional degree in size, base composition, transposable element content and gene conservation. It is now known what cnidarian genomes, given 500 million years, are capable of; as we discuss here, the next challenge is to understand how this genomic history has led to the striking diversity seen in this group. Copyright © 2010 Elsevier Ltd. All rights reserved.

  6. The genome sequence of taurine cattle: a window to ruminant biology and evolution.

    PubMed

    Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi

    2009-04-24

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.

  7. The Genome of the Obligate Intracellular Parasite Trachipleistophora hominis: New Insights into Microsporidian Genome Dynamics and Reductive Evolution

    PubMed Central

    Heinz, Eva; Williams, Tom A.; Nakjang, Sirintra; Noël, Christophe J.; Swan, Daniel C.; Goldberg, Alina V.; Harris, Simon R.; Weinmaier, Thomas; Markert, Stephanie; Becher, Dörte; Bernhardt, Jörg; Dagan, Tal; Hacker, Christian; Lucocq, John M.; Schweder, Thomas; Rattei, Thomas; Hall, Neil; Hirt, Robert P.; Embley, T. Martin

    2012-01-01

    The dynamics of reductive genome evolution for eukaryotes living inside other eukaryotic cells are poorly understood compared to well-studied model systems involving obligate intracellular bacteria. Here we present 8.5 Mb of sequence from the genome of the microsporidian Trachipleistophora hominis, isolated from an HIV/AIDS patient, which is an outgroup to the smaller compacted-genome species that primarily inform ideas of evolutionary mode for these enormously successful obligate intracellular parasites. Our data provide detailed information on the gene content, genome architecture and intergenic regions of a larger microsporidian genome, while comparative analyses allowed us to infer genomic features and metabolism of the common ancestor of the species investigated. Gene length reduction and massive loss of metabolic capacity in the common ancestor was accompanied by the evolution of novel microsporidian-specific protein families, whose conservation among microsporidians, against a background of reductive evolution, suggests they may have important functions in their parasitic lifestyle. The ancestor had already lost many metabolic pathways but retained glycolysis and the pentose phosphate pathway to provide cytosolic ATP and reduced coenzymes, and it had a minimal mitochondrion (mitosome) making Fe-S clusters but not ATP. It possessed bacterial-like nucleotide transport proteins as a key innovation for stealing host-generated ATP, the machinery for RNAi, key elements of the early secretory pathway, canonical eukaryotic as well as microsporidian-specific regulatory elements, a diversity of repetitive and transposable elements, and relatively low average gene density. Microsporidian genome evolution thus appears to have proceeded in at least two major steps: an ancestral remodelling of the proteome upon transition to intracellular parasitism that involved reduction but also selective expansion, followed by a secondary compaction of genome architecture in some, but not all, lineages. PMID:23133373

  8. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

    PubMed

    Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

    2003-07-04

    The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.

  9. A conserved role for the ESCRT membrane budding complex in LINE retrotransposition

    PubMed Central

    Dong, Chun; Han, Jeffrey S.

    2017-01-01

    Long interspersed nuclear element-1s (LINE-1s, or L1s) are an active family of retrotransposable elements that continue to mutate mammalian genomes. Despite the large contribution of L1 to mammalian genome evolution, we do not know where active L1 particles (particles in the process of retrotransposition) are located in the cell, or how they move towards the nucleus, the site of L1 reverse transcription. Using a yeast model of LINE retrotransposition, we identified ESCRT (endosomal sorting complex required for transport) as a critical complex for LINE retrotransposition, and verified that this interaction is conserved for human L1. ESCRT interacts with L1 via a late domain motif, and this interaction facilitates L1 replication. Loss of the L1/ESCRT interaction does not impair RNP formation or enzymatic activity, but leads to loss of retrotransposition and reduced L1 endonuclease activity in the nucleus. This study highlights the importance of the ESCRT complex in the L1 life cycle and suggests an unusual mode for L1 RNP trafficking. PMID:28586350

  10. Feather development genes and associated regulatory innovation predate the origin of Dinosauria.

    PubMed

    Lowe, Craig B; Clarke, Julia A; Baker, Allan J; Haussler, David; Edwards, Scott V

    2015-01-01

    The evolution of avian feathers has recently been illuminated by fossils and the identification of genes involved in feather patterning and morphogenesis. However, molecular studies have focused mainly on protein-coding genes. Using comparative genomics and more than 600,000 conserved regulatory elements, we show that patterns of genome evolution in the vicinity of feather genes are consistent with a major role for regulatory innovation in the evolution of feathers. Rates of innovation at feather regulatory elements exhibit an extended period of innovation with peaks in the ancestors of amniotes and archosaurs. We estimate that 86% of such regulatory elements and 100% of the nonkeratin feather gene set were present prior to the origin of Dinosauria. On the branch leading to modern birds, we detect a strong signal of regulatory innovation near insulin-like growth factor binding protein (IGFBP) 2 and IGFBP5, which have roles in body size reduction, and may represent a genomic signature for the miniaturization of dinosaurian body size preceding the origin of flight. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  11. Seedling lethality in Nicotiana plumbaginifolia conferred by Ds transposable element insertion into a plant-specific gene.

    PubMed

    Majira, Amel; Domin, Monique; Grandjean, Olivier; Gofron, Krystyna; Houba-Hérin, Nicole

    2002-10-01

    A seedling lethal mutant of Nicotiana plumbaginifolia (sdl-1) was isolated by transposon tagging using a maize Dissociation (Ds) element. The insertion mutation was produced by direct co-transformation of protoplasts with two plasmids: one containing Ds and a second with an Ac transposase gene. sdl-1 seedlings exhibit several phenotypes: swollen organs, short hypocotyls in light and dark conditions, and enlarged and multinucleated cells, that altogether suggest cell growth defects. Mutant cells are able to proliferate under in vitro culture conditions. Genomic DNA sequences bordering the transposon were used to recover cDNA from the normal allele. Complementation of the mutant phenotype with the cDNA confirmed that the transposon had caused the mutation. The Ds element was inserted into the first exon of the open reading frame and the homozygous mutant lacked detectable transcript. Phenocopies of the mutant were obtained by an antisense approach. SDL-1 encodes a novel protein found in several plant genomes but apparently missingfrom animal and fungal genomes; the protein is highly conserved and has a potential plastid targeting motif.

  12. Identification and Classification of Conserved RNA Secondary Structures in the Human Genome

    PubMed Central

    Pedersen, Jakob Skou; Bejerano, Gill; Siepel, Adam; Rosenbloom, Kate; Lindblad-Toh, Kerstin; Lander, Eric S; Kent, Jim; Miller, Webb; Haussler, David

    2006-01-01

    The discoveries of microRNAs and riboswitches, among others, have shown functional RNAs to be biologically more important and genomically more prevalent than previously anticipated. We have developed a general comparative genomics method based on phylogenetic stochastic context-free grammars for identifying functional RNAs encoded in the human genome and used it to survey an eight-way genome-wide alignment of the human, chimpanzee, mouse, rat, dog, chicken, zebra-fish, and puffer-fish genomes for deeply conserved functional RNAs. At a loose threshold for acceptance, this search resulted in a set of 48,479 candidate RNA structures. This screen finds a large number of known functional RNAs, including 195 miRNAs, 62 histone 3′UTR stem loops, and various types of known genetic recoding elements. Among the highest-scoring new predictions are 169 new miRNA candidates, as well as new candidate selenocysteine insertion sites, RNA editing hairpins, RNAs involved in transcript auto regulation, and many folds that form singletons or small functional RNA families of completely unknown function. While the rate of false positives in the overall set is difficult to estimate and is likely to be substantial, the results nevertheless provide evidence for many new human functional RNAs and present specific predictions to facilitate their further characterization. PMID:16628248

  13. Genomic organization, complete sequence, and chromosomal location of the gene for human eotaxin (SCYA11), an eosinophil-specific CC chemokine

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garcia-Zepeda, E.A.; Sarafi, M.N.; Luster, A.D.

    1997-05-01

    Eotaxin is a CC chemokine that is a specific chemoattractant for eosinophils and is implicated in the pathogenesis of eosinophilic inflammatory diseases, such as asthma. We describe the genomic organization, complete sequence, including 1354 bp 5{prime} of the RNA initiation site, and chromosomal localization of the human eotaxin gene. Fluorescence in situ hybridization analysis localized eotaxin to human chromosome 17, in the region q21.1-q21.2, and the human gene name SCYA11 was assigned. We also present the 5{prime} flanking sequence of the mouse eotaxin gene and have identified several regulatory elements that are conserved between the murine and the human promoters.more » In particular, the presence of elements such as NF-{Kappa}B, interferon-{gamma} response element, and glucocorticoid response element may explain the observed regulation of the eotaxin gene by cytokines and glucocorticoids. 17 refs., 4 figs., 1 tab.« less

  14. Conserved structure and expression of hsp70 paralogs in teleost fishes.

    PubMed

    Metzger, David C H; Hemmer-Hansen, Jakob; Schulte, Patricia M

    2016-06-01

    The cytosolic 70KDa heat shock proteins (Hsp70s) are widely used as biomarkers of environmental stress in ecological and toxicological studies in fish. Here we analyze teleost genome sequences to show that two genes encoding inducible hsp70s (hsp70-1 and hsp70-2) are likely present in all teleost fish. Phylogenetic and synteny analyses indicate that hsp70-1 and hsp70-2 are distinct paralogs that originated prior to the diversification of the teleosts. The promoters of both genes contain a TATA box and conserved heat shock elements (HSEs), but unlike mammalian HSP70s, both genes contain an intron in the 5' UTR. The hsp70-2 gene has undergone tandem duplication in several species. In addition, many other teleost genome assemblies have multiple copies of hsp70-2 present on separate, small, genomic scaffolds. To verify that these represent poorly assembled tandem duplicates, we cloned the genomic region surrounding hsp70-2 in Fundulus heteroclitus and showed that the hsp70-2 gene copies that are on separate scaffolds in the genome assembly are arranged as tandem duplicates. Real-time quantitative PCR of F. heteroclitus genomic DNA indicates that four copies of the hsp70-2 gene are likely present in the F. heteroclitus genome. Comparison of expression patterns in F. heteroclitus and Gasterosteus aculeatus demonstrates that hsp70-2 has a higher fold increase than hsp70-1 following heat shock in gill but not in muscle tissue, revealing a conserved difference in expression patterns between isoforms and tissues. These data indicate that ecological and toxicological studies using hsp70 as a biomarker in teleosts should take this complexity into account. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Protection of CpG islands from DNA methylation is DNA-encoded and evolutionarily conserved

    PubMed Central

    Long, Hannah K.; King, Hamish W.; Patient, Roger K.; Odom, Duncan T.; Klose, Robert J.

    2016-01-01

    DNA methylation is a repressive epigenetic modification that covers vertebrate genomes. Regions known as CpG islands (CGIs), which are refractory to DNA methylation, are often associated with gene promoters and play central roles in gene regulation. Yet how CGIs in their normal genomic context evade the DNA methylation machinery and whether these mechanisms are evolutionarily conserved remains enigmatic. To address these fundamental questions we exploited a transchromosomic animal model and genomic approaches to understand how the hypomethylated state is formed in vivo and to discover whether mechanisms governing CGI formation are evolutionarily conserved. Strikingly, insertion of a human chromosome into mouse revealed that promoter-associated CGIs are refractory to DNA methylation regardless of host species, demonstrating that DNA sequence plays a central role in specifying the hypomethylated state through evolutionarily conserved mechanisms. In contrast, elements distal to gene promoters exhibited more variable methylation between host species, uncovering a widespread dependence on nucleotide frequency and occupancy of DNA-binding transcription factors in shaping the DNA methylation landscape away from gene promoters. This was exemplified by young CpG rich lineage-restricted repeat sequences that evaded DNA methylation in the absence of co-evolved mechanisms targeting methylation to these sequences, and species specific DNA binding events that protected against DNA methylation in CpG poor regions. Finally, transplantation of mouse chromosomal fragments into the evolutionarily distant zebrafish uncovered the existence of a mechanistically conserved and DNA-encoded logic which shapes CGI formation across vertebrate species. PMID:27084945

  16. PwRn1, a novel Ty3/gypsy-like retrotransposon of Paragonimus westermani: molecular characters and its differentially preserved mobile potential according to host chromosomal polyploidy.

    PubMed

    Bae, Young-An; Ahn, Jong-Sook; Kim, Seon-Hee; Rhyu, Mun-Gan; Kong, Yoon; Cho, Seung-Yull

    2008-10-14

    Retrotransposons have been known to involve in the remodeling and evolution of host genome. These reverse transcribing elements, which show a complex evolutionary pathway with diverse intermediate forms, have been comprehensively analyzed from a wide range of host genomes, while the information remains limited to only a few species in the phylum Platyhelminthes. A LTR retrotransposon and its homologs with a strong phylogenetic affinity toward CsRn1 of Clonorchis sinensis were isolated from a trematode parasite Paragonimus westermani via a degenerate PCR method and from an insect species Anopheles gambiae by in silico analysis of the whole mosquito genome, respectively. These elements, designated PwRn1 and AgCR-1 - AgCR-14 conserved unique features including a t-RNATrp primer binding site and the unusual CHCC signature of Gag proteins. Their flanking LTRs displayed >97% nucleotide identities and thus, these elements were likely to have expanded recently in the trematode and insect genomes. They evolved heterogeneous expression strategies: a single fused ORF, two separate ORFs with an identical reading frame and two ORFs overlapped by -1 frameshifting. Phylogenetic analyses suggested that the elements with the separate ORFs had evolved from an ancestral form(s) with the overlapped ORFs. The mobile potential of PwRn1 was likely to be maintained differentially in association with the karyotype of host genomes, as was examined by the presence/absence of intergenomic polymorphism and mRNA transcripts. Our results on the structural diversity of CsRn1-like elements can provide a molecular tool to dissect a more detailed evolutionary episode of LTR retrotransposons. The PwRn1-associated genomic polymorphism, which is substantial in diploids, will also be informative in addressing genomic diversification following inter-/intra-specific hybridization in P. westermani populations.

  17. Structured populations of Sulfolobus acidocaldarius with susceptibility to mobile genetic elements

    USGS Publications Warehouse

    Anderson, Rika E.; Kouris, Angela; Seward, Christopher H.; Campbell, Kate M.; Whitaker, Rachel J.

    2017-01-01

    The impact of a structured environment on genome evolution can be determined through comparative population genomics of species that live in the same habitat. Recent work comparing three genome sequences of Sulfolobus acidocaldarius suggested that highly structured, extreme, hot spring environments do not limit dispersal of this thermoacidophile, in contrast to other co-occurring Sulfolobus species. Instead, a high level of conservation among these three S. acidocaldarius genomes was hypothesized to result from rapid, global-scale dispersal promoted by low susceptibility to viruses that sets S. acidocaldarius apart from its sister Sulfolobus species. To test this hypothesis, we conducted a comparative analysis of 47 genomes of S. acidocaldarius from spatial and temporal sampling of two hot springs in Yellowstone National Park. While we confirm the low diversity in the core genome, we observe differentiation among S. acidocaldarius populations, likely resulting from low migration among hot spring “islands” in Yellowstone National Park. Patterns of genomic variation indicate that differing geological contexts result in the elimination or preservation of diversity among differentiated populations. We observe multiple deletions associated with a large genomic island rich in glycosyltransferases, differential integrations of the Sulfolobus turreted icosahedral virus, as well as two different plasmid elements. These data demonstrate that neither rapid dispersal nor lack of mobile genetic elements result in low diversity in the S. acidocaldariusgenomes. We suggest instead that significant differences in the recent evolutionary history, or the intrinsic evolutionary rates, of sister Sulfolobusspecies result in the relatively low diversity of the S. acidocaldarius genome.

  18. Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi.

    PubMed

    Batty, Elizabeth M; Chaemchuen, Suwittra; Blacksell, Stuart; Richards, Allen L; Paris, Daniel; Bowden, Rory; Chan, Caroline; Lachumanan, Ramkumar; Day, Nicholas; Donnelly, Peter; Chen, Swaine; Salje, Jeanne

    2018-06-01

    Orientia tsutsugamushi is a clinically important but neglected obligate intracellular bacterial pathogen of the Rickettsiaceae family that causes the potentially life-threatening human disease scrub typhus. In contrast to the genome reduction seen in many obligate intracellular bacteria, early genetic studies of Orientia have revealed one of the most repetitive bacterial genomes sequenced to date. The dramatic expansion of mobile elements has hampered efforts to generate complete genome sequences using short read sequencing methodologies, and consequently there have been few studies of the comparative genomics of this neglected species. We report new high-quality genomes of O. tsutsugamushi, generated using PacBio single molecule long read sequencing, for six strains: Karp, Kato, Gilliam, TA686, UT76 and UT176. In comparative genomics analyses of these strains together with existing reference genomes from Ikeda and Boryong strains, we identify a relatively small core genome of 657 genes, grouped into core gene islands and separated by repeat regions, and use the core genes to infer the first whole-genome phylogeny of Orientia. Complete assemblies of multiple Orientia genomes verify initial suggestions that these are remarkable organisms. They have larger genomes compared with most other Rickettsiaceae, with widespread amplification of repeat elements and massive chromosomal rearrangements between strains. At the gene level, Orientia has a relatively small set of universally conserved genes, similar to other obligate intracellular bacteria, and the relative expansion in genome size can be accounted for by gene duplication and repeat amplification. Our study demonstrates the utility of long read sequencing to investigate complex bacterial genomes and characterise genomic variation.

  19. Cross-cohort analysis identifies a TEAD4 ↔ MYCN positive-feedback loop as the core regulatory element of high-risk neuroblastoma. | Office of Cancer Genomics

    Cancer.gov

    High-risk neuroblastomas show a paucity of recurrent somatic mutations at diagnosis. As a result, the molecular basis for this aggressive phenotype remains elusive. Recent progress in regulatory network analysis helped us elucidate disease-driving mechanisms downstream of genomic alterations, including recurrent chromosomal alterations. Our analysis identified three molecular subtypes of high-risk neuroblastomas, consistent with chromosomal alterations, and identified subtype-specific master regulator (MR) proteins that were conserved across independent cohorts.

  20. Differential DNA methylation at conserved non-genic elements and evidence for transgenerational inheritance following developmental exposure to mono(2-ethylhexyl) phthalate and 5-azacytidine in zebrafish.

    PubMed

    Kamstra, Jorke H; Sales, Liana Bastos; Aleström, Peter; Legler, Juliette

    2017-01-01

    Exposure to environmental stressors during development may lead to latent and transgenerational adverse health effects. To understand the role of DNA methylation in these effects, we used zebrafish as a vertebrate model to investigate heritable changes in DNA methylation following chemical-induced stress during early development. We exposed zebrafish embryos to non-embryotoxic concentrations of the biologically active phthalate metabolite mono(2-ethylhexyl) phthalate (MEHP, 30 µM) and the DNA methyltransferase 1 inhibitor 5-azacytidine (5AC, 10 µM). Direct, latent and transgenerational effects on DNA methylation were assessed using global, genome-wide and locus-specific DNA methylation analyses. Following direct exposure in zebrafish embryos from 0 to 6 days post-fertilization, genome-wide analysis revealed a multitude of differentially methylated regions, strongly enriched at conserved non-genic elements for both compounds. Pathways involved in adipogenesis were enriched with the putative obesogenic compound MEHP. Exposure to 5AC resulted in enrichment of pathways involved in embryonic development and transgenerational effects on larval body length. Locus-specific methylation analysis of 10 differentially methylated sites revealed six of these loci differentially methylated in sperm sampled from adult zebrafish exposed during development to 5AC, and in first and second generation larvae. With MEHP, consistent changes were found at 2 specific loci in first and second generation larvae. Our results suggest a functional role for DNA methylation on cis-regulatory conserved elements following developmental exposure to compounds. Effects on these regions are potentially transferred to subsequent generations.

  1. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements.

    PubMed

    De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

    2015-12-01

    The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  2. BLSSpeller: exhaustive comparative discovery of conserved cis-regulatory elements

    PubMed Central

    De Witte, Dieter; Van de Velde, Jan; Decap, Dries; Van Bel, Michiel; Audenaert, Pieter; Demeester, Piet; Dhoedt, Bart; Vandepoele, Klaas; Fostier, Jan

    2015-01-01

    Motivation: The accurate discovery and annotation of regulatory elements remains a challenging problem. The growing number of sequenced genomes creates new opportunities for comparative approaches to motif discovery. Putative binding sites are then considered to be functional if they are conserved in orthologous promoter sequences of multiple related species. Existing methods for comparative motif discovery usually rely on pregenerated multiple sequence alignments, which are difficult to obtain for more diverged species such as plants. As a consequence, misaligned regulatory elements often remain undetected. Results: We present a novel algorithm that supports both alignment-free and alignment-based motif discovery in the promoter sequences of related species. Putative motifs are exhaustively enumerated as words over the IUPAC alphabet and screened for conservation using the branch length score. Additionally, a confidence score is established in a genome-wide fashion. In order to take advantage of a cloud computing infrastructure, the MapReduce programming model is adopted. The method is applied to four monocotyledon plant species and it is shown that high-scoring motifs are significantly enriched for open chromatin regions in Oryza sativa and for transcription factor binding sites inferred through protein-binding microarrays in O.sativa and Zea mays. Furthermore, the method is shown to recover experimentally profiled ga2ox1-like KN1 binding sites in Z.mays. Availability and implementation: BLSSpeller was written in Java. Source code and manual are available at http://bioinformatics.intec.ugent.be/blsspeller Contact: Klaas.Vandepoele@psb.vib-ugent.be or jan.fostier@intec.ugent.be Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26254488

  3. Feature co-localization landscape of the human genome

    PubMed Central

    Ng, Siu-Kin; Hu, Taobo; Long, Xi; Chan, Cheuk-Hin; Tsang, Shui-Ying; Xue, Hong

    2016-01-01

    Although feature co-localizations could serve as useful guide-posts to genome architecture, a comprehensive and quantitative feature co-localization map of the human genome has been lacking. Herein we show that, in contrast to the conventional bipartite division of genomic sequences into genic and inter-genic regions, pairwise co-localizations of forty-two genomic features in the twenty-two autosomes based on 50-kb to 2,000-kb sequence windows indicate a tripartite zonal architecture comprising Genic zones enriched with gene-related features and Alu-elements; Proximal zones enriched with MIR- and L2-elements, transcription-factor-binding-sites (TFBSs), and conserved-indels (CIDs); and Distal zones enriched with L1-elements. Co-localizations between single-nucleotide-polymorphisms (SNPs) and copy-number-variations (CNVs) reveal a fraction of sequence windows displaying steeply enhanced levels of SNPs, CNVs and recombination rates that point to active adaptive evolution in such pathways as immune response, sensory perceptions, and cognition. The strongest positive co-localization observed between TFBSs and CIDs suggests a regulatory role of CIDs in cooperation with TFBSs. The positive co-localizations of cancer somatic CNVs (CNVT) with all Proximal zone and most Genic zone features, in contrast to the distinctly more restricted co-localizations exhibited by germline CNVs (CNVG), reveal disparate distributions of CNVTs and CNVGs indicative of dissimilarity in their underlying mechanisms. PMID:26854351

  4. PCR and magnetic bead-mediated target capture for the isolation of short interspersed nucleotide elements in fishes.

    PubMed

    Liu, Dong; Zhu, Guoli; Tang, Wenqiao; Yang, Jinquan; Guo, Hongyi

    2012-01-01

    Short interspersed nucleotide elements (SINEs), a type of retrotransposon, are widely distributed in various genomes with multiple copies arranged in different orientations, and cause changes to genes and genomes during evolutionary history. This can provide the basis for determining genome diversity, genetic variation and molecular phylogeny, etc. SINE DNA is transcribed into RNA by polymerase III from an internal promoter, which is composed of two conserved boxes, box A and box B. Here we present an approach to isolate novel SINEs based on these promoter elements. Box A of a SINE is obtained via PCR with only one primer identical to box B (B-PCR). Box B and its downstream sequence are acquired by PCR with one primer corresponding to box A (A-PCR). The SINE clone produced by A-PCR is selected as a template to label a probe with biotin. The full-length SINEs are isolated from the genomic pool through complex capture using the biotinylated probe bound to magnetic particles. Using this approach, a novel SINE family, Cn-SINE, from the genomes of Coilia nasus, was isolated. The members are 180-360 bp long. Sequence homology suggests that Cn-SINEs evolved from a leucine tRNA gene. This is the first report of a tRNA(Leu)-related SINE obtained without the use of a genomic library or inverse PCR. These results provide new insights into the origin of SINEs.

  5. PCR and Magnetic Bead-Mediated Target Capture for the Isolation of Short Interspersed Nucleotide Elements in Fishes

    PubMed Central

    Liu, Dong; Zhu, Guoli; Tang, Wenqiao; Yang, Jinquan; Guo, Hongyi

    2012-01-01

    Short interspersed nucleotide elements (SINEs), a type of retrotransposon, are widely distributed in various genomes with multiple copies arranged in different orientations, and cause changes to genes and genomes during evolutionary history. This can provide the basis for determining genome diversity, genetic variation and molecular phylogeny, etc. SINE DNA is transcribed into RNA by polymerase III from an internal promoter, which is composed of two conserved boxes, box A and box B. Here we present an approach to isolate novel SINEs based on these promoter elements. Box A of a SINE is obtained via PCR with only one primer identical to box B (B-PCR). Box B and its downstream sequence are acquired by PCR with one primer corresponding to box A (A-PCR). The SINE clone produced by A-PCR is selected as a template to label a probe with biotin. The full-length SINEs are isolated from the genomic pool through complex capture using the biotinylated probe bound to magnetic particles. Using this approach, a novel SINE family, Cn-SINE, from the genomes of Coilia nasus, was isolated. The members are 180–360 bp long. Sequence homology suggests that Cn-SINEs evolved from a leucine tRNA gene. This is the first report of a tRNALeu-related SINE obtained without the use of a genomic library or inverse PCR. These results provide new insights into the origin of SINEs. PMID:22408437

  6. WordCluster: detecting clusters of DNA words and genomic elements

    PubMed Central

    2011-01-01

    Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981

  7. Conservation of transcription factor binding events predicts gene expression across species

    PubMed Central

    Hemberg, Martin; Kreiman, Gabriel

    2011-01-01

    Recent technological advances have made it possible to determine the genome-wide binding sites of transcription factors (TFs). Comparisons across species have suggested a relatively low degree of evolutionary conservation of experimentally defined TF binding events (TFBEs). Using binding data for six different TFs in hepatocytes and embryonic stem cells from human and mouse, we demonstrate that evolutionary conservation of TFBEs within orthologous proximal promoters is closely linked to function, defined as expression of the target genes. We show that (i) there is a significantly higher degree of conservation of TFBEs when the target gene is expressed in both species; (ii) there is increased conservation of binding events for groups of TFs compared to individual TFs; and (iii) conserved TFBEs have a greater impact on the expression of their target genes than non-conserved ones. These results link conservation of structural elements (TFBEs) to conservation of function (gene expression) and suggest a higher degree of functional conservation than implied by previous studies. PMID:21622661

  8. The elephant shark methylome reveals conservation of epigenetic regulation across jawed vertebrates

    PubMed Central

    Peat, Julian R.; Ortega-Recalde, Oscar; Kardailsky, Olga; Hore, Timothy A.

    2017-01-01

    Background: Methylation of CG dinucleotides constitutes a critical system of epigenetic memory in bony vertebrates, where it modulates gene expression and suppresses transposon activity. The genomes of studied vertebrates are pervasively hypermethylated, with the exception of regulatory elements such as transcription start sites (TSSs), where the presence of methylation is associated with gene silencing. This system is not found in the sparsely methylated genomes of invertebrates, and establishing how it arose during early vertebrate evolution is impeded by a paucity of epigenetic data from basal vertebrates. Methods: We perform whole-genome bisulfite sequencing to generate the first genome-wide methylation profiles of a cartilaginous fish, the elephant shark Callorhinchus milii. Employing these to determine the elephant shark methylome structure and its relationship with expression, we compare this with higher vertebrates and an invertebrate chordate using published methylation and transcriptome data.  Results: Like higher vertebrates, the majority of elephant shark CG sites are highly methylated, and methylation is abundant across the genome rather than patterned in the mosaic configuration of invertebrates. This global hypermethylation includes transposable elements and the bodies of genes at all expression levels. Significantly, we document an inverse relationship between TSS methylation and expression in the elephant shark, supporting the presence of the repressive regulatory architecture shared by higher vertebrates. Conclusions: Our demonstration that methylation patterns in a cartilaginous fish are characteristic of higher vertebrates imply the conservation of this epigenetic modification system across jawed vertebrates separated by 465 million years of evolution. In addition, these findings position the elephant shark as a valuable model to explore the evolutionary history and function of vertebrate methylation. PMID:28580133

  9. The elephant shark methylome reveals conservation of epigenetic regulation across jawed vertebrates.

    PubMed

    Peat, Julian R; Ortega-Recalde, Oscar; Kardailsky, Olga; Hore, Timothy A

    2017-01-01

    Methylation of CG dinucleotides constitutes a critical system of epigenetic memory in bony vertebrates, where it modulates gene expression and suppresses transposon activity. The genomes of studied vertebrates are pervasively hypermethylated, with the exception of regulatory elements such as transcription start sites (TSSs), where the presence of methylation is associated with gene silencing. This system is not found in the sparsely methylated genomes of invertebrates, and establishing how it arose during early vertebrate evolution is impeded by a paucity of epigenetic data from basal vertebrates.  We perform whole-genome bisulfite sequencing to generate the first genome-wide methylation profiles of a cartilaginous fish, the elephant shark Callorhinchus milii . Employing these to determine the elephant shark methylome structure and its relationship with expression, we compare this with higher vertebrates and an invertebrate chordate using published methylation and transcriptome data.  Results: Like higher vertebrates, the majority of elephant shark CG sites are highly methylated, and methylation is abundant across the genome rather than patterned in the mosaic configuration of invertebrates. This global hypermethylation includes transposable elements and the bodies of genes at all expression levels. Significantly, we document an inverse relationship between TSS methylation and expression in the elephant shark, supporting the presence of the repressive regulatory architecture shared by higher vertebrates.  Our demonstration that methylation patterns in a cartilaginous fish are characteristic of higher vertebrates imply the conservation of this epigenetic modification system across jawed vertebrates separated by 465 million years of evolution. In addition, these findings position the elephant shark as a valuable model to explore the evolutionary history and function of vertebrate methylation.

  10. Microarray-based comparative genomic profiling of reference strains and selected Canadian field isolates of Actinobacillus pleuropneumoniae

    PubMed Central

    Gouré, Julien; Findlay, Wendy A; Deslandes, Vincent; Bouevitch, Anne; Foote, Simon J; MacInnes, Janet I; Coulton, James W; Nash, John HE; Jacques, Mario

    2009-01-01

    Background Actinobacillus pleuropneumoniae, the causative agent of porcine pleuropneumonia, is a highly contagious respiratory pathogen that causes severe losses to the swine industry worldwide. Current commercially-available vaccines are of limited value because they do not induce cross-serovar immunity and do not prevent development of the carrier state. Microarray-based comparative genomic hybridizations (M-CGH) were used to estimate whole genomic diversity of representative Actinobacillus pleuropneumoniae strains. Our goal was to identify conserved genes, especially those predicted to encode outer membrane proteins and lipoproteins because of their potential for the development of more effective vaccines. Results Using hierarchical clustering, our M-CGH results showed that the majority of the genes in the genome of the serovar 5 A. pleuropneumoniae L20 strain were conserved in the reference strains of all 15 serovars and in representative field isolates. Fifty-eight conserved genes predicted to encode for outer membrane proteins or lipoproteins were identified. As well, there were several clusters of diverged or absent genes including those associated with capsule biosynthesis, toxin production as well as genes typically associated with mobile elements. Conclusion Although A. pleuropneumoniae strains are essentially clonal, M-CGH analysis of the reference strains of the fifteen serovars and representative field isolates revealed several classes of genes that were divergent or absent. Not surprisingly, these included genes associated with capsule biosynthesis as the capsule is associated with sero-specificity. Several of the conserved genes were identified as candidates for vaccine development, and we conclude that M-CGH is a valuable tool for reverse vaccinology. PMID:19239696

  11. Conserved loci of leaf and stem rust fungi of wheat share synteny interrupted by lineage-specific influx of repeat elements

    USDA-ARS?s Scientific Manuscript database

    Background: Wheat leaf rust (Puccinia triticina Eriks; Pt) and stem rust (P. graminis f.sp. tritici; Pgt) are significant economic pathogens having similar host ranges and life cycles, but different alternate hosts. The Pt genome, currently estimated at 135 Mb, is significantly larger than Pgt, at ...

  12. End-to-end crosstalk within the hepatitis C virus genome mediates the conformational switch of the 3′X-tail region

    PubMed Central

    Romero-López, Cristina; Barroso-delJesus, Alicia; García-Sacristán, Ana; Briones, Carlos; Berzal-Herranz, Alfredo

    2014-01-01

    The hepatitis C virus (HCV) RNA genome contains multiple structurally conserved domains that make long-distance RNA–RNA contacts important in the establishment of viral infection. Microarray antisense oligonucelotide assays, improved dimethyl sulfate probing methods and 2′ acylation chemistry (selective 2’-hydroxyl acylation and primer extension, SHAPE) showed the folding of the genomic RNA 3′ end to be regulated by the internal ribosome entry site (IRES) element via direct RNA–RNA interactions. The essential cis-acting replicating element (CRE) and the 3′X-tail region adopted different 3D conformations in the presence and absence of the genomic RNA 5′ terminus. Further, the structural transition in the 3′X-tail from the replication-competent conformer (consisting of three stem-loops) to the dimerizable form (with two stem-loops), was found to depend on the presence of both the IRES and the CRE elements. Complex interplay between the IRES, the CRE and the 3′X-tail region would therefore appear to occur. The preservation of this RNA–RNA interacting network, and the maintenance of the proper balance between different contacts, may play a crucial role in the switch between different steps of the HCV cycle. PMID:24049069

  13. Multidrug-resistant enterococci lack CRISPR-cas.

    PubMed

    Palmer, Kelli L; Gilmore, Michael S

    2010-10-12

    Clustered, regularly interspaced short palindromic repeats (CRISPR) provide bacteria and archaea with sequence-specific, acquired defense against plasmids and phage. Because mobile elements constitute up to 25% of the genome of multidrug-resistant (MDR) enterococci, it was of interest to examine the codistribution of CRISPR and acquired antibiotic resistance in enterococcal lineages. A database was built from 16 Enterococcus faecalis draft genome sequences to identify commonalities and polymorphisms in the location and content of CRISPR loci. With this data set, we were able to detect identities between CRISPR spacers and sequences from mobile elements, including pheromone-responsive plasmids and phage, suggesting that CRISPR regulates the flux of these elements through the E. faecalis species. Based on conserved locations of CRISPR and CRISPR-cas loci and the discovery of a new CRISPR locus with associated functional genes, CRISPR3-cas, we screened additional E. faecalis strains for CRISPR content, including isolates predating the use of antibiotics. We found a highly significant inverse correlation between the presence of a CRISPR-cas locus and acquired antibiotic resistance in E. faecalis, and examination of an additional eight E. faecium genomes yielded similar results for that species. A mechanism for CRISPR-cas loss in E. faecalis was identified. The inverse relationship between CRISPR-cas and antibiotic resistance suggests that antibiotic use inadvertently selects for enterococcal strains with compromised genome defense.

  14. Whole genome and transcriptome analyses of environmental antibiotic sensitive and multi-resistant Pseudomonas aeruginosa isolates exposed to waste water and tap water

    PubMed Central

    Schwartz, Thomas; Armant, Olivier; Bretschneider, Nancy; Hahn, Alexander; Kirchen, Silke; Seifert, Martin; Dötsch, Andreas

    2015-01-01

    The fitness of sensitive and resistant Pseudomonas aeruginosa in different aquatic environments depends on genetic capacities and transcriptional regulation. Therefore, an antibiotic-sensitive isolate PA30 and a multi-resistant isolate PA49 originating from waste waters were compared via whole genome and transcriptome Illumina sequencing after exposure to municipal waste water and tap water. A number of different genomic islands (e.g. PAGIs, PAPIs) were identified in the two environmental isolates beside the highly conserved core genome. Exposure to tap water and waste water exhibited similar transcriptional impacts on several gene clusters (antibiotic and metal resistance, genetic mobile elements, efflux pumps) in both environmental P. aeruginosa isolates. The MexCD-OprJ efflux pump was overexpressed in PA49 in response to waste water. The expression of resistance genes, genetic mobile elements in PA49 was independent from the water matrix. Consistently, the antibiotic sensitive strain PA30 did not show any difference in expression of the intrinsic resistance determinants and genetic mobile elements. Thus, the exposure of both isolates to polluted waste water and oligotrophic tap water resulted in similar expression profiles of mentioned genes. However, changes in environmental milieus resulted in rather unspecific transcriptional responses than selected and stimuli-specific gene regulation. PMID:25186059

  15. The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution

    PubMed Central

    Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.

    2010-01-01

    To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049

  16. A Conserved C-terminal Element in the Yeast Doa10 and Human MARCH6 Ubiquitin Ligases Required for Selective Substrate Degradation*

    PubMed Central

    Zattas, Dimitrios; Berk, Jason M.; Kreft, Stefan G.; Hochstrasser, Mark

    2016-01-01

    Specific proteins are modified by ubiquitin at the endoplasmic reticulum (ER) and are degraded by the proteasome, a process referred to as ER-associated protein degradation. In Saccharomyces cerevisiae, two principal ER-associated protein degradation ubiquitin ligases (E3s) reside in the ER membrane, Doa10 and Hrd1. The membrane-embedded Doa10 functions in the degradation of substrates in the ER membrane, nuclear envelope, cytoplasm, and nucleoplasm. How most E3 ligases, including Doa10, recognize their protein substrates remains poorly understood. Here we describe a previously unappreciated but highly conserved C-terminal element (CTE) in Doa10; this cytosolically disposed 16-residue motif follows the final transmembrane helix. A conserved CTE asparagine residue is required for ubiquitylation and degradation of a subset of Doa10 substrates. Such selectivity suggests that the Doa10 CTE is involved in substrate discrimination and not general ligase function. Functional conservation of the CTE was investigated in the human ortholog of Doa10, MARCH6 (TEB4), by analyzing MARCH6 autoregulation of its own degradation. Mutation of the conserved Asn residue (N890A) in the MARCH6 CTE stabilized the normally short lived enzyme to the same degree as a catalytically inactivating mutation (C9A). We also report the localization of endogenous MARCH6 to the ER using epitope tagging of the genomic MARCH6 locus by clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-mediated genome editing. These localization and CTE analyses support the inference that MARCH6 and Doa10 are functionally similar. Moreover, our results with the yeast enzyme suggest that the CTE is involved in the recognition and/or ubiquitylation of specific protein substrates. PMID:27068744

  17. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lang, Daniel; Ullrich, Kristian K.; Murat, Florent

    Here, the draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome–scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene– and TE–rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono–centric with peaks of a class of Copia elements potentially coinciding with centromeres. Genemore » body methylation is evident in 5.7% of the protein–coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure–based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant–specific cell growth and tissue organization. The P. patens genome lacks the TE–rich pericentromeric and gene–rich distal regions typical for most flowering plant genomes. More non–seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.« less

  18. The Physcomitrella patens chromosome-scale assembly reveals moss genome structure and evolution

    DOE PAGES

    Lang, Daniel; Ullrich, Kristian K.; Murat, Florent; ...

    2017-12-13

    Here, the draft genome of the moss model, Physcomitrella patens, comprised approximately 2000 unordered scaffolds. In order to enable analyses of genome structure and evolution we generated a chromosome–scale genome assembly using genetic linkage as well as (end) sequencing of long DNA fragments. We find that 57% of the genome comprises transposable elements (TEs), some of which may be actively transposing during the life cycle. Unlike in flowering plant genomes, gene– and TE–rich regions show an overall even distribution along the chromosomes. However, the chromosomes are mono–centric with peaks of a class of Copia elements potentially coinciding with centromeres. Genemore » body methylation is evident in 5.7% of the protein–coding genes, typically coinciding with low GC and low expression. Some giant virus insertions are transcriptionally active and might protect gametes from viral infection via siRNA mediated silencing. Structure–based detection methods show that the genome evolved via two rounds of whole genome duplications (WGDs), apparently common in mosses but not in liverworts and hornworts. Several hundred genes are present in colinear regions conserved since the last common ancestor of plants. These syntenic regions are enriched for functions related to plant–specific cell growth and tissue organization. The P. patens genome lacks the TE–rich pericentromeric and gene–rich distal regions typical for most flowering plant genomes. More non–seed plant genomes are needed to unravel how plant genomes evolve, and to understand whether the P. patens genome structure is typical for mosses or bryophytes.« less

  19. In silico Analysis of 3′-End-Processing Signals in Aspergillus oryzae Using Expressed Sequence Tags and Genomic Sequencing Data

    PubMed Central

    Tanaka, Mizuki; Sakai, Yoshifumi; Yamada, Osamu; Shintani, Takahiro; Gomi, Katsuya

    2011-01-01

    To investigate 3′-end-processing signals in Aspergillus oryzae, we created a nucleotide sequence data set of the 3′-untranslated region (3′ UTR) plus 100 nucleotides (nt) sequence downstream of the poly(A) site using A. oryzae expressed sequence tags and genomic sequencing data. This data set comprised 1065 sequences derived from 1042 unique genes. The average 3′ UTR length in A. oryzae was 241 nt, which is greater than that in yeast but similar to that in plants. The 3′ UTR and 100 nt sequence downstream of the poly(A) site is notably U-rich, while the region located 15–30 nt upstream of the poly(A) site is markedly A-rich. The most frequently found hexanucleotide in this A-rich region is AAUGAA, although this sequence accounts for only 6% of all transcripts. These data suggested that A. oryzae has no highly conserved sequence element equivalent to AAUAAA, a mammalian polyadenylation signal. We identified that putative 3′-end-processing signals in A. oryzae, while less well conserved than those in mammals, comprised four sequence elements: the furthest upstream U-rich element, A-rich sequence, cleavage site, and downstream U-rich element flanking the cleavage site. Although these putative 3′-end-processing signals are similar to those in yeast and plants, some notable differences exist between them. PMID:21586533

  20. Parallel evolution of chordate cis-regulatory code for development.

    PubMed

    Doglio, Laura; Goode, Debbie K; Pelleri, Maria C; Pauls, Stefan; Frabetti, Flavia; Shimeld, Sebastian M; Vavouri, Tanya; Elgar, Greg

    2013-11-01

    Urochordates are the closest relatives of vertebrates and at the larval stage, possess a characteristic bilateral chordate body plan. In vertebrates, the genes that orchestrate embryonic patterning are in part regulated by highly conserved non-coding elements (CNEs), yet these elements have not been identified in urochordate genomes. Consequently the evolution of the cis-regulatory code for urochordate development remains largely uncharacterised. Here, we use genome-wide comparisons between C. intestinalis and C. savignyi to identify putative urochordate cis-regulatory sequences. Ciona conserved non-coding elements (ciCNEs) are associated with largely the same key regulatory genes as vertebrate CNEs. Furthermore, some of the tested ciCNEs are able to activate reporter gene expression in both zebrafish and Ciona embryos, in a pattern that at least partially overlaps that of the gene they associate with, despite the absence of sequence identity. We also show that the ability of a ciCNE to up-regulate gene expression in vertebrate embryos can in some cases be localised to short sub-sequences, suggesting that functional cross-talk may be defined by small regions of ancestral regulatory logic, although functional sub-sequences may also be dispersed across the whole element. We conclude that the structure and organisation of cis-regulatory modules is very different between vertebrates and urochordates, reflecting their separate evolutionary histories. However, functional cross-talk still exists because the same repertoire of transcription factors has likely guided their parallel evolution, exploiting similar sets of binding sites but in different combinations.

  1. Genome sequencing and comparative genomics of honey bee microsporidia, Nosema apis reveal novel insights into host-parasite interactions.

    PubMed

    Chen, Yan ping; Pettis, Jeffery S; Zhao, Yan; Liu, Xinyue; Tallon, Luke J; Sadzewicz, Lisa D; Li, Renhua; Zheng, Huoqing; Huang, Shaokang; Zhang, Xuan; Hamilton, Michele C; Pernal, Stephen F; Melathopoulos, Andony P; Yan, Xianghe; Evans, Jay D

    2013-07-05

    The microsporidia parasite Nosema contributes to the steep global decline of honey bees that are critical pollinators of food crops. There are two species of Nosema that have been found to infect honey bees, Nosema apis and N. ceranae. Genome sequencing of N. apis and comparative genome analysis with N. ceranae, a fully sequenced microsporidia species, reveal novel insights into host-parasite interactions underlying the parasite infections. We applied the whole-genome shotgun sequencing approach to sequence and assemble the genome of N. apis which has an estimated size of 8.5 Mbp. We predicted 2,771 protein- coding genes and predicted the function of each putative protein using the Gene Ontology. The comparative genomic analysis led to identification of 1,356 orthologs that are conserved between the two Nosema species and genes that are unique characteristics of the individual species, thereby providing a list of virulence factors and new genetic tools for studying host-parasite interactions. We also identified a highly abundant motif in the upstream promoter regions of N. apis genes. This motif is also conserved in N. ceranae and other microsporidia species and likely plays a role in gene regulation across the microsporidia. The availability of the N. apis genome sequence is a significant addition to the rapidly expanding body of microsprodian genomic data which has been improving our understanding of eukaryotic genome diversity and evolution in a broad sense. The predicted virulent genes and transcriptional regulatory elements are potential targets for innovative therapeutics to break down the life cycle of the parasite.

  2. High quality de novo sequencing and assembly of the Saccharomyces arboricolus genome

    PubMed Central

    2013-01-01

    Background Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate. Results Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny. Conclusions We suggest that the genome of S. arboricolus will be useful in future comparative genomics analysis of the Saccharomyces sensu stricto yeasts. PMID:23368932

  3. How and why should we implement genomics into conservation?

    PubMed Central

    McMahon, Barry J; Teeling, Emma C; Höglund, Jacob

    2014-01-01

    Conservation genetics has provided important information into the dynamics of endangered populations. The rapid development of genomic methods has posed an important question, namely where do genetics and genomics sit in relation to their application in the conservation of species? Although genetics can answer a number of relevant questions related to conservation, the argument for the application of genomics is not yet fully exploited. Here, we explore the transition and rationale for the move from genetic to genomic research in conservation biology and the utility of such research. We explore the idea of a ‘conservation prior’ and how this can be determined by genomic data and used in the management of populations. We depict three different conservation scenarios and describe how genomic data can drive management action in each situation. We conclude that the most effective applications of genomics will be to inform stakeholders with the aim of avoiding ‘emergency room conservation’. PMID:25553063

  4. Mobile Element Evolution Playing Jigsaw—SINEs in Gastropod and Bivalve Mollusks

    PubMed Central

    Matetovici, Irina; Sajgo, Szilard; Ianc, Bianca; Ochis, Cornelia; Bulzu, Paul; Popescu, Octavian; Damert, Annette

    2016-01-01

    SINEs (Short INterspersed Elements) are widely distributed among eukaryotes. Some SINE families are organized in superfamilies characterized by a shared central domain. These central domains are conserved across species, classes, and even phyla. Here we report the identification of two novel such superfamilies in the genomes of gastropod and bivalve mollusks. The central conserved domain of the first superfamily is present in SINEs in Caenogastropoda and Vetigastropoda as well as in all four subclasses of Bivalvia. We designated the domain MESC (Romanian for MElc—snail and SCoica—mussel) because it appears to be restricted to snails and mussels. The second superfamily is restricted to Caenogastropoda. Its central conserved domain—Snail—is related to the Nin-DC domain. Furthermore, we provide evidence that a 40-bp subdomain of the SINE V-domain is conserved in SINEs in mollusks and arthropods. It is predicted to form a stable stem-loop structure that is preserved in the context of the overall SINE RNA secondary structure in invertebrates. Our analysis also recovered short retrotransposons with a Long INterspersed Element (LINE)-derived 5′ end. These share the body and/or the tail with transfer RNA (tRNA)-derived SINEs within and across species. Finally, we identified CORE SINEs in gastropods and bivalves—extending the distribution range of this superfamily. PMID:26739168

  5. Complete genome sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii.

    PubMed

    Khatri, Indu; Tomar, Rajul; Ganesan, K; Prasad, G S; Subramanian, Srikrishna

    2017-03-23

    The probiotic yeast, Saccharomyces boulardii (Sb) is known to be effective against many gastrointestinal disorders and antibiotic-associated diarrhea. To understand molecular basis of probiotic-properties ascribed to Sb we determined the complete genomes of two strains of Sb i.e. Biocodex and unique28 and the draft genomes for three other Sb strains that are marketed as probiotics in India. We compared these genomes with 145 strains of S. cerevisiae (Sc) to understand genome-level similarities and differences between these yeasts. A distinctive feature of Sb from other Sc is absence of Ty elements Ty1, Ty3, Ty4 and associated LTR. However, we could identify complete Ty2 and Ty5 elements in Sb. The genes for hexose transporters HXT11 and HXT9, and asparagine-utilization are absent in all Sb strains. We find differences in repeat periods and copy numbers of repeats in flocculin genes that are likely related to the differential adhesion of Sb as compared to Sc. Core-proteome based taxonomy places Sb strains along with wine strains of Sc. We find the introgression of five genes from Z. bailii into the chromosome IV of Sb and wine strains of Sc. Intriguingly, genes involved in conferring known probiotic properties to Sb are conserved in most Sc strains.

  6. Comparative genomic analysis of three Leishmania species that cause diverse human disease

    PubMed Central

    Peacock, Christopher S; Seeger, Kathy; Harris, David; Murphy, Lee; Ruiz, Jeronimo C; Quail, Michael A; Peters, Nick; Adlem, Ellen; Tivey, Adrian; Aslett, Martin; Kerhornou, Arnaud; Ivens, Alasdair; Fraser, Audrey; Rajandream, Marie-Adele; Carver, Tim; Norbertczak, Halina; Chillingworth, Tracey; Hance, Zahra; Jagels, Kay; Moule, Sharon; Ormond, Doug; Rutter, Simon; Squares, Rob; Whitehead, Sally; Rabbinowitsch, Ester; Arrowsmith, Claire; White, Brian; Thurston, Scott; Bringaud, Frédéric; Baldauf, Sandra L; Faulconbridge, Adam; Jeffares, Daniel; Depledge, Daniel P; Oyola, Samuel O; Hilley, James D; Brito, Loislene O; Tosi, Luiz R O; Barrell, Barclay; Cruz, Angela K; Mottram, Jeremy C; Smith, Deborah F; Berriman, Matthew

    2008-01-01

    Leishmania parasites cause a broad spectrum of clinical disease. Here we report the sequencing of the genomes of two species of Leishmania: Leishmania infantum and Leishmania braziliensis. The comparison of these sequences with the published genome of Leishmania major reveals marked conservation of synteny and identifies only ∼200 genes with a differential distribution between the three species. L. braziliensis, contrary to Leishmania species examined so far, possesses components of a putative RNA-mediated interference pathway, telomere-associated transposable elements and spliced leader–associated SLACS retrotransposons. We show that pseudogene formation and gene loss are the principal forces shaping the different genomes. Genes that are differentially distributed between the species encode proteins implicated in host-pathogen interactions and parasite survival in the macrophage. PMID:17572675

  7. D-MATRIX: A web tool for constructing weight matrix of conserved DNA motifs

    PubMed Central

    Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

    2009-01-01

    Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D­MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co­regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos­box cis­regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D­MATRIX tool is accessible through the CIMAP domain network. Availability http://203.190.147.116/dmatrix/ PMID:19759861

  8. D-MATRIX: a web tool for constructing weight matrix of conserved DNA motifs.

    PubMed

    Sen, Naresh; Mishra, Manoj; Khan, Feroz; Meena, Abha; Sharma, Ashok

    2009-07-27

    Despite considerable efforts to date, DNA motif prediction in whole genome remains a challenge for researchers. Currently the genome wide motif prediction tools required either direct pattern sequence (for single motif) or weight matrix (for multiple motifs). Although there are known motif pattern databases and tools for genome level prediction but no tool for weight matrix construction. Considering this, we developed a D-MATRIX tool which predicts the different types of weight matrix based on user defined aligned motif sequence set and motif width. For retrieval of known motif sequences user can access the commonly used databases such as TFD, RegulonDB, DBTBS, Transfac. D-MATRIX program uses a simple statistical approach for weight matrix construction, which can be converted into different file formats according to user requirement. It provides the possibility to identify the conserved motifs in the co-regulated genes or whole genome. As example, we successfully constructed the weight matrix of LexA transcription factor binding site with the help of known sos-box cis-regulatory elements in Deinococcus radiodurans genome. The algorithm is implemented in C-Sharp and wrapped in ASP.Net to maintain a user friendly web interface. D-MATRIX tool is accessible through the CIMAP domain network. http://203.190.147.116/dmatrix/

  9. Divergent evolutionary rates in vertebrate and mammalian specific conserved non-coding elements (CNEs) in echolocating mammals.

    PubMed

    Davies, Kalina T J; Tsagkogeorga, Georgia; Rossiter, Stephen J

    2014-12-19

    The majority of DNA contained within vertebrate genomes is non-coding, with a certain proportion of this thought to play regulatory roles during development. Conserved Non-coding Elements (CNEs) are an abundant group of putative regulatory sequences that are highly conserved across divergent groups and thus assumed to be under strong selective constraint. Many CNEs may contain regulatory factor binding sites, and their frequent spatial association with key developmental genes - such as those regulating sensory system development - suggests crucial roles in regulating gene expression and cellular patterning. Yet surprisingly little is known about the molecular evolution of CNEs across diverse mammalian taxa or their role in specific phenotypic adaptations. We examined 3,110 vertebrate-specific and ~82,000 mammalian-specific CNEs across 19 and 9 mammalian orders respectively, and tested for changes in the rate of evolution of CNEs located in the proximity of genes underlying the development or functioning of auditory systems. As we focused on CNEs putatively associated with genes underlying the development/functioning of auditory systems, we incorporated echolocating taxa in our dataset because of their highly specialised and derived auditory systems. Phylogenetic reconstructions of concatenated CNEs broadly recovered accepted mammal relationships despite high levels of sequence conservation. We found that CNE substitution rates were highest in rodents and lowest in primates, consistent with previous findings. Comparisons of CNE substitution rates from several genomic regions containing genes linked to auditory system development and hearing revealed differences between echolocating and non-echolocating taxa. Wider taxonomic sampling of four CNEs associated with the homeobox genes Hmx2 and Hmx3 - which are required for inner ear development - revealed family-wise variation across diverse bat species. Specifically within one family of echolocating bats that utilise frequency-modulated echolocation calls varying widely in frequency and intensity high levels of sequence divergence were found. Levels of selective constraint acting on CNEs differed both across genomic locations and taxa, with observed variation in substitution rates of CNEs among bat species. More work is needed to determine whether this variation can be linked to echolocation, and wider taxonomic sampling is necessary to fully document levels of conservation in CNEs across diverse taxa.

  10. Inferring Selective Constraint from Population Genomic Data Suggests Recent Regulatory Turnover in the Human Brain

    PubMed Central

    Schrider, Daniel R.; Kern, Andrew D.

    2015-01-01

    The comparative genomics revolution of the past decade has enabled the discovery of functional elements in the human genome via sequence comparison. While that is so, an important class of elements, those specific to humans, is entirely missed by searching for sequence conservation across species. Here we present an analysis based on variation data among human genomes that utilizes a supervised machine learning approach for the identification of human-specific purifying selection in the genome. Using only allele frequency information from the complete low-coverage 1000 Genomes Project data set in conjunction with a support vector machine trained from known functional and nonfunctional portions of the genome, we are able to accurately identify portions of the genome constrained by purifying selection. Our method identifies previously known human-specific gains or losses of function and uncovers many novel candidates. Candidate targets for gain and loss of function along the human lineage include numerous putative regulatory regions of genes essential for normal development of the central nervous system, including a significant enrichment of gain of function events near neurotransmitter receptor genes. These results are consistent with regulatory turnover being a key mechanism in the evolution of human-specific characteristics of brain development. Finally, we show that the majority of the genome is unconstrained by natural selection currently, in agreement with what has been estimated from phylogenetic methods but in sharp contrast to estimates based on transcriptomics or other high-throughput functional methods. PMID:26590212

  11. Protection of CpG islands from DNA methylation is DNA-encoded and evolutionarily conserved.

    PubMed

    Long, Hannah K; King, Hamish W; Patient, Roger K; Odom, Duncan T; Klose, Robert J

    2016-08-19

    DNA methylation is a repressive epigenetic modification that covers vertebrate genomes. Regions known as CpG islands (CGIs), which are refractory to DNA methylation, are often associated with gene promoters and play central roles in gene regulation. Yet how CGIs in their normal genomic context evade the DNA methylation machinery and whether these mechanisms are evolutionarily conserved remains enigmatic. To address these fundamental questions we exploited a transchromosomic animal model and genomic approaches to understand how the hypomethylated state is formed in vivo and to discover whether mechanisms governing CGI formation are evolutionarily conserved. Strikingly, insertion of a human chromosome into mouse revealed that promoter-associated CGIs are refractory to DNA methylation regardless of host species, demonstrating that DNA sequence plays a central role in specifying the hypomethylated state through evolutionarily conserved mechanisms. In contrast, elements distal to gene promoters exhibited more variable methylation between host species, uncovering a widespread dependence on nucleotide frequency and occupancy of DNA-binding transcription factors in shaping the DNA methylation landscape away from gene promoters. This was exemplified by young CpG rich lineage-restricted repeat sequences that evaded DNA methylation in the absence of co-evolved mechanisms targeting methylation to these sequences, and species specific DNA binding events that protected against DNA methylation in CpG poor regions. Finally, transplantation of mouse chromosomal fragments into the evolutionarily distant zebrafish uncovered the existence of a mechanistically conserved and DNA-encoded logic which shapes CGI formation across vertebrate species. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Ultraconserved elements are associated with homeostatic control of splicing regulators by alternative splicing and nonsense-mediated decay

    PubMed Central

    Ni, Julie Z.; Grate, Leslie; Donohue, John Paul; Preston, Christine; Nobida, Naomi; O’Brien, Georgeann; Shiue, Lily; Clark, Tyson A.; Blume, John E.; Ares, Manuel

    2007-01-01

    Many alternative splicing events create RNAs with premature stop codons, suggesting that alternative splicing coupled with nonsense-mediated decay (AS-NMD) may regulate gene expression post-transcriptionally. We tested this idea in mice by blocking NMD and measuring changes in isoform representation using splicing-sensitive microarrays. We found a striking class of highly conserved stop codon-containing exons whose inclusion renders the transcript sensitive to NMD. A genomic search for additional examples identified >50 such exons in genes with a variety of functions. These exons are unusually frequent in genes that encode splicing activators and are unexpectedly enriched in the so-called “ultraconserved” elements in the mammalian lineage. Further analysis show that NMD of mRNAs for splicing activators such as SR proteins is triggered by splicing activation events, whereas NMD of the mRNAs for negatively acting hnRNP proteins is triggered by splicing repression, a polarity consistent with widespread homeostatic control of splicing regulator gene expression. We suggest that the extreme genomic conservation surrounding these regulatory splicing events within splicing factor genes demonstrates the evolutionary importance of maintaining tightly tuned homeostasis of RNA-binding protein levels in the vertebrate cell. PMID:17369403

  13. Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics

    PubMed Central

    Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

    2015-01-01

    The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. PMID:25378326

  14. Genomic assessment of the evolution of the prion protein gene family in vertebrates.

    PubMed

    Harrison, Paul M; Khachane, Amit; Kumar, Manish

    2010-05-01

    Prion diseases are devastating neurological disorders caused by the propagation of particles containing an alternative beta-sheet-rich form of the prion protein (PrP). Genes paralogous to PrP, called Doppel and Shadoo, have been identified, that also have neuropathological relevance. To aid in the further functional characterization of PrP and its relatives, we annotated completely the PrP gene family (PrP-GF), in the genomes of 42 vertebrates, through combined strategic application of gene prediction programs and advanced remote homology detection techniques (such as HMMs, PSI-TBLASTN and pGenThreader). We have uncovered several previously undescribed paralogous genes and pseudogenes. We find that current high-quality genomic evidence indicates that the PrP relative Doppel, was likely present in the last common ancestor of present-day Tetrapoda, but was lost in the bird lineage, since its divergence from reptiles. Using the new gene annotations, we have defined the consensus of structural features that are characteristic of the PrP and Doppel structures, across diverse Tetrapoda clades. Furthermore, we describe in detail a transcribed pseudogene derived from Shadoo that is conserved across primates, and that overlaps the meiosis gene, SYCE1, thus possibly regulating its expression. In addition, we analysed the locus of PRNP/PRND for significant conservation across the genomic DNA of eleven mammals, and determined the phylogenetic penetration of non-coding exons. The genomic evidence indicates that the second PRNP non-coding exon found in even-toed ungulates and rodents, is conserved in all high-coverage genome assemblies of primates (human, chimp, orang utan and macaque), and is, at least, likely to have fallen out of use during primate speciation. Furthermore, we have demonstrated that the PRNT gene (at the PRNP human locus) is conserved across at least sixteen mammals, and evolves like a long non-coding RNA, fashioned from fragments of ancient, long, interspersed elements. These annotations and evolutionary analyses will be of further use for functional characterisation of the PrP-GF, and will be updatable in a semi-automated fashion as more genomes accumulate. Copyright 2010 Elsevier Inc. All rights reserved.

  15. Compactness of viral genomes: effect of disperse and localized random mutations

    NASA Astrophysics Data System (ADS)

    Lošdorfer Božič, Anže; Micheletti, Cristian; Podgornik, Rudolf; Tubiana, Luca

    2018-02-01

    Genomes of single-stranded RNA viruses have evolved to optimize several concurrent properties. One of them is the architecture of their genomic folds, which must not only feature precise structural elements at specific positions, but also allow for overall spatial compactness. The latter was shown to be disrupted by random synonymous mutations, a disruption which can consequently negatively affect genome encapsidation. In this study, we use three mutation schemes with different degrees of locality to mutate the genomes of phage MS2 and Brome Mosaic virus in order to understand the observed sensitivity of the global compactness of their folds. We find that mutating local stretches of their genomes’ sequence or structure is less disruptive to their compactness compared to inducing randomly-distributed mutations. Our findings are indicative of a mechanism for the conservation of compactness acting on a global scale of the genomes, and have several implications for understanding the interplay between local and global architecture of viral RNA genomes.

  16. DCODE.ORG Anthology of Comparative Genomic Tools

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Loots, G G; Ovcharenko, I

    2005-01-11

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the noncoding encryption of gene regulation across genomes. To facilitate the use of comparative genomics to practical applications in genetics and genomics we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools: zPicture and Mulan; a phylogenetic shadowing tool: eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools: rVista and multiTF; a toolmore » for extracting cis-regulatory modules governing the expression of co-regulated genes, CREME; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ web site.« less

  17. A map of human microRNA variation uncovers unexpectedly high levels of variability

    PubMed Central

    2012-01-01

    Background MicroRNAs (miRNAs) are key components of the gene regulatory network in many species. During the past few years, these regulatory elements have been shown to be involved in an increasing number and range of diseases. Consequently, the compilation of a comprehensive map of natural variability in a healthy population seems an obvious requirement for future research on miRNA-related pathologies. Methods Data on 14 populations from the 1000 Genomes Project were analyzed, along with new data extracted from 60 exomes of healthy individuals from a population from southern Spain, sequenced in the context of the Medical Genome Project, to derive an accurate map of miRNA variability. Results Despite the common belief that miRNAs are highly conserved elements, analysis of the sequences of the 1,152 individuals indicated that the observed level of variability is double what was expected. A total of 527 variants were found. Among these, 45 variants affected the recognition region of the corresponding miRNA and were found in 43 different miRNAs, 26 of which are known to be involved in 57 diseases. Different parts of the mature structure of the miRNA were affected to different degrees by variants, which suggests the existence of a selective pressure related to the relative functional impact of the change. Moreover, 41 variants showed a significant deviation from the Hardy-Weinberg equilibrium, which supports the existence of a selective process against some alleles. The average number of variants per individual in miRNAs was 28. Conclusions Despite an expectation that miRNAs would be highly conserved genomic elements, our study reports a level of variability comparable to that observed for coding genes. PMID:22906193

  18. Repeated divergent selection on pigmentation genes in a rapid finch radiation

    PubMed Central

    Campagna, Leonardo; Repenning, Márcio; Silveira, Luís Fábio; Fontana, Carla Suertegaray; Tubaro, Pablo L.; Lovette, Irby J.

    2017-01-01

    Instances of recent and rapid speciation are suitable for associating phenotypes with their causal genotypes, especially if gene flow homogenizes areas of the genome that are not under divergent selection. We study a rapid radiation of nine sympatric bird species known as capuchino seedeaters, which are differentiated in sexually selected characters of male plumage and song. We sequenced the genomes of a phenotypically diverse set of species to search for differentiated genomic regions. Capuchinos show differences in a small proportion of their genomes, yet selection has acted independently on the same targets in different members of this radiation. Many divergent regions contain genes involved in the melanogenesis pathway, with the strongest signal originating from putative regulatory regions. Selection has acted on these same genomic regions in different lineages, likely shaping the evolution of cis-regulatory elements, which control how more conserved genes are expressed and thereby generate diversity in classically sexually selected traits. PMID:28560331

  19. Comparative Sequence Analysis of the X-Inactivation Center Region in Mouse, Human, and Bovine

    PubMed Central

    Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

    2002-01-01

    We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5′ of Xist that was recently shown to attract histone modification early after the onset of X inactivation. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AJ421478, AJ421479, AJ421480, and AJ421481. Online supplemental data are available at http://pbil.univ-lyon1.fr/datasets/Xic2002/data.html and www.genome.org.] PMID:12045143

  20. Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters.

    PubMed

    Dallery, Jean-Félix; Lapalu, Nicolas; Zampounis, Antonios; Pigné, Sandrine; Luyten, Isabelle; Amselem, Joëlle; Wittenberg, Alexander H J; Zhou, Shiguo; de Queiroz, Marisa V; Robin, Guillaume P; Auger, Annie; Hainaut, Matthieu; Henrissat, Bernard; Kim, Ki-Tae; Lee, Yong-Hwan; Lespinet, Olivier; Schwartz, David C; Thon, Michael R; O'Connell, Richard J

    2017-08-29

    The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.

  1. Mutation in a primate-conserved retrotransposon reveals a noncoding RNA as a mediator of infantile encephalopathy

    PubMed Central

    Cartault, François; Munier, Patrick; Benko, Edgar; Desguerre, Isabelle; Hanein, Sylvain; Boddaert, Nathalie; Bandiera, Simonetta; Vellayoudom, Jeanine; Krejbich-Trotot, Pascale; Bintner, Marc; Hoarau, Jean-Jacques; Girard, Muriel; Génin, Emmanuelle; de Lonlay, Pascale; Fourmaintraux, Alain; Naville, Magali; Rodriguez, Diana; Feingold, Josué; Renouil, Michel; Munnich, Arnold; Westhof, Eric; Fähling, Michael; Lyonnet, Stanislas; Henrion-Caude, Alexandra

    2012-01-01

    The human genome is densely populated with transposons and transposon-like repetitive elements. Although the impact of these transposons and elements on human genome evolution is recognized, the significance of subtle variations in their sequence remains mostly unexplored. Here we report homozygosity mapping of an infantile neurodegenerative disease locus in a genetic isolate. Complete DNA sequencing of the 400-kb linkage locus revealed a point mutation in a primate-specific retrotransposon that was transcribed as part of a unique noncoding RNA, which was expressed in the brain. In vitro knockdown of this RNA increased neuronal apoptosis, consistent with the inappropriate dosage of this RNA in vivo and with the phenotype. Moreover, structural analysis of the sequence revealed a small RNA-like hairpin that was consistent with the putative gain of a functional site when mutated. We show here that a mutation in a unique transposable element-containing RNA is associated with lethal encephalopathy, and we suggest that RNAs that harbor evolutionarily recent repetitive elements may play important roles in human brain development. PMID:22411793

  2. Staphylococcal SCCmec elements encode an active MCM-like helicase and thus may be replicative

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mir-Sanchis, Ignacio; Roman, Christina A.; Misiura, Agnieszka

    2016-08-29

    Methicillin-resistant Staphylococcus aureus (MRSA) is a public-health threat worldwide. Although the mobile genomic island responsible for this phenotype, staphylococcal cassette chromosome (SCC), has been thought to be nonreplicative, we predicted DNA-replication-related functions for some of the conserved proteins encoded by SCC. We show that one of these, Cch, is homologous to the self-loading initiator helicases of an unrelated family of genomic islands, that it is an active 3'-to-5' helicase and that the adjacent ORF encodes a single-stranded DNA–binding protein. Our 2.9-Å crystal structure of intact Cch shows that it forms a hexameric ring. Cch, like the archaeal and eukaryotic MCM-familymore » replicative helicases, belongs to the pre–sensor II insert clade of AAA+ ATPases. Additionally, we found that SCC elements are part of a broader family of mobile elements, all of which encode a replication initiator upstream of their recombinases. Replication after excision would enhance the efficiency of horizontal gene transfer.« less

  3. Complete sequence determination of a novel reptile iridovirus isolated from soft-shelled turtle and evolutionary analysis of Iridoviridae

    PubMed Central

    Huang, Youhua; Huang, Xiaohong; Liu, Hong; Gong, Jie; Ouyang, Zhengliang; Cui, Huachun; Cao, Jianhao; Zhao, Yingtao; Wang, Xiujie; Jiang, Yulin; Qin, Qiwei

    2009-01-01

    Background Soft-shelled turtle iridovirus (STIV) is the causative agent of severe systemic diseases in cultured soft-shelled turtles (Trionyx sinensis). To our knowledge, the only molecular information available on STIV mainly concerns the highly conserved STIV major capsid protein. The complete sequence of the STIV genome is not yet available. Therefore, determining the genome sequence of STIV and providing a detailed bioinformatic analysis of its genome content and evolution status will facilitate further understanding of the taxonomic elements of STIV and the molecular mechanisms of reptile iridovirus pathogenesis. Results We determined the complete nucleotide sequence of the STIV genome using 454 Life Science sequencing technology. The STIV genome is 105 890 bp in length with a base composition of 55.1% G+C. Computer assisted analysis revealed that the STIV genome contains 105 potential open reading frames (ORFs), which encode polypeptides ranging from 40 to 1,294 amino acids and 20 microRNA candidates. Among the putative proteins, 20 share homology with the ancestral proteins of the nuclear and cytoplasmic large DNA viruses (NCLDVs). Comparative genomic analysis showed that STIV has the highest degree of sequence conservation and a colinear arrangement of genes with frog virus 3 (FV3), followed by Tiger frog virus (TFV), Ambystoma tigrinum virus (ATV), Singapore grouper iridovirus (SGIV), Grouper iridovirus (GIV) and other iridovirus isolates. Phylogenetic analysis based on conserved core genes and complete genome sequence of STIV with other virus genomes was performed. Moreover, analysis of the gene gain-and-loss events in the family Iridoviridae suggested that the genes encoded by iridoviruses have evolved for favoring adaptation to different natural host species. Conclusion This study has provided the complete genome sequence of STIV. Phylogenetic analysis suggested that STIV and FV3 are strains of the same viral species belonging to the Ranavirus genus in the Iridoviridae family. Given virus-host co-evolution and the phylogenetic relationship among vertebrates from fish to reptiles, we propose that iridovirus might transmit between reptiles and amphibians and that STIV and FV3 are strains of the same viral species in the Ranavirus genus. PMID:19439104

  4. Complete Genome Analysis of Thermus parvatiensis and Comparative Genomics of Thermus spp. Provide Insights into Genetic Variability and Evolution of Natural Competence as Strategic Survival Attributes

    PubMed Central

    Tripathi, Charu; Mishra, Harshita; Khurana, Himani; Dwivedi, Vatsala; Kamra, Komal; Negi, Ram K.; Lal, Rup

    2017-01-01

    Thermophilic environments represent an interesting niche. Among thermophiles, the genus Thermus is among the most studied genera. In this study, we have sequenced the genome of Thermus parvatiensis strain RL, a thermophile isolated from Himalayan hot water springs (temperature >96°C) using PacBio RSII SMRT technique. The small genome (2.01 Mbp) comprises a chromosome (1.87 Mbp) and a plasmid (143 Kbp), designated in this study as pTP143. Annotation revealed a high number of repair genes, a squeezed genome but containing highly plastic plasmid with transposases, integrases, mobile elements and hypothetical proteins (44%). We performed a comparative genomic study of the group Thermus with an aim of analysing the phylogenetic relatedness as well as niche specific attributes prevalent among the group. We compared the reference genome RL with 16 Thermus genomes to assess their phylogenetic relationships based on 16S rRNA gene sequences, average nucleotide identity (ANI), conserved marker genes (31 and 400), pan genome and tetranucleotide frequency. The core genome of the analyzed genomes contained 1,177 core genes and many singleton genes were detected in individual genomes, reflecting a conserved core but adaptive pan repertoire. We demonstrated the presence of metagenomic islands (chromosome:5, plasmid:5) by recruiting raw metagenomic data (from the same niche) against the genomic replicons of T. parvatiensis. We also dissected the CRISPR loci wide all genomes and found widespread presence of this system across Thermus genomes. Additionally, we performed a comparative analysis of competence loci wide Thermus genomes and found evidence for recent horizontal acquisition of the locus and continued dispersal among members reflecting that natural competence is a beneficial survival trait among Thermus members and its acquisition depicts unending evolution in order to accomplish optimal fitness. PMID:28798737

  5. Comparison of Ultra-Conserved Elements in Drosophilids and Vertebrates

    PubMed Central

    Makunin, Igor V.; Shloma, Viktor V.; Stephen, Stuart J.; Pheasant, Michael; Belyakin, Stepan N.

    2013-01-01

    Metazoan genomes contain many ultra-conserved elements (UCEs), long sequences identical between distant species. In this study we identified UCEs in drosophilid and vertebrate species with a similar level of phylogenetic divergence measured at protein-coding regions, and demonstrated that both the length and number of UCEs are larger in vertebrates. The proportion of non-exonic UCEs declines in distant drosophilids whilst an opposite trend was observed in vertebrates. We generated a set of 2,126 Sophophora UCEs by merging elements identified in several drosophila species and compared these to the eutherian UCEs identified in placental mammals. In contrast to vertebrates, the Sophophora UCEs are depleted around transcription start sites. Analysis of 52,954 P-element, piggyBac and Minos insertions in the D. melanogaster genome revealed depletion of the P-element and piggyBac insertions in and around the Sophophora UCEs. We examined eleven fly strains with transposon insertions into the intergenic UCEs and identified associated phenotypes in five strains. Four insertions behave as recessive lethals, and in one case we observed a suppression of the marker gene within the transgene, presumably by silenced chromatin around the integration site. To confirm the lethality is caused by integration of transposons we performed a phenotype rescue experiment for two stocks and demonstrated that the excision of the transposons from the intergenic UCEs restores viability. Sequencing of DNA after the transposon excision in one fly strain with the restored viability revealed a 47 bp insertion at the original transposon integration site suggesting that the nature of the mutation is important for the appearance of the phenotype. Our results suggest that the UCEs in flies and vertebrates have both common and distinct features, and demonstrate that a significant proportion of intergenic drosophila UCEs are sensitive to disruption. PMID:24349264

  6. Identification of a high frequency transposon induced by tissue culture, nDaiZ, a member of the hAT family in rice.

    PubMed

    Huang, Jian; Zhang, Kewei; Shen, Yi; Huang, Zejun; Li, Ming; Tang, Ding; Gu, Minghong; Cheng, Zhukuan

    2009-03-01

    Recent completion of rice genome sequencing has revealed that more than 40% of its genome consists of repetitive sequences, and most of them are related to inactive transposable elements. In the present study, a transposable element, nDaiZ0, which is induced by tissue culture with high frequency, was identified by sequence analysis of an allelic line of the golden hull and internode 2 (gh2) mutant, which was integrated into the forth exon of GH2. The 528-bp nDaiZ0 has 14-bp terminal inverted repeats (TIRs), and generates an 8-bp duplication of its target sites (TSD) during its mobilization. nDaiZs are non-autonomous transposons and have no coding capacity. Bioinformatics analysis and southern blot hybridization showed that at least 16 copies of nDaiZ elements exist in the japonica cultivar Nipponbare genome and 11 copies in the indica cultivar 93-11 genome. During tissue culture, only one copy, nDaiZ9, located on chromosome 5 in the genome of Nipponbare can be activated with its transposable frequency reaching 30%. However, nDaiZ9 was not present in the 93-11 genome. The larger elements, DaiZs, were further identified by database searching using nDaiZ0 as a query because they share similar TIRs and subterminal sequences. DaiZ can also generate an 8-bp TSD. DaiZ elements contain a conserved region with a high similarity to the hAT dimerization motif, suggesting that the nDaiZ-DaiZ transposon system probably belongs to the hAT superfamily of class II transposons. Phylogenetic analysis indicated that it is a new type of plant hAT-like transposon. Although nDaiZ is activated by tissue culture, the high transposable frequency indicates that it could become a useful gene tagging system for rice functional genomic studies. In addition, the mechanism of the high transposable ability of nDaiZ9 is discussed.

  7. De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

    PubMed

    Zolotarov, Yevgen; Strömvik, Martina

    2015-01-01

    Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.

  8. Whole genome and transcriptome analyses of environmental antibiotic sensitive and multi-resistant Pseudomonas aeruginosa isolates exposed to waste water and tap water.

    PubMed

    Schwartz, Thomas; Armant, Olivier; Bretschneider, Nancy; Hahn, Alexander; Kirchen, Silke; Seifert, Martin; Dötsch, Andreas

    2015-01-01

    The fitness of sensitive and resistant Pseudomonas aeruginosa in different aquatic environments depends on genetic capacities and transcriptional regulation. Therefore, an antibiotic-sensitive isolate PA30 and a multi-resistant isolate PA49 originating from waste waters were compared via whole genome and transcriptome Illumina sequencing after exposure to municipal waste water and tap water. A number of different genomic islands (e.g. PAGIs, PAPIs) were identified in the two environmental isolates beside the highly conserved core genome. Exposure to tap water and waste water exhibited similar transcriptional impacts on several gene clusters (antibiotic and metal resistance, genetic mobile elements, efflux pumps) in both environmental P. aeruginosa isolates. The MexCD-OprJ efflux pump was overexpressed in PA49 in response to waste water. The expression of resistance genes, genetic mobile elements in PA49 was independent from the water matrix. Consistently, the antibiotic sensitive strain PA30 did not show any difference in expression of the intrinsic resistance determinants and genetic mobile elements. Thus, the exposure of both isolates to polluted waste water and oligotrophic tap water resulted in similar expression profiles of mentioned genes. However, changes in environmental milieus resulted in rather unspecific transcriptional responses than selected and stimuli-specific gene regulation. © 2014 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  9. Primate-specific evolution of noncoding element insertion into PLA2G4C and human preterm birth

    PubMed Central

    2010-01-01

    Background The onset of birth in humans, like other apes, differs from non-primate mammals in its endocrine physiology. We hypothesize that higher primate-specific gene evolution may lead to these differences and target genes involved in human preterm birth, an area of global health significance. Methods We performed a comparative genomics screen of highly conserved noncoding elements and identified PLA2G4C, a phospholipase A isoform involved in prostaglandin biosynthesis as human accelerated. To examine whether this gene demonstrating primate-specific evolution was associated with birth timing, we genotyped and analyzed 8 common single nucleotide polymorphisms (SNPs) in PLA2G4C in US Hispanic (n = 73 preterm, 292 control), US White (n = 147 preterm, 157 control) and US Black (n = 79 preterm, 166 control) mothers. Results Detailed structural and phylogenic analysis of PLA2G4C suggested a short genomic element within the gene duplicated from a paralogous highly conserved element on chromosome 1 specifically in primates. SNPs rs8110925 and rs2307276 in US Hispanics and rs11564620 in US Whites were significant after correcting for multiple tests (p < 0.006). Additionally, rs11564620 (Thr360Pro) was associated with increased metabolite levels of the prostaglandin thromboxane in healthy individuals (p = 0.02), suggesting this variant may affect PLA2G4C activity. Conclusions Our findings suggest that variation in PLA2G4C may influence preterm birth risk by increasing levels of prostaglandins, which are known to regulate labor. PMID:21184677

  10. An Insulator Element Located at the Cyclin B1 Interacting Protein 1 Gene Locus Is Highly Conserved among Mammalian Species

    PubMed Central

    Yoshida, Wataru; Tomikawa, Junko; Inaki, Makoto; Kimura, Hiroshi; Onodera, Masafumi; Hata, Kenichiro; Nakabayashi, Kazuhiko

    2015-01-01

    Insulators are cis-elements that control the direction of enhancer and silencer activities (enhancer-blocking) and protect genes from silencing by heterochromatinization (barrier activity). Understanding insulators is critical to elucidate gene regulatory mechanisms at chromosomal domain levels. Here, we focused on a genomic region upstream of the mouse Ccnb1ip1 (cyclin B1 interacting protein 1) gene that was methylated in E9.5 embryos of the C57BL/6 strain, but unmethylated in those of the 129X1/SvJ and JF1/Ms strains. We hypothesized the existence of an insulator-type element that prevents the spread of DNA methylation within the 1.8 kbp segment, and actually identified a 242-bp and a 185-bp fragments that were located adjacent to each other and showed insulator and enhancer activities, respectively, in reporter assays. We designated these genomic regions as the Ccnb1ip1 insulator and the Ccnb1ip1 enhancer. The Ccnb1ip1 insulator showed enhancer-blocking activity in the luciferase assays and barrier activity in the colony formation assays. Further examination of the Ccnb1ip1 locus in other mammalian species revealed that the insulator and enhancer are highly conserved among a wide variety of species, and are located immediately upstream of the transcriptional start site of Ccnb1ip1. These newly identified cis-elements may be involved in transcriptional regulation of Ccnb1ip1, which is important in meiotic crossing-over and G2/M transition of the mitotic cell cycle. PMID:26110280

  11. ICAM-1-related long non-coding RNA: promoter analysis and expression in human retinal endothelial cells.

    PubMed

    Lumsden, Amanda L; Ma, Yuefang; Ashander, Liam M; Stempel, Andrew J; Keating, Damien J; Smith, Justine R; Appukuttan, Binoy

    2018-05-09

    Regulation of intercellular adhesion molecule (ICAM)-1 in retinal endothelial cells is a promising druggable target for retinal vascular diseases. The ICAM-1-related (ICR) long non-coding RNA stabilizes ICAM-1 transcript, increasing protein expression. However, studies of ICR involvement in disease have been limited as the promoter is uncharacterized. To address this issue, we undertook a comprehensive in silico analysis of the human ICR gene promoter region. We used genomic evolutionary rate profiling to identify a 115 base pair (bp) sequence within 500 bp upstream of the transcription start site of the annotated human ICR gene that was conserved across 25 eutherian genomes. A second constrained sequence upstream of the orthologous mouse gene (68 bp; conserved across 27 Eutherian genomes including human) was also discovered. Searching these elements identified 33 matrices predictive of binding sites for transcription factors known to be responsive to a broad range of pathological stimuli, including hypoxia, and metabolic and inflammatory proteins. Five phenotype-associated single nucleotide polymorphisms (SNPs) in the immediate vicinity of these elements included four SNPs (i.e. rs2569693, rs281439, rs281440 and rs11575074) predicted to impact binding motifs of transcription factors, and thus the expression of ICR and ICAM-1 genes, with potential to influence disease susceptibility. We verified that human retinal endothelial cells expressed ICR, and observed induction of expression by tumor necrosis factor-α.

  12. Reconstructing the complex evolutionary history of mobile plasmids in red algal genomes

    PubMed Central

    Lee, JunMo; Kim, Kyeong Mi; Yang, Eun Chan; Miller, Kathy Ann; Boo, Sung Min; Bhattacharya, Debashish; Yoon, Hwan Su

    2016-01-01

    The integration of foreign DNA into algal and plant plastid genomes is a rare event, with only a few known examples of horizontal gene transfer (HGT). Plasmids, which are well-studied drivers of HGT in prokaryotes, have been reported previously in red algae (Rhodophyta). However, the distribution of these mobile DNA elements and their sites of integration into the plastid (ptDNA), mitochondrial (mtDNA), and nuclear genomes of Rhodophyta remain unknown. Here we reconstructed the complex evolutionary history of plasmid-derived DNAs in red algae. Comparative analysis of 21 rhodophyte ptDNAs, including new genome data for 5 species, turned up 22 plasmid-derived open reading frames (ORFs) that showed syntenic and copy number variation among species, but were conserved within different individuals in three lineages. Several plasmid-derived homologs were found not only in ptDNA but also in mtDNA and in the nuclear genome of green plants, stramenopiles, and rhizarians. Phylogenetic and plasmid-derived ORF analyses showed that the majority of plasmid DNAs originated within red algae, whereas others were derived from cyanobacteria, other bacteria, and viruses. Our results elucidate the evolution of plasmid DNAs in red algae and suggest that they spread as parasitic genetic elements. This hypothesis is consistent with their sporadic distribution within Rhodophyta. PMID:27030297

  13. Evidence for a lineage of virulent bacteriophages that target Campylobacter.

    PubMed

    Timms, Andrew R; Cambray-Young, Joanna; Scott, Andrew E; Petty, Nicola K; Connerton, Phillippa L; Clarke, Louise; Seeger, Kathy; Quail, Mike; Cummings, Nicola; Maskell, Duncan J; Thomson, Nicholas R; Connerton, Ian F

    2010-03-30

    Our understanding of the dynamics of genome stability versus gene flux within bacteriophage lineages is limited. Recently, there has been a renewed interest in the use of bacteriophages as 'therapeutic' agents; a prerequisite for their use in such therapies is a thorough understanding of their genetic complement, genome stability and their ecology to avoid the dissemination or mobilisation of phage or bacterial virulence and toxin genes. Campylobacter, a food-borne pathogen, is one of the organisms for which the use of bacteriophage is being considered to reduce human exposure to this organism. Sequencing and genome analysis was performed for two Campylobacter bacteriophages. The genomes were extremely similar at the nucleotide level (> or = 96%) with most differences accounted for by novel insertion sequences, DNA methylases and an approximately 10 kb contiguous region of metabolic genes that were dissimilar at the sequence level but similar in gene function between the two phages. Both bacteriophages contained a large number of radical S-adenosylmethionine (SAM) genes, presumably involved in boosting host metabolism during infection, as well as evidence that many genes had been acquired from a wide range of bacterial species. Further bacteriophages, from the UK Campylobacter typing set, were screened for the presence of bacteriophage structural genes, DNA methylases, mobile genetic elements and regulatory genes identified from the genome sequences. The results indicate that many of these bacteriophages are related, with 10 out of 15 showing some relationship to the sequenced genomes. Two large virulent Campylobacter bacteriophages were found to show very high levels of sequence conservation despite separation in time and place of isolation. The bacteriophages show adaptations to their host and possess genes that may enhance Campylobacter metabolism, potentially advantaging both the bacteriophage and its host. Genetic conservation has been shown to extend to other Campylobacter bacteriophages, forming a highly conserved lineage of bacteriophages that predate upon campylobacters and indicating that highly adapted bacteriophage genomes can be stable over prolonged periods of time.

  14. Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation

    PubMed Central

    Bazzini, Ariel A; Johnstone, Timothy G; Christiano, Romain; Mackowiak, Sebastian D; Obermayer, Benedikt; Fleming, Elizabeth S; Vejnar, Charles E; Lee, Miler T; Rajewsky, Nikolaus; Walther, Tobias C; Giraldez, Antonio J

    2014-01-01

    Identification of the coding elements in the genome is a fundamental step to understanding the building blocks of living systems. Short peptides (< 100 aa) have emerged as important regulators of development and physiology, but their identification has been limited by their size. We have leveraged the periodicity of ribosome movement on the mRNA to define actively translated ORFs by ribosome footprinting. This approach identifies several hundred translated small ORFs in zebrafish and human. Computational prediction of small ORFs from codon conservation patterns corroborates and extends these findings and identifies conserved sequences in zebrafish and human, suggesting functional peptide products (micropeptides). These results identify micropeptide-encoding genes in vertebrates, providing an entry point to define their function in vivo. PMID:24705786

  15. Dcode.org anthology of comparative genomic tools.

    PubMed

    Loots, Gabriela G; Ovcharenko, Ivan

    2005-07-01

    Comparative genomics provides the means to demarcate functional regions in anonymous DNA sequences. The successful application of this method to identifying novel genes is currently shifting to deciphering the non-coding encryption of gene regulation across genomes. To facilitate the practical application of comparative sequence analysis to genetics and genomics, we have developed several analytical and visualization tools for the analysis of arbitrary sequences and whole genomes. These tools include two alignment tools, zPicture and Mulan; a phylogenetic shadowing tool, eShadow for identifying lineage- and species-specific functional elements; two evolutionary conserved transcription factor analysis tools, rVista and multiTF; a tool for extracting cis-regulatory modules governing the expression of co-regulated genes, Creme 2.0; and a dynamic portal to multiple vertebrate and invertebrate genome alignments, the ECR Browser. Here, we briefly describe each one of these tools and provide specific examples on their practical applications. All the tools are publicly available at the http://www.dcode.org/ website.

  16. The pineapple genome and the evolution of CAM photosynthesis.

    PubMed

    Ming, Ray; VanBuren, Robert; Wai, Ching Man; Tang, Haibao; Schatz, Michael C; Bowers, John E; Lyons, Eric; Wang, Ming-Li; Chen, Jung; Biggers, Eric; Zhang, Jisen; Huang, Lixian; Zhang, Lingmao; Miao, Wenjing; Zhang, Jian; Ye, Zhangyao; Miao, Chenyong; Lin, Zhicong; Wang, Hao; Zhou, Hongye; Yim, Won C; Priest, Henry D; Zheng, Chunfang; Woodhouse, Margaret; Edger, Patrick P; Guyot, Romain; Guo, Hao-Bo; Guo, Hong; Zheng, Guangyong; Singh, Ratnesh; Sharma, Anupma; Min, Xiangjia; Zheng, Yun; Lee, Hayan; Gurtowski, James; Sedlazeck, Fritz J; Harkess, Alex; McKain, Michael R; Liao, Zhenyang; Fang, Jingping; Liu, Juan; Zhang, Xiaodan; Zhang, Qing; Hu, Weichang; Qin, Yuan; Wang, Kai; Chen, Li-Yu; Shirley, Neil; Lin, Yann-Rong; Liu, Li-Yu; Hernandez, Alvaro G; Wright, Chris L; Bulone, Vincent; Tuskan, Gerald A; Heath, Katy; Zee, Francis; Moore, Paul H; Sunkar, Ramanjulu; Leebens-Mack, James H; Mockler, Todd; Bennetzen, Jeffrey L; Freeling, Michael; Sankoff, David; Paterson, Andrew H; Zhu, Xinguang; Yang, Xiaohan; Smith, J Andrew C; Cushman, John C; Paull, Robert E; Yu, Qingyi

    2015-12-01

    Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues. CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.

  17. Mobile Element Evolution Playing Jigsaw - SINEs in Gastropod and Bivalve Mollusks.

    PubMed

    Matetovici, Irina; Sajgo, Szilard; Ianc, Bianca; Ochis, Cornelia; Bulzu, Paul; Popescu, Octavian; Damert, Annette

    2016-01-06

    SINEs (Short INterspersed Elements) are widely distributed among eukaryotes. Some SINE families are organized in superfamilies characterized by a shared central domain. These central domains are conserved across species, classes, and even phyla. Here we report the identification of two novel such superfamilies in the genomes of gastropod and bivalve mollusks. The central conserved domain of the first superfamily is present in SINEs in Caenogastropoda and Vetigastropoda as well as in all four subclasses of Bivalvia. We designated the domain MESC (Romanian for MElc-snail and SCoica-mussel) because it appears to be restricted to snails and mussels. The second superfamily is restricted to Caenogastropoda. Its central conserved domain-Snail-is related to the Nin-DC domain. Furthermore, we provide evidence that a 40-bp subdomain of the SINE V-domain is conserved in SINEs in mollusks and arthropods. It is predicted to form a stable stem-loop structure that is preserved in the context of the overall SINE RNA secondary structure in invertebrates. Our analysis also recovered short retrotransposons with a Long INterspersed Element (LINE)-derived 5' end. These share the body and/or the tail with transfer RNA (tRNA)-derived SINEs within and across species. Finally, we identified CORE SINEs in gastropods and bivalves-extending the distribution range of this superfamily. © The Author(s) 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

    PubMed Central

    Moore, Carrie B.; Wallace, John R.; Wolfe, Daniel J.; Frase, Alex T.; Pendergrass, Sarah A.; Weiss, Kenneth M.; Ritchie, Marylyn D.

    2013-01-01

    Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses. PMID:24385916

  19. Unit-length line-1 transcripts in human teratocarcinoma cells.

    PubMed Central

    Skowronski, J; Fanning, T G; Singer, M F

    1988-01-01

    We have characterized the approximately 6.5-kilobase cytoplasmic poly(A)+ Line-1 (L1) RNA present in a human teratocarcinoma cell line, NTera2D1, by primer extension and by analysis of cloned cDNAs. The bulk of the RNA begins (5' end) at the residue previously identified as the 5' terminus of the longest known primate genomic L1 elements, presumed to represent "unit" length. Several of the cDNA clones are close to 6 kilobase pairs, that is, close to full length. The partial sequences of 18 cDNA clones and full sequence of one (5,975 base pairs) indicate that many different genomic L1 elements contribute transcripts to the 6.5-kilobase cytoplasmic poly(A)+ RNA in NTera2D1 cells because no 2 of the 19 cDNAs analyzed had identical sequences. The transcribed elements appear to represent a subset of the total genomic L1s, a subset that has a characteristic consensus sequence in the 3' noncoding region and a high degree of sequence conservation throughout. Two open reading frames (ORFs) of 1,122 (ORF1) and 3,852 (ORF2) bases, flanked by about 800 and 200 bases of sequence at the 5' and 3' ends, respectively, can be identified in the cDNAs. Both ORFs are in the same frame, and they are separated by 33 bases bracketed by two conserved in-frame stop codons. ORF 2 is interrupted by at least one randomly positioned stop codon in the majority of the cDNAs. The data support proposals suggesting that the human L1 family includes one or more functional genes as well as an extraordinarily large number of pseudogenes whose ORFs are broken by stop codons. The cDNA structures suggest that both genes and pseudogenes are transcribed. At least one of the cDNAs (cD11), which was sequenced in its entirety, could, in principle, represent an mRNA for production of the ORF1 polypeptide. The similarity of mammalian L1s to several recently described invertebrate movable elements defines a new widely distributed class of elements which we term class II retrotransposons. Images PMID:2454389

  20. Small Open Reading Frames, Non-Coding RNAs and Repetitive Elements in Bradyrhizobium japonicum USDA 110

    PubMed Central

    Hahn, Julia; Tsoy, Olga V.; Thalmann, Sebastian; Čuklina, Jelena; Gelfand, Mikhail S.

    2016-01-01

    Small open reading frames (sORFs) and genes for non-coding RNAs are poorly investigated components of most genomes. Our analysis of 1391 ORFs recently annotated in the soybean symbiont Bradyrhizobium japonicum USDA 110 revealed that 78% of them contain less than 80 codons. Twenty-one of these sORFs are conserved in or outside Alphaproteobacteria and most of them are similar to genes found in transposable elements, in line with their broad distribution. Stabilizing selection was demonstrated for sORFs with proteomic evidence and bll1319_ISGA which is conserved at the nucleotide level in 16 alphaproteobacterial species, 79 species from other taxa and 49 other Proteobacteria. Further we used Northern blot hybridization to validate ten small RNAs (BjsR1 to BjsR10) belonging to new RNA families. We found that BjsR1 and BjsR3 have homologs outside the genus Bradyrhizobium, and BjsR5, BjsR6, BjsR7, and BjsR10 have up to four imperfect copies in Bradyrhizobium genomes. BjsR8, BjsR9, and BjsR10 are present exclusively in nodules, while the other sRNAs are also expressed in liquid cultures. We also found that the level of BjsR4 decreases after exposure to tellurite and iron, and this down-regulation contributes to survival under high iron conditions. Analysis of additional small RNAs overlapping with 3’-UTRs revealed two new repetitive elements named Br-REP1 and Br-REP2. These REP elements may play roles in the genomic plasticity and gene regulation and could be useful for strain identification by PCR-fingerprinting. Furthermore, we studied two potential toxin genes in the symbiotic island and confirmed toxicity of the yhaV homolog bll1687 but not of the newly annotated higB homolog blr0229_ISGA in E. coli. Finally, we revealed transcription interference resulting in an antisense RNA complementary to blr1853, a gene induced in symbiosis. The presented results expand our knowledge on sORFs, non-coding RNAs and repetitive elements in B. japonicum and related bacteria. PMID:27788207

  1. Antisense transcription is pervasive but rarely conserved in enteric bacteria.

    PubMed

    Raghavan, Rahul; Sloan, Daniel B; Ochman, Howard

    2012-01-01

    Noncoding RNAs, including antisense RNAs (asRNAs) that originate from the complementary strand of protein-coding genes, are involved in the regulation of gene expression in all domains of life. Recent application of deep-sequencing technologies has revealed that the transcription of asRNAs occurs genome-wide in bacteria. Although the role of the vast majority of asRNAs remains unknown, it is often assumed that their presence implies important regulatory functions, similar to those of other noncoding RNAs. Alternatively, many antisense transcripts may be produced by chance transcription events from promoter-like sequences that result from the degenerate nature of bacterial transcription factor binding sites. To investigate the biological relevance of antisense transcripts, we compared genome-wide patterns of asRNA expression in closely related enteric bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, by performing strand-specific transcriptome sequencing. Although antisense transcripts are abundant in both species, less than 3% of asRNAs are expressed at high levels in both species, and only about 14% appear to be conserved among species. And unlike the promoters of protein-coding genes, asRNA promoters show no evidence of sequence conservation between, or even within, species. Our findings suggest that many or even most bacterial asRNAs are nonadaptive by-products of the cell's transcription machinery. IMPORTANCE Application of high-throughput methods has revealed the expression throughout bacterial genomes of transcripts encoded on the strand complementary to protein-coding genes. Because transcription is costly, it is usually assumed that these transcripts, termed antisense RNAs (asRNAs), serve some function; however, the role of most asRNAs is unclear, raising questions about their relevance in cellular processes. Because natural selection conserves functional elements, comparisons between related species provide a method for assessing functionality genome-wide. Applying such an approach, we assayed all transcripts in two closely related bacteria, Escherichia coli and Salmonella enterica serovar Typhimurium, and demonstrate that, although the levels of genome-wide antisense transcription are similarly high in both bacteria, only a small fraction of asRNAs are shared across species. Moreover, the promoters associated with asRNAs show no evidence of sequence conservation between, or even within, species. These findings indicate that despite the genome-wide transcription of asRNAs, many of these transcripts are likely nonfunctional.

  2. Rapid Evolution of Virulence and Drug Resistance in the Emerging Zoonotic Pathogen Streptococcus suis

    PubMed Central

    Holden, Matthew T. G.; Hauser, Heidi; Sanders, Mandy; Ngo, Thi Hoa; Cherevach, Inna; Cronin, Ann; Goodhead, Ian; Mungall, Karen; Quail, Michael A.; Price, Claire; Rabbinowitsch, Ester; Sharp, Sarah; Croucher, Nicholas J.; Chieu, Tran Bich; Thi Hoang Mai, Nguyen; Diep, To Song; Chinh, Nguyen Tran; Kehoe, Michael; Leigh, James A.; Ward, Philip N.; Dowson, Christopher G.; Whatmore, Adrian M.; Chanter, Neil; Iversen, Pernille; Gottschalk, Marcelo; Slater, Josh D.; Smith, Hilde E.; Spratt, Brian G.; Xu, Jianguo; Ye, Changyun; Bentley, Stephen; Barrell, Barclay G.; Schultsz, Constance; Maskell, Duncan J.; Parkhill, Julian

    2009-01-01

    Background Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood. Methodology/Principal Findings The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, ∼40% of the ∼2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three ∼90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors. Conclusions/Significance The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance. PMID:19603075

  3. A decade of human genome project conclusion: Scientific diffusion about our genome knowledge.

    PubMed

    Moraes, Fernanda; Góes, Andréa

    2016-05-06

    The Human Genome Project (HGP) was initiated in 1990 and completed in 2003. It aimed to sequence the whole human genome. Although it represented an advance in understanding the human genome and its complexity, many questions remained unanswered. Other projects were launched in order to unravel the mysteries of our genome, including the ENCyclopedia of DNA Elements (ENCODE). This review aims to analyze the evolution of scientific knowledge related to both the HGP and ENCODE projects. Data were retrieved from scientific articles published in 1990-2014, a period comprising the development and the 10 years following the HGP completion. The fact that only 20,000 genes are protein and RNA-coding is one of the most striking HGP results. A new concept about the organization of genome arose. The ENCODE project was initiated in 2003 and targeted to map the functional elements of the human genome. This project revealed that the human genome is pervasively transcribed. Therefore, it was determined that a large part of the non-protein coding regions are functional. Finally, a more sophisticated view of chromatin structure emerged. The mechanistic functioning of the genome has been redrafted, revealing a much more complex picture. Besides, a gene-centric conception of the organism has to be reviewed. A number of criticisms have emerged against the ENCODE project approaches, raising the question of whether non-conserved but biochemically active regions are truly functional. Thus, HGP and ENCODE projects accomplished a great map of the human genome, but the data generated still requires further in depth analysis. © 2016 by The International Union of Biochemistry and Molecular Biology, 44:215-223, 2016. © 2016 The International Union of Biochemistry and Molecular Biology.

  4. Inhibition of flavivirus infections by antisense oligomers specifically suppressing viral translation and RNA replication.

    PubMed

    Deas, Tia S; Binduga-Gajewska, Iwona; Tilgner, Mark; Ren, Ping; Stein, David A; Moulton, Hong M; Iversen, Patrick L; Kauffman, Elizabeth B; Kramer, Laura D; Shi, Pei-Yong

    2005-04-01

    RNA elements within flavivirus genomes are potential targets for antiviral therapy. A panel of phosphorodiamidate morpholino oligomers (PMOs), whose sequences are complementary to RNA elements located in the 5'- and 3'-termini of the West Nile (WN) virus genome, were designed to anneal to important cis-acting elements and potentially to inhibit WN infection. A novel Arg-rich peptide was conjugated to each PMO for efficient cellular delivery. These PMOs exhibited various degrees of antiviral activity upon incubation with a WN virus luciferase-replicon-containing cell line. Among them, PMOs targeting the 5'-terminal 20 nucleotides (5'End) or targeting the 3'-terminal element involved in a potential genome cyclizing interaction (3'CSI) exhibited the greatest potency. When cells infected with an epidemic strain of WN virus were treated with the 5'End or 3'CSI PMO, virus titers were reduced by approximately 5 to 6 logs at a 5 muM concentration without apparent cytotoxicity. The 3'CSI PMO also inhibited mosquito-borne flaviviruses other than WN virus, and the antiviral potency correlated with the conservation of the targeted 3'CSI sequences of specific viruses. Mode-of-action analyses showed that the 5'End and 3'CSI PMOs suppressed viral infection through two distinct mechanisms. The 5'End PMO inhibited viral translation, whereas the 3'CSI PMO did not significantly affect viral translation but suppressed RNA replication. The results suggest that antisense PMO-mediated blocking of cis-acting elements of flavivirus genomes can potentially be developed into an anti-flavivirus therapy. In addition, we report that although a full-length WN virus containing a luciferase reporter (engineered at the 3' untranslated region of the genome) is not stable, an early passage of this reporting virus can be used to screen for inhibitors against any step of the virus life cycle.

  5. Low-pass sequencing for microbial comparative genomics

    PubMed Central

    Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

    2004-01-01

    Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067

  6. Transposon fingerprinting using low coverage whole genome shotgun sequencing in cacao (Theobroma cacao L.) and related species.

    PubMed

    Sveinsson, Saemundur; Gill, Navdeep; Kane, Nolan C; Cronk, Quentin

    2013-07-24

    Transposable elements (TEs) and other repetitive elements are a large and dynamically evolving part of eukaryotic genomes, especially in plants where they can account for a significant proportion of genome size. Their dynamic nature gives them the potential for use in identifying and characterizing crop germplasm. However, their repetitive nature makes them challenging to study using conventional methods of molecular biology. Next generation sequencing and new computational tools have greatly facilitated the investigation of TE variation within species and among closely related species. (i) We generated low-coverage Illumina whole genome shotgun sequencing reads for multiple individuals of cacao (Theobroma cacao) and related species. These reads were analysed using both an alignment/mapping approach and a de novo (graph based clustering) approach. (ii) A standard set of ultra-conserved orthologous sequences (UCOS) standardized TE data between samples and provided phylogenetic information on the relatedness of samples. (iii) The mapping approach proved highly effective within the reference species but underestimated TE abundance in interspecific comparisons relative to the de novo methods. (iv) Individual T. cacao accessions have unique patterns of TE abundance indicating that the TE composition of the genome is evolving actively within this species. (v) LTR/Gypsy elements are the most abundant, comprising c.10% of the genome. (vi) Within T. cacao the retroelement families show an order of magnitude greater sequence variability than the DNA transposon families. (vii) Theobroma grandiflorum has a similar TE composition to T. cacao, but the related genus Herrania is rather different, with LTRs making up a lower proportion of the genome, perhaps because of a massive presence (c. 20%) of distinctive low complexity satellite-like repeats in this genome. (i) Short read alignment/mapping to reference TE contigs provides a simple and effective method of investigating intraspecific differences in TE composition. It is not appropriate for comparing repetitive elements across the species boundaries, for which de novo methods are more appropriate. (ii) Individual T. cacao accessions have unique spectra of TE composition indicating active evolution of TE abundance within this species. TE patterns could potentially be used as a "fingerprint" to identify and characterize cacao accessions.

  7. [Genome-scale sequence data processing and epigenetic analysis of DNA methylation].

    PubMed

    Wang, Ting-Zhang; Shan, Gao; Xu, Jian-Hong; Xue, Qing-Zhong

    2013-06-01

    A new approach recently developed for detecting cytosine DNA methylation (mC) and analyzing the genome-scale DNA methylation profiling, is called BS-Seq which is based on bisulfite conversion of genomic DNA combined with next-generation sequencing. The method can not only provide an insight into the difference of genome-scale DNA methylation among different organisms, but also reveal the conservation of DNA methylation in all contexts and nucleotide preference for different genomic regions, including genes, exons, and repetitive DNA sequences. It will be helpful to under-stand the epigenetic impacts of cytosine DNA methylation on the regulation of gene expression and maintaining silence of repetitive sequences, such as transposable elements. In this paper, we introduce the preprocessing steps of DNA methylation data, by which cytosine (C) and guanine (G) in the reference sequence are transferred to thymine (T) and adenine (A), and cytosine in reads is transferred to thymine, respectively. We also comprehensively review the main content of the DNA methylation analysis on the genomic scale: (1) the cytosine methylation under the context of different sequences; (2) the distribution of genomic methylcytosine; (3) DNA methylation context and the preference for the nucleotides; (4) DNA- protein interaction sites of DNA methylation; (5) degree of methylation of cytosine in the different structural elements of genes. DNA methylation analysis technique provides a powerful tool for the epigenome study in human and other species, and genes and environment interaction, and founds the theoretical basis for further development of disease diagnostics and therapeutics in human.

  8. How Malleable is the Eukaryotic Genome? Extreme Rate of Chromosomal Rearrangement in the Genus Drosophila

    PubMed Central

    Ranz, José María; Casals, Ferran; Ruiz, Alfredo

    2001-01-01

    During the evolution of the genus Drosophila, the molecular organization of the major chromosomal elements has been repeatedly rearranged via the fixation of paracentric inversions. Little detailed information is available, however, on the extent and effect of these changes at the molecular level. In principle, a full description of the rate and pattern of change could reveal the limits, if any, to which the eukaryotic genome can accommodate reorganizations. We have constructed a high-density physical map of the largest chromosomal element in Drosophila repleta (chromosome 2) and compared the order and distances between the markers with those on the homologous chromosomal element (3R) in Drosophila melanogaster. The two species belong to different subgenera (Drosophila and Sophophora, respectively), which diverged 40–62 million years (Myr) ago and represent, thus, the farthest lineages within the Drosophila genus. The comparison reveals extensive reshuffling of gene order from centromere to telomere. Using a maximum likelihood method, we estimate that 114 ± 14 paracentric inversions have been fixed in this chromosomal element since the divergence of the two species, that is, 0.9–1.4 inversions fixed per Myr. Comparison with available rates of chromosomal evolution, taking into account genome size, indicates that the Drosophila genome shows the highest rate found so far in any eukaryote. Twenty-one small segments (23–599 kb) comprising at least two independent (nonoverlapping) markers appear to be conserved between D. melanogaster and D. repleta. These results are consistent with the random breakage model and do not provide significant evidence of functional constraint of any kind. They support the notion that the Drosophila genome is extraordinarily malleable and has a modular organization. The high rate of chromosomal change also suggests a very limited transferability of the positional information from the Drosophila genome to other insects. [The sequence data described in this paper have been submitted to the GenBank data library under accession no, AF319441.] PMID:11157786

  9. Identification and characterisation of Short Interspersed Nuclear Elements in the olive tree (Olea europaea L.) genome.

    PubMed

    Barghini, Elena; Mascagni, Flavia; Natali, Lucia; Giordani, Tommaso; Cavallini, Andrea

    2017-02-01

    Short Interspersed Nuclear Elements (SINEs) are nonautonomous retrotransposons in the genome of most eukaryotic species. While SINEs have been intensively investigated in humans and other animal systems, SINE identification has been carried out only in a limited number of plant species. This lack of information is apparent especially in non-model plants whose genome has not been sequenced yet. The aim of this work was to produce a specific bioinformatics pipeline for analysing second generation sequence reads of a non-model species and identifying SINEs. We have identified, for the first time, 227 putative SINEs of the olive tree (Olea europaea), that constitute one of the few sets of such sequences in dicotyledonous species. The identified SINEs ranged from 140 to 362 bp in length and were characterised with regard to the occurrence of the tRNA domain in their sequence. The majority of identified elements resulted in single copy or very lowly repeated, often in association with genic sequences. Analysis of sequence similarity allowed us to identify two major groups of SINEs showing different abundances in the olive tree genome, the former with sequence similarity to SINEs of Scrophulariaceae and Solanaceae and the latter to SINEs of Salicaceae. A comparison of sequence conservation between olive SINEs and LTR retrotransposon families suggested that SINE expansion in the genome occurred especially in very ancient times, before LTR retrotransposon expansion, and presumably before the separation of the rosids (to which Oleaceae belong) from the Asterids. Besides providing data on olive SINEs, our results demonstrate the suitability of the pipeline employed for SINE identification. Applying this pipeline will favour further structural and functional analyses on these relatively unknown elements to be performed also in other plant species, even in the absence of a reference genome, and will allow establishing general evolutionary patterns for this kind of repeats in plants.

  10. Concerted copy number variation balances ribosomal DNA dosage in human and mouse genomes

    PubMed Central

    Gibbons, John G.; Branco, Alan T.; Godinho, Susana A.; Yu, Shoukai; Lemos, Bernardo

    2015-01-01

    Tandemly repeated ribosomal DNA (rDNA) arrays are among the most evolutionary dynamic loci of eukaryotic genomes. The loci code for essential cellular components, yet exhibit extensive copy number (CN) variation within and between species. CN might be partly determined by the requirement of dosage balance between the 5S and 45S rDNA arrays. The arrays are nonhomologous, physically unlinked in mammals, and encode functionally interdependent RNA components of the ribosome. Here we show that the 5S and 45S rDNA arrays exhibit concerted CN variation (cCNV). Despite 5S and 45S rDNA elements residing on different chromosomes and lacking sequence similarity, cCNV between these loci is strong, evolutionarily conserved in humans and mice, and manifested across individual genotypes in natural populations and pedigrees. Finally, we observe that bisphenol A induces rapid and parallel modulation of 5S and 45S rDNA CN. Our observations reveal a novel mode of genome variation, indicate that natural selection contributed to the evolution and conservation of cCNV, and support the hypothesis that 5S CN is partly determined by the requirement of dosage balance with the 45S rDNA array. We suggest that human disease variation might be traced to disrupted rDNA dosage balance in the genome. PMID:25583482

  11. Comparative Analysis of the Mitochondrial Genomes of Callitettixini Spittlebugs (Hemiptera: Cercopidae) Confirms the Overall High Evolutionary Speed of the AT-Rich Region but Reveals the Presence of Short Conservative Elements at the Tribal Level

    PubMed Central

    Liu, Jie; Bu, Cuiping; Wipfler, Benjamin; Liang, Aiping

    2014-01-01

    The present study compares the mitochondrial genomes of five species of the spittlebug tribe Callitettixini (Hemiptera: Cercopoidea: Cercopidae) from eastern Asia. All genomes of the five species sequenced are circular double-stranded DNA molecules and range from 15,222 to 15,637 bp in length. They contain 22 tRNA genes, 13 protein coding genes (PCGs) and 2 rRNA genes and share the putative ancestral gene arrangement of insects. The PCGs show an extreme bias of nucleotide and amino acid composition. Significant differences of the substitution rates among the different genes as well as the different codon position of each PCG are revealed by the comparative evolutionary analyses. The substitution speeds of the first and second codon position of different PCGs are negatively correlated with their GC content. Among the five species, the AT-rich region features great differences in length and pattern and generally shows a 2–5 times higher substitution rate than the fastest PCG in the mitochondrial genome, atp8. Despite the significant variability in length, short conservative segments were identified in the AT-rich region within Callitettixini, although absent from the other groups of the spittlebug superfamily Cercopoidea. PMID:25285442

  12. Genome-wide localization of exosome components to active promoters and chromatin insulators in Drosophila

    PubMed Central

    Lim, Su Jun; Boyle, Patrick J.; Chinen, Madoka; Dale, Ryan K.; Lei, Elissa P.

    2013-01-01

    Chromatin insulators are functionally conserved DNA–protein complexes situated throughout the genome that organize independent transcriptional domains. Previous work implicated RNA as an important cofactor in chromatin insulator activity, although the precise mechanisms are not yet understood. Here we identify the exosome, the highly conserved major cellular 3′ to 5′ RNA degradation machinery, as a physical interactor of CP190-dependent chromatin insulator complexes in Drosophila. Genome-wide profiling of exosome by ChIP-seq in two different embryonic cell lines reveals extensive and specific overlap with the CP190, BEAF-32 and CTCF insulator proteins. Colocalization occurs mainly at promoters but also boundary elements such as Mcp, Fab-8, scs and scs′, which overlaps with a promoter. Surprisingly, exosome associates primarily with promoters but not gene bodies of active genes, arguing against simple cotranscriptional recruitment to RNA substrates. Similar to insulator proteins, exosome is also significantly enriched at divergently transcribed promoters. Directed ChIP of exosome in cell lines depleted of insulator proteins shows that CTCF is required specifically for exosome association at Mcp and Fab-8 but not other sites, suggesting that alternate mechanisms must also contribute to exosome chromatin recruitment. Taken together, our results reveal a novel positive relationship between exosome and chromatin insulators throughout the genome. PMID:23358822

  13. The pineapple genome and the evolution of CAM photosynthesis

    PubMed Central

    Ming, Ray; VanBuren, Robert; Wai, Ching Man; Tang, Haibao; Schatz, Michael C.; Bowers, John E.; Lyons, Eric; Wang, Ming-Li; Chen, Jung; Biggers, Eric; Zhang, Jisen; Huang, Lixian; Zhang, Lingmao; Miao, Wenjing; Zhang, Jian; Ye, Zhangyao; Miao, Chenyong; Lin, Zhicong; Wang, Hao; Zhou, Hongye; Yim, Won C.; Priest, Henry D.; Zheng, Chunfang; Woodhouse, Margaret; Edger, Patrick P.; Guyot, Romain; Guo, Hao-Bo; Guo, Hong; Zheng, Guangyong; Singh, Ratnesh; Sharma, Anupma; Min, Xiangjia; Zheng, Yun; Lee, Hayan; Gurtowski, James; Sedlazeck, Fritz J.; Harkess, Alex; McKain, Michael R.; Liao, Zhenyang; Fang, Jingping; Liu, Juan; Zhang, Xiaodan; Zhang, Qing; Hu, Weichang; Qin, Yuan; Wang, Kai; Chen, Li-Yu; Shirley, Neil; Lin, Yann-Rong; Liu, Li-Yu; Hernandez, Alvaro G.; Wright, Chris L.; Bulone, Vincent; Tuskan, Gerald A.; Heath, Katy; Zee, Francis; Moore, Paul H.; Sunkar, Ramanjulu; Leebens-Mack, James H.; Mockler, Todd; Bennetzen, Jeffrey L.; Freeling, Michael; Sankoff, David; Paterson, Andrew H.; Zhu, Xinguang; Yang, Xiaohan; Smith, J. Andrew C.; Cushman, John C.; Paull, Robert E.; Yu, Qingyi

    2016-01-01

    Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water use efficiency, and the second most important tropical fruit after banana in terms of international trade. We sequenced the genomes of pineapple varieties ‘F153’ and ‘MD2’, and a wild pineapple relative A. bracteatus accession CB5. The pineapple genome has one fewer ancient whole genome duplications than sequenced grass genomes and, therefore, provides an important reference for elucidating gene content and structure in the last common ancestor of extant members of the grass family (Poaceae). Pineapple has a conserved karyotype with seven pre rho duplication chromosomes that are ancestral to extant grass karyotypes. The pineapple lineage has transitioned from C3 photosynthesis to CAM with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues using beta-carbonic anhydrase (βCA) for initial capture of CO2. Promoter regions of all three βCA genes contain a CCA1 binding site that can bind circadian core oscillators. CAM pathway genes were enriched with cis-regulatory elements including the morning (CCACAC) and evening (AAAATATC) elements associated with regulation of circadian-clock genes, providing the first link between CAM and the circadian clock regulation. Gene-interaction network analysis revealed both activation and repression of regulatory elements that control key enzymes in CAM photosynthesis, indicating that CAM evolved by reconfiguration of pathways preexisting in C3 plants. Pineapple CAM photosynthesis is the result of regulatory neofunctionalization of preexisting gene copies and not acquisition of neofunctionalized genes via whole genome or tandem gene duplication. PMID:26523774

  14. NAC transcription factor genes: genome-wide identification, phylogenetic, motif and cis-regulatory element analysis in pigeonpea (Cajanus cajan (L.) Millsp.).

    PubMed

    Satheesh, Viswanathan; Jagannadham, P Tej Kumar; Chidambaranathan, Parameswaran; Jain, P K; Srinivasan, R

    2014-12-01

    The NAC (NAM, ATAF and CUC) proteins are plant-specific transcription factors implicated in development and stress responses. In the present study 88 pigeonpea NAC genes were identified from the recently published draft genome of pigeonpea by using homology based and de novo prediction programmes. These sequences were further subjected to phylogenetic, motif and promoter analyses. In motif analysis, highly conserved motifs were identified in the NAC domain and also in the C-terminal region of the NAC proteins. A phylogenetic reconstruction using pigeonpea, Arabidopsis and soybean NAC genes revealed 33 putative stress-responsive pigeonpea NAC genes. Several stress-responsive cis-elements were identified through in silico analysis of the promoters of these putative stress-responsive genes. This analysis is the first report of NAC gene family in pigeonpea and will be useful for the identification and selection of candidate genes associated with stress tolerance.

  15. A diverse intrinsic antibiotic resistome from a cave bacterium.

    PubMed

    Pawlowski, Andrew C; Wang, Wenliang; Koteva, Kalinka; Barton, Hazel A; McArthur, Andrew G; Wright, Gerard D

    2016-12-08

    Antibiotic resistance is ancient and widespread in environmental bacteria. These are therefore reservoirs of resistance elements and reflective of the natural history of antibiotics and resistance. In a previous study, we discovered that multi-drug resistance is common in bacteria isolated from Lechuguilla Cave, an underground ecosystem that has been isolated from the surface for over 4 Myr. Here we use whole-genome sequencing, functional genomics and biochemical assays to reveal the intrinsic resistome of Paenibacillus sp. LC231, a cave bacterial isolate that is resistant to most clinically used antibiotics. We systematically link resistance phenotype to genotype and in doing so, identify 18 chromosomal resistance elements, including five determinants without characterized homologues and three mechanisms not previously shown to be involved in antibiotic resistance. A resistome comparison across related surface Paenibacillus affirms the conservation of resistance over millions of years and establishes the longevity of these genes in this genus.

  16. Genomicus update 2015: KaryoView and MatrixView provide a genome-wide perspective to multispecies comparative genomics.

    PubMed

    Louis, Alexandra; Nguyen, Nga Thi Thuy; Muffato, Matthieu; Roest Crollius, Hugues

    2015-01-01

    The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  17. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world

    PubMed Central

    Koonin, Eugene V.; Wolf, Yuri I.

    2008-01-01

    The first bacterial genome was sequenced in 1995, and the first archaeal genome in 1996. Soon after these breakthroughs, an exponential rate of genome sequencing was established, with a doubling time of approximately 20 months for bacteria and approximately 34 months for archaea. Comparative analysis of the hundreds of sequenced bacterial and dozens of archaeal genomes leads to several generalizations on the principles of genome organization and evolution. A crucial finding that enables functional characterization of the sequenced genomes and evolutionary reconstruction is that the majority of archaeal and bacterial genes have conserved orthologs in other, often, distant organisms. However, comparative genomics also shows that horizontal gene transfer (HGT) is a dominant force of prokaryotic evolution, along with the loss of genetic material resulting in genome contraction. A crucial component of the prokaryotic world is the mobilome, the enormous collection of viruses, plasmids and other selfish elements, which are in constant exchange with more stable chromosomes and serve as HGT vehicles. Thus, the prokaryotic genome space is a tightly connected, although compartmentalized, network, a novel notion that undermines the ‘Tree of Life’ model of evolution and requires a new conceptual framework and tools for the study of prokaryotic evolution. PMID:18948295

  18. Guinea Pig ID-Like Families of SINEs

    PubMed Central

    Kass, David H.; Schaetz, Brian A.; Beitler, Lindsey; Bonney, Kevin M.; Jamison, Nicole; Wiesner, Cathy

    2009-01-01

    Previous studies have indicated a paucity of SINEs within the genomes of the guinea pig and nutria, representatives of the Hystricognathi suborder of rodents. More recent work has shown that the guinea pig genome contains a large number of B1 elements, expanding to various levels among different rodents. In this work we utilized A–B PCR and screened GenBank with sequences from isolated clones to identify potentially uncharacterized SINEs within the guinea pig genome, and identified numerous sequences with a high degree of similarity (>92%) specific to the guinea pig. The presence of A-tails and flanking direct repeats associated with these sequences supported the identification of a full-length SINE, with a consensus sequence notably distinct from other rodent SINEs. Although most similar to the ID SINE, it clearly was not derived from the known ID master gene (BC1), hence we refer to this element as guinea pig ID-like (GPIDL). Using the consensus to screen the guinea pig genomic database (Assembly CavPor2) with Ensembl BlastView, we estimated at least 100,000 copies, which contrasts markedly to just over 100 copies of ID elements. Additionally we provided evidence of recent integrations of GPIDL as two of seven analyzed conserved GPIDL-containing loci demonstrated presence/absence variants in Cavia porcellus and C. aperea. Using intra-IDL PCR and sequence analyses we also provide evidence that GPIDL is derived from a hystricognath-specific SINE family. These results demonstrate that this SINE family continues to contribute to the dynamics of genomes of hystricognath rodents. PMID:19232383

  19. Guinea pig ID-like families of SINEs.

    PubMed

    Kass, David H; Schaetz, Brian A; Beitler, Lindsey; Bonney, Kevin M; Jamison, Nicole; Wiesner, Cathy

    2009-05-01

    Previous studies have indicated a paucity of SINEs within the genomes of the guinea pig and nutria, representatives of the Hystricognathi suborder of rodents. More recent work has shown that the guinea pig genome contains a large number of B1 elements, expanding to various levels among different rodents. In this work we utilized A-B PCR and screened GenBank with sequences from isolated clones to identify potentially uncharacterized SINEs within the guinea pig genome, and identified numerous sequences with a high degree of similarity (>92%) specific to the guinea pig. The presence of A-tails and flanking direct repeats associated with these sequences supported the identification of a full-length SINE, with a consensus sequence notably distinct from other rodent SINEs. Although most similar to the ID SINE, it clearly was not derived from the known ID master gene (BC1), hence we refer to this element as guinea pig ID-like (GPIDL). Using the consensus to screen the guinea pig genomic database (Assembly CavPor2) with Ensembl BlastView, we estimated at least 100,000 copies, which contrasts markedly to just over 100 copies of ID elements. Additionally we provided evidence of recent integrations of GPIDL as two of seven analyzed conserved GPIDL-containing loci demonstrated presence/absence variants in Cavia porcellus and C. aperea. Using intra-IDL PCR and sequence analyses we also provide evidence that GPIDL is derived from a hystricognath-specific SINE family. These results demonstrate that this SINE family continues to contribute to the dynamics of genomes of hystricognath rodents.

  20. Enhancer Evolution across 20 Mammalian Species

    PubMed Central

    Villar, Diego; Berthelot, Camille; Aldridge, Sarah; Rayner, Tim F.; Lukk, Margus; Pignatelli, Miguel; Park, Thomas J.; Deaville, Robert; Erichsen, Jonathan T.; Jasinska, Anna J.; Turner, James M.A.; Bertelsen, Mads F.; Murchison, Elizabeth P.; Flicek, Paul; Odom, Duncan T.

    2015-01-01

    Summary The mammalian radiation has corresponded with rapid changes in noncoding regions of the genome, but we lack a comprehensive understanding of regulatory evolution in mammals. Here, we track the evolution of promoters and enhancers active in liver across 20 mammalian species from six diverse orders by profiling genomic enrichment of H3K27 acetylation and H3K4 trimethylation. We report that rapid evolution of enhancers is a universal feature of mammalian genomes. Most of the recently evolved enhancers arise from ancestral DNA exaptation, rather than lineage-specific expansions of repeat elements. In contrast, almost all liver promoters are partially or fully conserved across these species. Our data further reveal that recently evolved enhancers can be associated with genes under positive selection, demonstrating the power of this approach for annotating regulatory adaptations in genomic sequences. These results provide important insight into the functional genetics underpinning mammalian regulatory evolution. PMID:25635462

  1. Transcriptional activation is a conserved feature of the early embryonic factor Zelda that requires a cluster of four zinc fingers for DNA binding and a low-complexity activation domain.

    PubMed

    Hamm, Danielle C; Bondra, Eliana R; Harrison, Melissa M

    2015-02-06

    Delayed transcriptional activation of the zygotic genome is a nearly universal phenomenon in metazoans. Immediately following fertilization, development is controlled by maternally deposited products, and it is not until later stages that widespread activation of the zygotic genome occurs. Although the mechanisms driving this genome activation are currently unknown, the transcriptional activator Zelda (ZLD) has been shown to be instrumental in driving this process in Drosophila melanogaster. Here we define functional domains of ZLD required for both DNA binding and transcriptional activation. We show that the C-terminal cluster of four zinc fingers mediates binding to TAGteam DNA elements in the promoters of early expressed genes. All four zinc fingers are required for this activity, and splice isoforms lacking three of the four zinc fingers fail to activate transcription. These truncated splice isoforms dominantly suppress activation by the full-length, embryonically expressed isoform. We map the transcriptional activation domain of ZLD to a central region characterized by low complexity. Despite relatively little sequence conservation within this domain, ZLD orthologs from Drosophila virilis, Anopheles gambiae, and Nasonia vitripennis activate transcription in D. melanogaster cells. Transcriptional activation by these ZLD orthologs suggests that ZLD functions through conserved interactions with a protein cofactor(s). We have identified distinct DNA-binding and activation domains within the critical transcription factor ZLD that controls the initial activation of the zygotic genome. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.

  2. Trichodesmium genome maintains abundant, widespread noncoding DNA in situ, despite oligotrophic lifestyle

    DOE PAGES

    Walworth, Nathan; Pfreundt, Ulrike; Nelson, William C.; ...

    2015-03-23

    Understanding the evolution of the free-living, cyanobacterial, diazotroph Trichodesmium is of great importance because of its critical role in oceanic biogeochemistry and primary production. Unlike the other >150 available genomes of free-living cyanobacteria, only 63.8% of the Trichodesmium erythraeum (strain IMS101) genome is predicted to encode protein, which is 20–25% less than the average for other cyanobacteria and nonpathogenic, free-living bacteria. In this paper, we use distinctive isolates and metagenomic data to show that low coding density observed in IMS101 is a common feature of the Trichodesmium genus, both in culture and in situ. Transcriptome analysis indicates that 86% ofmore » the noncoding space is expressed, although the function of these transcripts is unclear. The density of noncoding, possible regulatory elements predicted in Trichodesmium, when normalized per intergenic kilobase, was comparable and twofold higher than that found in the gene-dense genomes of the sympatric cyanobacterial genera Synechococcus and Prochlorococcus, respectively. Conserved Trichodesmium noncoding RNA secondary structures were predicted between most culture and metagenomic sequences, lending support to the structural conservation. Conservation of these intergenic regions in spatiotemporally separated Trichodesmium populations suggests possible genus-wide selection for their maintenance. These large intergenic spacers may have developed during intervals of strong genetic drift caused by periodic blooms of a subset of genotypes, which may have reduced effective population size. Finally, our data suggest that transposition of selfish DNA, low effective population size, and high-fidelity replication allowed the unusual “inflation” of noncoding sequence observed in Trichodesmium despite its oligotrophic lifestyle.« less

  3. Trichodesmium genome maintains abundant, widespread noncoding DNA in situ, despite oligotrophic lifestyle

    DOE PAGES

    Walworth, Nathan G.; Pfreundt, Ulrike; Nelson, William C.; ...

    2015-04-07

    Understanding the evolution of the free-living, cyanobacterial, diazotroph Trichodesmium is of great importance due to its critical role in oceanic biogeochemistry and primary production. Unlike the other >150 available genomes of free-living cyanobacteria, only 63.8% of the Trichodesmium erythraeum (strain IMS101) genome is predicted to encode protein, which is 20-25% less than the average for other cyanobacteria and non-pathogenic, free-living bacteria. We use distinctive isolates and metagenomic data to show that low coding density observed in IMS101 is a common feature of the Trichodesmium genus both in culture and in situ. Transcriptome analysis indicates that 86% of the non-coding spacemore » is expressed, although the function of these transcripts is unclear. The density of noncoding, possible regulatory elements predicted in Trichodesmium, when normalized per intergenic kilobase, was comparable and two fold higher than that found in the gene dense genomes of the sympatric cyanobacterial genera Synechococcus and Prochlorococcus, respectively. Conserved Trichodesmium ncRNA secondary structures were predicted between most culture and metagenomic sequences lending support to the structural conservation. Conservation of these intergenic regions in spatiotemporally separated Trichodesmium populations suggests possible genus-wide selection for their maintenance. These large intergenic spacers may have developed during intervals of strong genetic drift caused by periodic blooms of a subset of genotypes, which may have reduced effective population size. Our data suggest that transposition of selfish DNA, low effective population size, and high fidelity replication allowed the unusual ‘inflation’ of noncoding sequence observed in Trichodesmium despite its oligotrophic lifestyle.« less

  4. Short interspersed element (SINE) depletion and long interspersed element (LINE) abundance are not features universally required for imprinting.

    PubMed

    Cowley, Michael; de Burca, Anna; McCole, Ruth B; Chahal, Mandeep; Saadat, Ghazal; Oakey, Rebecca J; Schulz, Reiner

    2011-04-20

    Genomic imprinting is a form of gene dosage regulation in which a gene is expressed from only one of the alleles, in a manner dependent on the parent of origin. The mechanisms governing imprinted gene expression have been investigated in detail and have greatly contributed to our understanding of genome regulation in general. Both DNA sequence features, such as CpG islands, and epigenetic features, such as DNA methylation and non-coding RNAs, play important roles in achieving imprinted expression. However, the relative importance of these factors varies depending on the locus in question. Defining the minimal features that are absolutely required for imprinting would help us to understand how imprinting has evolved mechanistically. Imprinted retrogenes are a subset of imprinted loci that are relatively simple in their genomic organisation, being distinct from large imprinting clusters, and have the potential to be used as tools to address this question. Here, we compare the repeat element content of imprinted retrogene loci with non-imprinted controls that have a similar locus organisation. We observe no significant differences that are conserved between mouse and human, suggesting that the paucity of SINEs and relative abundance of LINEs at imprinted loci reported by others is not a sequence feature universally required for imprinting.

  5. Genomics and the challenging translation into conservation practice

    Treesearch

    Aaron B. A. Shafer; Jochen B. W. Wolf; Paulo C. Alves; Linnea Bergstrom; Michael W. Bruford; Ioana Brannstrom; Guy Colling; Love Dalen; Luc De Meester; Robert Ekblom; Katie D. Fawcett; Simone Fior; Mehrdad Hajibabaei; Jason A. Hill; A. Rus Hoezel; Jacob Hoglund; Evelyn L. Jensen; Johannes Krause; Torsten N. Kristensen; Michael Krutzen; John K. McKay; Anita J. Norman; Rob Ogden; E. Martin Osterling; N. Joop Ouborg; John Piccolo; Danijela Popovic; Craig R. Primmer; Floyd A. Reed; Marie Roumet; Jordi Salmona; Tamara Schenekar; Michael K. Schwartz; Gernot Segelbacher; Helen Senn; Jens Thaulow; Mia Valtonen; Andrew Veale; Philippine Vergeer; Nagarjun Vijay; Carles Vila; Matthias Weissensteiner; Lovisa Wennerstrom; Christopher W. Wheat; Piotr Zielinski

    2015-01-01

    The global loss of biodiversity continues at an alarming rate. Genomic approaches have been suggested as a promising tool for conservation practice as scaling up to genome-wide data can improve traditional conservation genetic inferences and provide qualitatively novel insights. However, the generation of genomic data and subsequent analyses and interpretations remain...

  6. A Conserved C-terminal Element in the Yeast Doa10 and Human MARCH6 Ubiquitin Ligases Required for Selective Substrate Degradation.

    PubMed

    Zattas, Dimitrios; Berk, Jason M; Kreft, Stefan G; Hochstrasser, Mark

    2016-06-03

    Specific proteins are modified by ubiquitin at the endoplasmic reticulum (ER) and are degraded by the proteasome, a process referred to as ER-associated protein degradation. In Saccharomyces cerevisiae, two principal ER-associated protein degradation ubiquitin ligases (E3s) reside in the ER membrane, Doa10 and Hrd1. The membrane-embedded Doa10 functions in the degradation of substrates in the ER membrane, nuclear envelope, cytoplasm, and nucleoplasm. How most E3 ligases, including Doa10, recognize their protein substrates remains poorly understood. Here we describe a previously unappreciated but highly conserved C-terminal element (CTE) in Doa10; this cytosolically disposed 16-residue motif follows the final transmembrane helix. A conserved CTE asparagine residue is required for ubiquitylation and degradation of a subset of Doa10 substrates. Such selectivity suggests that the Doa10 CTE is involved in substrate discrimination and not general ligase function. Functional conservation of the CTE was investigated in the human ortholog of Doa10, MARCH6 (TEB4), by analyzing MARCH6 autoregulation of its own degradation. Mutation of the conserved Asn residue (N890A) in the MARCH6 CTE stabilized the normally short lived enzyme to the same degree as a catalytically inactivating mutation (C9A). We also report the localization of endogenous MARCH6 to the ER using epitope tagging of the genomic MARCH6 locus by clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-mediated genome editing. These localization and CTE analyses support the inference that MARCH6 and Doa10 are functionally similar. Moreover, our results with the yeast enzyme suggest that the CTE is involved in the recognition and/or ubiquitylation of specific protein substrates. © 2016 by The American Society for Biochemistry and Molecular Biology, Inc.

  7. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

    PubMed Central

    Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

    2007-01-01

    Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571

  8. GenomeVista

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Poliakov, Alexander; Couronne, Olivier

    2002-11-04

    Aligning large vertebrate genomes that are structurally complex poses a variety of problems not encountered on smaller scales. Such genomes are rich in repetitive elements and contain multiple segmental duplications, which increases the difficulty of identifying true orthologous SNA segments in alignments. The sizes of the sequences make many alignment algorithms designed for comparing single proteins extremely inefficient when processing large genomic intervals. We integrated both local and global alignment tools and developed a suite of programs for automatically aligning large vertebrate genomes and identifying conserved non-coding regions in the alignments. Our method uses the BLAT local alignment program tomore » find anchors on the base genome to identify regions of possible homology for a query sequence. These regions are postprocessed to find the best candidates which are then globally aligned using the AVID global alignment program. In the last step conserved non-coding segments are identified using VISTA. Our methods are fast and the resulting alignments exhibit a high degree of sensitivity, covering more than 90% of known coding exons in the human genome. The GenomeVISTA software is a suite of Perl programs that is built on a MySQL database platform. The scheduler gets control data from the database, builds a queve of jobs, and dispatches them to a PC cluster for execution. The main program, running on each node of the cluster, processes individual sequences. A Perl library acts as an interface between the database and the above programs. The use of a separate library allows the programs to function independently of the database schema. The library also improves on the standard Perl MySQL database interfere package by providing auto-reconnect functionality and improved error handling.« less

  9. DNaseI Hypersensitivity and Ultraconservation Reveal Novel, Interdependent Long-Range Enhancers at the Complex Pax6 Cis-Regulatory Region

    PubMed Central

    McBride, David J.; Buckle, Adam; van Heyningen, Veronica; Kleinjan, Dirk A.

    2011-01-01

    The PAX6 gene plays a crucial role in development of the eye, brain, olfactory system and endocrine pancreas. Consistent with its pleiotropic role the gene exhibits a complex developmental expression pattern which is subject to strict spatial, temporal and quantitative regulation. Control of expression depends on a large array of cis-elements residing in an extended genomic domain around the coding region of the gene. The minimal essential region required for proper regulation of this complex locus has been defined through analysis of human aniridia-associated breakpoints and YAC transgenic rescue studies of the mouse smalleye mutant. We have carried out a systematic DNase I hypersensitive site (HS) analysis across 200 kb of this critical region of mouse chromosome 2E3 to identify putative regulatory elements. Mapping the identified HSs onto a percent identity plot (PIP) shows many HSs correspond to recognisable genomic features such as evolutionarily conserved sequences, CpG islands and retrotransposon derived repeats. We then focussed on a region previously shown to contain essential long range cis-regulatory information, the Pax6 downstream regulatory region (DRR), allowing comparison of mouse HS data with previous human HS data for this region. Reporter transgenic mice for two of the HS sites, HS5 and HS6, show that they function as tissue specific regulatory elements. In addition we have characterised enhancer activity of an ultra-conserved cis-regulatory region located near Pax6, termed E60. All three cis-elements exhibit multiple spatio-temporal activities in the embryo that overlap between themselves and other elements in the locus. Using a deletion set of YAC reporter transgenic mice we demonstrate functional interdependence of the elements. Finally, we use the HS6 enhancer as a marker for the migration of precerebellar neuro-epithelium cells to the hindbrain precerebellar nuclei along the posterior and anterior extramural streams allowing visualisation of migratory defects in both pathways in Pax6Sey/Sey mice. PMID:22220192

  10. Construction and sequence sampling of deep-coverage, large-insert BAC libraries for three model lepidopteran species

    PubMed Central

    Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R

    2009-01-01

    Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes. PMID:19558662

  11. Complete chloroplast genome of Trachelium caeruleum: extensiverearrangements are associated with repeats and tRNAs

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Haberle, Rosemarie C.; Fourcade, Matthew L.; Boore, Jeffrey L.

    2006-01-09

    Chloroplast genome structure, gene order and content arehighly conserved in land plants. We sequenced the complete chloroplastgenome sequence of Trachelium caeruleum (Campanulaceae) a member of anangiosperm family known for highly rearranged chloroplast genomes. Thetotal genome size is 162,321 bp with an IR of 27,273 bp, LSC of 100,113bp and SSC of 7,661 bp. The genome encodes 115 unique genes, with 19duplicated in the IR, a tRNA (trnI-CAU) duplicated once in the LSC and aprotein coding gene (psbJ) duplicated twice, for a total of 137 genes.Four genes (ycf15, rpl23, infA and accD) are truncated and likelynonfunctional; three others (clpP, ycf1 andmore » ycf2) are so highly divergedthat they may now be pseudogenes. The most conspicuous feature of theTrachelium genome is the presence of eighteen internally unrearrangedblocks of genes that have been inverted or relocated within the genome,relative to the typical gene order of most angiosperm chloroplastgenomes. Recombination between repeats or tRNAs has been suggested as twomeans of chloroplast genome rearrangements. We compared the relativenumber of repeats in Trachelium to eight other angiosperm chloroplastgenomes, and evaluated the location of repeats and tRNAs in relation torearrangements. Trachelium has the highest number and largest repeats,which are concentrated near inversion endpoints or other rearrangements.tRNAs occur at many but not all inversion endpoints. There is likely nosingle mechanism responsible for the remarkable number of alterations inthis genome, but both repeats and tRNAs are clearly associated with theserearrangements. Land plant chloroplast genomes are highly conserved instructure, gene order and content. The chloroplast genomes of ferns, thegymnosperm Ginkgo, and most angiosperms are nearly collinear, reflectingthe gene order in lineages that diverged from lycopsids and the ancestralchloroplast gene order over 350 million years ago (Raubeson and Jansen,1992). Although earlier mapping studies identified a number of taxa inwhich several rearrangements have occurred (reviewed in Raubeson andJansen, 2005), an extraordinary number of chloroplast genome alterationsare concentrated in several families in the angiosperm order Asterales(sensu APGII, Bremer et al., 2003). Gene mapping studies ofrepresentatives of the Campanulaceae (Cosner, 1993; Cosner et al.,1997,2004) and Lobeliaceae (Knox et al., 1993; Knox and Palmer, 1999)identified large inversions, contraction and expansion of the invertedrepeat regions, and several insertions and deletions in the cpDNAs ofthese closely related taxa. Detailed restriction site and gene mapping ofthe chloroplast genome of Trachelium caeruleum (Campanulaceae) identifiedseven to ten large inversions, families of repeats associated withrearrangements, possible transpositions, and even the disruption ofoperons (Cosner et al., 1997). Seventeen other members of theCampanulaceae were mapped and exhibit many additional rearrangements(Cosner et al., 2004). What happened in this lineage that made itsusceptible to so many chloroplast genome rearrangements? How do normallyvery conserved chloroplast genomes change? The cause of rearrangements inthis group is unclear based on the limited resolution available withmapping techniques. Several mechanisms have been proposed to explain howrearrangements occur: recombination between repeats, transposition, ortemporary instability due to loss of the inverted repeat (Raubeson andJansen, 2005). Sequencing whole chloroplast genomes within theCampanulaceae offers a unique opportunity to examine both the extent andmechanisms of rearrangements within a phylogenetic framework.We reporthere the first complete chloroplast genome sequence of a member of theCampanulaceae, Trachelium caeruleum. This work will serve as a benchmarkfor subsequent, comparative sequencing and analysis of other members ofthis family and close relatives, with the goal of further understandingchloroplast genome evolution. We confirmed features previously identifiedthrough mapping, and discovered many additional structural changes,including several partial to entire gene duplications, deterioration ofat least four normally conserved chloroplast genes into gene fragments,and the nature and position of numerous repeat elements at or nearinversion endpoints. The focus of this paper is on analyses of sequencesat or near these rearrangements in Trachelium caeruleum. Inversions arebelieved to occur due to the presence of repeat elements subject tohomologous recombination (Palmer, 1991; Knox et al., 1993). Repeats mayfacilitate inversions or other genome rearrangements (Achaz et al.,2003), and higher incidences of repeats have been correlated with greaternumbers of rearrangements (Rocha, 2003). Alternatively, repeats mayproliferate within a genome asa result of DNA strand repair mechanismsfollowing a rearrangement event such as an inversion. Gene« less

  12. MpSaci is a widespread gypsy-Ty3 retrotransposon highly represented by non-autonomous copies in the Moniliophthora perniciosa genome.

    PubMed

    Pereira, Jorge F; Araújo, Elza F; Brommonschenkel, Sérgio H; Queiroz, Casley B; Costa, Gustavo G L; Carazzolle, Marcelo F; Pereira, Gonçalo A G; Queiroz, Marisa V

    2015-05-01

    Transposons are an important source of genetic variation. The phytopathogen Moniliophthora perniciosa shows high level of variability but little is known about the role of class I elements in shaping its genome. In this work, we aimed the characterization of a new gypsy/Ty3 retrotransposon species, named MpSaci, in the M. perniciosa genome. These elements are largely variable in size, ranging from 4 to 15 kb, and harbor direct long terminal repeats (LTRs) with varying degrees of similarity. Approximately, all of the copies are non-autonomous as shifts in the reading frame and stop codons were detected. Only two elements (MpSaci6 and MpSaci9) code for GAG and POL proteins that possess functional domains. Conserved domains that are typically not found in retrotransposons were detected and could potentially impact the expression of neighbor genes. Solo LTRs and several LARDs (large retrotransposon derivative) were detected. Unusual elements containing small sequences with or without interruptions that are similar to gag or different pol domains and presenting LTRs with different levels of similarities were identified. Methylation was observed in MpSaci reverse transcriptase sequences. Distribution analysis indicates that MpSaci elements are present in high copy number in the genomes of C-, S- and L-biotypes of M. perniciosa. In addition, C-biotype isolates originating from the state of Bahia have fragments in common with isolates from the Amazon region and two hybridization profiles related to two chromosomal groups. RT-PCR analysis reveals that the gag gene is constitutively expressed and that the expression is increased at least three-fold with nutrient depravation even though no new insertion were observed. These findings point out that MpSaci collaborated and, even though is primarily represented by non-autonomous elements, still might contribute to the generation of genetic variability in the most important cacao pathogen in Brazil.

  13. Molecular epidemiology of Epizootic haematopoietic necrosis virus (EHNV).

    PubMed

    Hick, Paul M; Subramaniam, Kuttichantran; Thompson, Patrick M; Waltzek, Thomas B; Becker, Joy A; Whittington, Richard J

    2017-11-01

    Low genetic diversity of Epizootic haematopoietic necrosis virus (EHNV) was determined for the complete genome of 16 isolates spanning the natural range of hosts, geography and time since the first outbreaks of disease. Genomes ranged from 125,591-127,487 nucleotides with 97.47% pairwise identity and 106-109 genes. All isolates shared 101 core genes with 121 potential genes predicted within the pan-genome of this collection. There was high conservation within 90,181 nucleotides of the core genes with isolates separated by average genetic distance of 3.43 × 10 -4 substitutions per site. Evolutionary analysis of the core genome strongly supported historical epidemiological evidence of iatrogenic spread of EHNV to naïve hosts and establishment of endemic status in discrete ecological niches. There was no evidence of structural genome reorganization, however, the complement of non-core genes and variation in repeat elements enabled fine scale molecular epidemiological investigation of this unpredictable pathogen of fish. Copyright © 2017 Elsevier Inc. All rights reserved.

  14. The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone.

    PubMed

    Lee, Kevin C; Stott, Matthew B; Dunfield, Peter F; Huttenhower, Curtis; McDonald, Ian R; Morgan, Xochitl C

    2016-06-15

    Chthonomonas calidirosea T49(T) is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. Copyright © 2016 Lee et al.

  15. The Chthonomonas calidirosea Genome Is Highly Conserved across Geographic Locations and Distinct Chemical and Microbial Environments in New Zealand's Taupō Volcanic Zone

    PubMed Central

    Lee, Kevin C.; Stott, Matthew B.; Dunfield, Peter F.; Huttenhower, Curtis; McDonald, Ian R.

    2016-01-01

    ABSTRACT Chthonomonas calidirosea T49T is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. IMPORTANCE This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes. It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. PMID:27060125

  16. Comparative genomics of Eucalyptus and Corymbia reveals low rates of genome structural rearrangement.

    PubMed

    Butler, J B; Vaillancourt, R E; Potts, B M; Lee, D J; King, G J; Baten, A; Shepherd, M; Freeman, J S

    2017-05-22

    Previous studies suggest genome structure is largely conserved between Eucalyptus species. However, it is unknown if this conservation extends to more divergent eucalypt taxa. We performed comparative genomics between the eucalypt genera Eucalyptus and Corymbia. Our results will facilitate transfer of genomic information between these important taxa and provide further insights into the rate of structural change in tree genomes. We constructed three high density linkage maps for two Corymbia species (Corymbia citriodora subsp. variegata and Corymbia torelliana) which were used to compare genome structure between both species and Eucalyptus grandis. Genome structure was highly conserved between the Corymbia species. However, the comparison of Corymbia and E. grandis suggests large (from 1-13 MB) intra-chromosomal rearrangements have occurred on seven of the 11 chromosomes. Most rearrangements were supported through comparisons of the three independent Corymbia maps to the E. grandis genome sequence, and to other independently constructed Eucalyptus linkage maps. These are the first large scale chromosomal rearrangements discovered between eucalypts. Nonetheless, in the general context of plants, the genomic structure of the two genera was remarkably conserved; adding to a growing body of evidence that conservation of genome structure is common amongst woody angiosperms.

  17. Comparative population genetics of a mimicry locus among hybridizing Heliconius butterfly species.

    PubMed

    Chamberlain, N L; Hill, R I; Baxter, S W; Jiggins, C D; Kronforst, M R

    2011-09-01

    The comimetic Heliconius butterfly species pair, H. erato and H. melpomene, appear to use a conserved Mendelian switch locus to generate their matching red wing patterns. Here we investigate whether H. cydno and H. pachinus, species closely related to H. melpomene, use this same switch locus to generate their highly divergent red and brown color pattern elements. Using an F2 intercross between H. cydno and H. pachinus, we first map the genomic positions of two novel red/brown wing pattern elements; the G locus, which controls the presence of red vs brown at the base of the ventral wings, and the Br locus, which controls the presence vs absence of a brown oval pattern on the ventral hind wing. The results reveal that the G locus is tightly linked to markers in the genomic interval that controls red wing pattern elements of H. erato and H. melpomene. Br is on the same linkage group but approximately 26 cM away. Next, we analyze fine-scale patterns of genetic differentiation and linkage disequilibrium throughout the G locus candidate interval in H. cydno, H. pachinus and H. melpomene, and find evidence for elevated differentiation between H. cydno and H. pachinus, but no localized signature of association. Overall, these results indicate that the G locus maps to the same interval as the locus controlling red patterning in H. melpomene and H. erato. This, in turn, suggests that the genes controlling red pattern elements may be homologous across Heliconius, supporting the hypothesis that Heliconius butterflies use a limited suite of conserved genetic switch loci to generate both convergent and divergent wing patterns.

  18. Constraints on genes shape long-term conservation of macro-synteny in metazoan genomes.

    PubMed

    Lv, Jie; Havlak, Paul; Putnam, Nicholas H

    2011-10-05

    Many metazoan genomes conserve chromosome-scale gene linkage relationships ("macro-synteny") from the common ancestor of multicellular animal life 1234, but the biological explanation for this conservation is still unknown. Double cut and join (DCJ) is a simple, well-studied model of neutral genome evolution amenable to both simulation and mathematical analysis 5, but as we show here, it is not sufficent to explain long-term macro-synteny conservation. We examine a family of simple (one-parameter) extensions of DCJ to identify models and choices of parameters consistent with the levels of macro- and micro-synteny conservation observed among animal genomes. Our software implements a flexible strategy for incorporating genomic context into the DCJ model to incorporate various types of genomic context ("DCJ-[C]"), and is available as open source software from http://github.com/putnamlab/dcj-c. A simple model of genome evolution, in which DCJ moves are allowed only if they maintain chromosomal linkage among a set of constrained genes, can simultaneously account for the level of macro-synteny conservation and for correlated conservation among multiple pairs of species. Simulations under this model indicate that a constraint on approximately 7% of metazoan genes is sufficient to constrain genome rearrangement to an average rate of 25 inversions and 1.7 translocations per million years.

  19. Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana).

    PubMed

    Baurens, Franc-Christophe; Bocs, Stéphanie; Rouard, Mathieu; Matsumoto, Takashi; Miller, Robert N G; Rodier-Goud, Marguerite; MBéguié-A-MBéguié, Didier; Yahiaoui, Nabila

    2010-07-16

    Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana.

  20. The Helicase Aquarius/EMB-4 Is Required to Overcome Intronic Barriers to Allow Nuclear RNAi Pathways to Heritably Silence Transcription.

    PubMed

    Akay, Alper; Di Domenico, Tomas; Suen, Kin M; Nabih, Amena; Parada, Guillermo E; Larance, Mark; Medhi, Ragini; Berkyurek, Ahmet C; Zhang, Xinlian; Wedeles, Christopher J; Rudolph, Konrad L M; Engelhardt, Jan; Hemberg, Martin; Ma, Ping; Lamond, Angus I; Claycomb, Julie M; Miska, Eric A

    2017-08-07

    Small RNAs play a crucial role in genome defense against transposable elements and guide Argonaute proteins to nascent RNA transcripts to induce co-transcriptional gene silencing. However, the molecular basis of this process remains unknown. Here, we identify the conserved RNA helicase Aquarius/EMB-4 as a direct and essential link between small RNA pathways and the transcriptional machinery in Caenorhabditis elegans. Aquarius physically interacts with the germline Argonaute HRDE-1. Aquarius is required to initiate small-RNA-induced heritable gene silencing. HRDE-1 and Aquarius silence overlapping sets of genes and transposable elements. Surprisingly, removal of introns from a target gene abolishes the requirement for Aquarius, but not HRDE-1, for small RNA-dependent gene silencing. We conclude that Aquarius allows small RNA pathways to compete for access to nascent transcripts undergoing co-transcriptional splicing in order to detect and silence transposable elements. Thus, Aquarius and HRDE-1 act as gatekeepers coordinating gene expression and genome defense. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  1. The identification and functional annotation of RNA structures conserved in vertebrates

    PubMed Central

    Seemann, Stefan E.; Mirza, Aashiq H.; Hansen, Claus; Bang-Berthelsen, Claus H.; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T.; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L.; Gorodkin, Jan

    2017-01-01

    Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. PMID:28487280

  2. Organisation of the plant genome in chromosomes.

    PubMed

    Heslop-Harrison, J S Pat; Schwarzacher, Trude

    2011-04-01

    The plant genome is organized into chromosomes that provide the structure for the genetic linkage groups and allow faithful replication, transcription and transmission of the hereditary information. Genome sizes in plants are remarkably diverse, with a 2350-fold range from 63 to 149,000 Mb, divided into n=2 to n= approximately 600 chromosomes. Despite this huge range, structural features of chromosomes like centromeres, telomeres and chromatin packaging are well-conserved. The smallest genomes consist of mostly coding and regulatory DNA sequences present in low copy, along with highly repeated rDNA (rRNA genes and intergenic spacers), centromeric and telomeric repetitive DNA and some transposable elements. The larger genomes have similar numbers of genes, with abundant tandemly repeated sequence motifs, and transposable elements alone represent more than half the DNA present. Chromosomes evolve by fission, fusion, duplication and insertion events, allowing evolution of chromosome size and chromosome number. A combination of sequence analysis, genetic mapping and molecular cytogenetic methods with comparative analysis, all only becoming widely available in the 21st century, is elucidating the exact nature of the chromosome evolution events at all timescales, from the base of the plant kingdom, to intraspecific or hybridization events associated with recent plant breeding. As well as being of fundamental interest, understanding and exploiting evolutionary mechanisms in plant genomes is likely to be a key to crop development for food production. © 2011 The Authors. The Plant Journal © 2011 Blackwell Publishing Ltd.

  3. Comparative promoter analysis allows de novo identification of specialized cell junction-associated proteins.

    PubMed

    Cohen, Clemens D; Klingenhoff, Andreas; Boucherot, Anissa; Nitsche, Almut; Henger, Anna; Brunner, Bodo; Schmid, Holger; Merkle, Monika; Saleem, Moin A; Koller, Klaus-Peter; Werner, Thomas; Gröne, Hermann-Josef; Nelson, Peter J; Kretzler, Matthias

    2006-04-11

    Shared transcription factor binding sites that are conserved in distance and orientation help control the expression of gene products that act together in the same biological context. New bioinformatics approaches allow the rapid characterization of shared promoter structures and can be used to find novel interacting molecules. Here, these principles are demonstrated by using molecules linked to the unique functional unit of the glomerular slit diaphragm. An evolutionarily conserved promoter model was generated by comparative genomics in the proximal promoter regions of the slit diaphragm-associated molecule nephrin. Phylogenetic promoter fingerprints of known elements of the slit diaphragm complex identified the nephrin model in the promoter region of zonula occludens-1 (ZO-1). Genome-wide scans using this promoter model effectively predicted a previously unrecognized slit diaphragm molecule, cadherin-5. Nephrin, ZO-1, and cadherin-5 mRNA showed stringent coexpression across a diverse set of human glomerular diseases. Comparative promoter analysis can identify regulatory pathways at work in tissue homeostasis and disease processes.

  4. The short interspersed repetitive element of Trypanosoma cruzi, SIRE, is part of VIPER, an unusual retroelement related to long terminal repeat retrotransposons

    PubMed Central

    Vázquez, Martín; Ben-Dov, Claudia; Lorenzi, Hernan; Moore, Troy; Schijman, Alejandro; Levin, Mariano J.

    2000-01-01

    The short interspersed repetitive element (SIRE) of Trypanosoma cruzi was first detected when comparing the sequences of loci that encode the TcP2β genes. It is present in about 1,500–3,000 copies per genome, depending on the strain, and it is distributed in all chromosomes. An initial analysis of SIRE sequences from 21 genomic fragments allowed us to derive a consensus nucleotide sequence and structure for the element, consisting of three regions (I, II, and III) each harboring distinctive features. Analysis of 158 transcribed SIREs demonstrates that the consensus is highly conserved. The sequences of 51 cDNAs show that SIRE is included in the 3′ end of several mRNAs, always transcribed from the sense strand, contributing the polyadenylation site in 63% of the cases. This study led to the characterization of VIPER (vestigial interposed retroelement), a 2,326-bp-long unusual retroelement. VIPER's 5′ end is formed by the first 182 bp of SIRE, whereas its 3′ end is formed by the last 220 bp of the element. Both SIRE moieties are connected by a 1,924-bp-long fragment that carries a unique ORF encoding a complete reverse transcriptase-RNase H gene whose 15 C-terminal amino acids derive from codons specified by SIRE's region II. The amino acid sequence of VIPER's reverse transcriptase-RNase H shares significant homology to that of long terminal repeat retrotransposons. The fact that SIRE and VIPER sequences are found only in the T. cruzi genome may be of relevance for studies concerning the evolution and the genome flexibility of this protozoan parasite. PMID:10688909

  5. Structural and sequence diversity of the transposon Galileo in the Drosophila willistoni genome.

    PubMed

    Gonçalves, Juliana W; Valiati, Victor Hugo; Delprat, Alejandra; Valente, Vera L S; Ruiz, Alfredo

    2014-09-13

    Galileo is one of three members of the P superfamily of DNA transposons. It was originally discovered in Drosophila buzzatii, in which three segregating chromosomal inversions were shown to have been generated by ectopic recombination between Galileo copies. Subsequently, Galileo was identified in six of 12 sequenced Drosophila genomes, indicating its widespread distribution within this genus. Galileo is strikingly abundant in Drosophila willistoni, a neotropical species that is highly polymorphic for chromosomal inversions, suggesting a role for this transposon in the evolution of its genome. We carried out a detailed characterization of all Galileo copies present in the D. willistoni genome. A total of 191 copies, including 133 with two terminal inverted repeats (TIRs), were classified according to structure in six groups. The TIRs exhibited remarkable variation in their length and structure compared to the most complete copy. Three copies showed extended TIRs due to internal tandem repeats, the insertion of other transposable elements (TEs), or the incorporation of non-TIR sequences into the TIRs. Phylogenetic analyses of the transposase (TPase)-encoding and TIR segments yielded two divergent clades, which we termed Galileo subfamilies V and W. Target-site duplications (TSDs) in D. willistoni Galileo copies were 7- or 8-bp in length, with the consensus sequence GTATTAC. Analysis of the region around the TSDs revealed a target site motif (TSM) with a 15-bp palindrome that may give rise to a stem-loop secondary structure. There is a remarkable abundance and diversity of Galileo copies in the D. willistoni genome, although no functional copies were found. The TIRs in particular have a dynamic structure and extend in different ways, but their ends (required for transposition) are more conserved than the rest of the element. The D. willistoni genome harbors two Galileo subfamilies (V and W) that diverged ~9 million years ago and may have descended from an ancestral element in the genome. Galileo shows a significant insertion preference for a 15-bp palindromic TSM.

  6. Multiple genome alignment for identifying the core structure among moderately related microbial genomes.

    PubMed

    Uchiyama, Ikuo

    2008-10-31

    Identifying the set of intrinsically conserved genes, or the genomic core, among related genomes is crucial for understanding prokaryotic genomes where horizontal gene transfers are common. Although core genome identification appears to be obvious among very closely related genomes, it becomes more difficult when more distantly related genomes are compared. Here, we consider the core structure as a set of sufficiently long segments in which gene orders are conserved so that they are likely to have been inherited mainly through vertical transfer, and developed a method for identifying the core structure by finding the order of pre-identified orthologous groups (OGs) that maximally retains the conserved gene orders. The method was applied to genome comparisons of two well-characterized families, Bacillaceae and Enterobacteriaceae, and identified their core structures comprising 1438 and 2125 OGs, respectively. The core sets contained most of the essential genes and their related genes, which were primarily included in the intersection of the two core sets comprising around 700 OGs. The definition of the genomic core based on gene order conservation was demonstrated to be more robust than the simpler approach based only on gene conservation. We also investigated the core structures in terms of G+C content homogeneity and phylogenetic congruence, and found that the core genes primarily exhibited the expected characteristic, i.e., being indigenous and sharing the same history, more than the non-core genes. The results demonstrate that our strategy of genome alignment based on gene order conservation can provide an effective approach to identify the genomic core among moderately related microbial genomes.

  7. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development

    PubMed Central

    2011-01-01

    Background We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. Results The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Conclusions Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution. PMID:21854559

  8. Sequence analysis of the PIP5K locus in Eimeria maxima provides further evidence for eimerian genome plasticity and segmental organization.

    PubMed

    Song, B K; Pan, M Z; Lau, Y L; Wan, K L

    2014-07-29

    Commercial flocks infected by Eimeria species parasites, including Eimeria maxima, have an increased risk of developing clinical or subclinical coccidiosis; an intestinal enteritis associated with increased mortality rates in poultry. Currently, infection control is largely based on chemotherapy or live vaccines; however, drug resistance is common and vaccines are relatively expensive. The development of new cost-effective intervention measures will benefit from unraveling the complex genetic mechanisms that underlie host-parasite interactions, including the identification and characterization of genes encoding proteins such as phosphatidylinositol 4-phosphate 5-kinase (PIP5K). We previously identified a PIP5K coding sequence within the E. maxima genome. In this study, we analyzed two bacterial artificial chromosome clones presenting a ~145-kb E. maxima (Weybridge strain) genomic region spanning the PIP5K gene locus. Sequence analysis revealed that ~95% of the simple sequence repeats detected were located within regions comparable to the previously described feature-rich segments of the Eimeria tenella genome. Comparative sequence analysis with the orthologous E. maxima (Houghton strain) region revealed a moderate level of conserved synteny. Unique segmental organizations and telomere-like repeats were also observed in both genomes. A number of incomplete transposable elements were detected and further scrutiny of these elements in both orthologous segments revealed interesting nesting events, which may play a role in facilitating genome plasticity in E. maxima. The current analysis provides more detailed information about the genome organization of E. maxima and may help to reveal genotypic differences that are important for expression of traits related to pathogenicity and virulence.

  9. Genome sequence of an Australian kangaroo, Macropus eugenii, provides insight into the evolution of mammalian reproduction and development.

    PubMed

    Renfree, Marilyn B; Papenfuss, Anthony T; Deakin, Janine E; Lindsay, James; Heider, Thomas; Belov, Katherine; Rens, Willem; Waters, Paul D; Pharo, Elizabeth A; Shaw, Geoff; Wong, Emily S W; Lefèvre, Christophe M; Nicholas, Kevin R; Kuroki, Yoko; Wakefield, Matthew J; Zenger, Kyall R; Wang, Chenwei; Ferguson-Smith, Malcolm; Nicholas, Frank W; Hickford, Danielle; Yu, Hongshi; Short, Kirsty R; Siddle, Hannah V; Frankenberg, Stephen R; Chew, Keng Yih; Menzies, Brandon R; Stringer, Jessica M; Suzuki, Shunsuke; Hore, Timothy A; Delbridge, Margaret L; Patel, Hardip R; Mohammadi, Amir; Schneider, Nanette Y; Hu, Yanqiu; O'Hara, William; Al Nadaf, Shafagh; Wu, Chen; Feng, Zhi-Ping; Cocks, Benjamin G; Wang, Jianghui; Flicek, Paul; Searle, Stephen M J; Fairley, Susan; Beal, Kathryn; Herrero, Javier; Carone, Dawn M; Suzuki, Yutaka; Sugano, Sumio; Toyoda, Atsushi; Sakaki, Yoshiyuki; Kondo, Shinji; Nishida, Yuichiro; Tatsumoto, Shoji; Mandiou, Ion; Hsu, Arthur; McColl, Kaighin A; Lansdell, Benjamin; Weinstock, George; Kuczek, Elizabeth; McGrath, Annette; Wilson, Peter; Men, Artem; Hazar-Rethinam, Mehlika; Hall, Allison; Davis, John; Wood, David; Williams, Sarah; Sundaravadanam, Yogi; Muzny, Donna M; Jhangiani, Shalini N; Lewis, Lora R; Morgan, Margaret B; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Nazareth, Lynne; Cree, Andrew; Fowler, Gerald; Kovar, Christie L; Dinh, Huyen H; Joshi, Vandita; Jing, Chyn; Lara, Fremiet; Thornton, Rebecca; Chen, Lei; Deng, Jixin; Liu, Yue; Shen, Joshua Y; Song, Xing-Zhi; Edson, Janette; Troon, Carmen; Thomas, Daniel; Stephens, Amber; Yapa, Lankesha; Levchenko, Tanya; Gibbs, Richard A; Cooper, Desmond W; Speed, Terence P; Fujiyama, Asao; Graves, Jennifer A M; O'Neill, Rachel J; Pask, Andrew J; Forrest, Susan M; Worley, Kim C

    2011-08-29

    We present the genome sequence of the tammar wallaby, Macropus eugenii, which is a member of the kangaroo family and the first representative of the iconic hopping mammals that symbolize Australia to be sequenced. The tammar has many unusual biological characteristics, including the longest period of embryonic diapause of any mammal, extremely synchronized seasonal breeding and prolonged and sophisticated lactation within a well-defined pouch. Like other marsupials, it gives birth to highly altricial young, and has a small number of very large chromosomes, making it a valuable model for genomics, reproduction and development. The genome has been sequenced to 2 × coverage using Sanger sequencing, enhanced with additional next generation sequencing and the integration of extensive physical and linkage maps to build the genome assembly. We also sequenced the tammar transcriptome across many tissues and developmental time points. Our analyses of these data shed light on mammalian reproduction, development and genome evolution: there is innovation in reproductive and lactational genes, rapid evolution of germ cell genes, and incomplete, locus-specific X inactivation. We also observe novel retrotransposons and a highly rearranged major histocompatibility complex, with many class I genes located outside the complex. Novel microRNAs in the tammar HOX clusters uncover new potential mammalian HOX regulatory elements. Analyses of these resources enhance our understanding of marsupial gene evolution, identify marsupial-specific conserved non-coding elements and critical genes across a range of biological systems, including reproduction, development and immunity, and provide new insight into marsupial and mammalian biology and genome evolution.

  10. The pineapple genome and the evolution of CAM photosynthesis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ming, Ray; VanBuren, Robert; Wai, Ching Man

    Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C 3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues.more » CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Lastly, we found pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C 3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.« less

  11. The pineapple genome and the evolution of CAM photosynthesis

    DOE PAGES

    Ming, Ray; VanBuren, Robert; Wai, Ching Man; ...

    2015-11-02

    Pineapple (Ananas comosus (L.) Merr.) is the most economically valuable crop possessing crassulacean acid metabolism (CAM), a photosynthetic carbon assimilation pathway with high water-use efficiency, and the second most important tropical fruit. We sequenced the genomes of pineapple varieties F153 and MD2 and a wild pineapple relative, Ananas bracteatus accession CB5. The pineapple genome has one fewer ancient whole-genome duplication event than sequenced grass genomes and a conserved karyotype with seven chromosomes from before the ρ duplication event. The pineapple lineage has transitioned from C 3 photosynthesis to CAM, with CAM-related genes exhibiting a diel expression pattern in photosynthetic tissues.more » CAM pathway genes were enriched with cis-regulatory elements associated with the regulation of circadian clock genes, providing the first cis-regulatory link between CAM and circadian clock regulation. Lastly, we found pineapple CAM photosynthesis evolved by the reconfiguration of pathways in C 3 plants, through the regulatory neofunctionalization of preexisting genes and not through the acquisition of neofunctionalized genes via whole-genome or tandem gene duplication.« less

  12. Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

    PubMed Central

    Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

    2000-01-01

    The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409

  13. Sequence and functional characterization of MIRNA164 promoters from Brassica shows copy number dependent regulatory diversification among homeologs.

    PubMed

    Jain, Aditi; Anand, Saurabh; Singh, Neer K; Das, Sandip

    2018-03-12

    The impact of polyploidy on functional diversification of cis-regulatory elements is poorly understood. This is primarily on account of lack of well-defined structure of cis-elements and a universal regulatory code. To the best of our knowledge, this is the first report on characterization of sequence and functional diversification of paralogous and homeologous promoter elements associated with MIR164 from Brassica. The availability of whole genome sequence allowed us to identify and isolate a total of 42 homologous copies of MIR164 from diploid species-Brassica rapa (A-genome), Brassica nigra (B-genome), Brassica oleracea (C-genome), and allopolyploids-Brassica juncea (AB-genome), Brassica carinata (BC-genome) and Brassica napus (AC-genome). Additionally, we retrieved homologous sequences based on comparative genomics from Arabidopsis lyrata, Capsella rubella, and Thellungiella halophila, spanning ca. 45 million years of evolutionary history of Brassicaceae. Sequence comparison across Brassicaceae revealed lineage-, karyotype, species-, and sub-genome specific changes providing a snapshot of evolutionary dynamics of miRNA promoters in polyploids. Tree topology of cis-elements associated with MIR164 was found to re-capitulate the species and family evolutionary history. Phylogenetic shadowing identified transcription factor binding sites (TFBS) conserved across Brassicaceae, of which, some are already known as regulators of MIR164 expression. Some of the TFBS were found to be distributed in a sub-genome specific (e.g., SOX specific to promoter of MIR164c from MF2 sub-genome), lineage-specific (YABBY binding motif, specific to C. rubella in MIR164b), or species-specific (e.g., VOZ in A. thaliana MIR164a) manner which might contribute towards genetic and adaptive variation. Reporter activity driven by promoters associated with MIR164 paralogs and homeologs was majorly in agreement with known role of miR164 in leaf shaping, regulation of lateral root development and senescence, and one previously un-described novel role in trichome. The impact of polyploidy was most profound when reporter activity across three MIR164c homeologs were compared that revealed negligible overlap, whereas reporter activity among two homeologs of MIR164a displays significant overlap. A copy number dependent cis-regulatory divergence thus exists in MIR164 genes in Brassica juncea. The full extent of regulatory diversification towards adaptive strategies will only be known when future endeavors analyze the promoter function under duress of stress and hormonal regimes.

  14. An unusual internal ribosomal entry site of inverted symmetry directs expression of a potato leafroll polerovirus replication-associated protein

    PubMed Central

    Jaag, Hannah Miriam; Kawchuk, Lawrence; Rohde, Wolfgang; Fischer, Rainer; Emans, Neil; Prüfer, Dirk

    2003-01-01

    Potato leafroll polerovirus (PLRV) genomic RNA acts as a polycistronic mRNA for the production of proteins P0, P1, and P2 translated from the 5′-proximal half of the genome. Within the P1 coding region we identified a 5-kDa replication-associated protein 1 (Rap1) essential for viral multiplication. An internal ribosome entry site (IRES) with unusual structure and location was identified that regulates Rap1 translation. Core structural elements for internal ribosome entry include a conserved AUG codon and a downstream GGAGAGAGAGG motif with inverted symmetry. Reporter gene expression in potato protoplasts confirmed the internal ribosome entry function. Unlike known IRES motifs, the PLRV IRES is located completely within the coding region of Rap1 at the center of the PLRV genome. PMID:12835413

  15. TAD disruption as oncogenic driver

    PubMed Central

    Valton, Anne-Laure; Dekker, Job

    2016-01-01

    Topologically Associating Domains (TADs) are conserved during evolution and play roles in guiding and constraining long-range regulation of gene expression. Disruption of TAD boundaries results in aberrant gene expression by exposing genes to inappropriate regulatory elements. Recent studies have shown that TAD disruption is often found in cancer cells and contributes to oncogenesis through two mechanisms. One mechanism locally disrupts domains by deleting or mutating a TAD boundary leading to fusion of the two adjacent TADs. The other mechanism involves genomic rearrangements that break up TADs and creates new ones without directly affecting TAD boundaries. Understanding the mechanisms by which TADs form and control long-range chromatin interactions will therefore not only provide insights into the mechanism of gene regulation in general, but will also reveal how genomic rearrangements and mutations in cancer genomes can lead to misregulation of oncogenes and tumor suppressors. PMID:27111891

  16. The Ty1-copia LTR retroelement family PARTC is highly conserved in conifers over 200 MY of evolution.

    PubMed

    Zuccolo, Andrea; Scofield, Douglas G; De Paoli, Emanuele; Morgante, Michele

    2015-08-15

    Long Terminal Repeat retroelements (LTR-RTs) are a major component of many plant genomes. Although well studied and described in angiosperms, their features and dynamics are poorly understood in gymnosperms. Representative complete copies of a Ty1-copia element isolate in Picea abies and named PARTC were identified in six other conifer species (Picea glauca, Pinus sylvestris, Pinus taeda, Abies sibirica, Taxus baccata and Juniperus communis) covering more than 200 million years of evolution. Here we characterized the structure of this element, assessed its abundance across conifers, studied the modes and timing of its amplification, and evaluated the degree of conservation of its extant copies at nucleotide level over distant species. We demonstrated that the element is ancient, abundant, widespread and its paralogous copies are present in the genera Picea, Pinus and Abies as an LTR-RT family. The amplification leading to the extant copies of PARTC occurred over long evolutionary times spanning 10s of MY and mostly took place after the speciation of the conifers analyzed. The level of conservation of PARTC is striking and may be explained by low substitution rates and limited removal mechanisms for LTR-RTs. These PARTC features and dynamics are representative of a more general scenario for LTR-RTs in gymnosperms quite different from that characterizing the vast majority of LTR-RT elements in angiosperms. Copyright © 2015 Elsevier B.V. All rights reserved.

  17. Evolution of Nova-Dependent Splicing Regulation in the Brain

    PubMed Central

    Živin, Marko; Darnell, Robert B

    2007-01-01

    A large number of alternative exons are spliced with tissue-specific patterns, but little is known about how such patterns have evolved. Here, we study the conservation of the neuron-specific splicing factors Nova1 and Nova2 and of the alternatively spliced exons they regulate in mouse brain. Whereas Nova RNA binding domains are 94% identical across vertebrate species, Nova-dependent splicing silencer and enhancer elements (YCAY clusters) show much greater divergence, as less than 50% of mouse YCAY clusters are conserved at orthologous positions in the zebrafish genome. To study the relation between the evolution of tissue-specific splicing and YCAY clusters, we compared the brain-specific splicing of Nova-regulated exons in zebrafish, chicken, and mouse. The presence of YCAY clusters in lower vertebrates invariably predicted conservation of brain-specific splicing across species, whereas their absence in lower vertebrates correlated with a loss of alternative splicing. We hypothesize that evolution of Nova-regulated splicing in higher vertebrates proceeds mainly through changes in cis-acting elements, that tissue-specific splicing might in some cases evolve in a single step corresponding to evolution of a YCAY cluster, and that the conservation level of YCAY clusters relates to the functions encoded by the regulated RNAs. PMID:17937501

  18. Genomic profiling of rice sperm cell transcripts reveals conserved and distinct elements in the flowering plant male germ lineage.

    PubMed

    Russell, Scott D; Gou, Xiaoping; Wong, Chui E; Wang, Xinkun; Yuan, Tong; Wei, Xiaoping; Bhalla, Prem L; Singh, Mohan B

    2012-08-01

    Genomic assay of sperm cell RNA provides insight into functional control, modes of regulation, and contributions of male gametes to double fertilization. Sperm cells of rice (Oryza sativa) were isolated from field-grown, disease-free plants and RNA was processed for use with the full-genome Affymetrix microarray. Comparison with Gene Expression Omnibus (GEO) reference arrays confirmed expressionally distinct gene profiles. A total of 10,732 distinct gene sequences were detected in sperm cells, of which 1668 were not expressed in pollen or seedlings. Pathways enriched in male germ cells included ubiquitin-mediated pathways, pathways involved in chromatin modeling including histones, histone modification and nonhistone epigenetic modification, and pathways related to RNAi and gene silencing. Genome-wide expression patterns in angiosperm sperm cells indicate common and divergent themes in the male germline that appear to be largely self-regulating through highly up-regulated chromatin modification pathways. A core of highly conserved genes appear common to all sperm cells, but evidence is still emerging that another class of genes have diverged in expression between monocots and dicots since their divergence. Sperm cell transcripts present at fusion may be transmitted through plasmogamy during double fertilization to effect immediate post-fertilization expression of early embryo and (or) endosperm development. © 2012 The Authors. New Phytologist © 2012 New Phytologist Trust.

  19. Genome-Wide Identification of the Alba Gene Family in Plants and Stress-Responsive Expression of the Rice Alba Genes.

    PubMed

    Verma, Jitendra Kumar; Wardhan, Vijay; Singh, Deepali; Chakraborty, Subhra; Chakraborty, Niranjan

    2018-03-28

    Architectural proteins play key roles in genome construction and regulate the expression of many genes, albeit the modulation of genome plasticity by these proteins is largely unknown. A critical screening of the architectural proteins in five crop species, viz., Oryza sativa , Zea mays , Sorghum bicolor , Cicer arietinum , and Vitis vinifera , and in the model plant Arabidopsis thaliana along with evolutionary relevant species such as Chlamydomonas reinhardtii , Physcomitrella patens , and Amborella trichopoda , revealed 9, 20, 10, 7, 7, 6, 1, 4, and 4 Alba (acetylation lowers binding affinity) genes, respectively. A phylogenetic analysis of the genes and of their counterparts in other plant species indicated evolutionary conservation and diversification. In each group, the structural components of the genes and motifs showed significant conservation. The chromosomal location of the Alba genes of rice ( OsAlba ), showed an unequal distribution on 8 of its 12 chromosomes. The expression profiles of the OsAlba genes indicated a distinct tissue-specific expression in the seedling, vegetative, and reproductive stages. The quantitative real-time PCR (qRT-PCR) analysis of the OsAlba genes confirmed their stress-inducible expression under multivariate environmental conditions and phytohormone treatments. The evaluation of the regulatory elements in 68 Alba genes from the 9 species studied led to the identification of conserved motifs and overlapping microRNA (miRNA) target sites, suggesting the conservation of their function in related proteins and a divergence in their biological roles across species. The 3D structure and the prediction of putative ligands and their binding sites for OsAlba proteins offered a key insight into the structure-function relationship. These results provide a comprehensive overview of the subtle genetic diversification of the OsAlba genes, which will help in elucidating their functional role in plants.

  20. Heterogeneous conservation of Dlx paralog co-expression in jawed vertebrates.

    PubMed

    Debiais-Thibaud, Mélanie; Metcalfe, Cushla J; Pollack, Jacob; Germon, Isabelle; Ekker, Marc; Depew, Michael; Laurenti, Patrick; Borday-Birraux, Véronique; Casane, Didier

    2013-01-01

    The Dlx gene family encodes transcription factors involved in the development of a wide variety of morphological innovations that first evolved at the origins of vertebrates or of the jawed vertebrates. This gene family expanded with the two rounds of genome duplications that occurred before jawed vertebrates diversified. It includes at least three bigene pairs sharing conserved regulatory sequences in tetrapods and teleost fish, but has been only partially characterized in chondrichthyans, the third major group of jawed vertebrates. Here we take advantage of developmental and molecular tools applied to the shark Scyliorhinus canicula to fill in the gap and provide an overview of the evolution of the Dlx family in the jawed vertebrates. These results are analyzed in the theoretical framework of the DDC (Duplication-Degeneration-Complementation) model. The genomic organisation of the catshark Dlx genes is similar to that previously described for tetrapods. Conserved non-coding elements identified in bony fish were also identified in catshark Dlx clusters and showed regulatory activity in transgenic zebrafish. Gene expression patterns in the catshark showed that there are some expression sites with high conservation of the expressed paralog(s) and other expression sites with events of paralog sub-functionalization during jawed vertebrate diversification, resulting in a wide variety of evolutionary scenarios within this gene family. Dlx gene expression patterns in the catshark show that there has been little neo-functionalization in Dlx genes over gnathostome evolution. In most cases, one tandem duplication and two rounds of vertebrate genome duplication have led to at least six Dlx coding sequences with redundant expression patterns followed by some instances of paralog sub-functionalization. Regulatory constraints such as shared enhancers, and functional constraints including gene pleiotropy, may have contributed to the evolutionary inertia leading to high redundancy between gene expression patterns.

  1. Mavericks, a novel class of giant transposable elements widespread in eukaryotes and related to DNA viruses.

    PubMed

    Pritham, Ellen J; Putliwala, Tasneem; Feschotte, Cédric

    2007-04-01

    We previously identified a group of atypical mobile elements designated Mavericks from the nematodes Caenorhabditis elegans and C. briggsae and the zebrafish Danio rerio. Here we present the results of comprehensive database searches of the genome sequences available, which reveal that Mavericks are widespread in invertebrates and non-mammalian vertebrates but show a patchy distribution in non-animal species, being present in the fungi Glomus intraradices and Phakopsora pachyrhizi and in several single-celled eukaryotes such as the ciliate Tetrahymena thermophila, the stramenopile Phytophthora infestans and the trichomonad Trichomonas vaginalis, but not detectable in plants. This distribution, together with comparative and phylogenetic analyses of Maverick-encoded proteins, is suggestive of an ancient origin of these elements in eukaryotes followed by lineage-specific losses and/or recurrent episodes of horizontal transmission. In addition, we report that Maverick elements have amplified recently to high copy numbers in T. vaginalis where they now occupy as much as 30% of the genome. Sequence analysis confirms that most Mavericks encode a retroviral-like integrase, but lack other open reading frames typically found in retroelements. Nevertheless, the length and conservation of the target site duplication created upon Maverick insertion (5- or 6-bp) is consistent with a role of the integrase-like protein in the integration of a double-stranded DNA transposition intermediate. Mavericks also display long terminal-inverted repeats but do not contain ORFs similar to proteins encoded by DNA transposons. Instead, Mavericks encode a conserved set of 5 to 9 genes (in addition to the integrase) that are predicted to encode proteins with homology to replication and packaging proteins of some bacteriophages and diverse eukaryotic double-stranded DNA viruses, including a DNA polymerase B homolog and putative capsid proteins. Based on these and other structural similarities, we speculate that Mavericks represent an evolutionary missing link between seemingly disparate invasive DNA elements that include bacteriophages, adenoviruses and eukaryotic linear plasmids.

  2. Genome Sequencing of Sulfolobus sp. A20 from Costa Rica and Comparative Analyses of the Putative Pathways of Carbon, Nitrogen, and Sulfur Metabolism in Various Sulfolobus Strains.

    PubMed

    Dai, Xin; Wang, Haina; Zhang, Zhenfeng; Li, Kuan; Zhang, Xiaoling; Mora-López, Marielos; Jiang, Chengying; Liu, Chang; Wang, Li; Zhu, Yaxin; Hernández-Ascencio, Walter; Dong, Zhiyang; Huang, Li

    2016-01-01

    The genome of Sulfolobus sp. A20 isolated from a hot spring in Costa Rica was sequenced. This circular genome of the strain is 2,688,317 bp in size and 34.8% in G+C content, and contains 2591 open reading frames (ORFs). Strain A20 shares ~95.6% identity at the 16S rRNA gene sequence level and <30% DNA-DNA hybridization (DDH) values with the most closely related known Sulfolobus species (i.e., Sulfolobus islandicus and Sulfolobus solfataricus ), suggesting that it represents a novel Sulfolobus species. Comparison of the genome of strain A20 with those of the type strains of S. solfataricus, Sulfolobus acidocaldarius, S. islandicus , and Sulfolobus tokodaii , which were isolated from geographically separated areas, identified 1801 genes conserved among all Sulfolobus species analyzed (core genes). Comparative genome analyses show that central carbon metabolism in Sulfolobus is highly conserved, and enzymes involved in the Entner-Doudoroff pathway, the tricarboxylic acid cycle and the CO 2 fixation pathways are predominantly encoded by the core genes. All Sulfolobus species encode genes required for the conversion of ammonium into glutamate/glutamine. Some Sulfolobus strains have gained the ability to utilize additional nitrogen source such as nitrate (i.e., S. islandicus strain REY15A, LAL14/1, M14.25, and M16.27) or urea (i.e., S. islandicus HEV10/4, S. tokodaii strain7, and S. metallicus DSM 6482). The strategies for sulfur metabolism are most diverse and least understood. S. tokodaii encodes sulfur oxygenase/reductase (SOR), whereas both S. islandicus and S. solfataricus contain genes for sulfur reductase (SRE). However, neither SOR nor SRE genes exist in the genome of strain A20, raising the possibility that an unknown pathway for the utilization of elemental sulfur may be present in the strain. The ability of Sulfolobus to utilize nitrate or sulfur is encoded by a gene cluster flanked by IS elements or their remnants. These clusters appear to have become fixed at a specific genomic site in some strains and lost in other strains during the course of evolution. The versatility in nitrogen and sulfur metabolism may represent adaptation of Sulfolobus to thriving in different habitats.

  3. Genome Sequencing of Sulfolobus sp. A20 from Costa Rica and Comparative Analyses of the Putative Pathways of Carbon, Nitrogen, and Sulfur Metabolism in Various Sulfolobus Strains

    PubMed Central

    Dai, Xin; Wang, Haina; Zhang, Zhenfeng; Li, Kuan; Zhang, Xiaoling; Mora-López, Marielos; Jiang, Chengying; Liu, Chang; Wang, Li; Zhu, Yaxin; Hernández-Ascencio, Walter; Dong, Zhiyang; Huang, Li

    2016-01-01

    The genome of Sulfolobus sp. A20 isolated from a hot spring in Costa Rica was sequenced. This circular genome of the strain is 2,688,317 bp in size and 34.8% in G+C content, and contains 2591 open reading frames (ORFs). Strain A20 shares ~95.6% identity at the 16S rRNA gene sequence level and <30% DNA-DNA hybridization (DDH) values with the most closely related known Sulfolobus species (i.e., Sulfolobus islandicus and Sulfolobus solfataricus), suggesting that it represents a novel Sulfolobus species. Comparison of the genome of strain A20 with those of the type strains of S. solfataricus, Sulfolobus acidocaldarius, S. islandicus, and Sulfolobus tokodaii, which were isolated from geographically separated areas, identified 1801 genes conserved among all Sulfolobus species analyzed (core genes). Comparative genome analyses show that central carbon metabolism in Sulfolobus is highly conserved, and enzymes involved in the Entner-Doudoroff pathway, the tricarboxylic acid cycle and the CO2 fixation pathways are predominantly encoded by the core genes. All Sulfolobus species encode genes required for the conversion of ammonium into glutamate/glutamine. Some Sulfolobus strains have gained the ability to utilize additional nitrogen source such as nitrate (i.e., S. islandicus strain REY15A, LAL14/1, M14.25, and M16.27) or urea (i.e., S. islandicus HEV10/4, S. tokodaii strain7, and S. metallicus DSM 6482). The strategies for sulfur metabolism are most diverse and least understood. S. tokodaii encodes sulfur oxygenase/reductase (SOR), whereas both S. islandicus and S. solfataricus contain genes for sulfur reductase (SRE). However, neither SOR nor SRE genes exist in the genome of strain A20, raising the possibility that an unknown pathway for the utilization of elemental sulfur may be present in the strain. The ability of Sulfolobus to utilize nitrate or sulfur is encoded by a gene cluster flanked by IS elements or their remnants. These clusters appear to have become fixed at a specific genomic site in some strains and lost in other strains during the course of evolution. The versatility in nitrogen and sulfur metabolism may represent adaptation of Sulfolobus to thriving in different habitats. PMID:27965637

  4. Transposon fingerprinting using low coverage whole genome shotgun sequencing in Cacao (Theobroma cacao L.) and related species

    PubMed Central

    2013-01-01

    Background Transposable elements (TEs) and other repetitive elements are a large and dynamically evolving part of eukaryotic genomes, especially in plants where they can account for a significant proportion of genome size. Their dynamic nature gives them the potential for use in identifying and characterizing crop germplasm. However, their repetitive nature makes them challenging to study using conventional methods of molecular biology. Next generation sequencing and new computational tools have greatly facilitated the investigation of TE variation within species and among closely related species. Results (i) We generated low-coverage Illumina whole genome shotgun sequencing reads for multiple individuals of cacao (Theobroma cacao) and related species. These reads were analysed using both an alignment/mapping approach and a de novo (graph based clustering) approach. (ii) A standard set of ultra-conserved orthologous sequences (UCOS) standardized TE data between samples and provided phylogenetic information on the relatedness of samples. (iii) The mapping approach proved highly effective within the reference species but underestimated TE abundance in interspecific comparisons relative to the de novo methods. (iv) Individual T. cacao accessions have unique patterns of TE abundance indicating that the TE composition of the genome is evolving actively within this species. (v) LTR/Gypsy elements are the most abundant, comprising c.10% of the genome. (vi) Within T. cacao the retroelement families show an order of magnitude greater sequence variability than the DNA transposon families. (vii) Theobroma grandiflorum has a similar TE composition to T. cacao, but the related genus Herrania is rather different, with LTRs making up a lower proportion of the genome, perhaps because of a massive presence (c. 20%) of distinctive low complexity satellite-like repeats in this genome. Conclusions (i) Short read alignment/mapping to reference TE contigs provides a simple and effective method of investigating intraspecific differences in TE composition. It is not appropriate for comparing repetitive elements across the species boundaries, for which de novo methods are more appropriate. (ii) Individual T. cacao accessions have unique spectra of TE composition indicating active evolution of TE abundance within this species. TE patterns could potentially be used as a “fingerprint” to identify and characterize cacao accessions. PMID:23883295

  5. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation.

    PubMed

    Zhou, Jia; Sears, Renee L; Xing, Xiaoyun; Zhang, Bo; Li, Daofeng; Rockweiler, Nicole B; Jang, Hyo Sik; Choudhary, Mayank N K; Lee, Hyung Joo; Lowdon, Rebecca F; Arand, Jason; Tabers, Brianne; Gu, C Charles; Cicero, Theodore J; Wang, Ting

    2017-09-12

    Uncovering mechanisms of epigenome evolution is an essential step towards understanding the evolution of different cellular phenotypes. While studies have confirmed DNA methylation as a conserved epigenetic mechanism in mammalian development, little is known about the conservation of tissue-specific genome-wide DNA methylation patterns. Using a comparative epigenomics approach, we identified and compared the tissue-specific DNA methylation patterns of rat against those of mouse and human across three shared tissue types. We confirmed that tissue-specific differentially methylated regions are strongly associated with tissue-specific regulatory elements. Comparisons between species revealed that at a minimum 11-37% of tissue-specific DNA methylation patterns are conserved, a phenomenon that we define as epigenetic conservation. Conserved DNA methylation is accompanied by conservation of other epigenetic marks including histone modifications. Although a significant amount of locus-specific methylation is epigenetically conserved, the majority of tissue-specific DNA methylation is not conserved across the species and tissue types that we investigated. Examination of the genetic underpinning of epigenetic conservation suggests that primary sequence conservation is a driving force behind epigenetic conservation. In contrast, evolutionary dynamics of tissue-specific DNA methylation are best explained by the maintenance or turnover of binding sites for important transcription factors. Our study extends the limited literature of comparative epigenomics and suggests a new paradigm for epigenetic conservation without genetic conservation through analysis of transcription factor binding sites.

  6. Synteny conservation between the Prunus genome and both the present and ancestral Arabidopsis genomes

    PubMed Central

    Jung, Sook; Main, Dorrie; Staton, Margaret; Cho, Ilhyung; Zhebentyayeva, Tatyana; Arús, Pere; Abbott, Albert

    2006-01-01

    Background Due to the lack of availability of large genomic sequences for peach or other Prunus species, the degree of synteny conservation between the Prunus species and Arabidopsis has not been systematically assessed. Using the recently available peach EST sequences that are anchored to Prunus genetic maps and to peach physical map, we analyzed the extent of conserved synteny between the Prunus and the Arabidopsis genomes. The reconstructed pseudo-ancestral Arabidopsis genome, existed prior to the proposed recent polyploidy event, was also utilized in our analysis to further elucidate the evolutionary relationship. Results We analyzed the synteny conservation between the Prunus and the Arabidopsis genomes by comparing 475 peach ESTs that are anchored to Prunus genetic maps and their Arabidopsis homologs detected by sequence similarity. Microsyntenic regions were detected between all five Arabidopsis chromosomes and seven of the eight linkage groups of the Prunus reference map. An additional 1097 peach ESTs that are anchored to 431 BAC contigs of the peach physical map and their Arabidopsis homologs were also analyzed. Microsyntenic regions were detected in 77 BAC contigs. The syntenic regions from both data sets were short and contained only a couple of conserved gene pairs. The synteny between peach and Arabidopsis was fragmentary; all the Prunus linkage groups containing syntenic regions matched to more than two different Arabidopsis chromosomes, and most BAC contigs with multiple conserved syntenic regions corresponded to multiple Arabidopsis chromosomes. Using the same peach EST datasets and their Arabidopsis homologs, we also detected conserved syntenic regions in the pseudo-ancestral Arabidopsis genome. In many cases, the gene order and content of peach regions was more conserved in the ancestral genome than in the present Arabidopsis region. Statistical significance of each syntenic group was calculated using simulated Arabidopsis genome. Conclusion We report here the result of the first extensive analysis of the conserved microsynteny using DNA sequences across the Prunus genome and their Arabidopsis homologs. Our study also illustrates that both the ancestral and present Arabidopsis genomes can provide a useful resource for marker saturation and candidate gene search, as well as elucidating evolutionary relationships between species. PMID:16615871

  7. VRprofile: gene-cluster-detection-based profiling of virulence and antibiotic resistance traits encoded within genome sequences of pathogenic bacteria.

    PubMed

    Li, Jun; Tai, Cui; Deng, Zixin; Zhong, Weihong; He, Yongqun; Ou, Hong-Yu

    2017-01-10

    VRprofile is a Web server that facilitates rapid investigation of virulence and antibiotic resistance genes, as well as extends these trait transfer-related genetic contexts, in newly sequenced pathogenic bacterial genomes. The used backend database MobilomeDB was firstly built on sets of known gene cluster loci of bacterial type III/IV/VI/VII secretion systems and mobile genetic elements, including integrative and conjugative elements, prophages, class I integrons, IS elements and pathogenicity/antibiotic resistance islands. VRprofile is thus able to co-localize the homologs of these conserved gene clusters using HMMer or BLASTp searches. With the integration of the homologous gene cluster search module with a sequence composition module, VRprofile has exhibited better performance for island-like region predictions than the other widely used methods. In addition, VRprofile also provides an integrated Web interface for aligning and visualizing identified gene clusters with MobilomeDB-archived gene clusters, or a variety set of bacterial genomes. VRprofile might contribute to meet the increasing demands of re-annotations of bacterial variable regions, and aid in the real-time definitions of disease-relevant gene clusters in pathogenic bacteria of interest. VRprofile is freely available at http://bioinfo-mml.sjtu.edu.cn/VRprofile. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  8. Comparative analysis of the Hrp pathogenicity island of Rubus- and Spiraeoideae-infecting Erwinia amylovora strains identifies the IT region as a remnant of an integrative conjugative element.

    PubMed

    Mann, Rachel A; Blom, Jochen; Bühlmann, Andreas; Plummer, Kim M; Beer, Steven V; Luck, Joanne E; Goesmann, Alexander; Frey, Jürg E; Rodoni, Brendan C; Duffy, Brion; Smits, Theo H M

    2012-08-01

    The Hrp pathogenicity island (hrpPAI) of Erwinia amylovora not only encodes a type III secretion system (T3SS) and other genes required for pathogenesis on host plants, but also includes the so-called island transfer (IT) region, a region that originates from an integrative conjugative element (ICE). Comparative genomic analysis of the IT regions of two Spiraeoideae- and three Rubus-infecting strains revealed that the regions in Spiraeoideae-infecting strains were syntenic and highly conserved in length and genetic information, but that the IT regions of the Rubus-infecting strains varied in gene content and length, showing a mosaic structure. None of the ICEs in E. amylovora strains were complete, as conserved ICE genes and the left border were missing, probably due to reductive genome evolution. Comparison of the hrpPAI region of E. amylovora strains to syntenic regions from other Erwinia spp. indicates that the hrpPAI and the IT regions are the result of several insertion and deletion events that have occurred within the ICE. It also suggests that the T3SS was present in a common ancestor of the pathoadapted Erwinia spp. and that insertion and deletion events in the IT region occurred during speciation. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications

    PubMed Central

    Siegel, Nicol; Hoegg, Simone; Salzburger, Walter; Braasch, Ingo; Meyer, Axel

    2007-01-01

    Background The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. Results We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. Conclusion There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular – but possibly clusters of genes more generally – might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters. PMID:17822543

  10. Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine.

    PubMed

    Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

    2002-06-01

    We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5' of Xist that was recently shown to attract histone modification early after the onset of X inactivation.

  11. Identification of Cis-Acting Promoter Elements in Cold- and Dehydration-Induced Transcriptional Pathways in Arabidopsis, Rice, and Soybean

    PubMed Central

    Maruyama, Kyonoshin; Todaka, Daisuke; Mizoi, Junya; Yoshida, Takuya; Kidokoro, Satoshi; Matsukura, Satoko; Takasaki, Hironori; Sakurai, Tetsuya; Yamamoto, Yoshiharu Y.; Yoshiwara, Kyouko; Kojima, Mikiko; Sakakibara, Hitoshi; Shinozaki, Kazuo; Yamaguchi-Shinozaki, Kazuko

    2012-01-01

    The genomes of three plants, Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and soybean (Glycine max), have been sequenced, and their many genes and promoters have been predicted. In Arabidopsis, cis-acting promoter elements involved in cold- and dehydration-responsive gene expression have been extensively analysed; however, the characteristics of such cis-acting promoter sequences in cold- and dehydration-inducible genes of rice and soybean remain to be clarified. In this study, we performed microarray analyses using the three species, and compared characteristics of identified cold- and dehydration-inducible genes. Transcription profiles of the cold- and dehydration-responsive genes were similar among these three species, showing representative upregulated (dehydrin/LEA) and downregulated (photosynthesis-related) genes. All (46 = 4096) hexamer sequences in the promoters of the three species were investigated, revealing the frequency of conserved sequences in cold- and dehydration-inducible promoters. A core sequence of the abscisic acid-responsive element (ABRE) was the most conserved in dehydration-inducible promoters of all three species, suggesting that transcriptional regulation for dehydration-inducible genes is similar among these three species, with the ABRE-dependent transcriptional pathway. In contrast, for cold-inducible promoters, the conserved hexamer sequences were diversified among these three species, suggesting the existence of diverse transcriptional regulatory pathways for cold-inducible genes among the species. PMID:22184637

  12. Phylogenetic Distribution of CRISPR-Cas Systems in Antibiotic-Resistant Pseudomonas aeruginosa.

    PubMed

    van Belkum, Alex; Soriaga, Leah B; LaFave, Matthew C; Akella, Srividya; Veyrieras, Jean-Baptiste; Barbu, E Magda; Shortridge, Dee; Blanc, Bernadette; Hannum, Gregory; Zambardi, Gilles; Miller, Kristofer; Enright, Mark C; Mugnier, Nathalie; Brami, Daniel; Schicklin, Stéphane; Felderman, Martina; Schwartz, Ariel S; Richardson, Toby H; Peterson, Todd C; Hubby, Bolyn; Cady, Kyle C

    2015-11-24

    Pseudomonas aeruginosa is an antibiotic-refractory pathogen with a large genome and extensive genotypic diversity. Historically, P. aeruginosa has been a major model system for understanding the molecular mechanisms underlying type I clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated protein (CRISPR-Cas)-based bacterial immune system function. However, little information on the phylogenetic distribution and potential role of these CRISPR-Cas systems in molding the P. aeruginosa accessory genome and antibiotic resistance elements is known. Computational approaches were used to identify and characterize CRISPR-Cas systems within 672 genomes, and in the process, we identified a previously unreported and putatively mobile type I-C P. aeruginosa CRISPR-Cas system. Furthermore, genomes harboring noninhibited type I-F and I-E CRISPR-Cas systems were on average ~300 kb smaller than those without a CRISPR-Cas system. In silico analysis demonstrated that the accessory genome (n = 22,036 genes) harbored the majority of identified CRISPR-Cas targets. We also assembled a global spacer library that aided the identification of difficult-to-characterize mobile genetic elements within next-generation sequencing (NGS) data and allowed CRISPR typing of a majority of P. aeruginosa strains. In summary, our analysis demonstrated that CRISPR-Cas systems play an important role in shaping the accessory genomes of globally distributed P. aeruginosa isolates. P. aeruginosa is both an antibiotic-refractory pathogen and an important model system for type I CRISPR-Cas bacterial immune systems. By combining the genome sequences of 672 newly and previously sequenced genomes, we were able to provide a global view of the phylogenetic distribution, conservation, and potential targets of these systems. This analysis identified a new and putatively mobile P. aeruginosa CRISPR-Cas subtype, characterized the diverse distribution of known CRISPR-inhibiting genes, and provided a potential new use for CRISPR spacer libraries in accessory genome analysis. Our data demonstrated the importance of CRISPR-Cas systems in modulating the accessory genomes of globally distributed strains while also providing substantial data for subsequent genomic and experimental studies in multiple fields. Understanding why certain genotypes of P. aeruginosa are clinically prevalent and adept at horizontally acquiring virulence and antibiotic resistance elements is of major clinical and economic importance. Copyright © 2015 van Belkum et al.

  13. Genome-wide analysis of the DNA-binding with one zinc finger (Dof) transcription factor family in bananas.

    PubMed

    Dong, Chen; Hu, Huigang; Xie, Jianghui

    2016-12-01

    DNA-binding with one finger (Dof) domain proteins are a multigene family of plant-specific transcription factors involved in numerous aspects of plant growth and development. In this study, we report a genome-wide search for Musa acuminata Dof (MaDof) genes and their expression profiles at different developmental stages and in response to various abiotic stresses. In addition, a complete overview of the Dof gene family in bananas is presented, including the gene structures, chromosomal locations, cis-regulatory elements, conserved protein domains, and phylogenetic inferences. Based on the genome-wide analysis, we identified 74 full-length protein-coding MaDof genes unevenly distributed on 11 chromosomes. Phylogenetic analysis with Dof members from diverse plant species showed that MaDof genes can be classified into four subgroups (StDof I, II, III, and IV). The detailed genomic information of the MaDof gene homologs in the present study provides opportunities for functional analyses to unravel the exact role of the genes in plant growth and development.

  14. An integrated map of the genome of the tubercle bacillus, Mycobacterium tuberculosis H37Rv, and comparison with Mycobacterium leprae.

    PubMed Central

    Philipp, W J; Poulet, S; Eiglmeier, K; Pascopella, L; Balasubramanian, V; Heym, B; Bergh, S; Bloom, B R; Jacobs, W R; Cole, S T

    1996-01-01

    An integrated map of the genome of the tubercle bacillus, Mycobacterium tuberculosis, was constructed by using a twin-pronged approach. Pulsed-field gel electrophoretic analysis enabled cleavage sites for Asn I and Dra I to be positioned on the 4.4-Mb circular chromosome, while, in parallel, clones from two cosmid libraries were ordered into contigs by means of fingerprinting and hybridization mapping. The resultant contig map was readily correlated with the physical map of the genome via the landmarked restriction sites. Over 165 genes and markers were localized on the integrated map, thus enabling comparisons with the leprosy bacillus, Mycobacterium leprae, to be undertaken. Mycobacterial genomes appear to have evolved as mosaic structures since extended segments with conserved gene order and organization are interspersed with different flanking regions. Repetitive sequences and insertion elements are highly abundant in M. tuberculosis, but the distribution of IS6110 is apparently nonrandom. Images Fig. 1 Fig. 2 PMID:8610181

  15. Using genomics to characterize evolutionary potential for conservation of wild populations

    PubMed Central

    Harrisson, Katherine A; Pavlova, Alexandra; Telonis-Scott, Marina; Sunnucks, Paul

    2014-01-01

    Genomics promises exciting advances towards the important conservation goal of maximizing evolutionary potential, notwithstanding associated challenges. Here, we explore some of the complexity of adaptation genetics and discuss the strengths and limitations of genomics as a tool for characterizing evolutionary potential in the context of conservation management. Many traits are polygenic and can be strongly influenced by minor differences in regulatory networks and by epigenetic variation not visible in DNA sequence. Much of this critical complexity is difficult to detect using methods commonly used to identify adaptive variation, and this needs appropriate consideration when planning genomic screens, and when basing management decisions on genomic data. When the genomic basis of adaptation and future threats are well understood, it may be appropriate to focus management on particular adaptive traits. For more typical conservations scenarios, we argue that screening genome-wide variation should be a sensible approach that may provide a generalized measure of evolutionary potential that accounts for the contributions of small-effect loci and cryptic variation and is robust to uncertainty about future change and required adaptive response(s). The best conservation outcomes should be achieved when genomic estimates of evolutionary potential are used within an adaptive management framework. PMID:25553064

  16. The nuclear OXPHOS genes in insecta: a common evolutionary origin, a common cis-regulatory motif, a common destiny for gene duplicates

    PubMed Central

    Porcelli, Damiano; Barsanti, Paolo; Pesole, Graziano; Caggese, Corrado

    2007-01-01

    Background When orthologous sequences from species distributed throughout an optimal range of divergence times are available, comparative genomics is a powerful tool to address problems such as the identification of the forces that shape gene structure during evolution, although the functional constraints involved may vary in different genes and lineages. Results We identified and annotated in the MitoComp2 dataset the orthologs of 68 nuclear genes controlling oxidative phosphorylation in 11 Drosophilidae species and in five non-Drosophilidae insects, and compared them with each other and with their counterparts in three vertebrates (Fugu rubripes, Danio rerio and Homo sapiens) and in the cnidarian Nematostella vectensis, taking into account conservation of gene structure and regulatory motifs, and preservation of gene paralogs in the genome. Comparative analysis indicates that the ancestral insect OXPHOS genes were intron rich and that extensive intron loss and lineage-specific intron gain occurred during evolution. Comparison with vertebrates and cnidarians also shows that many OXPHOS gene introns predate the cnidarian/Bilateria evolutionary split. The nuclear respiratory gene element (NRG) has played a key role in the evolution of the insect OXPHOS genes; it is constantly conserved in the OXPHOS orthologs of all the insect species examined, while their duplicates either completely lack the element or possess only relics of the motif. Conclusion Our observations reinforce the notion that the common ancestor of most animal phyla had intron-rich gene, and suggest that changes in the pattern of expression of the gene facilitate the fixation of duplications in the genome and the development of novel genetic functions. PMID:18315839

  17. Analysis of a new homozygous deletion in the tumor suppressor region at 3p12.3 reveals two novel intronic noncoding RNA genes.

    PubMed

    Angeloni, Debora; ter Elst, Arja; Wei, Ming Hui; van der Veen, Anneke Y; Braga, Eleonora A; Klimov, Eugene A; Timmer, Tineke; Korobeinikova, Luba; Lerman, Michael I; Buys, Charles H C M

    2006-07-01

    Homozygous deletions or loss of heterozygosity (LOH) at human chromosome band 3p12 are consistent features of lung and other malignancies, suggesting the presence of a tumor suppressor gene(s) (TSG) at this location. Only one gene has been cloned thus far from the overlapping region deleted in lung and breast cancer cell lines U2020, NCI H2198, and HCC38. It is DUTT1 (Deleted in U Twenty Twenty), also known as ROBO1, FLJ21882, and SAX3, according to HUGO. DUTT1, the human ortholog of the fly gene ROBO, has homology with NCAM proteins. Extensive analyses of DUTT1 in lung cancer have not revealed any mutations, suggesting that another gene(s) at this location could be of importance in lung cancer initiation and progression. Here, we report the discovery of a new, small, homozygous deletion in the small cell lung cancer (SCLC) cell line GLC20, nested in the overlapping, critical region. The deletion was delineated using several polymorphic markers and three overlapping P1 phage clones. Fiber-FISH experiments revealed the deletion was approximately 130 kb. Comparative genomic sequence analysis uncovered short sequence elements highly conserved among mammalian genomes and the chicken genome. The discovery of two EST clusters within the deleted region led to the isolation of two noncoding RNA (ncRNA) genes. These were subsequently found differentially expressed in various tumors when compared to their normal tissues. The ncRNA and other highly conserved sequence elements in the deleted region may represent miRNA targets of importance in cancer initiation or progression. Published 2006 Wiley-Liss, Inc.

  18. The mariner transposons belonging to the irritans subfamily were maintained in chordate genomes by vertical transmission.

    PubMed

    Sinzelle, Ludivine; Chesneau, Albert; Bigot, Yves; Mazabraud, André; Pollet, Nicolas

    2006-01-01

    Mariner-like elements (MLEs) belong to the Tc1-mariner superfamily of DNA transposons, which is very widespread in animal genomes. We report here the first complete description of a MLE, Xtmar1, within the genome of a poikilotherm vertebrate, the amphibian Xenopus tropicalis. A close relative, XlMLE, is also characterized within the genome of a sibling species, Xenopus laevis. The phylogenetic analysis of the relationships between MLE transposases reveals that Xtmar1 is closely related to Hsmar2 and Bytmar1 and that together they form a second distinct lineage of the irritans subfamily. All members of this lineage are also characterized by the 36- to 43-bp size of their imperfectly conserved inverted terminal repeats and by the -8-bp motif located at their outer extremity. Since XlMLE, Xlmar1, and Hsmar2 are present in species located at both extremities of the vertebrate evolutionary tree, we looked for MLE relatives belonging to the same subfamily in the available sequencing projects using the amino acid consensus sequence of the Hsmar2 transposase as an in silico probe. We found that irritans MLEs are present in chordate genomes including most craniates. This therefore suggests that these elements have been present within chordate genomes for 750 Myr and that the main way they have been maintained in these species has been via vertical transmission. The very small number of stochastic losses observed in the data available suggests that their inactivation during evolution has been very slow.

  19. Three-Dimensional RNA Structure of the Major HIV-1 Packaging Signal Region

    PubMed Central

    Stephenson, James D.; Li, Haitao; Kenyon, Julia C.; Symmons, Martyn; Klenerman, Dave; Lever, Andrew M.L.

    2013-01-01

    Summary HIV-1 genomic RNA has a noncoding 5′ region containing sequential conserved structural motifs that control many parts of the life cycle. Very limited data exist on their three-dimensional (3D) conformation and, hence, how they work structurally. To assemble a working model, we experimentally reassessed secondary structure elements of a 240-nt region and used single-molecule distances, derived from fluorescence resonance energy transfer, between defined locations in these elements as restraints to drive folding of the secondary structure into a 3D model with an estimated resolution below 10 Å. The folded 3D model satisfying the data is consensual with short nuclear-magnetic-resonance-solved regions and reveals previously unpredicted motifs, offering insight into earlier functional assays. It is a 3D representation of this entire region, with implications for RNA dimerization and protein binding during regulatory steps. The structural information of this highly conserved region of the virus has the potential to reveal promising therapeutic targets. PMID:23685210

  20. Adenovirus Delivered Short Hairpin RNA Targeting a Conserved Site in the 5′ Non-Translated Region Inhibits All Four Serotypes of Dengue Viruses

    PubMed Central

    Korrapati, Anil Babu; Swaminathan, Gokul; Singh, Aarti; Khanna, Navin; Swaminathan, Sathyamangalam

    2012-01-01

    Background Dengue is a mosquito-borne viral disease caused by four closely related serotypes of Dengue viruses (DENVs). This disease whose symptoms range from mild fever to potentially fatal haemorrhagic fever and hypovolemic shock, threatens nearly half the global population. There is neither a preventive vaccine nor an effective antiviral therapy against dengue disease. The difference between severe and mild disease appears to be dependent on the viral load. Early diagnosis may enable timely therapeutic intervention to blunt disease severity by reducing the viral load. Harnessing the therapeutic potential of RNA interference (RNAi) to attenuate DENV replication may offer one approach to dengue therapy. Methodology/Principal Findings We screened the non-translated regions (NTRs) of the RNA genomes of representative members of the four DENV serotypes for putative siRNA targets mapping to known transcription/translation regulatory elements. We identified a target site in the 5′ NTR that maps to the 5′ upstream AUG region, a highly conserved cis-acting element essential for viral replication. We used a replication-defective human adenovirus type 5 (AdV5) vector to deliver a short-hairpin RNA (shRNA) targeting this site into cells. We show that this shRNA matures to the cognate siRNA and is able to inhibit effectively antigen secretion, viral RNA replication and infectious virus production by all four DENV serotypes. Conclusion/Significance The data demonstrate the feasibility of using AdV5-mediated delivery of shRNAs targeting conserved sites in the viral genome to achieve inhibition of all four DENV serotypes. This paves the way towards exploration of RNAi as a possible therapeutic strategy to curtail DENV infection. PMID:22848770

  1. Global mapping of DNA conformational flexibility on Saccharomyces cerevisiae.

    PubMed

    Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

    2015-04-01

    In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3'UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3'-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites.

  2. Global Mapping of DNA Conformational Flexibility on Saccharomyces cerevisiae

    PubMed Central

    Menconi, Giulia; Bedini, Andrea; Barale, Roberto; Sbrana, Isabella

    2015-01-01

    In this study we provide the first comprehensive map of DNA conformational flexibility in Saccharomyces cerevisiae complete genome. Flexibility plays a key role in DNA supercoiling and DNA/protein binding, regulating DNA transcription, replication or repair. Specific interest in flexibility analysis concerns its relationship with human genome instability. Enrichment in flexible sequences has been detected in unstable regions of human genome defined fragile sites, where genes map and carry frequent deletions and rearrangements in cancer. Flexible sequences have been suggested to be the determinants of fragile gene proneness to breakage; however, their actual role and properties remain elusive. Our in silico analysis carried out genome-wide via the StabFlex algorithm, shows the conserved presence of highly flexible regions in budding yeast genome as well as in genomes of other Saccharomyces sensu stricto species. Flexibile peaks in S. cerevisiae identify 175 ORFs mapping on their 3’UTR, a region affecting mRNA translation, localization and stability. (TA)n repeats of different extension shape the central structure of peaks and co-localize with polyadenylation efficiency element (EE) signals. ORFs with flexible peaks share common features. Transcripts are characterized by decreased half-life: this is considered peculiar of genes involved in regulatory systems with high turnover; consistently, their function affects biological processes such as cell cycle regulation or stress response. Our findings support the functional importance of flexibility peaks, suggesting that the flexible sequence may be derived by an expansion of canonical TAYRTA polyadenylation efficiency element. The flexible (TA)n repeat amplification could be the outcome of an evolutionary neofunctionalization leading to a differential 3’-end processing and expression regulation in genes with peculiar function. Our study provides a new support to the functional role of flexibility in genomes and a strategy for its characterization inside human fragile sites. PMID:25860149

  3. A Comparative Encyclopedia of DNA Elements in the Mouse Genome

    PubMed Central

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D.; Shen, Yin; Pervouchine, Dmitri D.; Djebali, Sarah; Thurman, Bob; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K.; Williams, Brian A.; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M. A.; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T.; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D.; Bansal, Mukul S.; Keller, Cheryl A.; Morrissey, Christapher S.; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S.; Cayting, Philip; Kawli, Trupti; Boyle, Alan P.; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S.; Cline, Melissa S.; Erickson, Drew T.; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A.; Rosenbloom, Kate R.; de Sousa, Beatriz Lacerda; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W. James; Santos, Miguel Ramalho; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J.; Wilken, Matthew S.; Reh, Thomas A.; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P.; Neph, Shane; Humbert, Richard; Hansen, R. Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E.; Orkin, Stuart H.; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J.; Blobel, Gerd A.; Good, Peter J.; Lowdon, Rebecca F.; Adams, Leslie B.; Zhou, Xiao-Qiao; Pazin, Michael J.; Feingold, Elise A.; Wold, Barbara; Taylor, James; Kellis, Manolis; Mortazavi, Ali; Weissman, Sherman M.; Stamatoyannopoulos, John; Snyder, Michael P.; Guigo, Roderic; Gingeras, Thomas R.; Gilbert, David M.; Hardison, Ross C.; Beer, Michael A.; Ren, Bing

    2014-01-01

    Summary As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases. PMID:25409824

  4. A comparative encyclopedia of DNA elements in the mouse genome.

    PubMed

    Yue, Feng; Cheng, Yong; Breschi, Alessandra; Vierstra, Jeff; Wu, Weisheng; Ryba, Tyrone; Sandstrom, Richard; Ma, Zhihai; Davis, Carrie; Pope, Benjamin D; Shen, Yin; Pervouchine, Dmitri D; Djebali, Sarah; Thurman, Robert E; Kaul, Rajinder; Rynes, Eric; Kirilusha, Anthony; Marinov, Georgi K; Williams, Brian A; Trout, Diane; Amrhein, Henry; Fisher-Aylor, Katherine; Antoshechkin, Igor; DeSalvo, Gilberto; See, Lei-Hoon; Fastuca, Meagan; Drenkow, Jorg; Zaleski, Chris; Dobin, Alex; Prieto, Pablo; Lagarde, Julien; Bussotti, Giovanni; Tanzer, Andrea; Denas, Olgert; Li, Kanwei; Bender, M A; Zhang, Miaohua; Byron, Rachel; Groudine, Mark T; McCleary, David; Pham, Long; Ye, Zhen; Kuan, Samantha; Edsall, Lee; Wu, Yi-Chieh; Rasmussen, Matthew D; Bansal, Mukul S; Kellis, Manolis; Keller, Cheryl A; Morrissey, Christapher S; Mishra, Tejaswini; Jain, Deepti; Dogan, Nergiz; Harris, Robert S; Cayting, Philip; Kawli, Trupti; Boyle, Alan P; Euskirchen, Ghia; Kundaje, Anshul; Lin, Shin; Lin, Yiing; Jansen, Camden; Malladi, Venkat S; Cline, Melissa S; Erickson, Drew T; Kirkup, Vanessa M; Learned, Katrina; Sloan, Cricket A; Rosenbloom, Kate R; Lacerda de Sousa, Beatriz; Beal, Kathryn; Pignatelli, Miguel; Flicek, Paul; Lian, Jin; Kahveci, Tamer; Lee, Dongwon; Kent, W James; Ramalho Santos, Miguel; Herrero, Javier; Notredame, Cedric; Johnson, Audra; Vong, Shinny; Lee, Kristen; Bates, Daniel; Neri, Fidencio; Diegel, Morgan; Canfield, Theresa; Sabo, Peter J; Wilken, Matthew S; Reh, Thomas A; Giste, Erika; Shafer, Anthony; Kutyavin, Tanya; Haugen, Eric; Dunn, Douglas; Reynolds, Alex P; Neph, Shane; Humbert, Richard; Hansen, R Scott; De Bruijn, Marella; Selleri, Licia; Rudensky, Alexander; Josefowicz, Steven; Samstein, Robert; Eichler, Evan E; Orkin, Stuart H; Levasseur, Dana; Papayannopoulou, Thalia; Chang, Kai-Hsin; Skoultchi, Arthur; Gosh, Srikanta; Disteche, Christine; Treuting, Piper; Wang, Yanli; Weiss, Mitchell J; Blobel, Gerd A; Cao, Xiaoyi; Zhong, Sheng; Wang, Ting; Good, Peter J; Lowdon, Rebecca F; Adams, Leslie B; Zhou, Xiao-Qiao; Pazin, Michael J; Feingold, Elise A; Wold, Barbara; Taylor, James; Mortazavi, Ali; Weissman, Sherman M; Stamatoyannopoulos, John A; Snyder, Michael P; Guigo, Roderic; Gingeras, Thomas R; Gilbert, David M; Hardison, Ross C; Beer, Michael A; Ren, Bing

    2014-11-20

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

  5. Genome Dynamics in Legionella: The Basis of Versatility and Adaptation to Intracellular Replication

    PubMed Central

    Gomez-Valero, Laura; Buchrieser, Carmen

    2013-01-01

    Legionella pneumophila is a bacterial pathogen present in aquatic environments that can cause a severe pneumonia called Legionnaires’ disease. Soon after its recognition, it was shown that Legionella replicates inside amoeba, suggesting that bacteria replicating in environmental protozoa are able to exploit conserved signaling pathways in human phagocytic cells. Comparative, evolutionary, and functional genomics suggests that the Legionella–amoeba interaction has shaped this pathogen more than previously thought. A complex evolutionary scenario involving mobile genetic elements, type IV secretion systems, and horizontal gene transfer among Legionella, amoeba, and other organisms seems to take place. This long-lasting coevolution led to the development of very sophisticated virulence strategies and a high level of temporal and spatial fine-tuning of bacteria host–cell interactions. We will discuss current knowledge of the evolution of virulence of Legionella from a genomics perspective and propose our vision of the emergence of this human pathogen from the environment. PMID:23732852

  6. Genome dynamics in Legionella: the basis of versatility and adaptation to intracellular replication.

    PubMed

    Gomez-Valero, Laura; Buchrieser, Carmen

    2013-06-01

    Legionella pneumophila is a bacterial pathogen present in aquatic environments that can cause a severe pneumonia called Legionnaires' disease. Soon after its recognition, it was shown that Legionella replicates inside amoeba, suggesting that bacteria replicating in environmental protozoa are able to exploit conserved signaling pathways in human phagocytic cells. Comparative, evolutionary, and functional genomics suggests that the Legionella-amoeba interaction has shaped this pathogen more than previously thought. A complex evolutionary scenario involving mobile genetic elements, type IV secretion systems, and horizontal gene transfer among Legionella, amoeba, and other organisms seems to take place. This long-lasting coevolution led to the development of very sophisticated virulence strategies and a high level of temporal and spatial fine-tuning of bacteria host-cell interactions. We will discuss current knowledge of the evolution of virulence of Legionella from a genomics perspective and propose our vision of the emergence of this human pathogen from the environment.

  7. TAD disruption as oncogenic driver.

    PubMed

    Valton, Anne-Laure; Dekker, Job

    2016-02-01

    Topologically Associating Domains (TADs) are conserved during evolution and play roles in guiding and constraining long-range regulation of gene expression. Disruption of TAD boundaries results in aberrant gene expression by exposing genes to inappropriate regulatory elements. Recent studies have shown that TAD disruption is often found in cancer cells and contributes to oncogenesis through two mechanisms. One mechanism locally disrupts domains by deleting or mutating a TAD boundary leading to fusion of the two adjacent TADs. The other mechanism involves genomic rearrangements that break up TADs and creates new ones without directly affecting TAD boundaries. Understanding the mechanisms by which TADs form and control long-range chromatin interactions will therefore not only provide insights into the mechanism of gene regulation in general, but will also reveal how genomic rearrangements and mutations in cancer genomes can lead to misregulation of oncogenes and tumor suppressors. Copyright © 2016 Elsevier Ltd. All rights reserved.

  8. Newly discovered young CORE-SINEs in marsupial genomes.

    PubMed

    Munemasa, Maruo; Nikaido, Masato; Nishihara, Hidenori; Donnellan, Stephen; Austin, Christopher C; Okada, Norihiro

    2008-01-15

    Although recent mammalian genome projects have uncovered a large part of genomic component of various groups, several repetitive sequences still remain to be characterized and classified for particular groups. The short interspersed repetitive elements (SINEs) distributed among marsupial genomes are one example. We have identified and characterized two new SINEs from marsupial genomes that belong to the CORE-SINE family, characterized by a highly conserved "CORE" domain. PCR and genomic dot blot analyses revealed that the distribution of each SINE shows distinct patterns among the marsupial genomes, implying different timing of their retroposition during the evolution of marsupials. The members of Mar3 (Marsupialia 3) SINE are distributed throughout the genomes of all marsupials, whereas the Mac1 (Macropodoidea 1) SINE is distributed specifically in the genomes of kangaroos. Sequence alignment of the Mar3 SINEs revealed that they can be further divided into four subgroups, each of which has diagnostic nucleotides. The insertion patterns of each SINE at particular genomic loci, together with the distribution patterns of each SINE, suggest that the Mar3 SINEs have intensively amplified after the radiation of diprotodontians, whereas the Mac1 SINE has amplified only slightly after the divergence of hypsiprimnodons from other macropods. By compiling the information of CORE-SINEs characterized to date, we propose a comprehensive picture of how SINE evolution occurred in the genomes of marsupials.

  9. Independent evolution of genomic characters during major metazoan transitions.

    PubMed

    Simakov, Oleg; Kawashima, Takeshi

    2017-07-15

    Metazoan evolution encompasses a vast evolutionary time scale spanning over 600 million years. Our ability to infer ancestral metazoan characters, both morphological and functional, is limited by our understanding of the nature and evolutionary dynamics of the underlying regulatory networks. Increasing coverage of metazoan genomes enables us to identify the evolutionary changes of the relevant genomic characters such as the loss or gain of coding sequences, gene duplications, micro- and macro-synteny, and non-coding element evolution in different lineages. In this review we describe recent advances in our understanding of ancestral metazoan coding and non-coding features, as deduced from genomic comparisons. Some genomic changes such as innovations in gene and linkage content occur at different rates across metazoan clades, suggesting some level of independence among genomic characters. While their contribution to biological innovation remains largely unclear, we review recent literature about certain genomic changes that do correlate with changes to specific developmental pathways and metazoan innovations. In particular, we discuss the origins of the recently described pharyngeal cluster which is conserved across deuterostome genomes, and highlight different genomic features that have contributed to the evolution of this group. We also assess our current capacity to infer ancestral metazoan states from gene models and comparative genomics tools and elaborate on the future directions of metazoan comparative genomics relevant to evo-devo studies. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Saving the spandrels? Adaptive genomic variation in conservation and fisheries management.

    PubMed

    Pearse, D E

    2016-12-01

    As highlighted by many of the papers in this issue, research on the genomic basis of adaptive phenotypic variation in natural populations has made spectacular progress in the past few years, largely due to the advances in sequencing technology and analysis. Without question, the resulting genomic data will improve the understanding of regions of the genome under selection and extend knowledge of the genetic basis of adaptive evolution. What is far less clear, but has been the focus of active discussion, is how such information can or should transfer into conservation practice to complement more typical conservation applications of genetic data. Before such applications can be realized, the evolutionary importance of specific targets of selection relative to the genome-wide diversity of the species as a whole must be evaluated. The key issues for the incorporation of adaptive genomic variation in conservation and management are discussed here, using published examples of adaptive genomic variation associated with specific phenotypes in salmonids and other taxa to highlight practical considerations for incorporating such information into conservation programmes. Scenarios are described in which adaptive genomic data could be used in conservation or restoration, constraints on its utility and the importance of validating inferences drawn from new genomic data before applying them in conservation practice. Finally, it is argued that an excessive focus on preserving the adaptive variation that can be measured, while ignoring the vast unknown majority that cannot, is a modern twist on the adaptationist programme that Gould and Lewontin critiqued almost 40 years ago. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.

  11. Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature

    PubMed Central

    Pozzoli, Uberto; Elgar, Greg; Cagliani, Rachele; Riva, Laura; Comi, Giacomo P.; Bresolin, Nereo; Bardoni, Alessandra; Sironi, Manuela

    2003-01-01

    The human DMD gene is the largest known to date, spanning > 2000 kb on the X chromosome. The gene size is mainly accounted for by huge intronic regions. We sequenced 190 kb of Fugu rubripes (pufferfish) genomic DNA corresponding to the complete dystrophin gene (FrDMD) and provide the first report of gene structure and sequence comparison among dystrophin genomic sequences from different vertebrate organisms. Almost all intron positions and phases are conserved between FrDMD and its mammalian counterparts, and the predicted protein product of the Fugu gene displays 55% identity and 71% similarity to human dystrophin. In analogy to the human gene, FrDMD presents several-fold longer than average intronic regions. Analysis of intron sequences of the human and murine genes revealed that they are extremely conserved in size and that a similar fraction of total intron length is represented by repetitive elements; moreover, our data indicate that intron expansion through repeat accumulation in the two orthologs is the result of independent insertional events. The hypothesis that intron length might be functionally relevant to the DMD gene regulation is proposed and substantiated by the finding that dystrophin intron gigantism is common to the three vertebrate genes. [Supplemental material is available online at www.genome.org.] PMID:12727896

  12. Alternative splicing of anciently exonized 5S rRNA regulates plant transcription factor TFIIIA

    PubMed Central

    Fu, Yan; Bannach, Oliver; Chen, Hao; Teune, Jan-Hendrik; Schmitz, Axel; Steger, Gerhard; Xiong, Liming; Barbazuk, W. Brad

    2009-01-01

    Identifying conserved alternative splicing (AS) events among evolutionarily distant species can prioritize AS events for functional characterization and help uncover relevant cis- and trans-regulatory factors. A genome-wide search for conserved cassette exon AS events in higher plants revealed the exonization of 5S ribosomal RNA (5S rRNA) within the gene of its own transcription regulator, TFIIIA (transcription factor for polymerase III A). The 5S rRNA-derived exon in TFIIIA gene exists in all representative land plant species but not in green algae and nonplant species, suggesting it is specific to land plants. TFIIIA is essential for RNA polymerase III-based transcription of 5S rRNA in eukaryotes. Integrating comparative genomics and molecular biology revealed that the conserved cassette exon derived from 5S rRNA is coupled with nonsense-mediated mRNA decay. Utilizing multiple independent Arabidopsis overexpressing TFIIIA transgenic lines under osmotic and salt stress, strong accordance between phenotypic and molecular evidence reveals the biological relevance of AS of the exonized 5S rRNA in quantitative autoregulation of TFIIIA homeostasis. Most significantly, this study provides the first evidence of ancient exaptation of 5S rRNA in plants, suggesting a novel gene regulation model mediated by the AS of an anciently exonized noncoding element. PMID:19211543

  13. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium.

    PubMed

    Catania, Francesco; Lynch, Michael

    2010-05-04

    In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes.

  14. The identification and functional annotation of RNA structures conserved in vertebrates.

    PubMed

    Seemann, Stefan E; Mirza, Aashiq H; Hansen, Claus; Bang-Berthelsen, Claus H; Garde, Christian; Christensen-Dalsgaard, Mikkel; Torarinsson, Elfar; Yao, Zizhen; Workman, Christopher T; Pociot, Flemming; Nielsen, Henrik; Tommerup, Niels; Ruzzo, Walter L; Gorodkin, Jan

    2017-08-01

    Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human-mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3' ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality. © 2017 Seemann et al.; Published by Cold Spring Harbor Laboratory Press.

  15. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    PubMed Central

    Bushley, Kathryn E.; Ohm, Robin A.; Otillar, Robert; Martin, Joel; Schackwitz, Wendy; Grimwood, Jane; MohdZainudin, NurAinIzzati; Xue, Chunsheng; Wang, Rui; Manning, Viola A.; Dhillon, Braham; Tu, Zheng Jin; Steffenson, Brian J.; Salamov, Asaf; Sun, Hui; Lowry, Steve; LaButti, Kurt; Han, James; Copeland, Alex; Lindquist, Erika; Barry, Kerrie; Schmutz, Jeremy; Baker, Scott E.; Ciuffetti, Lynda M.; Grigoriev, Igor V.; Zhong, Shaobin; Turgeon, B. Gillian

    2013-01-01

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five percent of each genome differs between strains of the same species, while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25× higher than those between inbred lines and 50× lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP–encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence. PMID:23357949

  16. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data

    PubMed Central

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-01-01

    Abstract Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets. PMID:29618048

  17. Comparative Genome Structure, Secondary Metabolite, and Effector Coding Capacity across Cochliobolus Pathogens

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Condon, Bradford J.; Leng, Yueqiang; Wu, Dongliang

    The genomes of five Cochliobolus heterostrophus strains, two Cochliobolus sativus strains, three additional Cochliobolus species (Cochliobolus victoriae, Cochliobolus carbonum, Cochliobolus miyabeanus), and closely related Setosphaeria turcica were sequenced at the Joint Genome Institute (JGI). The datasets were used to identify SNPs between strains and species, unique genomic regions, core secondary metabolism genes, and small secreted protein (SSP) candidate effector encoding genes with a view towards pinpointing structural elements and gene content associated with specificity of these closely related fungi to different cereal hosts. Whole-genome alignment shows that three to five of each genome differs between strains of the same species,more » while a quarter of each genome differs between species. On average, SNP counts among field isolates of the same C. heterostrophus species are more than 25 higher than those between inbred lines and 50 lower than SNPs between Cochliobolus species. The suites of nonribosomal peptide synthetase (NRPS), polyketide synthase (PKS), and SSP encoding genes are astoundingly diverse among species but remarkably conserved among isolates of the same species, whether inbred or field strains, except for defining examples that map to unique genomic regions. Functional analysis of several strain-unique PKSs and NRPSs reveal a strong correlation with a role in virulence.« less

  18. Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics, and epigenetics data.

    PubMed

    Nguyen, Quan H; Tellam, Ross L; Naval-Sanchez, Marina; Porto-Neto, Laercio R; Barendse, William; Reverter, Antonio; Hayes, Benjamin; Kijas, James; Dalrymple, Brian P

    2018-03-01

    Genome sequences for hundreds of mammalian species are available, but an understanding of their genomic regulatory regions, which control gene expression, is only beginning. A comprehensive prediction of potential active regulatory regions is necessary to functionally study the roles of the majority of genomic variants in evolution, domestication, and animal production. We developed a computational method to predict regulatory DNA sequences (promoters, enhancers, and transcription factor binding sites) in production animals (cows and pigs) and extended its broad applicability to other mammals. The method utilizes human regulatory features identified from thousands of tissues, cell lines, and experimental assays to find homologous regions that are conserved in sequences and genome organization and are enriched for regulatory elements in the genome sequences of other mammalian species. Importantly, we developed a filtering strategy, including a machine learning classification method, to utilize a very small number of species-specific experimental datasets available to select for the likely active regulatory regions. The method finds the optimal combination of sensitivity and accuracy to unbiasedly predict regulatory regions in mammalian species. Furthermore, we demonstrated the utility of the predicted regulatory datasets in cattle for prioritizing variants associated with multiple production and climate change adaptation traits and identifying potential genome editing targets.

  19. eShadow: A tool for comparing closely related sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.

    2004-01-15

    Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less

  20. Enhancer elements upstream of the SHOX gene are active in the developing limb.

    PubMed

    Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun

    2010-05-01

    Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD.

  1. Enhancer elements upstream of the SHOX gene are active in the developing limb

    PubMed Central

    Durand, Claudia; Bangs, Fiona; Signolet, Jason; Decker, Eva; Tickle, Cheryll; Rappold, Gudrun

    2010-01-01

    Léri-Weill Dyschondrosteosis (LWD) is a dominant skeletal disorder characterized by short stature and distinct bone anomalies. SHOX gene mutations and deletions of regulatory elements downstream of SHOX resulting in haploinsufficiency have been found in patients with LWD. SHOX encodes a homeodomain transcription factor and is known to be expressed in the developing limb. We have now analyzed the regulatory significance of the region upstream of the SHOX gene. By comparative genomic analyses, we identified several conserved non-coding elements, which subsequently were tested in an in ovo enhancer assay in both chicken limb bud and cornea, where SHOX is also expressed. In this assay, we found three enhancers to be active in the developing chicken limb, but none were functional in the developing cornea. A screening of 60 LWD patients with an intact SHOX coding and downstream region did not yield any deletion of the upstream enhancer region. Thus, we speculate that SHOX upstream deletions occur at a lower frequency because of the structural organization of this genomic region and/or that SHOX upstream deletions may cause a phenotype that differs from the one observed in LWD. PMID:19997128

  2. Flavivirus and Filovirus EvoPrinters: New alignment tools for the comparative analysis of viral evolution.

    PubMed

    Brody, Thomas; Yavatkar, Amarendra S; Park, Dong Sun; Kuzin, Alexander; Ross, Jermaine; Odenwald, Ward F

    2017-06-01

    Flavivirus and Filovirus infections are serious epidemic threats to human populations. Multi-genome comparative analysis of these evolving pathogens affords a view of their essential, conserved sequence elements as well as progressive evolutionary changes. While phylogenetic analysis has yielded important insights, the growing number of available genomic sequences makes comparisons between hundreds of viral strains challenging. We report here a new approach for the comparative analysis of these hemorrhagic fever viruses that can superimpose an unlimited number of one-on-one alignments to identify important features within genomes of interest. We have adapted EvoPrinter alignment algorithms for the rapid comparative analysis of Flavivirus or Filovirus sequences including Zika and Ebola strains. The user can input a full genome or partial viral sequence and then view either individual comparisons or generate color-coded readouts that superimpose hundreds of one-on-one alignments to identify unique or shared identity SNPs that reveal ancestral relationships between strains. The user can also opt to select a database genome in order to access a library of pre-aligned genomes of either 1,094 Flaviviruses or 460 Filoviruses for rapid comparative analysis with all database entries or a select subset. Using EvoPrinter search and alignment programs, we show the following: 1) superimposing alignment data from many related strains identifies lineage identity SNPs, which enable the assessment of sublineage complexity within viral outbreaks; 2) whole-genome SNP profile screens uncover novel Dengue2 and Zika recombinant strains and their parental lineages; 3) differential SNP profiling identifies host cell A-to-I hyper-editing within Ebola and Marburg viruses, and 4) hundreds of superimposed one-on-one Ebola genome alignments highlight ultra-conserved regulatory sequences, invariant amino acid codons and evolutionarily variable protein-encoding domains within a single genome. EvoPrinter allows for the assessment of lineage complexity within Flavivirus or Filovirus outbreaks, identification of recombinant strains, highlights sequences that have undergone host cell A-to-I editing, and identifies unique input and database SNPs within highly conserved sequences. EvoPrinter's ability to superimpose alignment data from hundreds of strains onto a single genome has allowed us to identify unique Zika virus sublineages that are currently spreading in South, Central and North America, the Caribbean, and in China. This new set of integrated alignment programs should serve as a useful addition to existing tools for the comparative analysis of these viruses.

  3. Archaeal Haloarcula californiae Icosahedral Virus 1 Highlights Conserved Elements in Icosahedral Membrane-Containing DNA Viruses from Extreme Environments.

    PubMed

    Demina, Tatiana A; Pietilä, Maija K; Svirskaitė, Julija; Ravantti, Janne J; Atanasova, Nina S; Bamford, Dennis H; Oksanen, Hanna M

    2016-07-19

    Despite their high genomic diversity, all known viruses are structurally constrained to a limited number of virion morphotypes. One morphotype of viruses infecting bacteria, archaea, and eukaryotes is the tailless icosahedral morphotype with an internal membrane. Although it is considered an abundant morphotype in extreme environments, only seven such archaeal viruses are known. Here, we introduce Haloarcula californiae icosahedral virus 1 (HCIV-1), a halophilic euryarchaeal virus originating from salt crystals. HCIV-1 also retains its infectivity under low-salinity conditions, showing that it is able to adapt to environmental changes. The release of progeny virions resulting from cell lysis was evidenced by reduced cellular oxygen consumption, leakage of intracellular ATP, and binding of an indicator ion to ruptured cell membranes. The virion contains at least 12 different protein species, lipids selectively acquired from the host cell membrane, and a 31,314-bp-long linear double-stranded DNA (dsDNA). The overall genome organization and sequence show high similarity to the genomes of archaeal viruses in the Sphaerolipoviridae family. Phylogenetic analysis based on the major conserved components needed for virion assembly-the major capsid proteins and the packaging ATPase-placed HCIV-1 along with the alphasphaerolipoviruses in a distinct, well-supported clade. On the basis of its virion morphology and sequence similarities, most notably, those of its core virion components, we propose that HCIV-1 is a member of the PRD1-adenovirus structure-based lineage together with other sphaerolipoviruses. This addition to the lineage reinforces the notion of the ancient evolutionary links observed between the viruses and further highlights the limits of the choices found in nature for formation of a virion. Under conditions of extreme salinity, the majority of the organisms present are archaea, which encounter substantial selective pressure, being constantly attacked by viruses. Regardless of the enormous viral sequence diversity, all known viruses can be clustered into a few structure-based viral lineages based on their core virion components. Our description of a new halophilic virus-host system adds significant insights into the largely unstudied field of archaeal viruses and, in general, of life under extreme conditions. Comprehensive molecular characterization of HCIV-1 shows that this icosahedral internal membrane-containing virus exhibits conserved elements responsible for virion organization. This places the virus neatly in the PRD1-adenovirus structure-based lineage. HCIV-1 further highlights the limited diversity of virus morphotypes despite the astronomical number of viruses in the biosphere. The observed high conservation in the core virion elements should be considered in addressing such fundamental issues as the origin and evolution of viruses and their interplay with their hosts. Copyright © 2016 Demina et al.

  4. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  5. The Serum Response Factor and a Putative Novel Transcription Factor Regulate Expression of the Immediate-Early Gene Arc/Arg3.1 in Cultured Cortical Neurons

    PubMed Central

    Pintchovski, Sean A.; Peebles, Carol L.; Kim, Hong Joo; Verdin, Eric; Finkbeiner, Steven

    2010-01-01

    The immediate-early effector gene Arc/Arg3.1 is robustly upregulated by synaptic activity associated with learning and memory. Here we show in primary cortical neuron culture that diverse stimuli induce Arc expression through new transcription. Searching for regulatory regions important for Arc transcription, we found nine DNaseI-sensitive nucleosome-depleted sites at this genomic locus. A reporter gene encompassing these sites responded to synaptic activity in an NMDA receptor–dependent manner, consistent with endogenous Arc mRNA. Responsiveness mapped to two enhancer regions ∼6.5 kb and ∼1.4 kb upstream of Arc. We dissected these regions further and found that the proximal enhancer contains a functional and conserved “Zeste-like” response element that binds a putative novel nuclear protein in neurons. Therefore, activity regulates Arc transcription partly by a novel signaling pathway. We also found that the distal enhancer has a functional and highly conserved serum response element. This element binds serum response factor, which is recruited by synaptic activity to regulate Arc. Thus, Arc is the first target of serum response factor that functions at synapses to mediate plasticity. PMID:19193899

  6. CSTminer: a web tool for the identification of coding and noncoding conserved sequence tags through cross-species genome comparison

    PubMed Central

    Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano

    2004-01-01

    The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464

  7. Overview on the Role of Advance Genomics in Conservation Biology of Endangered Species.

    PubMed

    Khan, Suliman; Nabi, Ghulam; Ullah, Muhammad Wajid; Yousaf, Muhammad; Manan, Sehrish; Siddique, Rabeea; Hou, Hongwei

    2016-01-01

    In the recent era, due to tremendous advancement in industrialization, pollution and other anthropogenic activities have created a serious scenario for biota survival. It has been reported that present biota is entering a "sixth" mass extinction, because of chronic exposure to anthropogenic activities. Various ex situ and in situ measures have been adopted for conservation of threatened and endangered plants and animal species; however, these have been limited due to various discrepancies associated with them. Current advancement in molecular technologies, especially, genomics, is playing a very crucial role in biodiversity conservation. Advance genomics helps in identifying the segments of genome responsible for adaptation. It can also improve our understanding about microevolution through a better understanding of selection, mutation, assertive matting, and recombination. Advance genomics helps in identifying genes that are essential for fitness and ultimately for developing modern and fast monitoring tools for endangered biodiversity. This review article focuses on the applications of advanced genomics mainly demographic, adaptive genetic variations, inbreeding, hybridization and introgression, and disease susceptibilities, in the conservation of threatened biota. In short, it provides the fundamentals for novice readers and advancement in genomics for the experts working for the conservation of endangered plant and animal species.

  8. Overview on the Role of Advance Genomics in Conservation Biology of Endangered Species

    PubMed Central

    Khan, Suliman; Nabi, Ghulam; Ullah, Muhammad Wajid; Yousaf, Muhammad; Manan, Sehrish; Siddique, Rabeea

    2016-01-01

    In the recent era, due to tremendous advancement in industrialization, pollution and other anthropogenic activities have created a serious scenario for biota survival. It has been reported that present biota is entering a “sixth” mass extinction, because of chronic exposure to anthropogenic activities. Various ex situ and in situ measures have been adopted for conservation of threatened and endangered plants and animal species; however, these have been limited due to various discrepancies associated with them. Current advancement in molecular technologies, especially, genomics, is playing a very crucial role in biodiversity conservation. Advance genomics helps in identifying the segments of genome responsible for adaptation. It can also improve our understanding about microevolution through a better understanding of selection, mutation, assertive matting, and recombination. Advance genomics helps in identifying genes that are essential for fitness and ultimately for developing modern and fast monitoring tools for endangered biodiversity. This review article focuses on the applications of advanced genomics mainly demographic, adaptive genetic variations, inbreeding, hybridization and introgression, and disease susceptibilities, in the conservation of threatened biota. In short, it provides the fundamentals for novice readers and advancement in genomics for the experts working for the conservation of endangered plant and animal species. PMID:28025636

  9. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species.

    PubMed

    Hezroni, Hadas; Koppstein, David; Schwartz, Matthew G; Avrutin, Alexandra; Bartel, David P; Ulitsky, Igor

    2015-05-19

    The inability to predict long noncoding RNAs from genomic sequence has impeded the use of comparative genomics for studying their biology. Here, we develop methods that use RNA sequencing (RNA-seq) data to annotate the transcriptomes of 16 vertebrates and the echinoid sea urchin, uncovering thousands of previously unannotated genes, most of which produce long intervening noncoding RNAs (lincRNAs). Although in each species, >70% of lincRNAs cannot be traced to homologs in species that diverged >50 million years ago, thousands of human lincRNAs have homologs with similar expression patterns in other species. These homologs share short, 5'-biased patches of sequence conservation nested in exonic architectures that have been extensively rewired, in part by transposable element exonization. Thus, over a thousand human lincRNAs are likely to have conserved functions in mammals, and hundreds beyond mammals, but those functions require only short patches of specific sequences and can tolerate major changes in gene architecture. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.

  10. Mechanisms of haplotype divergence at the RGA08 nucleotide-binding leucine-rich repeat gene locus in wild banana (Musa balbisiana)

    PubMed Central

    2010-01-01

    Background Comparative sequence analysis of complex loci such as resistance gene analog clusters allows estimating the degree of sequence conservation and mechanisms of divergence at the intraspecies level. In banana (Musa sp.), two diploid wild species Musa acuminata (A genome) and Musa balbisiana (B genome) contribute to the polyploid genome of many cultivars. The M. balbisiana species is associated with vigour and tolerance to pests and disease and little is known on the genome structure and haplotype diversity within this species. Here, we compare two genomic sequences of 253 and 223 kb corresponding to two haplotypes of the RGA08 resistance gene analog locus in M. balbisiana "Pisang Klutuk Wulung" (PKW). Results Sequence comparison revealed two regions of contrasting features. The first is a highly colinear gene-rich region where the two haplotypes diverge only by single nucleotide polymorphisms and two repetitive element insertions. The second corresponds to a large cluster of RGA08 genes, with 13 and 18 predicted RGA genes and pseudogenes spread over 131 and 152 kb respectively on each haplotype. The RGA08 cluster is enriched in repetitive element insertions, in duplicated non-coding intergenic sequences including low complexity regions and shows structural variations between haplotypes. Although some allelic relationships are retained, a large diversity of RGA08 genes occurs in this single M. balbisiana genotype, with several RGA08 paralogs specific to each haplotype. The RGA08 gene family has evolved by mechanisms of unequal recombination, intragenic sequence exchange and diversifying selection. An unequal recombination event taking place between duplicated non-coding intergenic sequences resulted in a different RGA08 gene content between haplotypes pointing out the role of such duplicated regions in the evolution of RGA clusters. Based on the synonymous substitution rate in coding sequences, we estimated a 1 million year divergence time for these M. balbisiana haplotypes. Conclusions A large RGA08 gene cluster identified in wild banana corresponds to a highly variable genomic region between haplotypes surrounded by conserved flanking regions. High level of sequence identity (70 to 99%) of the genic and intergenic regions suggests a recent and rapid evolution of this cluster in M. balbisiana. PMID:20637079

  11. Sequencing Needs for Viral Diagnostics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gardner, S N; Lam, M; Mulakken, N J

    2004-01-26

    We built a system to guide decisions regarding the amount of genomic sequencing required to develop diagnostic DNA signatures, which are short sequences that are sufficient to uniquely identify a viral species. We used our existing DNA diagnostic signature prediction pipeline, which selects regions of a target species genome that are conserved among strains of the target (for reliability, to prevent false negatives) and unique relative to other species (for specificity, to avoid false positives). We performed simulations, based on existing sequence data, to assess the number of genome sequences of a target species and of close phylogenetic relatives (''nearmore » neighbors'') that are required to predict diagnostic signature regions that are conserved among strains of the target species and unique relative to other bacterial and viral species. For DNA viruses such as variola (smallpox), three target genomes provide sufficient guidance for selecting species-wide signatures. Three near neighbor genomes are critical for species specificity. In contrast, most RNA viruses require four target genomes and no near neighbor genomes, since lack of conservation among strains is more limiting than uniqueness. SARS and Ebola Zaire are exceptional, as additional target genomes currently do not improve predictions, but near neighbor sequences are urgently needed. Our results also indicate that double stranded DNA viruses are more conserved among strains than are RNA viruses, since in most cases there was at least one conserved signature candidate for the DNA viruses and zero conserved signature candidates for the RNA viruses.« less

  12. A physical map, including a BAC/PAC clone contig, of the Williams-Beuren syndrome--deletion region at 7q11.23.

    PubMed

    Peoples, R; Franke, Y; Wang, Y K; Pérez-Jurado, L; Paperna, T; Cisco, M; Francke, U

    2000-01-01

    Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although >/=16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of >/=320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region.

  13. Structural and Functional Characterization of Ribosomal Protein Gene Introns in Sponges

    PubMed Central

    Perina, Drago; Korolija, Marina; Mikoč, Andreja; Roller, Maša; Pleše, Bruna; Imešek, Mirna; Morrow, Christine; Batel, Renato; Ćetković, Helena

    2012-01-01

    Ribosomal protein genes (RPGs) are a powerful tool for studying intron evolution. They exist in all three domains of life and are much conserved. Accumulating genomic data suggest that RPG introns in many organisms abound with non-protein-coding-RNAs (ncRNAs). These ancient ncRNAs are small nucleolar RNAs (snoRNAs) essential for ribosome assembly. They are also mobile genetic elements and therefore probably important in diversification and enrichment of transcriptomes through various mechanisms such as intron/exon gain/loss. snoRNAs in basal metazoans are poorly characterized. We examined 449 RPG introns, in total, from four demosponges: Amphimedon queenslandica, Suberites domuncula, Suberites ficus and Suberites pagurorum and showed that RPG introns from A. queenslandica share position conservancy and some structural similarity with “higher” metazoans. Moreover, our study indicates that mobile element insertions play an important role in the evolution of their size. In four sponges 51 snoRNAs were identified. The analysis showed discrepancies between the snoRNA pools of orthologous RPG introns between S. domuncula and A. queenslandica. Furthermore, these two sponges show as much conservancy of RPG intron positions between each other as between themselves and human. Sponges from the Suberites genus show consistency in RPG intron position conservation. However, significant differences in some of the orthologous RPG introns of closely related sponges were observed. This indicates that RPG introns are dynamic even on these shorter evolutionary time scales. PMID:22880015

  14. Structural and functional characterization of ribosomal protein gene introns in sponges.

    PubMed

    Perina, Drago; Korolija, Marina; Mikoč, Andreja; Roller, Maša; Pleše, Bruna; Imešek, Mirna; Morrow, Christine; Batel, Renato; Ćetković, Helena

    2012-01-01

    Ribosomal protein genes (RPGs) are a powerful tool for studying intron evolution. They exist in all three domains of life and are much conserved. Accumulating genomic data suggest that RPG introns in many organisms abound with non-protein-coding-RNAs (ncRNAs). These ancient ncRNAs are small nucleolar RNAs (snoRNAs) essential for ribosome assembly. They are also mobile genetic elements and therefore probably important in diversification and enrichment of transcriptomes through various mechanisms such as intron/exon gain/loss. snoRNAs in basal metazoans are poorly characterized. We examined 449 RPG introns, in total, from four demosponges: Amphimedon queenslandica, Suberites domuncula, Suberites ficus and Suberites pagurorum and showed that RPG introns from A. queenslandica share position conservancy and some structural similarity with "higher" metazoans. Moreover, our study indicates that mobile element insertions play an important role in the evolution of their size. In four sponges 51 snoRNAs were identified. The analysis showed discrepancies between the snoRNA pools of orthologous RPG introns between S. domuncula and A. queenslandica. Furthermore, these two sponges show as much conservancy of RPG intron positions between each other as between themselves and human. Sponges from the Suberites genus show consistency in RPG intron position conservation. However, significant differences in some of the orthologous RPG introns of closely related sponges were observed. This indicates that RPG introns are dynamic even on these shorter evolutionary time scales.

  15. Conservation genomics of threatened animal species.

    PubMed

    Steiner, Cynthia C; Putnam, Andrea S; Hoeck, Paquita E A; Ryder, Oliver A

    2013-01-01

    The genomics era has opened up exciting possibilities in the field of conservation biology by enabling genomic analyses of threatened species that previously were limited to model organisms. Next-generation sequencing (NGS) and the collection of genome-wide data allow for more robust studies of the demographic history of populations and adaptive variation associated with fitness and local adaptation. Genomic analyses can also advance management efforts for threatened wild and captive populations by identifying loci contributing to inbreeding depression and disease susceptibility, and predicting fitness consequences of introgression. However, the development of genomic tools in wild species still carries multiple challenges, particularly those associated with computational and sampling constraints. This review provides an overview of the most significant applications of NGS and the implications and limitations of genomic studies in conservation.

  16. A versatile palindromic amphipathic repeat coding sequence horizontally distributed among diverse bacterial and eucaryotic microbes

    PubMed Central

    2010-01-01

    Background Intragenic tandem repeats occur throughout all domains of life and impart functional and structural variability to diverse translation products. Repeat proteins confer distinctive surface phenotypes to many unicellular organisms, including those with minimal genomes such as the wall-less bacterial monoderms, Mollicutes. One such repeat pattern in this clade is distributed in a manner suggesting its exchange by horizontal gene transfer (HGT). Expanding genome sequence databases reveal the pattern in a widening range of bacteria, and recently among eucaryotic microbes. We examined the genomic flux and consequences of the motif by determining its distribution, predicted structural features and association with membrane-targeted proteins. Results Using a refined hidden Markov model, we document a 25-residue protein sequence motif tandemly arrayed in variable-number repeats in ORFs lacking assigned functions. It appears sporadically in unicellular microbes from disparate bacterial and eucaryotic clades, representing diverse lifestyles and ecological niches that include host parasitic, marine and extreme environments. Tracts of the repeats predict a malleable configuration of recurring domains, with conserved hydrophobic residues forming an amphipathic secondary structure in which hydrophilic residues endow extensive sequence variation. Many ORFs with these domains also have membrane-targeting sequences that predict assorted topologies; others may comprise reservoirs of sequence variants. We demonstrate expressed variants among surface lipoproteins that distinguish closely related animal pathogens belonging to a subgroup of the Mollicutes. DNA sequences encoding the tandem domains display dyad symmetry. Moreover, in some taxa the domains occur in ORFs selectively associated with mobile elements. These features, a punctate phylogenetic distribution, and different patterns of dispersal in genomes of related taxa, suggest that the repeat may be disseminated by HGT and intra-genomic shuffling. Conclusions We describe novel features of PARCELs (Palindromic Amphipathic Repeat Coding ELements), a set of widely distributed repeat protein domains and coding sequences that were likely acquired through HGT by diverse unicellular microbes, further mobilized and diversified within genomes, and co-opted for expression in the membrane proteome of some taxa. Disseminated by multiple gene-centric vehicles, ORFs harboring these elements enhance accessory gene pools as part of the "mobilome" connecting genomes of various clades, in taxa sharing common niches. PMID:20626840

  17. Coordinately Co-opted Multiple Transposable Elements Constitute an Enhancer for wnt5a Expression in the Mammalian Secondary Palate

    PubMed Central

    Kimura-Yoshida, Chiharu; Yan, Kuo; Bormuth, Olga; Ding, Qiong; Nakanishi, Akiko; Sasaki, Takeshi; Hirakawa, Mika; Sumiyama, Kenta; Furuta, Yasuhide; Tarabykin, Victor; Matsuo, Isao; Okada, Norihiro

    2016-01-01

    Acquisition of cis-regulatory elements is a major driving force of evolution, and there are several examples of developmental enhancers derived from transposable elements (TEs). However, it remains unclear whether one enhancer element could have been produced via cooperation among multiple, yet distinct, TEs during evolution. Here we show that an evolutionarily conserved genomic region named AS3_9 comprises three TEs (AmnSINE1, X6b_DNA and MER117), inserted side-by-side, and functions as a distal enhancer for wnt5a expression during morphogenesis of the mammalian secondary palate. Functional analysis of each TE revealed step-by-step retroposition/transposition and co-option together with acquisition of a binding site for Msx1 for its full enhancer function during mammalian evolution. The present study provides a new perspective suggesting that a huge variety of TEs, in combination, could have accelerated the diversity of cis-regulatory elements involved in morphological evolution. PMID:27741242

  18. High copy number of highly similar mariner-like transposons in planarian (Platyhelminthe): evidence for a trans-phyla horizontal transfer.

    PubMed

    Garcia-Fernàndez, J; Bayascas-Ramírez, J R; Marfany, G; Muñoz-Mármol, A M; Casali, A; Baguñà, J; Saló, E

    1995-05-01

    Several DNA sequences similar to the mariner element were isolated and characterized in the platyhelminthe Dugesia (Girardia) tigrina. They were 1,288 bp long, flanked by two 32 bp-inverted repeats, and contained a single 339 amino acid open-reading frame (ORF) encoding the transposase. The number of copies of this element is approximately 8,000 per haploid genome, constituting a member of the middle-repetitive DNA of Dugesia tigrina. Sequence analysis of several elements showed a high percentage of conservation between the different copies. Most of them presented an intact ORF and the standard signals of actively expressed genes, which suggests that some of them are or have recently been functional transposons. The high degree of similarity shared with other mariner elements from some arthropods, together with the fact that this element is undetectable in other planarian species, strongly suggests a case of horizontal transfer between these two distant phyla.

  19. Possible involvement of SINEs in mammalian-specific brain formation

    PubMed Central

    Sasaki, Takeshi; Nishihara, Hidenori; Hirakawa, Mika; Fujimura, Koji; Tanaka, Mikiko; Kokubo, Nobuhiro; Kimura-Yoshida, Chiharu; Matsuo, Isao; Sumiyama, Kenta; Saitou, Naruya; Shimogori, Tomomi; Okada, Norihiro

    2008-01-01

    Retroposons, such as short interspersed elements (SINEs) and long interspersed elements (LINEs), are the major constituents of higher vertebrate genomes. Although there are many examples of retroposons' acquiring function, none has been implicated in the morphological innovations specific to a certain taxonomic group. We previously characterized a SINE family, AmnSINE1, members of which constitute a part of conserved noncoding elements (CNEs) in mammalian genomes. We proposed that this family acquired genomic functionality or was exapted after retropositioning in a mammalian ancestor. Here we identified 53 new AmnSINE1 loci and refined 124 total loci, two of which were further analyzed. Using a mouse enhancer assay, we demonstrate that one SINE locus, AS071, 178 kbp from the gene FGF8 (fibroblast growth factor 8), is an enhancer that recapitulates FGF8 expression in two regions of the developing forebrain, namely the diencephalon and the hypothalamus. Our gain-of-function analysis revealed that FGF8 expression in the diencephalon controls patterning of thalamic nuclei, which act as a relay center of the neocortex, suggesting a role for FGF8 in mammalian-specific forebrain patterning. Furthermore, we demonstrated that the locus, AS021, 392 kbp from the gene SATB2, controls gene expression in the lateral telencephalon, which is thought to be a signaling center during development. These results suggest important roles for SINEs in the development of the mammalian neuronal network, a part of which was initiated with the exaptation of AmnSINE1 in a common mammalian ancestor. PMID:18334644

  20. Possible involvement of SINEs in mammalian-specific brain formation.

    PubMed

    Sasaki, Takeshi; Nishihara, Hidenori; Hirakawa, Mika; Fujimura, Koji; Tanaka, Mikiko; Kokubo, Nobuhiro; Kimura-Yoshida, Chiharu; Matsuo, Isao; Sumiyama, Kenta; Saitou, Naruya; Shimogori, Tomomi; Okada, Norihiro

    2008-03-18

    Retroposons, such as short interspersed elements (SINEs) and long interspersed elements (LINEs), are the major constituents of higher vertebrate genomes. Although there are many examples of retroposons' acquiring function, none has been implicated in the morphological innovations specific to a certain taxonomic group. We previously characterized a SINE family, AmnSINE1, members of which constitute a part of conserved noncoding elements (CNEs) in mammalian genomes. We proposed that this family acquired genomic functionality or was exapted after retropositioning in a mammalian ancestor. Here we identified 53 new AmnSINE1 loci and refined 124 total loci, two of which were further analyzed. Using a mouse enhancer assay, we demonstrate that one SINE locus, AS071, 178 kbp from the gene FGF8 (fibroblast growth factor 8), is an enhancer that recapitulates FGF8 expression in two regions of the developing forebrain, namely the diencephalon and the hypothalamus. Our gain-of-function analysis revealed that FGF8 expression in the diencephalon controls patterning of thalamic nuclei, which act as a relay center of the neocortex, suggesting a role for FGF8 in mammalian-specific forebrain patterning. Furthermore, we demonstrated that the locus, AS021, 392 kbp from the gene SATB2, controls gene expression in the lateral telencephalon, which is thought to be a signaling center during development. These results suggest important roles for SINEs in the development of the mammalian neuronal network, a part of which was initiated with the exaptation of AmnSINE1 in a common mammalian ancestor.

  1. Mobile Bacterial Group II Introns at the Crux of Eukaryotic Evolution

    PubMed Central

    Lambowitz, Alan M.; Belfort, Marlene

    2015-01-01

    SUMMARY This review focuses on recent developments in our understanding of group II intron function, the relationships of these introns to retrotransposons and spliceosomes, and how their common features have informed thinking about bacterial group II introns as key elements in eukaryotic evolution. Reverse transcriptase-mediated and host factor-aided intron retrohoming pathways are considered along with retrotransposition mechanisms to novel sites in bacteria, where group II introns are thought to have originated. DNA target recognition and movement by target-primed reverse transcription infer an evolutionary relationship among group II introns, non-LTR retrotransposons, such as LINE elements, and telomerase. Additionally, group II introns are almost certainly the progenitors of spliceosomal introns. Their profound similarities include splicing chemistry extending to RNA catalysis, reaction stereochemistry, and the position of two divalent metals that perform catalysis at the RNA active site. There are also sequence and structural similarities between group II introns and the spliceosome’s small nuclear RNAs (snRNAs) and between a highly conserved core spliceosomal protein Prp8 and a group II intron-like reverse transcriptase. It has been proposed that group II introns entered eukaryotes during bacterial endosymbiosis or bacterial-archaeal fusion, proliferated within the nuclear genome, necessitating evolution of the nuclear envelope, and fragmented giving rise to spliceosomal introns. Thus, these bacterial self-splicing mobile elements have fundamentally impacted the composition of extant eukaryotic genomes, including the human genome, most of which is derived from close relatives of mobile group II introns. PMID:25878921

  2. The complete mitochondrial genome of the pink stem borer, Sesamia inferens, in comparison with four other Noctuid moths.

    PubMed

    Chai, Huan-Na; Du, Yu-Zhou

    2012-01-01

    The complete 15,413-bp mitochondrial genome (mitogenome) of Sesamia inferens (Walker) (Lepidoptera: Noctuidae) was sequenced and compared with those of four other noctuid moths. All of the mitogenomes analyzed displayed similar characteristics with respect to gene content, genome organization, nucleotide comparison, and codon usages. Twelve-one protein-coding genes (PCGs) utilized the standard ATN, but the cox1 gene used CGA as the initiation codon; cox1, cox2, and nad4 genes had the truncated termination codon T in the S. inferens mitogenome. All of the tRNA genes had typical cloverleaf secondary structures except for trnS1(AGN), in which the dihydrouridine (DHU) arm did not form a stable stem-loop structure. Both the secondary structures of rrnL and rrnS genes inferred from the S. inferens mitogenome closely resembled those of other noctuid moths. In the A+T-rich region, the conserved motif "ATAGA" followed by a long T-stretch was observed in all noctuid moths, but other specific tandem-repeat elements were more variable. Additionally, the S. inferens mitogenome contained a potential stem-loop structure, a duplicated 17-bp repeat element, a decuplicated segment, and a microsatellite "(AT)(7)", without a poly-A element upstream of the trnM in the A+T-rich region. Finally, the phylogenetic relationships were reconstructed based on amino acid sequences of mitochondrial 13 PCGs, which support the traditional morphologically based view of relationships within the Noctuidae.

  3. The Complete Mitochondrial Genome of the Pink Stem Borer, Sesamia inferens, in Comparison with Four Other Noctuid Moths

    PubMed Central

    Chai, Huan-Na; Du, Yu-Zhou

    2012-01-01

    The complete 15,413-bp mitochondrial genome (mitogenome) of Sesamia inferens (Walker) (Lepidoptera: Noctuidae) was sequenced and compared with those of four other noctuid moths. All of the mitogenomes analyzed displayed similar characteristics with respect to gene content, genome organization, nucleotide comparison, and codon usages. Twelve-one protein-coding genes (PCGs) utilized the standard ATN, but the cox1 gene used CGA as the initiation codon; cox1, cox2, and nad4 genes had the truncated termination codon T in the S. inferens mitogenome. All of the tRNA genes had typical cloverleaf secondary structures except for trnS1(AGN), in which the dihydrouridine (DHU) arm did not form a stable stem-loop structure. Both the secondary structures of rrnL and rrnS genes inferred from the S. inferens mitogenome closely resembled those of other noctuid moths. In the A+T-rich region, the conserved motif “ATAGA” followed by a long T-stretch was observed in all noctuid moths, but other specific tandem-repeat elements were more variable. Additionally, the S. inferens mitogenome contained a potential stem-loop structure, a duplicated 17-bp repeat element, a decuplicated segment, and a microsatellite “(AT)7”, without a poly-A element upstream of the trnM in the A+T-rich region. Finally, the phylogenetic relationships were reconstructed based on amino acid sequences of mitochondrial 13 PCGs, which support the traditional morphologically based view of relationships within the Noctuidae. PMID:22949858

  4. Evolutionary diversity and potential recombinogenic role of integration targets of non-LTR retrotransposons

    PubMed Central

    Gentles, Andrew J.; Kohany, Oleksiy; Jurka, Jerzy

    2005-01-01

    Short interspersed elements (SINEs) make up a significant fraction of total DNA in mammalian genomes, providing a rich substrate for chromosomal rearrangements by SINE-SINE recombinations. Proliferation of mammalian SINEs is mediated primarily by LINE1 (L1) non-LTR retrotransposons that preferentially integrate at DNA sequence targets with average length ~15 bp and containing conserved endonucleolytic nicking signals at both ends. We report that sequence variations in the first of the two nicking signals, represented by a 5′TT-AAAA consensus sequence, affect the position of the second signal thus leading to target site duplications (TSDs) of different lengths. The length distribution of TSDs appears to be affected also by L1-encoded enzyme variants, since targets with the same 5′ nicking site can be of different average length in different mammalian species. Taking this into account, we re-analyzed the second nicking site and found that it is larger and includes more conserved sites than previously appreciated, with a consensus of 5′ANTNTN-AA. We also studied potential involvement of the nicking sites in stimulating recombinations between SINE elements. We determined that SINE elements retaining TSDs with perfect 5′TT-AAAA nicking sites appear to be lost relatively rapidly from the human and rat genomes, and less rapidly from dog. We speculate that the introduction of single-strand DNA breaks induced by recurring endonucleolytic attacks at these sites, combined with the ubiquitousness of SINEs, may significantly promote recombination between repetitive elements, leading to the observed losses. At the same time new L1 subfamilies may be selected for “incompatibility” with pre-existing targets. This provides a possible driving force for the continual emergence of new L1 subfamilies which, in turn, may affect selection of L1-dependent SINE subfamilies. PMID:15944437

  5. The Drosophila Translational Control Element (TCE) Is Required for High-Level Transcription of Many Genes That Are Specifically Expressed in Testes

    PubMed Central

    Anderson, Ashley K.; Ohler, Uwe; Wassarman, David A.

    2012-01-01

    To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5′ untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300–400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation. PMID:22984601

  6. The Drosophila Translational Control Element (TCE) is required for high-level transcription of many genes that are specifically expressed in testes.

    PubMed

    Katzenberger, Rebeccah J; Rach, Elizabeth A; Anderson, Ashley K; Ohler, Uwe; Wassarman, David A

    2012-01-01

    To investigate the importance of core promoter elements for tissue-specific transcription of RNA polymerase II genes, we examined testis-specific transcription in Drosophila melanogaster. Bioinformatic analyses of core promoter sequences from 190 genes that are specifically expressed in testes identified a 10 bp A/T-rich motif that is identical to the translational control element (TCE). The TCE functions in the 5' untranslated region of Mst(3)CGP mRNAs to repress translation, and it also functions in a heterologous gene to regulate transcription. We found that among genes with focused initiation patterns, the TCE is significantly enriched in core promoters of genes that are specifically expressed in testes but not in core promoters of genes that are specifically expressed in other tissues. The TCE is variably located in core promoters and is conserved in melanogaster subgroup species, but conservation dramatically drops in more distant species. In transgenic flies, short (300-400 bp) genomic regions containing a TCE directed testis-specific transcription of a reporter gene. Mutation of the TCE significantly reduced but did not abolish reporter gene transcription indicating that the TCE is important but not essential for transcription activation. Finally, mutation of testis-specific TFIID (tTFIID) subunits significantly reduced the transcription of a subset of endogenous TCE-containing but not TCE-lacking genes, suggesting that tTFIID activity is limited to TCE-containing genes but that tTFIID is not an obligatory regulator of TCE-containing genes. Thus, the TCE is a core promoter element in a subset of genes that are specifically expressed in testes. Furthermore, the TCE regulates transcription in the context of short genomic regions, from variable locations in the core promoter, and both dependently and independently of tTFIID. These findings set the stage for determining the mechanism by which the TCE regulates testis-specific transcription and understanding the dual role of the TCE in translational and transcriptional regulation.

  7. Traffic at the tmRNA Gene

    PubMed Central

    Williams, Kelly P.

    2003-01-01

    A partial screen for genetic elements integrated into completely sequenced bacterial genomes shows more significant bias in specificity for the tmRNA gene (ssrA) than for any type of tRNA gene. Horizontal gene transfer, a major avenue of bacterial evolution, was assessed by focusing on elements using this single attachment locus. Diverse elements use ssrA; among enterobacteria alone, at least four different integrase subfamilies have independently evolved specificity for ssrA, and almost every strain analyzed presents a unique set of integrated elements. Even elements using essentially the same integrase can be very diverse, as is a group with an ssrA-specific integrase of the P4 subfamily. This same integrase appears to promote damage routinely at attachment sites, which may be adaptive. Elements in arrays can recombine; one such event mediated by invertible DNA segments within neighboring elements likely explains the monophasic nature of Salmonella enterica serovar Typhi. One of a limited set of conserved sequences occurs at the attachment site of each enterobacterial element, apparently serving as a transcriptional terminator for ssrA. Elements were usually found integrated into tRNA-like sequence at the 3′ end of ssrA, at subsites corresponding to those used in tRNA genes; an exception was found at the non-tRNA-like 3′ end produced by ssrA gene permutation in cyanobacteria, suggesting that, during the evolution of new site specificity by integrases, tropism toward a conserved 3′ end of an RNA gene may be as strong as toward a tRNA-like sequence. The proximity of ssrA and smpB, which act in concert, was also surveyed. PMID:12533482

  8. Comparative genomics of Lupinus angustifolius gene-rich regions: BAC library exploration, genetic mapping and cytogenetics

    PubMed Central

    2013-01-01

    Background The narrow-leafed lupin, Lupinus angustifolius L., is a grain legume species with a relatively compact genome. The species has 2n = 40 chromosomes and its genome size is 960 Mbp/1C. During the last decade, L. angustifolius genomic studies have achieved several milestones, such as molecular-marker development, linkage maps, and bacterial artificial chromosome (BAC) libraries. Here, these resources were integratively used to identify and sequence two gene-rich regions (GRRs) of the genome. Results The genome was screened with a probe representing the sequence of a microsatellite fragment length polymorphism (MFLP) marker linked to Phomopsis stem blight resistance. BAC clones selected by hybridization were subjected to restriction fingerprinting and contig assembly, and 232 BAC-ends were sequenced and annotated. BAC fluorescence in situ hybridization (BAC-FISH) identified eight single-locus clones. Based on physical mapping, cytogenetic localization, and BAC-end annotation, five clones were chosen for sequencing. Within the sequences of clones that hybridized in FISH to a single-locus, two large GRRs were identified. The GRRs showed strong and conserved synteny to Glycine max duplicated genome regions, illustrated by both identical gene order and parallel orientation. In contrast, in the clones with dispersed FISH signals, more than one-third of sequences were transposable elements. Sequenced, single-locus clones were used to develop 12 genetic markers, increasing the number of L. angustifolius chromosomes linked to appropriate linkage groups by five pairs. Conclusions In general, probes originating from MFLP sequences can assist genome screening and gene discovery. However, such probes are not useful for positional cloning, because they tend to hybridize to numerous loci. GRRs identified in L. angustifolius contained a low number of interspersed repeats and had a high level of synteny to the genome of the model legume G. max. Our results showed that not only was the gene nucleotide sequence conserved between soybean and lupin GRRs, but the order and orientation of particular genes in syntenic blocks was homologous, as well. These findings will be valuable to the forthcoming sequencing of the lupin genome. PMID:23379841

  9. Toward Understanding Phage:Host Interactions in the Rumen; Complete Genome Sequences of Lytic Phages Infecting Rumen Bacteria

    PubMed Central

    Gilbert, Rosalind A.; Kelly, William J.; Altermann, Eric; Leahy, Sinead C.; Minchin, Catherine; Ouwerkerk, Diane; Klieve, Athol V.

    2017-01-01

    The rumen is known to harbor dense populations of bacteriophages (phages) predicted to be capable of infecting a diverse range of rumen bacteria. While bacterial genome sequencing projects are revealing the presence of phages which can integrate their DNA into the genome of their host to form stable, lysogenic associations, little is known of the genetics of phages which utilize lytic replication. These phages infect and replicate within the host, culminating in host lysis, and the release of progeny phage particles. While lytic phages for rumen bacteria have been previously isolated, their genomes have remained largely uncharacterized. Here we report the first complete genome sequences of lytic phage isolates specifically infecting three genera of rumen bacteria: Bacteroides, Ruminococcus, and Streptococcus. All phages were classified within the viral order Caudovirales and include two phage morphotypes, representative of the Siphoviridae and Podoviridae families. The phage genomes displayed modular organization and conserved viral genes were identified which enabled further classification and determination of closest phage relatives. Co-examination of bacterial host genomes led to the identification of several genes responsible for modulating phage:host interactions, including CRISPR/Cas elements and restriction-modification phage defense systems. These findings provide new genetic information and insights into how lytic phages may interact with bacteria of the rumen microbiome. PMID:29259581

  10. Draft genome sequence of bitter gourd (Momordica charantia), a vegetable and medicinal plant in tropical and subtropical regions.

    PubMed

    Urasaki, Naoya; Takagi, Hiroki; Natsume, Satoshi; Uemura, Aiko; Taniai, Naoki; Miyagi, Norimichi; Fukushima, Mai; Suzuki, Shouta; Tarora, Kazuhiko; Tamaki, Moritoshi; Sakamoto, Moriaki; Terauchi, Ryohei; Matsumura, Hideo

    2017-02-01

    Bitter gourd (Momordica charantia) is an important vegetable and medicinal plant in tropical and subtropical regions globally. In this study, the draft genome sequence of a monoecious bitter gourd inbred line, OHB3-1, was analyzed. Through Illumina sequencing and de novo assembly, scaffolds of 285.5 Mb in length were generated, corresponding to ∼84% of the estimated genome size of bitter gourd (339 Mb). In this draft genome sequence, 45,859 protein-coding gene loci were identified, and transposable elements accounted for 15.3% of the whole genome. According to synteny mapping and phylogenetic analysis of conserved genes, bitter gourd was more related to watermelon (Citrullus lanatus) than to cucumber (Cucumis sativus) or melon (C. melo). Using RAD-seq analysis, 1507 marker loci were genotyped in an F2 progeny of two bitter gourd lines, resulting in an improved linkage map, comprising 11 linkage groups. By anchoring RAD tag markers, 255 scaffolds were assigned to the linkage map. Comparative analysis of genome sequences and predicted genes determined that putative trypsin-inhibitor and ribosome-inactivating genes were distinctive in the bitter gourd genome. These genes could characterize the bitter gourd as a medicinal plant. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  11. Insights into origin and evolution of α-proteobacterial gene transfer agents

    PubMed Central

    Shakya, Migun; Soucy, Shannon M

    2017-01-01

    Abstract Several bacterial and archaeal lineages produce nanostructures that morphologically resemble small tailed viruses, but, unlike most viruses, contain apparently random pieces of the host genome. Since these elements can deliver the packaged DNA to other cells, they were dubbed gene transfer agents (GTAs). Because many genes involved in GTA production have viral homologs, it has been hypothesized that the GTA ancestor was a virus. Whether GTAs represent an atypical virus, a defective virus, or a virus co-opted by the prokaryotes for some function, remains to be elucidated. To evaluate these possibilities, we examined the distribution and evolutionary histories of genes that encode a GTA in the α-proteobacterium Rhodobacter capsulatus (RcGTA). We report that although homologs of many individual RcGTA genes are abundant across bacteria and their viruses, RcGTA-like genomes are mainly found in one subclade of α-proteobacteria. When compared with the viral homologs, genes of the RcGTA-like genomes evolve significantly slower, and do not have higher %A+T nucleotides than their host chromosomes. Moreover, they appear to reside in stable regions of the bacterial chromosomes that are generally conserved across taxonomic orders. These findings argue against RcGTA being an atypical or a defective virus. Our phylogenetic analyses suggest that RcGTA ancestor likely originated in the lineage that gave rise to contemporary α-proteobacterial orders Rhizobiales, Rhodobacterales, Caulobacterales, Parvularculales, and Sphingomonadales, and since that time the RcGTA-like element has co-evolved with its host chromosomes. Such evolutionary history is compatible with maintenance of these elements by bacteria due to some selective advantage. As for many other prokaryotic traits, horizontal gene transfer played a substantial role in the evolution of RcGTA-like elements, not only in shaping its genome components within the orders, but also in occasional dissemination of RcGTA-like regions across the orders and even to different bacterial phyla. PMID:29250433

  12. Insights into origin and evolution of α-proteobacterial gene transfer agents.

    PubMed

    Shakya, Migun; Soucy, Shannon M; Zhaxybayeva, Olga

    2017-07-01

    Several bacterial and archaeal lineages produce nanostructures that morphologically resemble small tailed viruses, but, unlike most viruses, contain apparently random pieces of the host genome. Since these elements can deliver the packaged DNA to other cells, they were dubbed gene transfer agents (GTAs). Because many genes involved in GTA production have viral homologs, it has been hypothesized that the GTA ancestor was a virus. Whether GTAs represent an atypical virus, a defective virus, or a virus co-opted by the prokaryotes for some function, remains to be elucidated. To evaluate these possibilities, we examined the distribution and evolutionary histories of genes that encode a GTA in the α-proteobacterium Rhodobacter capsulatus (RcGTA). We report that although homologs of many individual RcGTA genes are abundant across bacteria and their viruses, RcGTA-like genomes are mainly found in one subclade of α-proteobacteria. When compared with the viral homologs, genes of the RcGTA-like genomes evolve significantly slower, and do not have higher %A+T nucleotides than their host chromosomes. Moreover, they appear to reside in stable regions of the bacterial chromosomes that are generally conserved across taxonomic orders. These findings argue against RcGTA being an atypical or a defective virus. Our phylogenetic analyses suggest that RcGTA ancestor likely originated in the lineage that gave rise to contemporary α-proteobacterial orders Rhizobiales , Rhodobacterales , Caulobacterales , Parvularculales , and Sphingomonadales , and since that time the RcGTA-like element has co-evolved with its host chromosomes. Such evolutionary history is compatible with maintenance of these elements by bacteria due to some selective advantage. As for many other prokaryotic traits, horizontal gene transfer played a substantial role in the evolution of RcGTA-like elements, not only in shaping its genome components within the orders, but also in occasional dissemination of RcGTA-like regions across the orders and even to different bacterial phyla.

  13. Variability among the Most Rapidly Evolving Plastid Genomic Regions is Lineage-Specific: Implications of Pairwise Genome Comparisons in Pyrus (Rosaceae) and Other Angiosperms for Marker Choice

    PubMed Central

    Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas

    2014-01-01

    Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing whole plastid genomes to find markers for evolutionary analyses is therefore particularly useful when overall genetic distances are low. PMID:25405773

  14. A Chromatin Insulator-Like Element in the Herpes Simplex Virus Type 1 Latency-Associated Transcript Region Binds CCCTC-Binding Factor and Displays Enhancer-Blocking and Silencing Activities

    PubMed Central

    Amelio, Antonio L.; McAnany, Peterjon K.; Bloom, David C.

    2006-01-01

    A previous study demonstrated that the latency-associated transcript (LAT) promoter and the LAT enhancer/reactivation critical region (rcr) are enriched in acetyl histone H3 (K9, K14) during herpes simplex virus type 1 (HSV-1) latency, whereas all lytic genes analyzed (ICP0, UL54, ICP4, and DNA polymerase) are not (N. J. Kubat, R. K. Tran, P. McAnany, and D. C. Bloom, J. Virol. 78:1139-1149, 2004). This suggests that the HSV-1 latent genome is organized into histone H3 (K9, K14) hyperacetylated and hypoacetylated regions corresponding to transcriptionally permissive and transcriptionally repressed chromatin domains, respectively. Such an organization implies that chromatin insulators, similar to those of cellular chromosomes, may separate distinct transcriptional domains of the HSV-1 latent genome. In the present study, we sought to identify cis elements that could partition the HSV-1 genome into distinct chromatin domains. Sequence analysis coupled with chromatin immunoprecipitation and luciferase reporter assays revealed that (i) the long and short repeats and the unique-short region of the HSV-1 genome contain clustered CTCF (CCCTC-binding factor) motifs, (ii) CTCF motif clusters similar to those in HSV-1 are conserved in other alphaherpesviruses, (iii) CTCF binds to these motifs on latent HSV-1 genomes in vivo, and (iv) a 1.5-kb region containing the CTCF motif cluster in the LAT region possesses insulator activities, specifically, enhancer blocking and silencing. The finding that CTCF, a cellular protein associated with chromatin insulators, binds to motifs on the latent genome and insulates the LAT enhancer suggests that CTCF may facilitate the formation of distinct chromatin boundaries during herpesvirus latency. PMID:16474142

  15. The F-box family genes as key elements in response to salt, heavy mental, and drought stresses in Medicago truncatula.

    PubMed

    Song, Jian Bo; Wang, Yan Xiang; Li, Hai Bo; Li, Bo Wen; Zhou, Zhao Sheng; Gao, Shuai; Yang, Zhi Min

    2015-07-01

    F-box protein is a subunit of Skp1-Rbx1-Cul1-F-box protein (SCF) complex with typically conserved F-box motifs of approximately 40 amino acids and is one of the largest protein families in eukaryotes. F-box proteins play critical roles in selective and specific protein degradation through the 26S proteasome. In this study, we bioinformatically identified 972 putative F-box proteins from Medicago truncatula genome. Our analysis showed that in addition to the conserved motif, the F-box proteins have several other functional domains in their C-terminal regions (e.g., LRRs, Kelch, FBA, and PP2), some of which were found to be M. truncatula species-specific. By phylogenetic analysis of the F-box motifs, these proteins can be classified into three major families, and each family can be further grouped into more subgroups. Analysis of the genomic distribution of F-box genes on M. truncatula chromosomes revealed that the evolutional expansion of F-box genes in M. truncatula was probably due to localized gene duplications. To investigate the possible response of the F-box genes to abiotic stresses, both publicly available and customer-prepared microarrays were analyzed. Most of the F-box protein genes can be responding to salt and heavy metal stresses. Real-time PCR analysis confirmed that some of the F-box protein genes containing heat, drought, salicylic acid, and abscisic acid responsive cis-elements were able to respond to the abiotic stresses.

  16. Pangenome Analysis of Burkholderia pseudomallei: Genome Evolution Preserves Gene Order despite High Recombination Rates.

    PubMed

    Spring-Pearson, Senanu M; Stone, Joshua K; Doyle, Adina; Allender, Christopher J; Okinaka, Richard T; Mayo, Mark; Broomall, Stacey M; Hill, Jessica M; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; McNew, Lauren A; Rosenzweig, C Nicole; Gibbons, Henry S; Currie, Bart J; Wagner, David M; Keim, Paul; Tuanyok, Apichai

    2015-01-01

    The pangenomic diversity in Burkholderia pseudomallei is high, with approximately 5.8% of the genome consisting of genomic islands. Genomic islands are known hotspots for recombination driven primarily by site-specific recombination associated with tRNAs. However, recombination rates in other portions of the genome are also high, a feature we expected to disrupt gene order. We analyzed the pangenome of 37 isolates of B. pseudomallei and demonstrate that the pangenome is 'open', with approximately 136 new genes identified with each new genome sequenced, and that the global core genome consists of 4568±16 homologs. Genes associated with metabolism were statistically overrepresented in the core genome, and genes associated with mobile elements, disease, and motility were primarily associated with accessory portions of the pangenome. The frequency distribution of genes present in between 1 and 37 of the genomes analyzed matches well with a model of genome evolution in which 96% of the genome has very low recombination rates but 4% of the genome recombines readily. Using homologous genes among pairs of genomes, we found that gene order was highly conserved among strains, despite the high recombination rates previously observed. High rates of gene transfer and recombination are incompatible with retaining gene order unless these processes are either highly localized to specific sites within the genome, or are characterized by symmetrical gene gain and loss. Our results demonstrate that both processes occur: localized recombination introduces many new genes at relatively few sites, and recombination throughout the genome generates the novel multi-locus sequence types previously observed while preserving gene order.

  17. Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard Shinisaurus crocodilurus.

    PubMed

    Gao, Jian; Li, Qiye; Wang, Zongji; Zhou, Yang; Martelli, Paolo; Li, Fang; Xiong, Zijun; Wang, Jian; Yang, Huanming; Zhang, Guojie

    2017-07-01

    The Chinese crocodile lizard, Shinisaurus crocodilurus, is the only living representative of the monotypic family Shinisauridae under the order Squamata. It is an obligate semi-aquatic, viviparous, diurnal species restricted to specific portions of mountainous locations in southwestern China and northeastern Vietnam. However, in the past several decades, this species has undergone a rapid decrease in population size due to illegal poaching and habitat disruption, making this unique reptile species endangered and listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora Appendix II since 1990. A proposal to uplist it to Appendix I was passed at the Convention on International Trade in Endangered Species of Wild Fauna and Flora Seventeenth meeting of the Conference of the Parties in 2016. To promote the conservation of this species, we sequenced the genome of a male Chinese crocodile lizard using a whole-genome shotgun strategy on the Illumina HiSeq 2000 platform. In total, we generated ∼291 Gb of raw sequencing data (×149 depth) from 13 libraries with insert sizes ranging from 250 bp to 40 kb. After filtering for polymerase chain reaction-duplicated and low-quality reads, ∼137 Gb of clean data (×70 depth) were obtained for genome assembly. We yielded a draft genome assembly with a total length of 2.24 Gb and an N50 scaffold size of 1.47 Mb. The assembled genome was predicted to contain 20 150 protein-coding genes and up to 1114 Mb (49.6%) of repetitive elements. The genomic resource of the Chinese crocodile lizard will contribute to deciphering the biology of this organism and provides an essential tool for conservation efforts. It also provides a valuable resource for future study of squamate evolution. © The Authors 2017. Published by Oxford University Press.

  18. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.

  19. A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

    PubMed Central

    Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

    2006-01-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031

  20. Biology of Three ICE Families: SXT/R391, ICEBs1, and ICESt1/ICESt3.

    PubMed

    Carraro, Nicolas; Burrus, Vincent

    2014-12-01

    Integrative and Conjugative Elements (ICEs) are bacterial mobile genetic elements that play a key role in bacterial genomes dynamics and evolution. ICEs are widely distributed among virtually all bacterial genera. Recent extensive studies have unraveled their high diversity and complexity. The present review depicts the general conserved features of ICEs and describes more precisely three major families of ICEs that have been extensively studied in the past decade for their biology, their evolution and their impact on genomes dynamics. First, the large SXT/R391 family of ICEs disseminates antibiotic resistance genes and drives the exchange of mobilizable genomic islands (MGIs) between many enteric pathogens such as Vibrio cholerae. Second, ICEBs1 of Bacillus subtilis is the most well understood ICE of Gram-positive bacteria, notably regarding the regulation of its dissemination and its initially unforeseen extrachromosomal replication, which could be a common feature of ICEs of both Gram-positive and Gram-negative bacteria. Finally, ICESt1 and ICESt3 of Streptococcus thermophilus are the prototypes of a large family of ICEs widely distributed among various streptococci. These ICEs carry an original regulation module that associates regulators related to those of both SXT/R391 and ICEBs1. Study of ICESt1 and ICESt3 uncovered the cis-mobilization of related genomic islands (CIMEs) by a mechanism called accretion-mobilization, which likely represents a paradigm for the evolution of many ICEs and genomic islands. These three major families of ICEs give a glimpse about ICEs dynamics and their high impact on bacterial adaptation.

  1. Diverse Lifestyles and Strategies of Plant Pathogenesis Encoded in the Genomes of Eighteen Dothideomycetes Fungi

    PubMed Central

    Ohm, Robin A.; Feau, Nicolas; Henrissat, Bernard; Schoch, Conrad L.; Horwitz, Benjamin A.; Barry, Kerrie W.; Condon, Bradford J.; Copeland, Alex C.; Dhillon, Braham; Glaser, Fabian; Hesse, Cedar N.; Kosti, Idit; LaButti, Kurt; Lindquist, Erika A.; Lucas, Susan; Salamov, Asaf A.; Bradshaw, Rosie E.; Ciuffetti, Lynda; Hamelin, Richard C.; Kema, Gert H. J.; Lawrence, Christopher; Scott, James A.; Spatafora, Joseph W.; Turgeon, B. Gillian; de Wit, Pierre J. G. M.; Zhong, Shaobin; Goodwin, Stephen B.; Grigoriev, Igor V.

    2012-01-01

    The class Dothideomycetes is one of the largest groups of fungi with a high level of ecological diversity including many plant pathogens infecting a broad range of hosts. Here, we compare genome features of 18 members of this class, including 6 necrotrophs, 9 (hemi)biotrophs and 3 saprotrophs, to analyze genome structure, evolution, and the diverse strategies of pathogenesis. The Dothideomycetes most likely evolved from a common ancestor more than 280 million years ago. The 18 genome sequences differ dramatically in size due to variation in repetitive content, but show much less variation in number of (core) genes. Gene order appears to have been rearranged mostly within chromosomal boundaries by multiple inversions, in extant genomes frequently demarcated by adjacent simple repeats. Several Dothideomycetes contain one or more gene-poor, transposable element (TE)-rich putatively dispensable chromosomes of unknown function. The 18 Dothideomycetes offer an extensive catalogue of genes involved in cellulose degradation, proteolysis, secondary metabolism, and cysteine-rich small secreted proteins. Ancestors of the two major orders of plant pathogens in the Dothideomycetes, the Capnodiales and Pleosporales, may have had different modes of pathogenesis, with the former having fewer of these genes than the latter. Many of these genes are enriched in proximity to transposable elements, suggesting faster evolution because of the effects of repeat induced point (RIP) mutations. A syntenic block of genes, including oxidoreductases, is conserved in most Dothideomycetes and upregulated during infection in L. maculans, suggesting a possible function in response to oxidative stress. PMID:23236275

  2. Diverse Lifestyles and Strategies of Plant Pathogenesis Encoded in the Genomes of Eighteen Dothideomycetes Fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ohm, Robin A.; Feau, Nicolas; Henrissat, Bernard

    The class Dothideomycetes is one of the largest groups of fungi with a high level of ecological diversity including many plant pathogens infecting a broad range of hosts. Here, we compare genome features of 18 members of this class, including 6 necrotrophs, 9 (hemi)biotrophs and 3 saprotrophs, to analyze genome structure, evolution, and the diverse strategies of pathogenesis. The Dothideomycetes most likely evolved from a common ancestor more than 280 million years ago. The 18 genome sequences differ dramatically in size due to variation in repetitive content, but show much less variation in number of (core) genes. Gene order appearsmore » to have been rearranged mostly within chromosomal boundaries by multiple inversions, in extant genomes frequently demarcated by adjacent simple repeats. Several Dothideomycetes contain one or more gene-poor, transposable element (TE)-rich putatively dispensable chromosomes of unknown function. The 18 Dothideomycetes offer an extensive catalogue of genes involved in cellulose degradation, proteolysis, secondary metabolism, and cysteine-rich small secreted proteins. Ancestors of the two major orders of plant pathogens in the Dothideomycetes, the Capnodiales and Pleosporales, may have had different modes of pathogenesis, with the former having fewer of these genes than the latter. Many of these genes are enriched in proximity to transposable elements, suggesting faster evolution because of the effects of repeat induced point (RIP) mutations. A syntenic block of genes, including oxidoreductases, is conserved in most Dothideomycetes and upregulated during infection in L. maculans, suggesting a possible function in response to oxidative stress.« less

  3. Chompy: an infestation of MITE-like repetitive elements in the crocodilian genome.

    PubMed

    Ray, David A; Hedges, Dale J; Herke, Scott W; Fowlkes, Justin D; Barnes, Erin W; LaVie, Daniel K; Goodwin, Lindsey M; Densmore, Llewellyn D; Batzer, Mark A

    2005-12-05

    Interspersed repeats are a major component of most eukaryotic genomes and have an impact on genome size and stability, but the repetitive element landscape of crocodilian genomes has not yet been fully investigated. In this report, we provide the first detailed characterization of an interspersed repeat element in any crocodilian genome. Chompy is a putative miniature inverted-repeat transposable element (MITE) family initially recovered from the genome of Alligator mississippiensis (American alligator) but also present in the genomes of Crocodylus moreletii (Morelet's crocodile) and Gavialis gangeticus (Indian gharial). The element has all of the hallmarks of MITEs including terminal inverted repeats, possible target site duplications, and a tendency to form secondary structures. We estimate the copy number in the alligator genome to be approximately 46,000 copies. As a result of their size and unique properties, Chompy elements may provide a useful source of genomic variation for crocodilian comparative genomics.

  4. Functional conservation between rodents and chicken of regulatory sequences driving skeletal muscle gene expression in transgenic chickens

    PubMed Central

    2010-01-01

    Background Regulatory elements that control expression of specific genes during development have been shown in many cases to contain functionally-conserved modules that can be transferred between species and direct gene expression in a comparable developmental pattern. An example of such a module has been identified at the rat myosin light chain (MLC) 1/3 locus, which has been well characterised in transgenic mouse studies. This locus contains two promoters encoding two alternatively spliced isoforms of alkali myosin light chain. These promoters are differentially regulated during development through the activity of two enhancer elements. The MLC3 promoter alone has been shown to confer expression of a reporter gene in skeletal and cardiac muscle in transgenic mice and the addition of the downstream MLC enhancer increased expression levels in skeletal muscle. We asked whether this regulatory module, sufficient for striated muscle gene expression in the mouse, would drive expression in similar domains in the chicken. Results We have observed that a conserved downstream MLC enhancer is present in the chicken MLC locus. We found that the rat MLC1/3 regulatory elements were transcriptionally active in chick skeletal muscle primary cultures. We observed that a single copy lentiviral insert containing this regulatory cassette was able to drive expression of a lacZ reporter gene in the fast-fibres of skeletal muscle in chicken in three independent transgenic chicken lines in a pattern similar to the endogenous MLC locus. Reporter gene expression in cardiac muscle tissues was not observed for any of these lines. Conclusions From these results we conclude that skeletal expression from this regulatory module is conserved in a genomic context between rodents and chickens. This transgenic module will be useful in future investigations of muscle development in avian species. PMID:20184756

  5. Conserved intergenic sequences revealed by CTAG-profiling in Salmonella: thermodynamic modeling for function prediction

    NASA Astrophysics Data System (ADS)

    Tang, Le; Zhu, Songling; Mastriani, Emilio; Fang, Xin; Zhou, Yu-Jie; Li, Yong-Guo; Johnston, Randal N.; Guo, Zheng; Liu, Gui-Rong; Liu, Shu-Lin

    2017-03-01

    Highly conserved short sequences help identify functional genomic regions and facilitate genomic annotation. We used Salmonella as the model to search the genome for evolutionarily conserved regions and focused on the tetranucleotide sequence CTAG for its potentially important functions. In Salmonella, CTAG is highly conserved across the lineages and large numbers of CTAG-containing short sequences fall in intergenic regions, strongly indicating their biological importance. Computer modeling demonstrated stable stem-loop structures in some of the CTAG-containing intergenic regions, and substitution of a nucleotide of the CTAG sequence would radically rearrange the free energy and disrupt the structure. The postulated degeneration of CTAG takes distinct patterns among Salmonella lineages and provides novel information about genomic divergence and evolution of these bacterial pathogens. Comparison of the vertically and horizontally transmitted genomic segments showed different CTAG distribution landscapes, with the genome amelioration process to remove CTAG taking place inward from both terminals of the horizontally acquired segment.

  6. A role for palindromic structures in the cis-region of maize Sirevirus LTRs in transposable element evolution and host epigenetic response.

    PubMed

    Bousios, Alexandros; Diez, Concepcion M; Takuno, Shohei; Bystry, Vojtech; Darzentas, Nikos; Gaut, Brandon S

    2016-02-01

    Transposable elements (TEs) proliferate within the genome of their host, which responds by silencing them epigenetically. Much is known about the mechanisms of silencing in plants, particularly the role of siRNAs in guiding DNA methylation. In contrast, little is known about siRNA targeting patterns along the length of TEs, yet this information may provide crucial insights into the dynamics between hosts and TEs. By focusing on 6456 carefully annotated, full-length Sirevirus LTR retrotransposons in maize, we show that their silencing associates with underlying characteristics of the TE sequence and also uncover three features of the host-TE interaction. First, siRNA mapping varies among families and among elements, but particularly along the length of elements. Within the cis-regulatory portion of the LTRs, a complex palindrome-rich region acts as a hotspot of both siRNA matching and sequence evolution. These patterns are consistent across leaf, tassel, and immature ear libraries, but particularly emphasized for floral tissues and 21- to 22-nt siRNAs. Second, this region has the ability to form hairpins, making it a potential template for the production of miRNA-like, hairpin-derived small RNAs. Third, Sireviruses are targeted by siRNAs as a decreasing function of their age, but the oldest elements remain highly targeted, partially by siRNAs that cross-map to the youngest elements. We show that the targeting of older Sireviruses reflects their conserved palindromes. Altogether, we hypothesize that the palindromes aid the silencing of active elements and influence transposition potential, siRNA targeting levels, and ultimately the fate of an element within the genome. © 2016 Bousios et al.; Published by Cold Spring Harbor Laboratory Press.

  7. cisprimertool: software to implement a comparative genomics strategy for the development of conserved intron scanning (CIS) markers.

    PubMed

    Jayashree, B; Jagadeesh, V T; Hoisington, D

    2008-05-01

    The availability of complete, annotated genomic sequence information in model organisms is a rich resource that can be extended to understudied orphan crops through comparative genomic approaches. We report here a software tool (cisprimertool) for the identification of conserved intron scanning regions using expressed sequence tag alignments to a completely sequenced model crop genome. The method used is based on earlier studies reporting the assessment of conserved intron scanning primers (called CISP) within relatively conserved exons located near exon-intron boundaries from onion, banana, sorghum and pearl millet alignments with rice. The tool is freely available to academic users at http://www.icrisat.org/gt-bt/CISPTool.htm. © 2007 ICRISAT.

  8. The Genome of a Pathogenic Rhodococcus: Cooptive Virulence Underpinned by Key Gene Acquisitions

    PubMed Central

    Letek, Michal; González, Patricia; MacArthur, Iain; Rodríguez, Héctor; Freeman, Tom C.; Valero-Rello, Ana; Blanco, Mónica; Buckley, Tom; Cherevach, Inna; Fahey, Ruth; Hapeshi, Alexia; Holdstock, Jolyon; Leadon, Desmond; Navas, Jesús; Ocampo, Alain; Quail, Michael A.; Sanders, Mandy; Scortti, Mariela M.; Prescott, John F.; Fogarty, Ursula; Meijer, Wim G.; Parkhill, Julian; Bentley, Stephen D.; Vázquez-Boland, José A.

    2010-01-01

    We report the genome of the facultative intracellular parasite Rhodococcus equi, the only animal pathogen within the biotechnologically important actinobacterial genus Rhodococcus. The 5.0-Mb R. equi 103S genome is significantly smaller than those of environmental rhodococci. This is due to genome expansion in nonpathogenic species, via a linear gain of paralogous genes and an accelerated genetic flux, rather than reductive evolution in R. equi. The 103S genome lacks the extensive catabolic and secondary metabolic complement of environmental rhodococci, and it displays unique adaptations for host colonization and competition in the short-chain fatty acid–rich intestine and manure of herbivores—two main R. equi reservoirs. Except for a few horizontally acquired (HGT) pathogenicity loci, including a cytoadhesive pilus determinant (rpl) and the virulence plasmid vap pathogenicity island (PAI) required for intramacrophage survival, most of the potential virulence-associated genes identified in R. equi are conserved in environmental rhodococci or have homologs in nonpathogenic Actinobacteria. This suggests a mechanism of virulence evolution based on the cooption of existing core actinobacterial traits, triggered by key host niche–adaptive HGT events. We tested this hypothesis by investigating R. equi virulence plasmid-chromosome crosstalk, by global transcription profiling and expression network analysis. Two chromosomal genes conserved in environmental rhodococci, encoding putative chorismate mutase and anthranilate synthase enzymes involved in aromatic amino acid biosynthesis, were strongly coregulated with vap PAI virulence genes and required for optimal proliferation in macrophages. The regulatory integration of chromosomal metabolic genes under the control of the HGT–acquired plasmid PAI is thus an important element in the cooptive virulence of R. equi. PMID:20941392

  9. In-cell SHAPE uncovers dynamic interactions between the untranslated regions of the foot-and-mouth disease virus RNA.

    PubMed

    Diaz-Toledano, Rosa; Lozano, Gloria; Martinez-Salas, Encarnacion

    2017-02-17

    The genome of RNA viruses folds into 3D structures that include long-range RNA–RNA interactions relevant to control critical steps of the viral cycle. In particular, initiation of translation driven by the IRES element of foot-and-mouth disease virus is stimulated by the 3΄UTR. Here we sought to investigate the RNA local flexibility of the IRES element and the 3΄UTR in living cells. The SHAPE reactivity observed in vivo showed statistically significant differences compared to the free RNA, revealing protected or exposed positions within the IRES and the 3΄UTR. Importantly, the IRES local flexibility was modified in the presence of the 3΄UTR, showing significant protections at residues upstream from the functional start codon. Conversely, presence of the IRES element in cis altered the 3΄UTR local flexibility leading to an overall enhanced reactivity. Unlike the reactivity changes observed in the IRES element, the SHAPE differences of the 3΄UTR were large but not statistically significant, suggesting multiple dynamic RNA interactions. These results were supported by covariation analysis, which predicted IRES-3΄UTR conserved helices in agreement with the protections observed by SHAPE probing. Mutational analysis suggested that disruption of one of these interactions could be compensated by alternative base pairings, providing direct evidences for dynamic long-range interactions between these distant elements of the viral genome.

  10. Identification of 15 candidate structured noncoding RNA motifs in fungi by comparative genomics.

    PubMed

    Li, Sanshu; Breaker, Ronald R

    2017-10-13

    With the development of rapid and inexpensive DNA sequencing, the genome sequences of more than 100 fungal species have been made available. This dataset provides an excellent resource for comparative genomics analyses, which can be used to discover genetic elements, including noncoding RNAs (ncRNAs). Bioinformatics tools similar to those used to uncover novel ncRNAs in bacteria, likewise, should be useful for searching fungal genomic sequences, and the relative ease of genetic experiments with some model fungal species could facilitate experimental validation studies. We have adapted a bioinformatics pipeline for discovering bacterial ncRNAs to systematically analyze many fungal genomes. This comparative genomics pipeline integrates information on conserved RNA sequence and structural features with alternative splicing information to reveal fungal RNA motifs that are candidate regulatory domains, or that might have other possible functions. A total of 15 prominent classes of structured ncRNA candidates were identified, including variant HDV self-cleaving ribozyme representatives, atypical snoRNA candidates, and possible structured antisense RNA motifs. Candidate regulatory motifs were also found associated with genes for ribosomal proteins, S-adenosylmethionine decarboxylase (SDC), amidase, and HexA protein involved in Woronin body formation. We experimentally confirm that the variant HDV ribozymes undergo rapid self-cleavage, and we demonstrate that the SDC RNA motif reduces the expression of SAM decarboxylase by translational repression. Furthermore, we provide evidence that several other motifs discovered in this study are likely to be functional ncRNA elements. Systematic screening of fungal genomes using a computational discovery pipeline has revealed the existence of a variety of novel structured ncRNAs. Genome contexts and similarities to known ncRNA motifs provide strong evidence for the biological and biochemical functions of some newly found ncRNA motifs. Although initial examinations of several motifs provide evidence for their likely functions, other motifs will require more in-depth analysis to reveal their functions.

  11. Analysis of Genome Plasticity in Pathogenic and Commensal Escherichia coli Isolates by Use of DNA Arrays

    PubMed Central

    Dobrindt, Ulrich; Agerer, Franziska; Michaelis, Kai; Janka, Andreas; Buchrieser, Carmen; Samuelson, Martin; Svanborg, Catharina; Gottschalk, Gerhard; Karch, Helge; Hacker, Jörg

    2003-01-01

    Genomes of prokaryotes differ significantly in size and DNA composition. Escherichia coli is considered a model organism to analyze the processes involved in bacterial genome evolution, as the species comprises numerous pathogenic and commensal variants. Pathogenic and nonpathogenic E. coli strains differ in the presence and absence of additional DNA elements contributing to specific virulence traits and also in the presence and absence of additional genetic information. To analyze the genetic diversity of pathogenic and commensal E. coli isolates, a whole-genome approach was applied. Using DNA arrays, the presence of all translatable open reading frames (ORFs) of nonpathogenic E. coli K-12 strain MG1655 was investigated in 26 E. coli isolates, including various extraintestinal and intestinal pathogenic E. coli isolates, 3 pathogenicity island deletion mutants, and commensal and laboratory strains. Additionally, the presence of virulence-associated genes of E. coli was determined using a DNA “pathoarray” developed in our laboratory. The frequency and distributional pattern of genomic variations vary widely in different E. coli strains. Up to 10% of the E. coli K-12-specific ORFs were not detectable in the genomes of the different strains. DNA sequences described for extraintestinal or intestinal pathogenic E. coli are more frequently detectable in isolates of the same origin than in other pathotypes. Several genes coding for virulence or fitness factors are also present in commensal E. coli isolates. Based on these results, the conserved E. coli core genome is estimated to consist of at least 3,100 translatable ORFs. The absence of K-12-specific ORFs was detectable in all chromosomal regions. These data demonstrate the great genome heterogeneity and genetic diversity among E. coli strains and underline the fact that both the acquisition and deletion of DNA elements are important processes involved in the evolution of prokaryotes. PMID:12618447

  12. Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences

    PubMed Central

    Huynen, Martijn; Snel, Berend; Lathe, Warren; Bork, Peer

    2000-01-01

    Various new methods have been proposed to predict functional interactions between proteins based on the genomic context of their genes. The types of genomic context that they use are Type I: the fusion of genes; Type II: the conservation of gene-order or co-occurrence of genes in potential operons; and Type III: the co-occurrence of genes across genomes (phylogenetic profiles). Here we compare these types for their coverage, their correlations with various types of functional interaction, and their overlap with homology-based function assignment. We apply the methods to Mycoplasma genitalium, the standard benchmarking genome in computational and experimental genomics. Quantitatively, conservation of gene order is the technique with the highest coverage, applying to 37% of the genes. By combining gene order conservation with gene fusion (6%), the co-occurrence of genes in operons in absence of gene order conservation (8%), and the co-occurrence of genes across genomes (11%), significant context information can be obtained for 50% of the genes (the categories overlap). Qualitatively, we observe that the functional interactions between genes are stronger as the requirements for physical neighborhood on the genome are more stringent, while the fraction of potential false positives decreases. Moreover, only in cases in which gene order is conserved in a substantial fraction of the genomes, in this case six out of twenty-five, does a single type of functional interaction (physical interaction) clearly dominate (>80%). In other cases, complementary function information from homology searches, which is available for most of the genes with significant genomic context, is essential to predict the type of interaction. Using a combination of genomic context and homology searches, new functional features can be predicted for 10% of M. genitalium genes. PMID:10958638

  13. Endogenous Retrovirus 3 – History, Physiology, and Pathology

    PubMed Central

    Bustamante Rivera, Yomara Y.; Brütting, Christine; Schmidt, Caroline; Volkmer, Ines; Staege, Martin S.

    2018-01-01

    Endogenous viral elements (EVE) seem to be present in all eukaryotic genomes. The composition of EVE varies between different species. The endogenous retrovirus 3 (ERV3) is one of these elements that is present only in humans and other Catarrhini. Conservation of ERV3 in most of the investigated Catarrhini and the expression pattern in normal tissues suggest a putative physiological role of ERV3. On the other hand, ERV3 has been implicated in the pathogenesis of auto-immunity and cancer. In the present review we summarize knowledge about this interesting EVE. We propose the model that expression of ERV3 (and probably other EVE loci) under pathological conditions might be part of a metazoan SOS response. PMID:29379485

  14. Conserved noncoding sequences conserve biological networks and influence genome evolution.

    PubMed

    Xie, Jianbo; Qian, Kecheng; Si, Jingna; Xiao, Liang; Ci, Dong; Zhang, Deqiang

    2018-05-01

    Comparative genomics approaches have identified numerous conserved cis-regulatory sequences near genes in plant genomes. Despite the identification of these conserved noncoding sequences (CNSs), our knowledge of their functional importance and selection remains limited. Here, we used a combination of DNA methylome analysis, microarray expression analyses, and functional annotation to study these sequences in the model tree Populus trichocarpa. Methylation in CG contexts and non-CG contexts was lower in CNSs, particularly CNSs in the 5'-upstream regions of genes, compared with other sites in the genome. We observed that CNSs are enriched in genes with transcription and binding functions, and this also associated with syntenic genes and those from whole-genome duplications, suggesting that cis-regulatory sequences play a key role in genome evolution. We detected a significant positive correlation between CNS number and protein interactions, suggesting that CNSs may have roles in the evolution and maintenance of biological networks. The divergence of CNSs indicates that duplication-degeneration-complementation drives the subfunctionalization of a proportion of duplicated genes from whole-genome duplication. Furthermore, population genomics confirmed that most CNSs are under strong purifying selection and only a small subset of CNSs shows evidence of adaptive evolution. These findings provide a foundation for future studies exploring these key genomic features in the maintenance of biological networks, local adaptation, and transcription.

  15. QTL Mapping of Sex Determination Loci Supports an Ancient Pathway in Ants and Honey Bees.

    PubMed

    Miyakawa, Misato O; Mikheyev, Alexander S

    2015-11-01

    Sex determination mechanisms play a central role in life-history characteristics, affecting mating systems, sex ratios, inbreeding tolerance, etc. Downstream components of sex determination pathways are highly conserved, but upstream components evolve rapidly. Evolutionary dynamics of sex determination remain poorly understood, particularly because mechanisms appear so diverse. Here we investigate the origins and evolution of complementary sex determination (CSD) in ants and bees. The honey bee has a well-characterized CSD locus, containing tandemly arranged homologs of the transformer gene [complementary sex determiner (csd) and feminizer (fem)]. Such tandem paralogs appear frequently in aculeate hymenopteran genomes. However, only comparative genomic, but not functional, data support a broader role for csd/fem in sex determination, and whether species other than the honey bee use this pathway remains controversial. Here we used a backcross to test whether csd/fem acts as a CSD locus in an ant (Vollenhovia emeryi). After sequencing and assembling the genome, we computed a linkage map, and conducted a quantitative trait locus (QTL) analysis of diploid male production using 68 diploid males and 171 workers. We found two QTLs on separate linkage groups (CsdQTL1 and CsdQTL2) that jointly explained 98.0% of the phenotypic variance. CsdQTL1 included two tandem transformer homologs. These data support the prediction that the same CSD mechanism has indeed been conserved for over 100 million years. CsdQTL2 had no similarity to CsdQTL1 and included a 236-kb region with no obvious CSD gene candidates, making it impossible to conclusively characterize it using our data. The sequence of this locus was conserved in at least one other ant genome that diverged >75 million years ago. By applying QTL analysis to ants for the first time, we support the hypothesis that elements of hymenopteran CSD are ancient, but also show that more remains to be learned about the diversity of CSD mechanisms.

  16. Component identification of electron transport chains in curdlan-producing Agrobacterium sp. ATCC 31749 and its genome-specific prediction using comparative genome and phylogenetic trees analysis.

    PubMed

    Zhang, Hongtao; Setubal, Joao Carlos; Zhan, Xiaobei; Zheng, Zhiyong; Yu, Lijun; Wu, Jianrong; Chen, Dingqiang

    2011-06-01

    Agrobacterium sp. ATCC 31749 (formerly named Alcaligenes faecalis var. myxogenes) is a non-pathogenic aerobic soil bacterium used in large scale biotechnological production of curdlan. However, little is known about its genomic information. DNA partial sequence of electron transport chains (ETCs) protein genes were obtained in order to understand the components of ETC and genomic-specificity in Agrobacterium sp. ATCC 31749. Degenerate primers were designed according to ETC conserved sequences in other reported species. DNA partial sequences of ETC genes in Agrobacterium sp. ATCC 31749 were cloned by the PCR method using degenerate primers. Based on comparative genomic analysis, nine electron transport elements were ascertained, including NADH ubiquinone oxidoreductase, succinate dehydrogenase complex II, complex III, cytochrome c, ubiquinone biosynthesis protein ubiB, cytochrome d terminal oxidase, cytochrome bo terminal oxidase, cytochrome cbb (3)-type terminal oxidase and cytochrome caa (3)-type terminal oxidase. Similarity and phylogenetic analyses of these genes revealed that among fully sequenced Agrobacterium species, Agrobacterium sp. ATCC 31749 is closest to Agrobacterium tumefaciens C58. Based on these results a comprehensive ETC model for Agrobacterium sp. ATCC 31749 is proposed.

  17. Genome-Wide Identification of the Alba Gene Family in Plants and Stress-Responsive Expression of the Rice Alba Genes

    PubMed Central

    Verma, Jitendra Kumar; Wardhan, Vijay; Singh, Deepali; Chakraborty, Subhra; Chakraborty, Niranjan

    2018-01-01

    Architectural proteins play key roles in genome construction and regulate the expression of many genes, albeit the modulation of genome plasticity by these proteins is largely unknown. A critical screening of the architectural proteins in five crop species, viz., Oryza sativa, Zea mays, Sorghum bicolor, Cicer arietinum, and Vitis vinifera, and in the model plant Arabidopsis thaliana along with evolutionary relevant species such as Chlamydomonas reinhardtii, Physcomitrella patens, and Amborella trichopoda, revealed 9, 20, 10, 7, 7, 6, 1, 4, and 4 Alba (acetylation lowers binding affinity) genes, respectively. A phylogenetic analysis of the genes and of their counterparts in other plant species indicated evolutionary conservation and diversification. In each group, the structural components of the genes and motifs showed significant conservation. The chromosomal location of the Alba genes of rice (OsAlba), showed an unequal distribution on 8 of its 12 chromosomes. The expression profiles of the OsAlba genes indicated a distinct tissue-specific expression in the seedling, vegetative, and reproductive stages. The quantitative real-time PCR (qRT-PCR) analysis of the OsAlba genes confirmed their stress-inducible expression under multivariate environmental conditions and phytohormone treatments. The evaluation of the regulatory elements in 68 Alba genes from the 9 species studied led to the identification of conserved motifs and overlapping microRNA (miRNA) target sites, suggesting the conservation of their function in related proteins and a divergence in their biological roles across species. The 3D structure and the prediction of putative ligands and their binding sites for OsAlba proteins offered a key insight into the structure–function relationship. These results provide a comprehensive overview of the subtle genetic diversification of the OsAlba genes, which will help in elucidating their functional role in plants. PMID:29597290

  18. Waiting in the wings: what can we learn about gene co-option from the diversification of butterfly wing patterns?

    PubMed

    Jiggins, Chris D; Wallbank, Richard W R; Hanly, Joseph J

    2017-02-05

    A major challenge is to understand how conserved gene regulatory networks control the wonderful diversity of form that we see among animals and plants. Butterfly wing patterns are an excellent example of this diversity. Butterfly wings form as imaginal discs in the caterpillar and are constructed by a gene regulatory network, much of which is conserved across the holometabolous insects. Recent work in Heliconius butterflies takes advantage of genomic approaches and offers insights into how the diversification of wing patterns is overlaid onto this conserved network. WntA is a patterning morphogen that alters spatial information in the wing. Optix is a transcription factor that acts later in development to paint specific wing regions red. Both of these loci fit the paradigm of conserved protein-coding loci with diverse regulatory elements and developmental roles that have taken on novel derived functions in patterning wings. These discoveries offer insights into the 'Nymphalid Ground Plan', which offers a unifying hypothesis for pattern formation across nymphalid butterflies. These loci also represent 'hotspots' for morphological change that have been targeted repeatedly during evolution. Both convergent and divergent evolution of a great diversity of patterns is controlled by complex alleles at just a few genes. We suggest that evolutionary change has become focused on one or a few genetic loci for two reasons. First, pre-existing complex cis-regulatory loci that already interact with potentially relevant transcription factors are more likely to acquire novel functions in wing patterning. Second, the shape of wing regulatory networks may constrain evolutionary change to one or a few loci. Overall, genomic approaches that have identified wing patterning loci in these butterflies offer broad insight into how gene regulatory networks evolve to produce diversity.This article is part of the themed issue 'Evo-devo in the genomics era, and the origins of morphological diversity'. © 2016 The Author(s).

  19. Waiting in the wings: what can we learn about gene co-option from the diversification of butterfly wing patterns?

    PubMed Central

    Wallbank, Richard W. R.; Hanly, Joseph J.

    2017-01-01

    A major challenge is to understand how conserved gene regulatory networks control the wonderful diversity of form that we see among animals and plants. Butterfly wing patterns are an excellent example of this diversity. Butterfly wings form as imaginal discs in the caterpillar and are constructed by a gene regulatory network, much of which is conserved across the holometabolous insects. Recent work in Heliconius butterflies takes advantage of genomic approaches and offers insights into how the diversification of wing patterns is overlaid onto this conserved network. WntA is a patterning morphogen that alters spatial information in the wing. Optix is a transcription factor that acts later in development to paint specific wing regions red. Both of these loci fit the paradigm of conserved protein-coding loci with diverse regulatory elements and developmental roles that have taken on novel derived functions in patterning wings. These discoveries offer insights into the ‘Nymphalid Ground Plan’, which offers a unifying hypothesis for pattern formation across nymphalid butterflies. These loci also represent ‘hotspots’ for morphological change that have been targeted repeatedly during evolution. Both convergent and divergent evolution of a great diversity of patterns is controlled by complex alleles at just a few genes. We suggest that evolutionary change has become focused on one or a few genetic loci for two reasons. First, pre-existing complex cis-regulatory loci that already interact with potentially relevant transcription factors are more likely to acquire novel functions in wing patterning. Second, the shape of wing regulatory networks may constrain evolutionary change to one or a few loci. Overall, genomic approaches that have identified wing patterning loci in these butterflies offer broad insight into how gene regulatory networks evolve to produce diversity. This article is part of the themed issue ‘Evo-devo in the genomics era, and the origins of morphological diversity’. PMID:27994126

  20. The genome sequence and effector complement of the flax rust pathogen Melampsora lini.

    PubMed

    Nemri, Adnane; Saunders, Diane G O; Anderson, Claire; Upadhyaya, Narayana M; Win, Joe; Lawrence, Gregory J; Jones, David A; Kamoun, Sophien; Ellis, Jeffrey G; Dodds, Peter N

    2014-01-01

    Rust fungi cause serious yield reductions on crops, including wheat, barley, soybean, coffee, and represent real threats to global food security. Of these fungi, the flax rust pathogen Melampsora lini has been developed most extensively over the past 80 years as a model to understand the molecular mechanisms that underpin pathogenesis. During infection, M. lini secretes virulence effectors to promote disease. The number of these effectors, their function and their degree of conservation across rust fungal species is unknown. To assess this, we sequenced and assembled de novo the genome of M. lini isolate CH5 into 21,130 scaffolds spanning 189 Mbp (scaffold N50 of 31 kbp). Global analysis of the DNA sequence revealed that repetitive elements, primarily retrotransposons, make up at least 45% of the genome. Using ab initio predictions, transcriptome data and homology searches, we identified 16,271 putative protein-coding genes. An analysis pipeline was then implemented to predict the effector complement of M. lini and compare it to that of the poplar rust, wheat stem rust and wheat stripe rust pathogens to identify conserved and species-specific effector candidates. Previous knowledge of four cloned M. lini avirulence effector proteins and two basidiomycete effectors was used to optimize parameters of the effector prediction pipeline. Markov clustering based on sequence similarity was performed to group effector candidates from all four rust pathogens. Clusters containing at least one member from M. lini were further analyzed and prioritized based on features including expression in isolated haustoria and infected leaf tissue and conservation across rust species. Herein, we describe 200 of 940 clusters that ranked highest on our priority list, representing 725 flax rust candidate effectors. Our findings on this important model rust species provide insight into how effectors of rust fungi are conserved across species and how they may act to promote infection on their hosts.

  1. Acquisition and evolution of plant pathogenesis-associated gene clusters and candidate determinants of tissue-specificity in xanthomonas.

    PubMed

    Lu, Hong; Patil, Prabhu; Van Sluys, Marie-Anne; White, Frank F; Ryan, Robert P; Dow, J Maxwell; Rabinowicz, Pablo; Salzberg, Steven L; Leach, Jan E; Sonti, Ramesh; Brendel, Volker; Bogdanove, Adam J

    2008-01-01

    Xanthomonas is a large genus of plant-associated and plant-pathogenic bacteria. Collectively, members cause diseases on over 392 plant species. Individually, they exhibit marked host- and tissue-specificity. The determinants of this specificity are unknown. To assess potential contributions to host- and tissue-specificity, pathogenesis-associated gene clusters were compared across genomes of eight Xanthomonas strains representing vascular or non-vascular pathogens of rice, brassicas, pepper and tomato, and citrus. The gum cluster for extracellular polysaccharide is conserved except for gumN and sequences downstream. The xcs and xps clusters for type II secretion are conserved, except in the rice pathogens, in which xcs is missing. In the otherwise conserved hrp cluster, sequences flanking the core genes for type III secretion vary with respect to insertion sequence element and putative effector gene content. Variation at the rpf (regulation of pathogenicity factors) cluster is more pronounced, though genes with established functional relevance are conserved. A cluster for synthesis of lipopolysaccharide varies highly, suggesting multiple horizontal gene transfers and reassortments, but this variation does not correlate with host- or tissue-specificity. Phylogenetic trees based on amino acid alignments of gum, xps, xcs, hrp, and rpf cluster products generally reflect strain phylogeny. However, amino acid residues at four positions correlate with tissue specificity, revealing hpaA and xpsD as candidate determinants. Examination of genome sequences of xanthomonads Xylella fastidiosa and Stenotrophomonas maltophilia revealed that the hrp, gum, and xcs clusters are recent acquisitions in the Xanthomonas lineage. Our results provide insight into the ancestral Xanthomonas genome and indicate that differentiation with respect to host- and tissue-specificity involved not major modifications or wholesale exchange of clusters, but subtle changes in a small number of genes or in non-coding sequences, and/or differences outside the clusters, potentially among regulatory targets or secretory substrates.

  2. Cloning of a CACTA transposon-like insertion in intron I of tomato invertase Lin5 gene and identification of transposase-like sequences of Solanaceae species.

    PubMed

    Proels, Reinhard K; Roitsch, Thomas

    2006-03-01

    Very few CACTA transposon-like sequences have been described in Solanaceae species. Sequence information has been restricted to partial transposase (TPase)-like fragments, and no target gene of CACTA-like transposon insertion has been described in tomato to date. In this manuscript, we report on a CACTA transposon-like insertion in intron I of tomato (Lycopersicon esculentum) invertase gene Lin5 and TPase-like sequences of several Solanaceae species. Consensus primers deduced from the TPase region of the tomato CACTA transposon-like element allowed the amplification of similar sequences from various Solanaceae species of different subfamilies including Solaneae (Solanum tuberosum), Cestreae (Nicotiana tabacum) and Datureae (Datura stramonium). This demonstrates the ubiquitous presence of CACTA-like elements in Solanaceae genomes. The obtained partial sequences are highly conserved, and allow further detection and detailed analysis of CACTA-like transposons throughout Solanaceae species. CACTA-like transposon sequences make possible the evaluation of their use for genome analysis, functional studies of genes and the evolutionary relationships between plant species.

  3. Identification of an evolutionarily conserved regulatory element of the zebrafish col2a1a gene.

    PubMed

    Dale, Rodney M; Topczewski, Jacek

    2011-09-15

    Zebrafish (Danio rerio) is an excellent model organism for the study of vertebrate development including skeletogenesis. Studies of mammalian cartilage formation were greatly advanced through the use of a cartilage specific regulatory element of the Collagen type II alpha 1 (Col2a1) gene. In an effort to isolate such an element in zebrafish, we compared the expression of two col2a1 homologues and found that expression of col2a1b, a previously uncharacterized zebrafish homologue, only partially overlaps with col2a1a. We focused our analysis on col2a1a, as it is expressed in both the stacked chondrocytes and the perichondrium. By comparing the genomic sequence surrounding the predicted transcriptional start site of col2a1a among several species of teleosts we identified a small highly conserved sequence (R2) located 1.7 kb upstream of the presumptive transcriptional initiation site. Interestingly, neither the sequence nor location of this element is conserved between teleost and mammalian Col2a1. We generated transient and stable transgenic lines with just the R2 element or the entire 1.7 kb fragment 5' of the transcriptional initiation site. The identified regulatory elements enable the tracking of cellular development in various tissues by driving robust reporter expression in craniofacial cartilage, ear, notochord, floor plate, hypochord and fins in a pattern similar to the expression of endogenous col2a1a. Using a reporter gene driven by the R2 regulatory element, we analyzed the morphogenesis of the notochord sheath cells as they withdraw from the stack of initially uniform cells and encase the inflating vacuolated notochord cells. Finally, we show that like endogenous col2a1a, craniofacial expression of these reporter constructs depends on Sox9a transcription factor activity. At the same time, notochord expression is maintained after Sox9a knockdown, suggesting that other factors can activate expression through the identified regulatory element in this tissue. Copyright © 2011 Elsevier Inc. All rights reserved.

  4. Identification of an evolutionarily conserved regulatory element of the zebrafish col2a1a gene

    PubMed Central

    Dale, Rodney M.; Topczewski, Jacek

    2011-01-01

    Zebrafish (Danio rerio) is an excellent model organism for the study of vertebrate development including skeletogenesis. Studies of mammalian cartilage formation were greatly advanced through the use of a cartilage specific regulatory element of the Collagen type II alpha 1 (Col2a1) gene. In an effort to isolate such an element in zebrafish, we compared the expression of two col2a1 homologues and found that expression of col2a1b, a previously uncharacterized zebrafish homologue, only partially overlaps with col2a1a. We focused our analysis on col2a1a, as it is expressed in both the stacked chondrocytes and the perichondrium. By comparing the genomic sequence surrounding the predicted transcriptional start site of col2a1a among several species of teleosts we identified a small highly conserved sequence (R2) located 1.7 kb upstream of the presumptive transcriptional initiation site. Interestingly, neither the sequence nor location of this element is conserved between teleost and mammalian Col2a1. We generated transient and stable transgenic lines with just the R2 element or the entire 1.7 kb fragment 5’ of the transcriptional initiation site. The identified regulatory elements enable the tracking of cellular development in various tissues by driving robust reporter expression in craniofacial cartilage, ear, notochord, floor plate, hypochord and fins in a pattern similar to the expression of endogenous col2a1a. Using a reporter gene driven by the R2 regulatory element, we analyzed the morphogenesis of the notochord sheath cells as they withdraw from the stack of initially uniform cells and encase the inflating vacuolated notochord cells. Finally, we show that like endogenous col2a1a, craniofacial expression of these reporter constructs depends on Sox9a transcription factor activity. At the same time, notochord expression is maintained after Sox9a knockdown, suggesting that other factors can activate expression through the identified regulatory element in this tissue. PMID:21723274

  5. Linking the potato genome to the Conserved Ortholog Set (COS) markers

    USDA-ARS?s Scientific Manuscript database

    Conserved ortholog set (COS) markers are an important functional genomics resource that has greatly improved orthology detection in Asterid species. A comprehensive list of these markers is available at Sol Genomics Network (http://www.sgn.cornell.edu) and many of these have been placed in the genet...

  6. Evolutionary dynamics of a conserved sequence motif in the ribosomal genes of the ciliate Paramecium

    PubMed Central

    2010-01-01

    Background In protozoa, the identification of preserved motifs by comparative genomics is often impeded by difficulties to generate reliable alignments for non-coding sequences. Moreover, the evolutionary dynamics of regulatory elements in 3' untranslated regions (both in protozoa and metazoa) remains a virtually unexplored issue. Results By screening Paramecium tetraurelia's 3' untranslated regions for 8-mers that were previously found to be preserved in mammalian 3' UTRs, we detect and characterize a motif that is distinctly conserved in the ribosomal genes of this ciliate. The motif appears to be conserved across Paramecium aurelia species but is absent from the ribosomal genes of four additional non-Paramecium species surveyed, including another ciliate, Tetrahymena thermophila. Motif-free ribosomal genes retain fewer paralogs in the genome and appear to be lost more rapidly relative to motif-containing genes. Features associated with the discovered preserved motif are consistent with this 8-mer playing a role in post-transcriptional regulation. Conclusions Our observations 1) shed light on the evolution of a putative regulatory motif across large phylogenetic distances; 2) are expected to facilitate the understanding of the modulation of ribosomal genes expression in Paramecium; and 3) reveal a largely unexplored--and presumably not restricted to Paramecium--association between the presence/absence of a DNA motif and the evolutionary fate of its host genes. PMID:20441586

  7. Comparative genomic analysis of isoproturon-mineralizing sphingomonads reveals the isoproturon catabolic mechanism.

    PubMed

    Yan, Xin; Gu, Tao; Yi, Zhongquan; Huang, Junwei; Liu, Xiaowei; Zhang, Ji; Xu, Xihui; Xin, Zhihong; Hong, Qing; He, Jian; Spain, Jim C; Li, Shunpeng; Jiang, Jiandong

    2016-12-01

    The worldwide use of the phenylurea herbicide, isoproturon (IPU), has resulted in considerable concern about its environmental fate. Although many microbial metabolites of IPU are known and IPU-mineralizing bacteria have been isolated, the molecular mechanism of IPU catabolism has not been elucidated yet. In this study, complete genes that encode the conserved IPU catabolic pathway were revealed, based on comparative analysis of the genomes of three IPU-mineralizing sphingomonads and subsequent experimental validation. The complete genes included a novel hydrolase gene ddhA, which is responsible for the cleavage of the urea side chain of the IPU demethylated products; a distinct aniline dioxygenase gene cluster adoQTA1A2BR, which has a broad substrate range; and an inducible catechol meta-cleavage pathway gene cluster adoXEGKLIJC. Furthermore, the initial mono-N-demethylation genes pdmAB were further confirmed to be involved in the successive N-demethylation of the IPU mono-N-demethylated product. These IPU-catabolic genes were organized into four transcription units and distributed on three plasmids. They were flanked by multiple mobile genetic elements and highly conserved among IPU-mineralizing sphingomonads. The elucidation of the molecular mechanism of IPU catabolism will enhance our understanding of the microbial mineralization of IPU and provide insights into the evolutionary scenario of the conserved IPU-catabolic pathway. © 2016 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd.

  8. Network Discovery Pipeline Elucidates Conserved Time-of-Day–Specific cis-Regulatory Modules

    PubMed Central

    McEntee, Connor; Byer, Amanda; Trout, Jonathan D; Hazen, Samuel P; Shen, Rongkun; Priest, Henry D; Sullivan, Christopher M; Givan, Scott A; Yanovsky, Marcelo; Hong, Fangxin; Kay, Steve A; Chory, Joanne

    2008-01-01

    Correct daily phasing of transcription confers an adaptive advantage to almost all organisms, including higher plants. In this study, we describe a hypothesis-driven network discovery pipeline that identifies biologically relevant patterns in genome-scale data. To demonstrate its utility, we analyzed a comprehensive matrix of time courses interrogating the nuclear transcriptome of Arabidopsis thaliana plants grown under different thermocycles, photocycles, and circadian conditions. We show that 89% of Arabidopsis transcripts cycle in at least one condition and that most genes have peak expression at a particular time of day, which shifts depending on the environment. Thermocycles alone can drive at least half of all transcripts critical for synchronizing internal processes such as cell cycle and protein synthesis. We identified at least three distinct transcription modules controlling phase-specific expression, including a new midnight specific module, PBX/TBX/SBX. We validated the network discovery pipeline, as well as the midnight specific module, by demonstrating that the PBX element was sufficient to drive diurnal and circadian condition-dependent expression. Moreover, we show that the three transcription modules are conserved across Arabidopsis, poplar, and rice. These results confirm the complex interplay between thermocycles, photocycles, and the circadian clock on the daily transcription program, and provide a comprehensive view of the conserved genomic targets for a transcriptional network key to successful adaptation. PMID:18248097

  9. Conserved Daily Transcriptional Programs in Carica papaya

    PubMed Central

    Zdepski, Anna; Wang, Wenqin; Priest, Henry D.; Ali, Faraz; Alam, Maqsudul; Mockler, Todd C.

    2008-01-01

    Most organisms have internal circadian clocks that mediate responses to daily environmental changes in order to synchronize biological functions to the correct times of the day. Previous studies have focused on plants found in temperate and sub-tropical climates, and little is known about the circadian transcriptional networks of plants that typically grow under conditions with relatively constant day lengths and temperatures over the year. In this study we conducted a genomic and computational analysis of the circadian biology of Carica papaya, a tropical tree. We found that predicted papaya circadian clock genes cycle with the same phase as Arabidopsis genes. The patterns of time-of-day overrepresentation of circadian-associated promoter elements were nearly identical across papaya, Arabidopsis, rice, and poplar. Evolution of promoter structure predicts the observed morning- and evening-specific expression profiles of the papaya PRR5 paralogs. The strong conservation of previously identified circadian transcriptional networks in papaya, despite its tropical habitat and distinct life-style, suggest that circadian timing has played a major role in the evolution of plant genomes, consistent with the selective pressure of anticipating daily environmental changes. Further studies could exploit this conservation to elucidate general design principles that will facilitate engineering plant growth pathways for specific environments. Electronic supplementary material The online version of this article (doi:10.1007/s12042-008-9020-3) contains supplementary material, which is available to authorized users. PMID:20671772

  10. Putative bovine topological association domains and CTCF binding motifs can reduce the search space for causative regulatory variants of complex traits.

    PubMed

    Wang, Min; Hancock, Timothy P; Chamberlain, Amanda J; Vander Jagt, Christy J; Pryce, Jennie E; Cocks, Benjamin G; Goddard, Mike E; Hayes, Benjamin J

    2018-05-24

    Topological association domains (TADs) are chromosomal domains characterised by frequent internal DNA-DNA interactions. The transcription factor CTCF binds to conserved DNA sequence patterns called CTCF binding motifs to either prohibit or facilitate chromosomal interactions. TADs and CTCF binding motifs control gene expression, but they are not yet well defined in the bovine genome. In this paper, we sought to improve the annotation of bovine TADs and CTCF binding motifs, and assess whether the new annotation can reduce the search space for cis-regulatory variants. We used genomic synteny to map TADs and CTCF binding motifs from humans, mice, dogs and macaques to the bovine genome. We found that our mapped TADs exhibited the same hallmark properties of those sourced from experimental data, such as housekeeping genes, transfer RNA genes, CTCF binding motifs, short interspersed elements, H3K4me3 and H3K27ac. We showed that runs of genes with the same pattern of allele-specific expression (ASE) (either favouring paternal or maternal allele) were often located in the same TAD or between the same conserved CTCF binding motifs. Analyses of variance showed that when averaged across all bovine tissues tested, TADs explained 14% of ASE variation (standard deviation, SD: 0.056), while CTCF explained 27% (SD: 0.078). Furthermore, we showed that the quantitative trait loci (QTLs) associated with gene expression variation (eQTLs) or ASE variation (aseQTLs), which were identified from mRNA transcripts from 141 lactating cows' white blood and milk cells, were highly enriched at putative bovine CTCF binding motifs. The linearly-furthermost, and most-significant aseQTL and eQTL for each genic target were located within the same TAD as the gene more often than expected (Chi-Squared test P-value < 0.001). Our results suggest that genomic synteny can be used to functionally annotate conserved transcriptional components, and provides a tool to reduce the search space for causative regulatory variants in the bovine genome.

  11. Hepatitis B virus nuclear export elements: RNA stem-loop α and β, key parts of the HBV post-transcriptional regulatory element.

    PubMed

    Lim, Chun Shen; Brown, Chris M

    2016-09-01

    Many viruses contain RNA elements that modulate splicing and/or promote nuclear export of their RNAs. The RNAs of the major human pathogen, hepatitis B virus (HBV) contain a large (~600 bases) composite cis-acting 'post-transcriptional regulatory element' (PRE). This element promotes expression from these naturally intronless transcripts. Indeed, the related woodchuck hepadnavirus PRE (WPRE) is used to enhance expression in gene therapy and other expression vectors. These PRE are likely to act through a combination of mechanisms, including promotion of RNA nuclear export. Functional components of both the HBV PRE and WPRE are 2 conserved RNA cis-acting stem-loop (SL) structures, SLα and SLβ. They are within the coding regions of polymerase (P) gene, and both P and X genes, respectively. Based on previous studies using mutagenesis and/or nuclear magnetic resonance (NMR), here we propose 2 covariance models for SLα and SLβ. The model for the 30-nucleotide SLα contains a G-bulge and a CNGG(U) apical loop of which the first and the fourth loop residues form a CG pair and the fifth loop residue is bulged out, as observed in the NMR structure. The model for the 23-nucleotide SLβ contains a 7-base-pair stem and a 9-nucleotide loop. Comparison of the models with other RNA structural elements, as well as similarity searches of human transcriptome and viral genomes demonstrate that SLα and SLβ are specific to HBV transcripts. However, they are well conserved among the hepadnaviruses of non-human primates, the woodchuck and ground squirrel.

  12. Hepatitis B virus nuclear export elements: RNA stem-loop α and β, key parts of the HBV post-transcriptional regulatory element

    PubMed Central

    Lim, Chun Shen; Brown, Chris M.

    2016-01-01

    ABSTRACT Many viruses contain RNA elements that modulate splicing and/or promote nuclear export of their RNAs. The RNAs of the major human pathogen, hepatitis B virus (HBV) contain a large (~600 bases) composite cis-acting 'post-transcriptional regulatory element' (PRE). This element promotes expression from these naturally intronless transcripts. Indeed, the related woodchuck hepadnavirus PRE (WPRE) is used to enhance expression in gene therapy and other expression vectors. These PRE are likely to act through a combination of mechanisms, including promotion of RNA nuclear export. Functional components of both the HBV PRE and WPRE are 2 conserved RNA cis-acting stem-loop (SL) structures, SLα and SLβ. They are within the coding regions of polymerase (P) gene, and both P and X genes, respectively. Based on previous studies using mutagenesis and/or nuclear magnetic resonance (NMR), here we propose 2 covariance models for SLα and SLβ. The model for the 30-nucleotide SLα contains a G-bulge and a CNGG(U) apical loop of which the first and the fourth loop residues form a CG pair and the fifth loop residue is bulged out, as observed in the NMR structure. The model for the 23-nucleotide SLβ contains a 7-base-pair stem and a 9-nucleotide loop. Comparison of the models with other RNA structural elements, as well as similarity searches of human transcriptome and viral genomes demonstrate that SLα and SLβ are specific to HBV transcripts. However, they are well conserved among the hepadnaviruses of non-human primates, the woodchuck and ground squirrel. PMID:27031749

  13. A bacterial genome in transition - an exceptional enrichment of IS elements but lack of evidence for recent transposition in the symbiont Amoebophilus asiaticus

    PubMed Central

    2011-01-01

    Background Insertion sequence (IS) elements are important mediators of genome plasticity and are widespread among bacterial and archaeal genomes. The 1.88 Mbp genome of the obligate intracellular amoeba symbiont Amoebophilus asiaticus contains an unusually large number of transposase genes (n = 354; 23% of all genes). Results The transposase genes in the A. asiaticus genome can be assigned to 16 different IS elements termed ISCaa1 to ISCaa16, which are represented by 2 to 24 full-length copies, respectively. Despite this high IS element load, the A. asiaticus genome displays a GC skew pattern typical for most bacterial genomes, indicating that no major rearrangements have occurred recently. Additionally, the high sequence divergence of some IS elements, the high number of truncated IS element copies (n = 143), as well as the absence of direct repeats in most IS elements suggest that the IS elements of A. asiaticus are transpositionally inactive. Although we could show transcription of 13 IS elements, we did not find experimental evidence for transpositional activity, corroborating our results from sequence analyses. However, we detected contiguous transcripts between IS elements and their downstream genes at nine loci in the A. asiaticus genome, indicating that some IS elements influence the transcription of downstream genes, some of which might be important for host cell interaction. Conclusions Taken together, the IS elements in the A. asiaticus genome are currently in the process of degradation and largely represent reflections of the evolutionary past of A. asiaticus in which its genome was shaped by their activity. PMID:21943072

  14. Genomic positional conservation identifies topological anchor point RNAs linked to developmental loci.

    PubMed

    Amaral, Paulo P; Leonardi, Tommaso; Han, Namshik; Viré, Emmanuelle; Gascoigne, Dennis K; Arias-Carrasco, Raúl; Büscher, Magdalena; Pandolfini, Luca; Zhang, Anda; Pluchino, Stefano; Maracaja-Coutinho, Vinicius; Nakaya, Helder I; Hemberg, Martin; Shiekhattar, Ramin; Enright, Anton J; Kouzarides, Tony

    2018-03-15

    The mammalian genome is transcribed into large numbers of long noncoding RNAs (lncRNAs), but the definition of functional lncRNA groups has proven difficult, partly due to their low sequence conservation and lack of identified shared properties. Here we consider promoter conservation and positional conservation as indicators of functional commonality. We identify 665 conserved lncRNA promoters in mouse and human that are preserved in genomic position relative to orthologous coding genes. These positionally conserved lncRNA genes are primarily associated with developmental transcription factor loci with which they are coexpressed in a tissue-specific manner. Over half of positionally conserved RNAs in this set are linked to chromatin organization structures, overlapping binding sites for the CTCF chromatin organiser and located at chromatin loop anchor points and borders of topologically associating domains (TADs). We define these RNAs as topological anchor point RNAs (tapRNAs). Characterization of these noncoding RNAs and their associated coding genes shows that they are functionally connected: they regulate each other's expression and influence the metastatic phenotype of cancer cells in vitro in a similar fashion. Furthermore, we find that tapRNAs contain conserved sequence domains that are enriched in motifs for zinc finger domain-containing RNA-binding proteins and transcription factors, whose binding sites are found mutated in cancers. This work leverages positional conservation to identify lncRNAs with potential importance in genome organization, development and disease. The evidence that many developmental transcription factors are physically and functionally connected to lncRNAs represents an exciting stepping-stone to further our understanding of genome regulation.

  15. Conservation genomics: coming to a salmonid near you.

    PubMed

    Piccolo, J J

    2016-12-01

    Using the examples on hereditary and environmental factors affecting salmonid populations, this paper demonstrates that ecologists have long appreciated the importance of local adaptation and intraspecific diversity for salmonid conservation. Conservationists, however, need to embrace the genomics revolution and use new insights to improve salmonid management. At the same time, researchers must be forthcoming with the uses and limitations of genomics, and conservation must move forward in the face of scientific uncertainty. © 2016 The Fisheries Society of the British Isles.

  16. Delineating slowly and rapidly evolving fractions of the Drosophila genome.

    PubMed

    Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

    2008-05-01

    Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.

  17. Filling gaps in PPAR-alpha signaling through comparative nutrigenomics analysis.

    PubMed

    Cavalieri, Duccio; Calura, Enrica; Romualdi, Chiara; Marchi, Emmanuela; Radonjic, Marijana; Van Ommen, Ben; Müller, Michael

    2009-12-11

    The application of high-throughput genomic tools in nutrition research is a widespread practice. However, it is becoming increasingly clear that the outcome of individual expression studies is insufficient for the comprehensive understanding of such a complex field. Currently, the availability of the large amounts of expression data in public repositories has opened up new challenges on microarray data analyses. We have focused on PPARalpha, a ligand-activated transcription factor functioning as fatty acid sensor controlling the gene expression regulation of a large set of genes in various metabolic organs such as liver, small intestine or heart. The function of PPARalpha is strictly connected to the function of its target genes and, although many of these have already been identified, major elements of its physiological function remain to be uncovered. To further investigate the function of PPARalpha, we have applied a cross-species meta-analysis approach to integrate sixteen microarray datasets studying high fat diet and PPARalpha signal perturbations in different organisms. We identified 164 genes (MDEGs) that were differentially expressed in a constant way in response to a high fat diet or to perturbations in PPARs signalling. In particular, we found five genes in yeast which were highly conserved and homologous of PPARalpha targets in mammals, potential candidates to be used as models for the equivalent mammalian genes. Moreover, a screening of the MDEGs for all known transcription factor binding sites and the comparison with a human genome-wide screening of Peroxisome Proliferating Response Elements (PPRE), enabled us to identify, 20 new potential candidate genes that show, both binding site, both change in expression in the condition studied. Lastly, we found a non random localization of the differentially expressed genes in the genome. The results presented are potentially of great interest to resume the currently available expression data, exploiting the power of in silico analysis filtered by evolutionary conservation. The analysis enabled us to indicate potential gene candidates that could fill in the gaps with regards to the signalling of PPARalpha and, moreover, the non-random localization of the differentially expressed genes in the genome, suggest that epigenetic mechanisms are of importance in the regulation of the transcription operated by PPARalpha.

  18. Comparative Analysis of WRKY Genes Potentially Involved in Salt Stress Responses in Triticum turgidum L. ssp. durum.

    PubMed

    Yousfi, Fatma-Ezzahra; Makhloufi, Emna; Marande, William; Ghorbel, Abdel W; Bouzayen, Mondher; Bergès, Hélène

    2016-01-01

    WRKY transcription factors are involved in multiple aspects of plant growth, development and responses to biotic stresses. Although they have been found to play roles in regulating plant responses to environmental stresses, these roles still need to be explored, especially those pertaining to crops. Durum wheat is the second most widely produced cereal in the world. Complex, large and unsequenced genomes, in addition to a lack of genomic resources, hinder the molecular characterization of tolerance mechanisms. This paper describes the isolation and characterization of five TdWRKY genes from durum wheat ( Triticum turgidum L . ssp. durum ). A PCR-based screening of a T. turgidum BAC genomic library using primers within the conserved region of WRKY genes resulted in the isolation of five BAC clones. Following sequencing fully the five BACs, fine annotation through Triannot pipeline revealed 74.6% of the entire sequences as transposable elements and a 3.2% gene content with genes organized as islands within oceans of TEs. Each BAC clone harbored a TdWRKY gene. The study showed a very extensive conservation of genomic structure between TdWRKYs and their orthologs from Brachypodium, barley, and T. aestivum . The structural features of TdWRKY proteins suggested that they are novel members of the WRKY family in durum wheat. TdWRKY1/2/4, TdWRKY3, and TdWRKY5 belong to the group Ia, IIa, and IIc, respectively. Enrichment of cis -regulatory elements related to stress responses in the promoters of some TdWRKY genes indicated their potential roles in mediating plant responses to a wide variety of environmental stresses. TdWRKY genes displayed different expression patterns in response to salt stress that distinguishes two durum wheat genotypes with contrasting salt stress tolerance phenotypes. TdWRKY genes tended to react earlier with a down-regulation in sensitive genotype leaves and with an up-regulation in tolerant genotype leaves. The TdWRKY transcripts levels in roots increased in tolerant genotype compared to sensitive genotype. The present results indicate that these genes might play some functional role in the salt tolerance in durum wheat.

  19. Genome-wide identification and characterization of Notch transcription complex-binding sequence paired sites in leukemia cells

    PubMed Central

    Severson, Eric; Arnett, Kelly L.; Wang, Hongfang; Zang, Chongzhi; Taing, Len; Liu, Hudan; Pear, Warren S.; Liu, X. Shirley; Blacklow, Stephen C.; Aster, Jon C.

    2018-01-01

    Notch transcription complexes (NTCs) drive target gene expression by binding to two distinct types of genomic response elements, NTC monomer-binding sites and sequence-paired sites (SPSs) that bind NTC dimers. SPSs are conserved and are linked to the Notch-responsiveness of a few genes, but their overall contribution to Notch-dependent gene regulation is unknown. To address this issue, we determined the DNA sequence requirements for NTC dimerization using a fluorescence resonance energy transfer (FRET) assay, and applied insights from these in vitro studies to Notch-“addicted” leukemia cells. We find that SPSs contribute to the regulation of approximately a third of direct Notch target genes. While originally described in promoters, SPSs are present mainly in long-range enhancers, including an enhancer containing a newly described SPS that regulates HES5. Our work provides a general method for identifying sequence-paired sites in genome-wide data sets and highlights the widespread role of NTC dimerization in Notch-transformed leukemia cells. PMID:28465412

  20. Multi-Omics Driven Assembly and Annotation of the Sandalwood (Santalum album) Genome.

    PubMed

    Mahesh, Hirehally Basavarajegowda; Subba, Pratigya; Advani, Jayshree; Shirke, Meghana Deepak; Loganathan, Ramya Malarini; Chandana, Shankara Lingu; Shilpa, Siddappa; Chatterjee, Oishi; Pinto, Sneha Maria; Prasad, Thottethodi Subrahmanya Keshava; Gowda, Malali

    2018-04-01

    Indian sandalwood ( Santalum album ) is an important tropical evergreen tree known for its fragrant heartwood-derived essential oil and its valuable carving wood. Here, we applied an integrated genomic, transcriptomic, and proteomic approach to assemble and annotate the Indian sandalwood genome. Our genome sequencing resulted in the establishment of a draft map of the smallest genome for any woody tree species to date (221 Mb). The genome annotation predicted 38,119 protein-coding genes and 27.42% repetitive DNA elements. In-depth proteome analysis revealed the identities of 72,325 unique peptides, which confirmed 10,076 of the predicted genes. The addition of transcriptomic and proteogenomic approaches resulted in the identification of 53 novel proteins and 34 gene-correction events that were missed by genomic approaches. Proteogenomic analysis also helped in reassigning 1,348 potential noncoding RNAs as bona fide protein-coding messenger RNAs. Gene expression patterns at the RNA and protein levels indicated that peptide sequencing was useful in capturing proteins encoded by nuclear and organellar genomes alike. Mass spectrometry-based proteomic evidence provided an unbiased approach toward the identification of proteins encoded by organellar genomes. Such proteins are often missed in transcriptome data sets due to the enrichment of only messenger RNAs that contain poly(A) tails. Overall, the use of integrated omic approaches enhanced the quality of the assembly and annotation of this nonmodel plant genome. The availability of genomic, transcriptomic, and proteomic data will enhance genomics-assisted breeding, germplasm characterization, and conservation of sandalwood trees. © 2018 American Society of Plant Biologists. All Rights Reserved.

  1. LINE-1 Elements in Structural Variation and Disease

    PubMed Central

    Beck, Christine R.; Garcia-Perez, José Luis; Badge, Richard M.; Moran, John V.

    2014-01-01

    The completion of the human genome reference sequence ushered in a new era for the study and discovery of human transposable elements. It now is undeniable that transposable elements, historically dismissed as junk DNA, have had an instrumental role in sculpting the structure and function of our genomes. In particular, long interspersed element-1 (LINE-1 or L1) and short interspersed elements (SINEs) continue to affect our genome, and their movement can lead to sporadic cases of disease. Here, we briefly review the types of transposable elements present in the human genome and their mechanisms of mobility. We next highlight how advances in DNA sequencing and genomic technologies have enabled the discovery of novel retrotransposons in individual genomes. Finally, we discuss how L1-mediated retrotransposition events impact human genomes. PMID:21801021

  2. Isolation and characterization of an Arabidopsis biotin carboxylase gene and its promoter.

    PubMed

    Bao, X; Shorrosh, B S; Ohlrogge, J B

    1997-11-01

    In the plastids of most plants, acetyl-CoA carboxylase (ACCase; EC 6.4.1.2) is a multisubunit complex consisting of biotin carboxylase (BC), biotin-carboxyl carrier protien (BCCP), and carboxytransferase (alpha-CT, beta-CT) subunits. To better understand the regulation of this enzyme, we have isolated and sequenced a BC genomic clone from Arabidopsis and partially characterized its promoter. Fifteen introns were identified. The deduced amino acid sequence of the mature BC protein is highly conserved between Arabidopsis and tobacco (92.6% identity). BC expression was evaluated using northern blots and BC/GUS fusion constructs in transgenic Arabidopsis. GUS activity in the BC/GUS transgenics as well as transcript level of the native gene were both found to be higher in silique and flower than in root and leaf. Analysis of tobacco suspension cells transformed with truncated BC promoter/GUS gene fusions indicated the region from -140 to +147 contained necessary promoter elements which supported basal gene expression. A positive regulatory region was found to be located between -2100 and -140, whereas a negative element was possibly located in the first intron. In addition, several conserved regulatory elements were identified in the BC promoter. Surprisingly, although BC is a low-abundance protein, the expression of BC/GUS fusion constructs was similar to 35S/GUS constructs.

  3. Heterochromatin and molecular characterization of DsmarMITE transposable element in the beetle Dichotomius schiffleri (Coleoptera: Scarabaeidae).

    PubMed

    Xavier, Crislaine; Cabral-de-Mello, Diogo Cavalcanti; de Moura, Rita Cássia

    2014-12-01

    Cytogenetic studies of the Neotropical beetle genus Dichotomius (Scarabaeinae, Coleoptera) have shown dynamism for centromeric constitutive heterochromatin sequences. In the present work we studied the chromosomes and isolated repetitive sequences of Dichotomius schiffleri aiming to contribute to the understanding of coleopteran genome/chromosomal organization. Dichotomius schiffleri presented a conserved karyotype and heterochromatin distribution in comparison to other species of the genus with 2n = 18, biarmed chromosomes, and pericentromeric C-positive blocks. Similarly to heterochromatin distributional patterns, the highly and moderately repetitive DNA fraction (C 0 t-1 DNA) was detected in pericentromeric areas, contrasting with the euchromatic mapping of an isolated TE (named DsmarMITE). After structural analyses, the DsmarMITE was classified as a non-autonomous element of the type miniature inverted-repeat transposable element (MITE) with terminal inverted repeats similar to Mariner elements of insects from different orders. The euchromatic distribution for DsmarMITE indicates that it does not play a part in the dynamics of constitutive heterochromatin sequences.

  4. Community standards for genomic resources, genetic conservation, and data integration

    Treesearch

    Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson

    2017-01-01

    Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...

  5. Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes1

    PubMed Central

    Rombauts, Stephane; Florquin, Kobe; Lescot, Magali; Marchal, Kathleen; Rouzé, Pierre; Van de Peer, Yves

    2003-01-01

    The identification of promoters and their regulatory elements is one of the major challenges in bioinformatics and integrates comparative, structural, and functional genomics. Many different approaches have been developed to detect conserved motifs in a set of genes that are either coregulated or orthologous. However, although recent approaches seem promising, in general, unambiguous identification of regulatory elements is not straightforward. The delineation of promoters is even harder, due to its complex nature, and in silico promoter prediction is still in its infancy. Here, we review the different approaches that have been developed for identifying promoters and their regulatory elements. We discuss the detection of cis-acting regulatory elements using word-counting or probabilistic methods (so-called “search by signal” methods) and the delineation of promoters by considering both sequence content and structural features (“search by content” methods). As an example of search by content, we explored in greater detail the association of promoters with CpG islands. However, due to differences in sequence content, the parameters used to detect CpG islands in humans and other vertebrates cannot be used for plants. Therefore, a preliminary attempt was made to define parameters that could possibly define CpG and CpNpG islands in Arabidopsis, by exploring the compositional landscape around the transcriptional start site. To this end, a data set of more than 5,000 gene sequences was built, including the promoter region, the 5′-untranslated region, and the first introns and coding exons. Preliminary analysis shows that promoter location based on the detection of potential CpG/CpNpG islands in the Arabidopsis genome is not straightforward. Nevertheless, because the landscape of CpG/CpNpG islands differs considerably between promoters and introns on the one side and exons (whether coding or not) on the other, more sophisticated approaches can probably be developed for the successful detection of “putative” CpG and CpNpG islands in plants. PMID:12857799

  6. Mitochondrial genome sequences illuminate maternal lineages of conservation concern in a rare carnivore

    Treesearch

    Brian J. Knaus; Richard Cronn; Aaron Liston; Kristine Pilgrim; Michael K. Schwartz

    2011-01-01

    Science-based wildlife management relies on genetic information to infer population connectivity and identify conservation units. The most commonly used genetic marker for characterizing animal biodiversity and identifying maternal lineages is the mitochondrial genome. Mitochondrial genotyping figures prominently in conservation and management plans, with much of the...

  7. Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database.

    PubMed

    Glenn, Travis C; Lance, Stacey L; McKee, Anna M; Webster, Bonnie L; Emery, Aidan M; Zerlotini, Adhemar; Oliveira, Guilherme; Rollinson, David; Faircloth, Brant C

    2013-10-17

    Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences conserved among species), as well as other markers that are specific to species or species-groups (i.e., using DNA sequences that differ among species). Full genome-sequencing of additional species and specimens of S. haematobium, S. japonicum, and S. mansoni is desirable to better characterize differences within and among these species, to develop additional genetic markers, and to examine genes as well as conserved non-coding elements associated with drug resistance.

  8. Evolutionary modes of emergence of short interspersed nuclear element (SINE) families in grasses.

    PubMed

    Kögler, Anja; Schmidt, Thomas; Wenke, Torsten

    2017-11-01

    Short interspersed nuclear elements (SINEs) are non-autonomous transposable elements which are propagated by retrotransposition and constitute an inherent part of the genome of most eukaryotic species. Knowledge of heterogeneous and highly abundant SINEs is crucial for de novo (or improvement of) annotation of whole genome sequences. We scanned Poaceae genome sequences of six important cereals (Oryza sativa, Triticum aestivum, Hordeum vulgare, Panicum virgatum, Sorghum bicolor, Zea mays) and Brachypodium distachyon to examine the diversity and evolution of SINE populations. We comparatively analyzed the structural features, distribution, evolutionary relation and abundance of 32 SINE families and subfamilies within grasses, comprising 11 052 individual copies. The investigation of activity profiles within the Poaceae provides insights into their species-specific diversification and amplification. We found that Poaceae SINEs (PoaS) fall into two length categories: simple SINEs of up to 180 bp and dimeric SINEs larger than 240 bp. Detailed analysis at the nucleotide level revealed that multimerization of related and unrelated SINE copies is an important evolutionary mechanism of SINE formation. We conclude that PoaS families diversify by massive reshuffling between SINE families, likely caused by insertion of truncated copies, and provide a model for this evolutionary scenario. Twenty-eight of 32 PoaS families and subfamilies show significant conservation, in particular either in the 5' or 3' regions, across Poaceae species and share large sequence stretches with one or more other PoaS families. © 2017 The Authors The Plant Journal © 2017 John Wiley & Sons Ltd.

  9. An RNA Element That Facilitates Programmed Ribosomal Readthrough in Turnip Crinkle Virus Adopts Multiple Conformations

    PubMed Central

    Kuhlmann, Micki M.; Chattopadhyay, Maitreyi; Stupina, Vera A.; Gao, Feng

    2016-01-01

    ABSTRACT Ribosome recoding is used by RNA viruses for translational readthrough or frameshifting past termination codons for the synthesis of extension products. Recoding sites, along with downstream recoding stimulatory elements (RSEs), have long been studied in reporter constructs, because these fragments alone mediate customary levels of recoding and are thus assumed to contain complete instructions for establishment of the proper ratio of termination to recoding. RSEs from the Tombusviridae and Luteoviridae are thought to be exceptions, since they contain a long-distance RNA-RNA connection with the 3′ end. This interaction has been suggested to substitute for pseudoknots, thought to be missing in tombusvirid RSEs. We provide evidence that the phylogenetically conserved RSE of the carmovirus Turnip crinkle virus (TCV) adopts an alternative, smaller structure that extends an upstream conserved hairpin and that this alternative structure is the predominant form of the RSE within nascent viral RNA in plant cells and when RNA is synthesized in vitro. The TCV RSE also contains an internal pseudoknot along with the long-distance interaction, and the pseudoknot is not compatible with the phylogenetically conserved structure. Conserved residues just past the recoding site are important for recoding, and these residues are also conserved in the RSEs of gammaretroviruses. Our data demonstrate the dynamic nature of the TCV RSE and suggest that studies using reporter constructs may not be effectively recapitulating RSE-mediated recoding within viral genomes. IMPORTANCE Ribosome recoding is used by RNA viruses to enable ribosomes to extend translation past termination codons for the synthesis of longer products. Recoding sites and a downstream recoding stimulatory element (RSE) mediate expected levels of recoding when excised and placed in reporter constructs and thus are assumed to contain complete instructions for the establishment of the proper ratio of termination to recoding. We provide evidence that most of the TCV RSE adopts an alternative structure that extends an upstream conserved hairpin and that this alternative structure, and not the phylogenetically conserved structure, is the predominant form of the RSE in RNA synthesized in vitro and in plant cells. The TCV RSE also contains an internal pseudoknot that is not compatible with the phylogenetically conserved structure and an RNA bridge to the 3′ end. These data suggest that the TCV RSE is structurally dynamic and that multiple conformations are likely required to regulate ribosomal readthrough. PMID:27440887

  10. Whole Genome Analysis of Leptospira licerasiae Provides Insight into Leptospiral Evolution and Pathogenicity

    PubMed Central

    Selengut, Jeremy D.; Harkins, Derek M.; Patra, Kailash P.; Moreno, Angelo; Lehmann, Jason S.; Purushe, Janaki; Sanka, Ravi; Torres, Michael; Webster, Nicholas J.; Vinetz, Joseph M.; Matthias, Michael A.

    2012-01-01

    The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835) provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae) that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT). Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for its infectiousness and its unique antigenic characteristics. PMID:23145189

  11. Epigenetic conservation at gene regulatory elements revealed by non-methylated DNA profiling in seven vertebrates

    PubMed Central

    Long, Hannah K; Sims, David; Heger, Andreas; Blackledge, Neil P; Kutter, Claudia; Wright, Megan L; Grützner, Frank; Odom, Duncan T; Patient, Roger; Ponting, Chris P; Klose, Robert J

    2013-01-01

    Two-thirds of gene promoters in mammals are associated with regions of non-methylated DNA, called CpG islands (CGIs), which counteract the repressive effects of DNA methylation on chromatin. In cold-blooded vertebrates, computational CGI predictions often reside away from gene promoters, suggesting a major divergence in gene promoter architecture across vertebrates. By experimentally identifying non-methylated DNA in the genomes of seven diverse vertebrates, we instead reveal that non-methylated islands (NMIs) of DNA are a central feature of vertebrate gene promoters. Furthermore, NMIs are present at orthologous genes across vast evolutionary distances, revealing a surprising level of conservation in this epigenetic feature. By profiling NMIs in different tissues and developmental stages we uncover a unifying set of features that are central to the function of NMIs in vertebrates. Together these findings demonstrate an ancient logic for NMI usage at gene promoters and reveal an unprecedented level of epigenetic conservation across vertebrate evolution. DOI: http://dx.doi.org/10.7554/eLife.00348.001 PMID:23467541

  12. Mobilization of retrotransposons as a cause of chromosomal diversification and rapid speciation: the case for the Antarctic teleost genus Trematomus.

    PubMed

    Auvinet, J; Graça, P; Belkadi, L; Petit, L; Bonnivard, E; Dettaï, A; Detrich, W H; Ozouf-Costaz, C; Higuet, D

    2018-05-09

    The importance of transposable elements (TEs) in the genomic remodeling and chromosomal rearrangements that accompany lineage diversification in vertebrates remains the subject of debate. The major impediment to understanding the roles of TEs in genome evolution is the lack of comparative and integrative analyses on complete taxonomic groups. To help overcome this problem, we have focused on the Antarctic teleost genus Trematomus (Notothenioidei: Nototheniidae), as they experienced rapid speciation accompanied by dramatic chromosomal diversity. Here we apply a multi-strategy approach to determine the role of large-scale TE mobilization in chromosomal diversification within Trematomus species. Despite the extensive chromosomal rearrangements observed in Trematomus species, our measurements revealed strong interspecific genome size conservation. After identifying the DIRS1, Gypsy and Copia retrotransposon superfamilies in genomes of 13 nototheniid species, we evaluated their diversity, abundance (copy numbers) and chromosomal distribution. Four families of DIRS1, nine of Gypsy, and two of Copia were highly conserved in these genomes; DIRS1 being the most represented within Trematomus genomes. Fluorescence in situ hybridization mapping showed preferential accumulation of DIRS1 in centromeric and pericentromeric regions, both in Trematomus and other nototheniid species, but not in outgroups: species of the Sub-Antarctic notothenioid families Bovichtidae and Eleginopsidae, and the non-notothenioid family Percidae. In contrast to the outgroups, High-Antarctic notothenioid species, including the genus Trematomus, were subjected to strong environmental stresses involving repeated bouts of warming above the freezing point of seawater and cooling to sub-zero temperatures on the Antarctic continental shelf during the past 40 millions of years (My). As a consequence of these repetitive environmental changes, including thermal shocks; a breakdown of epigenetic regulation that normally represses TE activity may have led to sequential waves of TE activation within their genomes. The predominance of DIRS1 in Trematomus species, their transposition mechanism, and their strategic location in "hot spots" of insertion on chromosomes are likely to have facilitated nonhomologous recombination, thereby increasing genomic rearrangements. The resulting centric and tandem fusions and fissions would favor the rapid lineage diversification, characteristic of the nototheniid adaptive radiation.

  13. Greig syndrome: Analysis of the GL13 gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Grzeschik, K.H.; Gessler, M.; Heid, C.

    1994-09-01

    Disruption of the zinc finger gene GL13 by translocation events has been implicated as the cause for cephalopolysyndactyly syndrome (GCPS) in several patients. To characterize this genomic region on human chromosome 7p13, we have isolated a YAC contig of more than 1000 kb including the GL13 gene. About 550 kb from this area were subdivided into a cosmid contig with a two- to ten-fold clone coverage. In this region the cloned GL13 cDNA appears to correspond to at least 14 exons spread over a distance of 280 kb. A CpG island defined by two NotI sites and several BssHII andmore » KspI sites is located in a genomic fragment covering the most proximal exon of the cloned GL13 cDNA. Further upstream, five segments conserved between man and mouse were found. In the mouse this region has been characterized as the transgene integration site resulting in the add phenotype. Both the CpG islands and the conserved regions are likely candidates to search for GL13 promoter and control elements. Intron-exon boundaries and breakpoints of the translocation events within the gene region of patients were identified and characterized.« less

  14. Conservation of mRNA secondary structures may filter out mutations in Escherichia coli evolution

    PubMed Central

    Chursov, Andrey; Frishman, Dmitrij; Shneider, Alexander

    2013-01-01

    Recent reports indicate that mutations in viral genomes tend to preserve RNA secondary structure, and those mutations that disrupt secondary structural elements may reduce gene expression levels, thereby serving as a functional knockout. In this article, we explore the conservation of secondary structures of mRNA coding regions, a previously unknown factor in bacterial evolution, by comparing the structural consequences of mutations in essential and nonessential Escherichia coli genes accumulated over 40 000 generations in the course of the ‘long-term evolution experiment’. We monitored the extent to which mutations influence minimum free energy (MFE) values, assuming that a substantial change in MFE is indicative of structural perturbation. Our principal finding is that purifying selection tends to eliminate those mutations in essential genes that lead to greater changes of MFE values and, therefore, may be more disruptive for the corresponding mRNA secondary structures. This effect implies that synonymous mutations disrupting mRNA secondary structures may directly affect the fitness of the organism. These results demonstrate that the need to maintain intact mRNA structures imposes additional evolutionary constraints on bacterial genomes, which go beyond preservation of structure and function of the encoded proteins. PMID:23783573

  15. Genome-Wide Microsatellite Characterization and Marker Development in the Sequenced Brassica Crop Species

    PubMed Central

    Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2014-01-01

    Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species. PMID:24130371

  16. Genome-wide microsatellite characterization and marker development in the sequenced Brassica crop species.

    PubMed

    Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong

    2014-02-01

    Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species.

  17. Genome organization of epidemic Acinetobacter baumannii strains.

    PubMed

    Di Nocera, Pier Paolo; Rocco, Francesco; Giannouli, Maria; Triassi, Maria; Zarrilli, Raffaele

    2011-10-10

    Acinetobacter baumannii is an opportunistic pathogen responsible for hospital-acquired infections. A. baumannii epidemics described world-wide were caused by few genotypic clusters of strains. The occurrence of epidemics caused by multi-drug resistant strains assigned to novel genotypes have been reported over the last few years. In the present study, we compared whole genome sequences of three A. baumannii strains assigned to genotypes ST2, ST25 and ST78, representative of the most frequent genotypes responsible for epidemics in several Mediterranean hospitals, and four complete genome sequences of A. baumannii strains assigned to genotypes ST1, ST2 and ST77. Comparative genome analysis showed extensive synteny and identified 3068 coding regions which are conserved, at the same chromosomal position, in all A. baumannii genomes. Genome alignments also identified 63 DNA regions, ranging in size from 4 o 126 kb, all defined as genomic islands, which were present in some genomes, but were either missing or replaced by non-homologous DNA sequences in others. Some islands are involved in resistance to drugs and metals, others carry genes encoding surface proteins or enzymes involved in specific metabolic pathways, and others correspond to prophage-like elements. Accessory DNA regions encode 12 to 19% of the potential gene products of the analyzed strains. The analysis of a collection of epidemic A. baumannii strains showed that some islands were restricted to specific genotypes. The definition of the genome components of A. baumannii provides a scaffold to rapidly evaluate the genomic organization of novel clinical A. baumannii isolates. Changes in island profiling will be useful in genomic epidemiology of A. baumannii population.

  18. Genome-Wide Identification and Expression Analysis of the Tubby-Like Protein Family in the Malus domestica Genome

    PubMed Central

    Xu, Jia-Ning; Xing, Shan-Shan; Zhang, Zheng-Rong; Chen, Xue-Sen; Wang, Xiao-Yun

    2016-01-01

    Tubby-like proteins (TLPs), which have a highly conserved β barrel tubby domain, have been found to be associated with some animal-specific characteristics. In the plant kingdom, more than 10 TLP family members were identified in Arabidopsis, rice and maize, and they were found to be involved in responses to stress. The publication of the apple genome makes it feasible to systematically study the TLP family in apple. In this investigation, nine TLP encoding genes (TLPs for short) were identified. When combined with the TLPs from other plant species, the TLPs were divided into three groups (group A, B, and C). Most plant TLP members in group A contained an additional F-box domain at the N-terminus. However, no common domain was identified other than tubby domain either in group B or in group C. An analysis of the tubby domains of MdTLPs identified three types of conserved motifs. Motif 1 and 2, the signature motifs in the confirmed TLPs, were always present in MdTLPs, while motif 3 was absent from group B. Homology modeling indicated that the tubby domain of most MdTLPs had a closed β barrel, as in animal tubby domains. Expression profiling revealed that the MdTLP genes were expressed in multiple organs and were abundant in roots, stems, and leaves but low in flowers. An analysis of cis-acting elements showed that elements related to the stress response were prevalent in the promoter sequences of MdTLPs. Expression profiling by qRT-PCR indicated that almost all MdTLPs were up-regulated at some extent under abiotic stress, exogenous ABA and H2O2 treatments in leaves and roots, though different MdTLP members exhibited differently in leaves and roots. The results and information above may provide a basis for further investigation of TLP function in plants. PMID:27895653

  19. Stenotrophomonas maltophilia responds to exogenous AHL signals through the LuxR solo SmoR (Smlt1839).

    PubMed

    Martínez, Paula; Huedo, Pol; Martinez-Servat, Sònia; Planell, Raquel; Ferrer-Navarro, Mario; Daura, Xavier; Yero, Daniel; Gibert, Isidre

    2015-01-01

    Quorum Sensing (QS) mediated by Acyl Homoserine Lactone (AHL) molecules are probably the most widespread and studied among Gram-negative bacteria. Canonical AHL systems are composed by a synthase (LuxI family) and a regulator element (LuxR family), whose genes are usually adjacent in the genome. However, incomplete AHL-QS machinery lacking the synthase LuxI is frequently observed in Proteobacteria, and the regulator element is then referred as LuxR solo. It has been shown that certain LuxR solos participate in interspecific communication by detecting signals produced by different organisms. In the case of Stenotrophomonas maltophilia, a preliminary genome sequence analysis revealed numerous putative luxR genes, none of them associated to a luxI gene. From these, the hypothetical LuxR solo Smlt1839, here designated SmoR, presents a conserved AHL binding domain and a helix-turn-helix DNA binding motif. Its genomic organization-adjacent to hchA gene-indicate that SmoR belongs to the new family "LuxR regulator chaperone HchA-associated." AHL-binding assays revealed that SmoR binds to AHLs in-vitro, at least to oxo-C8-homoserine lactone, and it regulates operon transcription, likely by recognizing a conserved palindromic regulatory box in the hchA upstream region. Supplementation with concentrated supernatants from Pseudomonas aeruginosa, which contain significant amounts of AHLs, promoted swarming motility in S. maltophilia. Contrarily, no swarming stimulation was observed when the P. aeruginosa supernatant was treated with the lactonase AiiA from Bacillus subtilis, confirming that AHL contributes to enhance the swarming ability of S. maltophilia. Finally, mutation of smoR resulted in a swarming alteration and an apparent insensitivity to the exogenous AHLs provided by P. aeruginosa. In conclusion, our results demonstrate that S. maltophilia senses AHLs produced by neighboring bacteria through the LuxR solo SmoR, regulating population behaviors such as swarming motility.

  20. Highly conserved proximal promoter element harbouring paired Sox9-binding sites contributes to the tissue- and developmental stage-specific activity of the matrilin-1 gene.

    PubMed

    Rentsendorj, Otgonchimeg; Nagy, Andrea; Sinkó, Ildikó; Daraba, Andreea; Barta, Endre; Kiss, Ibolya

    2005-08-01

    The matrilin-1 gene has the unique feature that it is expressed in chondrocytes in a developmental stage-specific manner. Previously, we found that the chicken matrilin-1 long promoter with or without the intronic enhancer and the short promoter with the intronic enhancer restricted the transgene expression to the columnar proliferative chondroblasts and prehypertrophic chondrocytes of growth-plate cartilage in transgenic mice. To study whether the short promoter shared by these transgenes harbours cartilage-specific control elements, we generated transgenic mice expressing the LacZ reporter gene under the control of the matrilin-1 promoter between -338 and +67. Histological analysis of the founder embryos demonstrated relatively weak transgene activity in the developing chondrocranium, axial and appendicular skeleton with highest level of expression in the columnar proliferating chondroblasts and prehypertrophic chondrocytes. Computer analysis of the matrilin-1 genes of amniotes revealed a highly conserved Pe1 (proximal promoter element 1) and two less-conserved sequence blocks in the distal promoter region. The inverted Sox motifs of the Pe1 element interacted with chondrogenic transcription factors Sox9, L-Sox5 and Sox6 in vitro and another factor bound to the spacer region. Point mutations in the Sox motifs or in the spacer region interfered with or altered the formation of nucleoprotein complexes in vitro and significantly decreased the reporter gene activity in transient expression assays in chondrocytes. In vivo occupancy of the Sox motifs in genomic footprinting in the expressing cell type, but not in fibroblasts, also supported the involvement of Pe1 in the tissue-specific regulation of the gene. Our results indicate that interaction of Pe1 with distal DNA elements is required for the high level, cartilage- and developmental stage-specific transgene expression.

  1. Comparative and genetic analysis of the four sequenced Paenibacillus polymyxa genomes reveals a diverse metabolism and conservation of genes relevant to plant-growth promotion and competitiveness.

    PubMed

    Eastman, Alexander W; Heinrichs, David E; Yuan, Ze-Chun

    2014-10-03

    Members of the genus Paenibacillus are important plant growth-promoting rhizobacteria that can serve as bio-reactors. Paenibacillus polymyxa promotes the growth of a variety of economically important crops. Our lab recently completed the genome sequence of Paenibacillus polymyxa CR1. As of January 2014, four P. polymyxa genomes have been completely sequenced but no comparative genomic analyses have been reported. Here we report the comparative and genetic analyses of four sequenced P. polymyxa genomes, which revealed a significantly conserved core genome. Complex metabolic pathways and regulatory networks were highly conserved and allow P. polymyxa to rapidly respond to dynamic environmental cues. Genes responsible for phytohormone synthesis, phosphate solubilization, iron acquisition, transcriptional regulation, σ-factors, stress responses, transporters and biomass degradation were well conserved, indicating an intimate association with plant hosts and the rhizosphere niche. In addition, genes responsible for antimicrobial resistance and non-ribosomal peptide/polyketide synthesis are present in both the core and accessory genome of each strain. Comparative analyses also reveal variations in the accessory genome, including large plasmids present in strains M1 and SC2. Furthermore, a considerable number of strain-specific genes and genomic islands are irregularly distributed throughout each genome. Although a variety of plant-growth promoting traits are encoded by all strains, only P. polymyxa CR1 encodes the unique nitrogen fixation cluster found in other Paenibacillus sp. Our study revealed that genomic loci relevant to host interaction and ecological fitness are highly conserved within the P. polymyxa genomes analysed, despite variations in the accessory genome. This work suggets that plant-growth promotion by P. polymyxa is mediated largely through phytohormone production, increased nutrient availability and bio-control mechanisms. This study provides an in-depth understanding of the genome architecture of this species, thus facilitating future genetic engineering and applications in agriculture, industry and medicine. Furthermore, this study highlights the current gap in our understanding of complex plant biomass metabolism in Gram-positive bacteria.

  2. Deletions involving long-range conserved nongenic sequences upstream and downstream of FOXL2 as a novel disease-causing mechanism in blepharophimosis syndrome.

    PubMed

    Beysen, D; Raes, J; Leroy, B P; Lucassen, A; Yates, J R W; Clayton-Smith, J; Ilyina, H; Brooks, S Sklower; Christin-Maitre, S; Fellous, M; Fryns, J P; Kim, J R; Lapunzina, P; Lemyre, E; Meire, F; Messiaen, L M; Oley, C; Splitt, M; Thomson, J; Van de Peer, Y; Veitia, R A; De Paepe, A; De Baere, E

    2005-08-01

    The expression of a gene requires not only a normal coding sequence but also intact regulatory regions, which can be located at large distances from the target genes, as demonstrated for an increasing number of developmental genes. In previous mutation studies of the role of FOXL2 in blepharophimosis syndrome (BPES), we identified intragenic mutations in 70% of our patients. Three translocation breakpoints upstream of FOXL2 in patients with BPES suggested a position effect. Here, we identified novel microdeletions outside of FOXL2 in cases of sporadic and familial BPES. Specifically, four rearrangements, with an overlap of 126 kb, are located 230 kb upstream of FOXL2, telomeric to the reported translocation breakpoints. Moreover, the shortest region of deletion overlap (SRO) contains several conserved nongenic sequences (CNGs) harboring putative transcription-factor binding sites and representing potential long-range cis-regulatory elements. Interestingly, the human region orthologous to the 12-kb sequence deleted in the polled intersex syndrome in goat, which is an animal model for BPES, is contained in this SRO, providing evidence of human-goat conservation of FOXL2 expression and of the mutational mechanism. Surprisingly, in a fifth family with BPES, one rearrangement was found downstream of FOXL2. In addition, we report nine novel rearrangements encompassing FOXL2 that range from partial gene deletions to submicroscopic deletions. Overall, genomic rearrangements encompassing or outside of FOXL2 account for 16% of all molecular defects found in our families with BPES. In summary, this is the first report of extragenic deletions in BPES, providing further evidence of potential long-range cis-regulatory elements regulating FOXL2 expression. It contributes to the enlarging group of developmental diseases caused by defective distant regulation of gene expression. Finally, we demonstrate that CNGs are candidate regions for genomic rearrangements in developmental genes.

  3. Deletions Involving Long-Range Conserved Nongenic Sequences Upstream and Downstream of FOXL2 as a Novel Disease-Causing Mechanism in Blepharophimosis Syndrome

    PubMed Central

    Beysen, D.; Raes, J.; Leroy, B. P.; Lucassen, A.; Yates, J. R. W.; Clayton-Smith, J.; Ilyina, H.; Brooks, S. Sklower; Christin-Maitre, S.; Fellous, M.; Fryns, J. P.; Kim, J. R.; Lapunzina, P.; Lemyre, E.; Meire, F.; Messiaen, L. M.; Oley, C.; Splitt, M.; Thomson, J.; Peer, Y. Van de; Veitia, R. A.; De Paepe, A.; De Baere, E.

    2005-01-01

    The expression of a gene requires not only a normal coding sequence but also intact regulatory regions, which can be located at large distances from the target genes, as demonstrated for an increasing number of developmental genes. In previous mutation studies of the role of FOXL2 in blepharophimosis syndrome (BPES), we identified intragenic mutations in 70% of our patients. Three translocation breakpoints upstream of FOXL2 in patients with BPES suggested a position effect. Here, we identified novel microdeletions outside of FOXL2 in cases of sporadic and familial BPES. Specifically, four rearrangements, with an overlap of 126 kb, are located 230 kb upstream of FOXL2, telomeric to the reported translocation breakpoints. Moreover, the shortest region of deletion overlap (SRO) contains several conserved nongenic sequences (CNGs) harboring putative transcription-factor binding sites and representing potential long-range cis-regulatory elements. Interestingly, the human region orthologous to the 12-kb sequence deleted in the polled intersex syndrome in goat, which is an animal model for BPES, is contained in this SRO, providing evidence of human-goat conservation of FOXL2 expression and of the mutational mechanism. Surprisingly, in a fifth family with BPES, one rearrangement was found downstream of FOXL2. In addition, we report nine novel rearrangements encompassing FOXL2 that range from partial gene deletions to submicroscopic deletions. Overall, genomic rearrangements encompassing or outside of FOXL2 account for 16% of all molecular defects found in our families with BPES. In summary, this is the first report of extragenic deletions in BPES, providing further evidence of potential long-range cis-regulatory elements regulating FOXL2 expression. It contributes to the enlarging group of developmental diseases caused by defective distant regulation of gene expression. Finally, we demonstrate that CNGs are candidate regions for genomic rearrangements in developmental genes. PMID:15962237

  4. The promise and peril of CRISPR gene drives: Genetic variation and inbreeding may impede the propagation of gene drives based on the CRISPR genome editing technology.

    PubMed

    Zentner, Gabriel E; Wade, Michael J

    2017-10-01

    Gene drives are selfish genetic elements that use a variety of mechanisms to ensure they are transmitted to subsequent generations at greater than expected frequencies. Synthetic gene drives based on the clustered regularly interspersed palindromic repeats (CRISPR) genome editing system have been proposed as a way to alter the genetic characteristics of natural populations of organisms relevant to the goals of public health, conservation, and agriculture. Here, we review the principles and potential applications of CRISPR drives, as well as means proposed to prevent their uncontrolled spread. We also focus on recent work suggesting that factors such as natural genetic variation and inbreeding may represent substantial impediments to the propagation of CRISPR drives. © 2017 WILEY Periodicals, Inc.

  5. Transcriptional Activation Signals Found in the Epstein-Barr Virus (EBV) Latency C Promoter Are Conserved in the Latency C Promoter Sequences from Baboon and Rhesus Monkey EBV-Like Lymphocryptoviruses (Cercopithicine Herpesviruses 12 and 15)

    PubMed Central

    Fuentes-Pananá, Ezequiel M.; Swaminathan, Sankar; Ling, Paul D.

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (−1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates. PMID:9847397

  6. Transcriptional activation signals found in the Epstein-Barr virus (EBV) latency C promoter are conserved in the latency C promoter sequences from baboon and Rhesus monkey EBV-like lymphocryptoviruses (cercopithicine herpesviruses 12 and 15).

    PubMed

    Fuentes-Pananá, E M; Swaminathan, S; Ling, P D

    1999-01-01

    The Epstein-Barr virus (EBV) EBNA2 protein is a transcriptional activator that controls viral latent gene expression and is essential for EBV-driven B-cell immortalization. EBNA2 is expressed from the viral C promoter (Cp) and regulates its own expression by activating Cp through interaction with the cellular DNA binding protein CBF1. Through regulation of Cp and EBNA2 expression, EBV controls the pattern of latent protein expression and the type of latency established. To gain further insight into the important regulatory elements that modulate Cp usage, we isolated and sequenced the Cp regions corresponding to nucleotides 10251 to 11479 of the EBV genome (-1079 to +144 relative to the transcription initiation site) from the EBV-like lymphocryptoviruses found in baboons (herpesvirus papio; HVP) and Rhesus macaques (RhEBV). Sequence comparison of the approximately 1,230-bp Cp regions from these primate viruses revealed that EBV and HVP Cp sequences are 64% conserved, EBV and RhEBV Cp sequences are 66% conserved, and HVP and RhEBV Cp sequences are 65% conserved relative to each other. Approximately 50% of the residues are conserved among all three sequences, yet all three viruses have retained response elements for glucocorticoids, two positionally conserved CCAAT boxes, and positionally conserved TATA boxes. The putative EBNA2 100-bp enhancers within these promoters contain 54 conserved residues, and the binding sites for CBF1 and CBF2 are well conserved. Cp usage in the HVP- and RhEBV-transformed cell lines was detected by S1 nuclease protection analysis. Transient-transfection analysis showed that promoters of both HVP and RhEBV are responsive to EBNA2 and that they bind CBF1 and CBF2 in gel mobility shift assays. These results suggest that similar mechanisms for regulation of latent gene expression are conserved among the EBV-related lymphocryptoviruses found in nonhuman primates.

  7. Evolution of developmental regulation in the vertebrate FgfD subfamily.

    PubMed

    Jovelin, Richard; Yan, Yi-Lin; He, Xinjun; Catchen, Julian; Amores, Angel; Canestro, Cristian; Yokoi, Hayato; Postlethwait, John H

    2010-01-15

    Fibroblast growth factors (Fgfs) encode small signaling proteins that help regulate embryo patterning. Fgfs fall into seven families, including FgfD. Nonvertebrate chordates have a single FgfD gene; mammals have three (Fgf8, Fgf17, and Fgf18); and teleosts have six (fgf8a, fgf8b, fgf17, fgf18a, fgf18b, and fgf24). What are the evolutionary processes that led to the structural duplication and functional diversification of FgfD genes during vertebrate phylogeny? To study this question, we investigated conserved syntenies, patterns of gene expression, and the distribution of conserved noncoding elements (CNEs) in FgfD genes of stickleback and zebrafish, and compared them with data from cephalochordates, urochordates, and mammals. Genomic analysis suggests that Fgf8, Fgf17, Fgf18, and Fgf24 arose in two rounds of whole genome duplication at the base of the vertebrate radiation; that fgf8 and fgf18 duplications occurred at the base of the teleost radiation; and that Fgf24 is an ohnolog that was lost in the mammalian lineage. Expression analysis suggests that ancestral subfunctions partitioned between gene duplicates and points to the evolution of novel expression domains. Analysis of CNEs, at least some of which are candidate regulatory elements, suggests that ancestral CNEs partitioned between gene duplicates. These results help explain the evolutionary pathways by which the developmentally important family of FgfD molecules arose and the deduced principles that guided FgfD evolution are likely applicable to the evolution of developmental regulation in many vertebrate multigene families. (c) 2009 Wiley-Liss, Inc.

  8. Genome-wide identification and characterization of the apple (Malus domestica) HECT ubiquitin-protein ligase family and expression analysis of their responsiveness to abiotic stresses.

    PubMed

    Xu, Jianing; Xing, Shanshan; Cui, Haoran; Chen, Xuesen; Wang, Xiaoyun

    2016-04-01

    The ubiquitin-protein ligases (E3s) directly participate in ubiquitin (Ub) transferring to the target proteins in the ubiquitination pathway. The HECT ubiquitin-protein ligase (UPL), one type of E3s, is characterized as containing a conserved HECT domain of approximately 350 amino acids in the C terminus. Some UPLs were found to be involved in trichome development and leaf senescence in Arabidopsis. However, studies on plant UPLs, such as characteristics of the protein structure, predicted functional motifs of the HECT domain, and the regulatory expression of UPLs have all been limited. Here, we present genome-wide identification of the genes encoding UPLs (HECT gene) in apple. The 13 genes (named as MdUPL1-MdUPL13) from ten different chromosomes were divided into four groups by phylogenetic analysis. Among these groups, the encoding genes in the intron-exon structure and the included additional functional domains were quite different. Notably, the F-box domain was first found in MdUPL7 in plant UPLs. The HECT domain in different MdUPL groups also presented different spatial features and three types of conservative motifs were identified. The promoters of each MdUPL member carried multiple stress-response related elements by cis-acting element analysis. Experimental results demonstrated that the expressions of several MdUPLs were quite sensitive to cold-, drought-, and salt-stresses by qRT-PCR assay. The results of this study helped to elucidate the functions of HECT proteins, especially in Rosaceae plants.

  9. Comparative analysis of CRISPR-Cas systems in Klebsiella genomes.

    PubMed

    Shen, Juntao; Lv, Li; Wang, Xudong; Xiu, Zhilong; Chen, Guoqiang

    2017-04-01

    Prokaryotic CRISPR-Cas system provides adaptive immunity against invasive genetic elements. Bacteria of the genus Klebsiella are important nosocomial opportunistic pathogens. However, information of CRISPR-Cas system in Klebsiella remains largely unknown. Here, we analyzed the CRISPR-Cas systems of 68 complete genomes of Klebsiella representing four species. All the elements for CRISPR-Cas system (cas genes, repeats, leader sequences, and PAMs) were characterized. Besides the typical Type I-E and I-F CRISPR-Cas systems, a new Subtype I system located in the ABC transport system-glyoxalase region was found. The conservation of the new subtype CRISPR system between different species showed new evidence for CRISPR horizontal transfer. CRISPR polymorphism was strongly correlated both with species and multilocus sequence types. Some results indicated the function of adaptive immunity: most spacers (112 of 124) matched to prophages and plasmids and no matching housekeeping genes; new spacer acquisition was observed within the same sequence type (ST) and same clonal complex; the identical spacers were observed only in the ancient position (far from the leader) between different STs and clonal complexes. Interestingly, a high ratio of self-targeting spacers (7.5%, 31 of 416) was found in CRISPR-bearing Klebsiella pneumoniae (61%, 11 of 18). In some strains, there even were multiple full matching self-targeting spacers. Some self-targeting spacers were conserved even between different STs. These results indicated that some unknown mechanisms existed to compromise the function of self-targets of CRISPR-Cas systems in K. pneumoniae. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Prospects and challenges for the conservation of farm animal genomic resources, 2015-2025

    PubMed Central

    Bruford, Michael W.; Ginja, Catarina; Hoffmann, Irene; Joost, Stéphane; Orozco-terWengel, Pablo; Alberto, Florian J.; Amaral, Andreia J.; Barbato, Mario; Biscarini, Filippo; Colli, Licia; Costa, Mafalda; Curik, Ino; Duruz, Solange; Ferenčaković, Maja; Fischer, Daniel; Fitak, Robert; Groeneveld, Linn F.; Hall, Stephen J. G.; Hanotte, Olivier; Hassan, Faiz-ul; Helsen, Philippe; Iacolina, Laura; Kantanen, Juha; Leempoel, Kevin; Lenstra, Johannes A.; Ajmone-Marsan, Paolo; Masembe, Charles; Megens, Hendrik-Jan; Miele, Mara; Neuditschko, Markus; Nicolazzi, Ezequiel L.; Pompanon, François; Roosen, Jutta; Sevane, Natalia; Smetko, Anamarija; Štambuk, Anamaria; Streeter, Ian; Stucki, Sylvie; Supakorn, China; Telo Da Gama, Luis; Tixier-Boichard, Michèle; Wegmann, Daniel; Zhan, Xiangjiang

    2015-01-01

    Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal Genomic Resources (FAnGR). However, these rapid changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and integrative methodologies needed to fully exploit new, multidimensional data. The final conference of the ESF Genomic Resources program aimed to address these interdisciplinary problems in an attempt to contribute to the agenda for research and policy development directions during the coming decade. By 2020, according to the Convention on Biodiversity's Aichi Target 13, signatories should ensure that “…the genetic diversity of …farmed and domesticated animals and of wild relatives …is maintained, and strategies have been developed and implemented for minimizing genetic erosion and safeguarding their genetic diversity.” However, the real extent of genetic erosion is very difficult to measure using current data. Therefore, this challenging target demands better coverage, understanding and utilization of genomic and environmental data, the development of optimized ways to integrate these data with social and other sciences and policy analysis to enable more flexible, evidence-based models to underpin FAnGR conservation. At the conference, we attempted to identify the most important problems for effective livestock genomic resource conservation during the next decade. Twenty priority questions were identified that could be broadly categorized into challenges related to methodology, analytical approaches, data management and conservation. It should be acknowledged here that while the focus of our meeting was predominantly around genetics, genomics and animal science, many of the practical challenges facing conservation of genomic resources are societal in origin and are predicated on the value (e.g., socio-economic and cultural) of these resources to farmers, rural communities and society as a whole. The overall conclusion is that despite the fact that the livestock sector has been relatively well-organized in the application of genetic methodologies to date, there is still a large gap between the current state-of-the-art in the use of tools to characterize genomic resources and its application to many non-commercial and local breeds, hampering the consistent utilization of genetic and genomic data as indicators of genetic erosion and diversity. The livestock genomic sector therefore needs to make a concerted effort in the coming decade to enable to the democratization of the powerful tools that are now at its disposal, and to ensure that they are applied in the context of breed conservation as well as development. PMID:26539210

  11. Prospects and challenges for the conservation of farm animal genomic resources, 2015-2025.

    PubMed

    Bruford, Michael W; Ginja, Catarina; Hoffmann, Irene; Joost, Stéphane; Orozco-terWengel, Pablo; Alberto, Florian J; Amaral, Andreia J; Barbato, Mario; Biscarini, Filippo; Colli, Licia; Costa, Mafalda; Curik, Ino; Duruz, Solange; Ferenčaković, Maja; Fischer, Daniel; Fitak, Robert; Groeneveld, Linn F; Hall, Stephen J G; Hanotte, Olivier; Hassan, Faiz-Ul; Helsen, Philippe; Iacolina, Laura; Kantanen, Juha; Leempoel, Kevin; Lenstra, Johannes A; Ajmone-Marsan, Paolo; Masembe, Charles; Megens, Hendrik-Jan; Miele, Mara; Neuditschko, Markus; Nicolazzi, Ezequiel L; Pompanon, François; Roosen, Jutta; Sevane, Natalia; Smetko, Anamarija; Štambuk, Anamaria; Streeter, Ian; Stucki, Sylvie; Supakorn, China; Telo Da Gama, Luis; Tixier-Boichard, Michèle; Wegmann, Daniel; Zhan, Xiangjiang

    2015-01-01

    Livestock conservation practice is changing rapidly in light of policy developments, climate change and diversifying market demands. The last decade has seen a step change in technology and analytical approaches available to define, manage and conserve Farm Animal Genomic Resources (FAnGR). However, these rapid changes pose challenges for FAnGR conservation in terms of technological continuity, analytical capacity and integrative methodologies needed to fully exploit new, multidimensional data. The final conference of the ESF Genomic Resources program aimed to address these interdisciplinary problems in an attempt to contribute to the agenda for research and policy development directions during the coming decade. By 2020, according to the Convention on Biodiversity's Aichi Target 13, signatories should ensure that "…the genetic diversity of …farmed and domesticated animals and of wild relatives …is maintained, and strategies have been developed and implemented for minimizing genetic erosion and safeguarding their genetic diversity." However, the real extent of genetic erosion is very difficult to measure using current data. Therefore, this challenging target demands better coverage, understanding and utilization of genomic and environmental data, the development of optimized ways to integrate these data with social and other sciences and policy analysis to enable more flexible, evidence-based models to underpin FAnGR conservation. At the conference, we attempted to identify the most important problems for effective livestock genomic resource conservation during the next decade. Twenty priority questions were identified that could be broadly categorized into challenges related to methodology, analytical approaches, data management and conservation. It should be acknowledged here that while the focus of our meeting was predominantly around genetics, genomics and animal science, many of the practical challenges facing conservation of genomic resources are societal in origin and are predicated on the value (e.g., socio-economic and cultural) of these resources to farmers, rural communities and society as a whole. The overall conclusion is that despite the fact that the livestock sector has been relatively well-organized in the application of genetic methodologies to date, there is still a large gap between the current state-of-the-art in the use of tools to characterize genomic resources and its application to many non-commercial and local breeds, hampering the consistent utilization of genetic and genomic data as indicators of genetic erosion and diversity. The livestock genomic sector therefore needs to make a concerted effort in the coming decade to enable to the democratization of the powerful tools that are now at its disposal, and to ensure that they are applied in the context of breed conservation as well as development.

  12. The Ecological Genomics of Fungi: Repeated Elements in Filamentous Fungi with a Focus on Wood-Decay Fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Murat, Claude; Payen, Thibaut; Petitpierre, Denis

    2013-01-01

    In the last decade, the genome of several dozen filamentous fungi have been sequenced. Interestingly, vast diversity in genome size was observed (Fig. 2.1) with 14-fold differences between the 9 Mb of the human pathogenic dandruff fungus (Malassezia globosa; Xu, Saunders, et al., 2007) and the 125 Mb of the ectomycorrhizal black truffle of P rigord (Tuber melanosporum; Martin, Kohler, et al., 2010). Recently, Raffaele and Kamoun (2012) highlighted that the genomes of several lineages of filamentous plant pathogens have been shaped by repeat-driven expansion. Indeed, repeated elements are ubiquitous in all prokaryote and eukaryote genomes; however, their frequencies canmore » vary from just a minor percentage of the genome to more that 60 percent of the genome. Repeated elements can be classified in two major types: satellites DNA and transposable elements. In this chapter, the different types of repeated elements and how these elements can impact genome and gene repertoire will be described. Also, an intriguing link between the transposable elements richness and diversity and the ecological niche will be highlighted.« less

  13. Role of genomics and transcriptomics in selection of reintroduction source populations.

    PubMed

    He, Xiaoping; Johansson, Mattias L; Heath, Daniel D

    2016-10-01

    The use and importance of reintroduction as a conservation tool to return a species to its historical range from which it has been extirpated will increase as climate change and human development accelerate habitat loss and population extinctions. Although the number of reintroduction attempts has increased rapidly over the past 2 decades, the success rate is generally low. As a result of population differences in fitness-related traits and divergent responses to environmental stresses, population performance upon reintroduction is highly variable, and it is generally agreed that selecting an appropriate source population is a critical component of a successful reintroduction. Conservation genomics is an emerging field that addresses long-standing challenges in conservation, and the potential for using novel molecular genetic approaches to inform and improve conservation efforts is high. Because the successful establishment and persistence of reintroduced populations is highly dependent on the functional genetic variation and environmental stress tolerance of the source population, we propose the application of conservation genomics and transcriptomics to guide reintroduction practices. Specifically, we propose using genome-wide functional loci to estimate genetic variation of source populations. This estimate can then be used to predict the potential for adaptation. We also propose using transcriptional profiling to measure the expression response of fitness-related genes to environmental stresses as a proxy for acclimation (tolerance) capacity. Appropriate application of conservation genomics and transcriptomics has the potential to dramatically enhance reintroduction success in a time of rapidly declining biodiversity and accelerating environmental change. © 2016 Society for Conservation Biology.

  14. Characterization of noncoding regulatory DNA in the human genome.

    PubMed

    Elkon, Ran; Agami, Reuven

    2017-08-08

    Genetic variants associated with common diseases are usually located in noncoding parts of the human genome. Delineation of the full repertoire of functional noncoding elements, together with efficient methods for probing their biological roles, is therefore of crucial importance. Over the past decade, DNA accessibility and various epigenetic modifications have been associated with regulatory functions. Mapping these features across the genome has enabled researchers to begin to document the full complement of putative regulatory elements. High-throughput reporter assays to probe the functions of regulatory regions have also been developed but these methods separate putative regulatory elements from the chromosome so that any effects of chromatin context and long-range regulatory interactions are lost. Definitive assignment of function(s) to putative cis-regulatory elements requires perturbation of these elements. Genome-editing technologies are now transforming our ability to perturb regulatory elements across entire genomes. Interpretation of high-throughput genetic screens that incorporate genome editors might enable the construction of an unbiased map of functional noncoding elements in the human genome.

  15. Limb-Enhancer Genie: An accessible resource of accurate enhancer predictions in the developing limb

    DOE PAGES

    Monti, Remo; Barozzi, Iros; Osterwalder, Marco; ...

    2017-08-21

    Epigenomic mapping of enhancer-associated chromatin modifications facilitates the genome-wide discovery of tissue-specific enhancers in vivo. However, reliance on single chromatin marks leads to high rates of false-positive predictions. More sophisticated, integrative methods have been described, but commonly suffer from limited accessibility to the resulting predictions and reduced biological interpretability. Here we present the Limb-Enhancer Genie (LEG), a collection of highly accurate, genome-wide predictions of enhancers in the developing limb, available through a user-friendly online interface. We predict limb enhancers using a combination of > 50 published limb-specific datasets and clusters of evolutionarily conserved transcription factor binding sites, taking advantage ofmore » the patterns observed at previously in vivo validated elements. By combining different statistical models, our approach outperforms current state-of-the-art methods and provides interpretable measures of feature importance. Our results indicate that including a previously unappreciated score that quantifies tissue-specific nuclease accessibility significantly improves prediction performance. We demonstrate the utility of our approach through in vivo validation of newly predicted elements. Moreover, we describe general features that can guide the type of datasets to include when predicting tissue-specific enhancers genome-wide, while providing an accessible resource to the general biological community and facilitating the functional interpretation of genetic studies of limb malformations.« less

  16. Conservation genetics and genomics of amphibians and reptiles.

    PubMed

    Shaffer, H Bradley; Gidiş, Müge; McCartney-Melstad, Evan; Neal, Kevin M; Oyamaguchi, Hilton M; Tellez, Marisa; Toffelmier, Erin M

    2015-01-01

    Amphibians and reptiles as a group are often secretive, reach their greatest diversity often in remote tropical regions, and contain some of the most endangered groups of organisms on earth. Particularly in the past decade, genetics and genomics have been instrumental in the conservation biology of these cryptic vertebrates, enabling work ranging from the identification of populations subject to trade and exploitation, to the identification of cryptic lineages harboring critical genetic variation, to the analysis of genes controlling key life history traits. In this review, we highlight some of the most important ways that genetic analyses have brought new insights to the conservation of amphibians and reptiles. Although genomics has only recently emerged as part of this conservation tool kit, several large-scale data sources, including full genomes, expressed sequence tags, and transcriptomes, are providing new opportunities to identify key genes, quantify landscape effects, and manage captive breeding stocks of at-risk species.

  17. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  18. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    PubMed

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  19. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  20. Genome-Wide Identification and Transferability of Microsatellite Markers between Palmae Species

    PubMed Central

    Xiao, Yong; Xia, Wei; Ma, Jianwei; Mason, Annaliese S.; Fan, Haikuo; Shi, Peng; Lei, Xintao; Ma, Zilong; Peng, Ming

    2016-01-01

    The Palmae family contains 202 genera and approximately 2800 species. Except for Elaeis guineensis and Phoenix dactylifera, almost no genetic and genomic information is available for Palmae species. Therefore, this is an obstacle to the conservation and genetic assessment of Palmae species, especially those that are currently endangered. The study was performed to develop a large number of microsatellite markers which can be used for genetic analysis in different Palmae species. Based on the assembled genome of E. guineensis and P. dactylifera, a total of 814 383 and 371 629 microsatellites were identified. Among these microsatellites identified in E. guineensis, 734 509 primer pairs could be designed from the flanking sequences of these microsatellites. The majority (618 762) of these designed primer pairs had in silico products in the genome of E. guineensis. These 618 762 primer pairs were subsequently used to in silico amplify the genome of P. dactylifera. A total of 7 265 conserved microsatellites were identified between E. guineensis and P. dactylifera. One hundred and thirty-five primer pairs flanking the conserved SSRs were stochastically selected and validated to have high cross-genera transferability, varying from 16.7 to 93.3% with an average of 73.7%. These genome-wide conserved microsatellite markers will provide a useful tool for genetic assessment and conservation of different Palmae species in the future. PMID:27826307

  1. Phylogenetic and Genomic Analyses Resolve the Origin of Important Plant Genes Derived from Transposable Elements.

    PubMed

    Joly-Lopez, Zoé; Hoen, Douglas R; Blanchette, Mathieu; Bureau, Thomas E

    2016-08-01

    Once perceived as merely selfish, transposable elements (TEs) are now recognized as potent agents of adaptation. One way TEs contribute to evolution is through TE exaptation, a process whereby TEs, which persist by replicating in the genome, transform into novel host genes, which persist by conferring phenotypic benefits. Known exapted TEs (ETEs) contribute diverse and vital functions, and may facilitate punctuated equilibrium, yet little is known about this process. To better understand TE exaptation, we designed an approach to resolve the phylogenetic context and timing of exaptation events and subsequent patterns of ETE diversification. Starting with known ETEs, we search in diverse genomes for basal ETEs and closely related TEs, carefully curate the numerous candidate sequences, and infer detailed phylogenies. To distinguish TEs from ETEs, we also weigh several key genomic characteristics including repetitiveness, terminal repeats, pseudogenic features, and conserved domains. Applying this approach to the well-characterized plant ETEs MUG and FHY3, we show that each group is paraphyletic and we argue that this pattern demonstrates that each originated in not one but multiple exaptation events. These exaptations and subsequent ETE diversification occurred throughout angiosperm evolution including the crown group expansion, the angiosperm radiation, and the primitive evolution of angiosperms. In addition, we detect evidence of several putative novel ETE families. Our findings support the hypothesis that TE exaptation generates novel genes more frequently than is currently thought, often coinciding with key periods of evolution. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  2. Insights into bilaterian evolution from three spiralian genomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simakov, Oleg; Marletaz, Ferdinand; Cho, Sung-Jin

    2012-01-07

    Current genomic perspectives on animal diversity neglect two prominent phyla, the molluscs and annelids, that together account for nearly one-third of known marine species and are important both ecologically and as experimental systems in classical embryology1, 2, 3. Here we describe the draft genomes of the owl limpet (Lottia gigantea), a marine polychaete (Capitella teleta) and a freshwater leech (Helobdella robusta), and compare them with other animal genomes to investigate the origin and diversification of bilaterians from a genomic perspective. We find that the genome organization, gene structure and functional content of these species are more similar to those ofmore » some invertebrate deuterostome genomes (for example, amphioxus and sea urchin) than those of other protostomes that have been sequenced to date (flies, nematodes and flatworms). The conservation of these genomic features enables us to expand the inventory of genes present in the last common bilaterian ancestor, establish the tripartite diversification of bilaterians using multiple genomic characteristics and identify ancient conserved long- and short-range genetic linkages across metazoans. Superimposed on this broadly conserved pan-bilaterian background we find examples of lineage-specific genome evolution, including varying rates of rearrangement, intron gain and loss, expansions and contractions of gene families, and the evolution of clade-specific genes that produce the unique content of each genome.« less

  3. Genome-wide identification and expression analysis of YTH domain-containing RNA-binding protein family in cucumber (Cucumis sativus).

    PubMed

    Zhou, Yong; Hu, Lifang; Jiang, Lunwei; Liu, Shiqiang

    2018-06-01

    YTH domain-containing RNA-binding proteins are involved in post-transcriptional regulation and play important roles in the growth and development as well as abiotic stress responses of plants. However, YTH genes have not been previously studied in cucumber (Cucumis sativus). In this study, a total of five YTH genes (CsYTH1-CsYTH5) were identified in cucumber, which could be mapped on three out of the seven cucumber chromosomes. All CsYTH proteins had highly conserved C-terminal YTH domains, and two of them (CsYTH1 and CsYTH4) harbored extra CCCH and P/Q/N-rich domains. The phylogenesis, conserved motifs and exon-intron structure of YTH genes from cucumber, Arabidopsis and rice were also analyzed. The phylogenetically closely clustered YTHs shared similar gene structures and conserved motifs. An analysis of the cis-acting regulatory elements in the upstream region of these genes resulted in the identification of many cis-elements related to stress, hormone and development. Expression analysis based on the transcriptome data showed that some CsYTHs had development- or tissue-specific expression. In addition, their expression levels were altered under various stresses such as salt, drought, cold, and abscisic acid (ABA) treatments. These findings lay the foundation for the functional analysis of CsYTHs in the future.

  4. Patterns and architecture of genomic islands in marine bacteria

    PubMed Central

    2012-01-01

    Background Genomic Islands (GIs) have key roles since they modulate the structure and size of bacterial genomes displaying a diverse set of laterally transferred genes. Despite their importance, GIs in marine bacterial genomes have not been explored systematically to uncover possible trends and to analyze their putative ecological significance. Results We carried out a comprehensive analysis of GIs in 70 selected marine bacterial genomes detected with IslandViewer to explore the distribution, patterns and functional gene content in these genomic regions. We detected 438 GIs containing a total of 8152 genes. GI number per genome was strongly and positively correlated with the total GI size. In 50% of the genomes analyzed the GIs accounted for approximately 3% of the genome length, with a maximum of 12%. Interestingly, we found transposases particularly enriched within Alphaproteobacteria GIs, and site-specific recombinases in Gammaproteobacteria GIs. We described specific Homologous Recombination GIs (HR-GIs) in several genera of marine Bacteroidetes and in Shewanella strains among others. In these HR-GIs, we recurrently found conserved genes such as the β-subunit of DNA-directed RNA polymerase, regulatory sigma factors, the elongation factor Tu and ribosomal protein genes typically associated with the core genome. Conclusions Our results indicate that horizontal gene transfer mediated by phages, plasmids and other mobile genetic elements, and HR by site-specific recombinases play important roles in the mobility of clusters of genes between taxa and within closely related genomes, modulating the flexible pool of the genome. Our findings suggest that GIs may increase bacterial fitness under environmental changing conditions by acquiring novel foreign genes and/or modifying gene transcription and/or transduction. PMID:22839777

  5. Drivers of bacterial genomes plasticity and roles they play in pathogen virulence, persistence and drug resistance.

    PubMed

    Patel, Seema

    2016-11-01

    Despite the advent of next-generation sequencing (NGS) technologies, sophisticated data analysis and drug development efforts, bacterial drug resistance persists and is escalating in magnitude. To better control the pathogens, a thorough understanding of their genomic architecture and dynamics is vital. Bacterial genome is extremely complex, a mosaic of numerous co-operating and antagonizing components, altruistic and self-interested entities, behavior of which are predictable and conserved to some extent, yet largely dictated by an array of variables. In this regard, mobile genetic elements (MGE), DNA repair systems, post-segregation killing systems, toxin-antitoxin (TA) systems, restriction-modification (RM) systems etc. are dominant agents and horizontal gene transfer (HGT), gene redundancy, epigenetics, phase and antigenic variation etc. processes shape the genome. By illegitimate recombinations, deletions, insertions, duplications, amplifications, inversions, conversions, translocations, modification of intergenic regions and other alterations, bacterial genome is modified to tackle stressors like drugs, and host immune effectors. Over the years, thousands of studies have investigated this aspect and mammoth amount of insights have been accumulated. This review strives to distillate the existing information, formulate hypotheses and to suggest directions, that might contribute towards improved mitigation of the vicious pathogens. Copyright © 2016 Elsevier B.V. All rights reserved.

  6. Genomic Insights into the Biomineralization and Environmental Function of Magnetotactic Bacteria

    NASA Astrophysics Data System (ADS)

    Lin, W.; Pan, Y.

    2015-12-01

    Microorganisms have populated the Earth for billions of years and their activities are important biologic forces shaping our planetary environments. Microbial biomineralization that selectively take up environmental elements (e.g., C, S, P, Fe) and synthesize minerals either intracellularly or extracellularly is of great interest. One of the most interesting examples of these types of organisms are magnetotactic bacteria (MTB), a polyphyletic group of prokaryotes that uptake iron from aquatic habitats and biomineralize intracellular nano-sized iron minerals of magnetite (Fe3O4) and/or greigite (Fe3S4), known as magnetosomes, and orientate and swim along the Earth's magnetic field. However, our knowledge on the biomineralization mechanisms of MTB and their environmental function remains very limited because the genomic information of most MTB is still not fully understood. By using metagenomic approaches, we have acquired genomic sequences of environmental MTB communities and discovered several conserved genomic fragments containing gene operons for magnetite or greigite biomineralization from Proteobacteria and Nitrospirae MTB. The comparison of these gene clusters has provided valuable insights into the origin and evolution of magnetosome biomineralization. We further obtained several draft genomes of uncultivated MTB belonging to the phylum Nitrospirae, which reveals a metabolic flexibility of this poorly understood magnetotactic group and indicates their considerable roles in the biogeochemical cycles of iron and sulfur.

  7. Comparative Genomics Reveals Accelerated Evolution in Conserved Pathways during the Diversification of Anole Lizards

    PubMed Central

    Tollis, Marc; Hutchins, Elizabeth D; Stapley, Jessica; Rupp, Shawn M; Eckalbar, Walter L; Maayan, Inbar; Lasku, Eris; Infante, Carlos R; Dennis, Stuart R; Robertson, Joel A; May, Catherine M; Bermingham, Eldredge; DeNardo, Dale F; Hsieh, Shi-Tong Tonia; Kulathinal, Rob J; McMillan, William Owen; Menke, Douglas B; Pratt, Stephen C; Rawls, Jeffery Alan; Sanjur, Oris; Wilson-Rawls, Jeanne; Wilson Sayres, Melissa A; Fisher, Rebecca E

    2018-01-01

    Abstract Squamates include all lizards and snakes, and display some of the most diverse and extreme morphological adaptations among vertebrates. However, compared with birds and mammals, relatively few resources exist for comparative genomic analyses of squamates, hampering efforts to understand the molecular bases of phenotypic diversification in such a speciose clade. In particular, the ∼400 species of anole lizard represent an extensive squamate radiation. Here, we sequence and assemble the draft genomes of three anole species—Anolis frenatus, Anolis auratus, and Anolis apletophallus—for comparison with the available reference genome of Anolis carolinensis. Comparative analyses reveal a rapid background rate of molecular evolution consistent with a model of punctuated equilibrium, and strong purifying selection on functional genomic elements in anoles. We find evidence for accelerated evolution in genes involved in behavior, sensory perception, and reproduction, as well as in genes regulating limb bud development and hindlimb specification. Morphometric analyses of anole fore and hindlimbs corroborated these findings. We detect signatures of positive selection across several genes related to the development and regulation of the forebrain, hormones, and the iguanian lizard dewlap, suggesting molecular changes underlying behavioral adaptations known to reinforce species boundaries were a key component in the diversification of anole lizards. PMID:29360978

  8. Genome-Wide Stochastic Adaptive DNA Amplification at Direct and Inverted DNA Repeats in the Parasite Leishmania

    PubMed Central

    Plourde, Marie; Gingras, Hélène; Roy, Gaétan; Lapointe, Andréanne; Leprohon, Philippe; Papadopoulou, Barbara; Corbeil, Jacques; Ouellette, Marc

    2014-01-01

    Gene amplification of specific loci has been described in all kingdoms of life. In the protozoan parasite Leishmania, the product of amplification is usually part of extrachromosomal circular or linear amplicons that are formed at the level of direct or inverted repeated sequences. A bioinformatics screen revealed that repeated sequences are widely distributed in the Leishmania genome and the repeats are chromosome-specific, conserved among species, and generally present in low copy number. Using sensitive PCR assays, we provide evidence that the Leishmania genome is continuously being rearranged at the level of these repeated sequences, which serve as a functional platform for constitutive and stochastic amplification (and deletion) of genomic segments in the population. This process is adaptive as the copy number of advantageous extrachromosomal circular or linear elements increases upon selective pressure and is reversible when selection is removed. We also provide mechanistic insights on the formation of circular and linear amplicons through RAD51 recombinase-dependent and -independent mechanisms, respectively. The whole genome of Leishmania is thus stochastically rearranged at the level of repeated sequences, and the selection of parasite subpopulations with changes in the copy number of specific loci is used as a strategy to respond to a changing environment. PMID:24844805

  9. Draft whole genome sequence of groundnut stem rot fungus Athelia rolfsii revealing genetic architect of its pathogenicity and virulence.

    PubMed

    Iquebal, M A; Tomar, Rukam S; Parakhia, M V; Singla, Deepak; Jaiswal, Sarika; Rathod, V M; Padhiyar, S M; Kumar, Neeraj; Rai, Anil; Kumar, Dinesh

    2017-07-13

    Groundnut (Arachis hypogaea L.) is an important oil seed crop having major biotic constraint in production due to stem rot disease caused by fungus, Athelia rolfsii causing 25-80% loss in productivity. As chemical and biological combating strategies of this fungus are not very effective, thus genome sequencing can reveal virulence and pathogenicity related genes for better understanding of the host-parasite interaction. We report draft assembly of Athelia rolfsii genome of ~73 Mb having 8919 contigs. Annotation analysis revealed 16830 genes which are involved in fungicide resistance, virulence and pathogenicity along with putative effector and lethal genes. Secretome analysis revealed CAZY genes representing 1085 enzymatic genes, glycoside hydrolases, carbohydrate esterases, carbohydrate-binding modules, auxillary activities, glycosyl transferases and polysaccharide lyases. Repeat analysis revealed 11171 SSRs, LTR, GYPSY and COPIA elements. Comparative analysis with other existing ascomycotina genome predicted conserved domain family of WD40, CYP450, Pkinase and ABC transporter revealing insight of evolution of pathogenicity and virulence. This study would help in understanding pathogenicity and virulence at molecular level and development of new combating strategies. Such approach is imperative in endeavour of genome based solution in stem rot disease management leading to better productivity of groundnut crop in tropical region of world.

  10. Pan-genome analysis of human gastric pathogen H. pylori: comparative genomics and pathogenomics approaches to identify regions associated with pathogenicity and prediction of potential core therapeutic targets.

    PubMed

    Ali, Amjad; Naz, Anam; Soares, Siomar C; Bakhtiar, Marriam; Tiwari, Sandeep; Hassan, Syed S; Hanan, Fazal; Ramos, Rommel; Pereira, Ulisses; Barh, Debmalya; Figueiredo, Henrique César Pereira; Ussery, David W; Miyoshi, Anderson; Silva, Artur; Azevedo, Vasco

    2015-01-01

    Helicobacter pylori is a human gastric pathogen implicated as the major cause of peptic ulcer and second leading cause of gastric cancer (~70%) around the world. Conversely, an increased resistance to antibiotics and hindrances in the development of vaccines against H. pylori are observed. Pan-genome analyses of the global representative H. pylori isolates consisting of 39 complete genomes are presented in this paper. Phylogenetic analyses have revealed close relationships among geographically diverse strains of H. pylori. The conservation among these genomes was further analyzed by pan-genome approach; the predicted conserved gene families (1,193) constitute ~77% of the average H. pylori genome and 45% of the global gene repertoire of the species. Reverse vaccinology strategies have been adopted to identify and narrow down the potential core-immunogenic candidates. Total of 28 nonhost homolog proteins were characterized as universal therapeutic targets against H. pylori based on their functional annotation and protein-protein interaction. Finally, pathogenomics and genome plasticity analysis revealed 3 highly conserved and 2 highly variable putative pathogenicity islands in all of the H. pylori genomes been analyzed.

  11. A Physical Map, Including a BAC/PAC Clone Contig, of the Williams-Beuren Syndrome–Deletion Region at 7q11.23

    PubMed Central

    Peoples, Risa; Franke, Yvonne; Wang, Yu-Ker; Pérez-Jurado, Luis; Paperna, Tamar; Cisco, Michael; Francke, Uta

    2000-01-01

    Summary Williams-Beuren syndrome (WBS) is a developmental disorder caused by haploinsufficiency for genes in a 2-cM region of chromosome band 7q11.23. With the exception of vascular stenoses due to deletion of the elastin gene, the various features of WBS have not yet been attributed to specific genes. Although ⩾16 genes have been identified within the WBS deletion, completion of a physical map of the region has been difficult because of the large duplicated regions flanking the deletion. We present a physical map of the WBS deletion and flanking regions, based on assembly of a bacterial artificial chromosome/P1-derived artificial chromosome contig, analysis of high-throughput genome-sequence data, and long-range restriction mapping of genomic and cloned DNA by pulsed-field gel electrophoresis. Our map encompasses 3 Mb, including 1.6 Mb within the deletion. Two large duplicons, flanking the deletion, of ⩾320 kb contain unique sequence elements from the internal border regions of the deletion, such as sequences from GTF2I (telomeric) and FKBP6 (centromeric). A third copy of this duplicon exists in inverted orientation distal to the telomeric flanking one. These duplicons show stronger sequence conservation with regard to each other than to the presumptive ancestral loci within the common deletion region. Sequence elements originating from beyond 7q11.23 are also present in these duplicons. Although the duplicons are not present in mice, the order of the single-copy genes in the conserved syntenic region of mouse chromosome 5 is inverted relative to the human map. A model is presented for a mechanism of WBS-deletion formation, based on the orientation of duplicons' components relative to each other and to the ancestral elements within the deletion region. PMID:10631136

  12. Annotation of the Clostridium Acetobutylicum Genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daly, M. J.

    The genome sequence of the solvent producing bacterium Clostridium acetobutylicum ATCC824, has been determined by the shotgun approach. The genome consists of a 3.94 Mb chromosome and a 192 kb megaplasmid that contains the majority of genes responsible for solvent production. Comparison of C. acetobutylicum to Bacillus subtilis reveals significant local conservation of gene order, which has not been seen in comparisons of other genomes with similar, or, in some cases, closer, phylogenetic proximity. This conservation allows the prediction of many previously undetected operons in both bacteria.

  13. Effector profiles distinguish formae speciales of Fusarium oxysporum.

    PubMed

    van Dam, Peter; Fokkens, Like; Schmidt, Sarah M; Linmans, Jasper H J; Kistler, H Corby; Ma, Li-Jun; Rep, Martijn

    2016-11-01

    Formae speciales (ff.spp.) of the fungus Fusarium oxysporum are often polyphyletic within the species complex, making it impossible to identify them on the basis of conserved genes. However, sequences that determine host-specific pathogenicity may be expected to be similar between strains within the same forma specialis. Whole genome sequencing was performed on strains from five different ff.spp. (cucumerinum, niveum, melonis, radicis-cucumerinum and lycopersici). In each genome, genes for putative effectors were identified based on small size, secretion signal, and vicinity to a "miniature impala" transposable element. The candidate effector genes of all genomes were collected and the presence/absence patterns in each individual genome were clustered. Members of the same forma specialis turned out to group together, with cucurbit-infecting strains forming a supercluster separate from other ff.spp. Moreover, strains from different clonal lineages within the same forma specialis harbour identical effector gene sequences, supporting horizontal transfer of genetic material. These data offer new insight into the genetic basis of host specificity in the F. oxysporum species complex and show that (putative) effectors can be used to predict host specificity in F. oxysporum. © 2016 Society for Applied Microbiology and John Wiley & Sons Ltd.

  14. rep-PCR-Mediated Genomic Fingerprinting: A Rapid and Effective Method to Identify Clavibacter michiganensis.

    PubMed

    Louws, F J; Bell, J; Medina-Mora, C M; Smart, C D; Opgenorth, D; Ishimaru, C A; Hausbeck, M K; de Bruijn, F J; Fulbright, D W

    1998-08-01

    ABSTRACT The genomic DNA fingerprinting technique known as repetitive-sequence-based polymerase chain reaction (rep-PCR) was evaluated as a tool to differentiate subspecies of Clavibacter michiganensis, with special emphasis on C. michiganensis subsp. michiganensis, the pathogen responsible for bacterial canker of tomato. DNA primers (REP, ERIC, and BOX), corresponding to conserved repetitive element motifs in the genomes of diverse bacterial species, were used to generate genomic fingerprints of C. michiganensis subsp. michiganensis, C. michiganensis subsp. sepedonicus, C. michiganensis subsp. nebraskensis, C. michiganensis subsp. tessellarius, and C. michiganensis subsp. insidiosum. The rep-PCR-generated patterns of DNA fragments observed after agarose gel electrophoresis support the current division of C. michiganensis into five subspecies. In addition, the rep-PCR fingerprints identified at least four types (A, B, C, and D) within C. michiganensis subsp. michiganensis based on limited DNA polymorphisms; the ability to differentiate individual strains may be of potential use in studies on the epidemiology and host-pathogen interactions of this organism. In addition, we have recovered from diseased tomato plants a relatively large number of naturally occurring avirulent C. michiganensis subsp. michiganensis strains with rep-PCR fingerprints identical to those of virulent C. michiganensis subsp. michiganensis strains.

  15. Global Identification and Characterization of Transcriptionally Active Regions in the Rice Genome

    PubMed Central

    Stolc, Viktor; Deng, Wei; He, Hang; Korbel, Jan; Chen, Xuewei; Tongprasit, Waraporn; Ronald, Pamela; Chen, Runsheng; Gerstein, Mark; Wang Deng, Xing

    2007-01-01

    Genome tiling microarray studies have consistently documented rich transcriptional activity beyond the annotated genes. However, systematic characterization and transcriptional profiling of the putative novel transcripts on the genome scale are still lacking. We report here the identification of 25,352 and 27,744 transcriptionally active regions (TARs) not encoded by annotated exons in the rice (Oryza. sativa) subspecies japonica and indica, respectively. The non-exonic TARs account for approximately two thirds of the total TARs detected by tiling arrays and represent transcripts likely conserved between japonica and indica. Transcription of 21,018 (83%) japonica non-exonic TARs was verified through expression profiling in 10 tissue types using a re-array in which annotated genes and TARs were each represented by five independent probes. Subsequent analyses indicate that about 80% of the japonica TARs that were not assigned to annotated exons can be assigned to various putatively functional or structural elements of the rice genome, including splice variants, uncharacterized portions of incompletely annotated genes, antisense transcripts, duplicated gene fragments, and potential non-coding RNAs. These results provide a systematic characterization of non-exonic transcripts in rice and thus expand the current view of the complexity and dynamics of the rice transcriptome. PMID:17372628

  16. PHYLUCE is a software package for the analysis of conserved genomic loci.

    PubMed

    Faircloth, Brant C

    2016-03-01

    Targeted enrichment of conserved and ultraconserved genomic elements allows universal collection of phylogenomic data from hundreds of species at multiple time scales (<5 Ma to > 300 Ma). Prior to downstream inference, data from these types of targeted enrichment studies must undergo preprocessing to assemble contigs from sequence data; identify targeted, enriched loci from the off-target background data; align enriched contigs representing conserved loci to one another; and prepare and manipulate these alignments for subsequent phylogenomic inference. PHYLUCE is an efficient and easy-to-install software package that accomplishes these tasks across hundreds of taxa and thousands of enriched loci. PHYLUCE is written for Python 2.7. PHYLUCE is supported on OSX and Linux (RedHat/CentOS) operating systems. PHYLUCE source code is distributed under a BSD-style license from https://www.github.com/faircloth-lab/phyluce/ PHYLUCE is also available as a package (https://binstar.org/faircloth-lab/phyluce) for the Anaconda Python distribution that installs all dependencies, and users can request a PHYLUCE instance on iPlant Atmosphere (tag: phyluce). The software manual and a tutorial are available from http://phyluce.readthedocs.org/en/latest/ and test data are available from doi: 10.6084/m9.figshare.1284521. brant@faircloth-lab.org Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. The complete mitochondrial genomes of the Fenton′s wood white, Leptidea morsei, and the lemon emigrant, Catopsilia pomona

    PubMed Central

    Hao, Juan-Juan; Hao, Jia-Sheng; Sun, Xiao-Yan; Zhang, Lan-Lan; Yang, Qun

    2014-01-01

    Abstract The complete mitochondrial genomes of Leptidea morsei Fenton (Lepidoptera: Pieridae: Dis-morphiinae) and Catopsilia pomona (F.) (Lepidoptera: Pieridae: Coliadinae) were determined to be 15,122 and 15,142 bp in length, respectively, with that of L . morsei being the smallest among all known butterflies. Both mitogenomes contained 37 genes and an A+T-rich region, with the gene order identical to those of other butterflies, except for the presence of a tRNA-like insertion, tRNA Leu (UUR), in C . pomona . The nucleotide compositions of both genomes were higher in A and T (80.2% for L . morsei and 81.3% for C . pomona ) than C and G; the A+T bias had a significant effect on the codon usage and the amino acid composition. The protein-coding genes utilized the standard mitochondrial start codon ATN, except the COI gene using CGA as the initiation codon, as reported in other butterflies. The intergenic spacer sequence between the tRNA Ser (UCN) and ND1 genes contained the ATACTAA motif. The A+T-rich region harbored a poly-T stretch and a conserved ATAGA motif located at the end of the region. In addition, there was a triplicated 23 bp repeat and a microsatellite-like (TA) 9 (AT) 3 element in the A+T-rich region of the L. morsei mitogenome , while in C . pomona, there was a duplicated 24 bp repeat element and a microsatellite-like (TA) 9 element. The phylogenetic trees of the main butterfly lineages (Hesperiidae, Papilionidae, Pieridae, Nymphalidae, Lycaenidae, and Riodinidae) were reconstructed with maximum likelihood and Bayesian inference methods based on the 13 concatenated nucleotide sequences of protein-coding genes, and both trees showed that the Pieridae family is sister to Lycaenidae. Although this result contradicts the traditional morphologically based views, it agrees with other recent studies based on mitochondrial genomic data. PMID:25368074

  18. The abundant extrachromosomal DNA content of the Spiroplasma citri GII3-3X genome

    PubMed Central

    Saillard, Colette; Carle, Patricia; Duret-Nurbel, Sybille; Henri, Raphaël; Killiny, Nabil; Carrère, Sébastien; Gouzy, Jérome; Bové, Joseph-Marie; Renaudin, Joël; Foissac, Xavier

    2008-01-01

    Background Spiroplama citri, the causal agent of citrus stubborn disease, is a bacterium of the class Mollicutes and is transmitted by phloem-feeding leafhopper vectors. In order to characterize candidate genes potentially involved in spiroplasma transmission and pathogenicity, the genome of S. citri strain GII3-3X is currently being deciphered. Results Assembling 20,000 sequencing reads generated seven circular contigs, none of which fit the 1.8 Mb chromosome map or carried chromosomal markers. These contigs correspond to seven plasmids: pSci1 to pSci6, with sizes ranging from 12.9 to 35.3 kbp and pSciA of 7.8 kbp. Plasmids pSci were detected as multiple copies in strain GII3-3X. Plasmid copy numbers of pSci1-6, as deduced from sequencing coverage, were estimated at 10 to 14 copies per spiroplasma cell, representing 1.6 Mb of extrachromosomal DNA. Genes encoding proteins of the TrsE-TraE, Mob, TraD-TraG, and Soj-ParA protein families were predicted in most of the pSci sequences, in addition to members of 14 protein families of unknown function. Plasmid pSci6 encodes protein P32, a marker of insect transmissibility. Plasmids pSci1-5 code for eight different S. citri adhesion-related proteins (ScARPs) that are homologous to the previously described protein P89 and the S. kunkelii SkARP1. Conserved signal peptides and C-terminal transmembrane alpha helices were predicted in all ScARPs. The predicted surface-exposed N-terminal region possesses the following elements: (i) 6 to 8 repeats of 39 to 42 amino acids each (sarpin repeats), (ii) a central conserved region of 330 amino acids followed by (iii) a more variable domain of about 110 amino acids. The C-terminus, predicted to be cytoplasmic, consists of a 27 amino acid stretch enriched in arginine and lysine (KR) and an optional 23 amino acid stretch enriched in lysine, aspartate and glutamate (KDE). Plasmids pSci mainly present a linear increase of cumulative GC skew except in regions presenting conserved hairpin structures. Conclusion The genome of S. citri GII3-3X is characterized by abundant extrachromosomal elements. The pSci plasmids could not only be vertically inherited but also horizontally transmitted, as they encode proteins usually involved in DNA element partitioning and cell to cell DNA transfer. Because plasmids pSci1-5 encode surface proteins of the ScARP family and pSci6 was recently shown to confer insect transmissibility, diversity and abundance of S. citri plasmids may essentially aid the rapid adaptation of S. citri to more efficient transmission by different insect vectors and to various plant hosts. PMID:18442384

  19. Treasure Your Exceptions: An Interview with 2017 George Beadle Award Recipient Susan A. Gerbi.

    PubMed

    Gerbi, Susan A

    2017-12-01

    THE Genetics Society of America's (GSA) George W. Beadle Award honors individuals who have made outstanding contributions to the community of genetics researchers and who exemplify the qualities of its namesake. The 2017 recipient is Susan A. Gerbi, who has been a prominent leader and advocate for the scientific community. In the course of her research on DNA replication, Gerbi helped develop the method of Replication Initiation Point (RIP) mapping to map replication origins at the nucleotide level, improving resolution by two orders of magnitude. RIP mapping also provides the basis for the now popular use of λ-exonuclease to enrich nascent DNA to map replication origins genome-wide. Gerbi's second area of research on ribosomal RNA revealed a conserved core secondary structure, as well as conserved nucleotide elements (CNEs). Some CNEs are universally conserved, while other CNEs are conserved in all eukaryotes but not in archaea or bacteria, suggesting a eukaryotic function. Intriguingly, the majority of the eukaryotic-specific CNEs line the tunnel of the large ribosomal subunit through which the nascent polypeptide exits. Gerbi has promoted the fly Sciara coprophila as a model organism ever since she used its enormous polytene chromosomes to help develop the method of in situ hybridization during her Ph.D. research in Joe Gall's laboratory. The Gerbi laboratory maintains the Sciara International Stock Center and manages its future, actively spreading Sciara stocks to other laboratories. Gerbi has also served in many leadership roles, working on issues of science policy, women in science, scientific training, and career preparation. This is an abridged version of the interview. The full interview is available on the Genes to Genomes blog, at genestogenomes.org/gerbi. Copyright © 2017 by the Genetics Society of America.

  20. The Variable Regions of Lactobacillus rhamnosus Genomes Reveal the Dynamic Evolution of Metabolic and Host-Adaptation Repertoires

    PubMed Central

    Ceapa, Corina; Davids, Mark; Ritari, Jarmo; Lambert, Jolanda; Wels, Michiel; Douillard, François P.; Smokvina, Tamara; de Vos, Willem M.; Knol, Jan; Kleerebezem, Michiel

    2016-01-01

    Lactobacillus rhamnosus is a diverse Gram-positive species with strains isolated from different ecological niches. Here, we report the genome sequence analysis of 40 diverse strains of L. rhamnosus and their genomic comparison, with a focus on the variable genome. Genomic comparison of 40 L. rhamnosus strains discriminated the conserved genes (core genome) and regions of plasticity involving frequent rearrangements and horizontal transfer (variome). The L. rhamnosus core genome encompasses 2,164 genes, out of 4,711 genes in total (the pan-genome). The accessory genome is dominated by genes encoding carbohydrate transport and metabolism, extracellular polysaccharides (EPS) biosynthesis, bacteriocin production, pili production, the cas system, and the associated clustered regularly interspaced short palindromic repeat (CRISPR) loci, and more than 100 transporter functions and mobile genetic elements like phages, plasmid genes, and transposons. A clade distribution based on amino acid differences between core (shared) proteins matched with the clade distribution obtained from the presence–absence of variable genes. The phylogenetic and variome tree overlap indicated that frequent events of gene acquisition and loss dominated the evolutionary segregation of the strains within this species, which is paralleled by evolutionary diversification of core gene functions. The CRISPR-Cas system could have contributed to this evolutionary segregation. Lactobacillus rhamnosus strains contain the genetic and metabolic machinery with strain-specific gene functions required to adapt to a large range of environments. A remarkable congruency of the evolutionary relatedness of the strains’ core and variome functions, possibly favoring interspecies genetic exchanges, underlines the importance of gene-acquisition and loss within the L. rhamnosus strain diversification. PMID:27358423

  1. Characterization of Three Novel SINE Families with Unusual Features in Helicoverpa armigera

    PubMed Central

    Wang, Jianjun; Wang, Aina; Han, Zhaojun; Zhang, Zan; Li, Fei; Li, Xianchun

    2012-01-01

    Although more than 120 families of short interspersed nuclear elements (SINEs) have been isolated from the eukaryotic genomes, little is known about SINEs in insects. Here, we characterize three novel SINEs from the cotton bollworm, Helicoverpa armigera. Two of them, HaSE1 and HaSE2, share similar 5′ -structure including a tRNA-related region immediately followed by conserved central domain. The 3′ -tail of HaSE1 is significantly similar to that of one LINE retrotransposon element, HaRTE1.1, in H. armigera genome. The 3′ -region of HaSE2 showed high identity with one mariner-like element in H. armigera. The third family, termed HaSE3, is a 5S rRNA-derived SINE and shares both body part and 3′-tail with HaSE1, thus may represent the first example of a chimera generated by recombination between 5S rRNA and tRNA-derived SINE in insect species. Further database searches revealed the presence of these SINEs in several other related insect species, but not in the silkworm, Bombyx mori, indicating a relatively narrow distribution of these SINEs in Lepidopterans. Apart from above, we found a copy of HaSE2 in the GenBank EST entry for the cotton aphid, Aphis gossypii, suggesting the occurrence of horizontal transfer. PMID:22319625

  2. Novel promoters and coding first exons in DLG2 linked to developmental disorders and intellectual disability.

    PubMed

    Reggiani, Claudio; Coppens, Sandra; Sekhara, Tayeb; Dimov, Ivan; Pichon, Bruno; Lufin, Nicolas; Addor, Marie-Claude; Belligni, Elga Fabia; Digilio, Maria Cristina; Faletra, Flavio; Ferrero, Giovanni Battista; Gerard, Marion; Isidor, Bertrand; Joss, Shelagh; Niel-Bütschi, Florence; Perrone, Maria Dolores; Petit, Florence; Renieri, Alessandra; Romana, Serge; Topa, Alexandra; Vermeesch, Joris Robert; Lenaerts, Tom; Casimir, Georges; Abramowicz, Marc; Bontempi, Gianluca; Vilain, Catheline; Deconinck, Nicolas; Smits, Guillaume

    2017-07-19

    Tissue-specific integrative omics has the potential to reveal new genic elements important for developmental disorders. Two pediatric patients with global developmental delay and intellectual disability phenotype underwent array-CGH genetic testing, both showing a partial deletion of the DLG2 gene. From independent human and murine omics datasets, we combined copy number variations, histone modifications, developmental tissue-specific regulation, and protein data to explore the molecular mechanism at play. Integrating genomics, transcriptomics, and epigenomics data, we describe two novel DLG2 promoters and coding first exons expressed in human fetal brain. Their murine conservation and protein-level evidence allowed us to produce new DLG2 gene models for human and mouse. These new genic elements are deleted in 90% of 29 patients (public and in-house) showing partial deletion of the DLG2 gene. The patients' clinical characteristics expand the neurodevelopmental phenotypic spectrum linked to DLG2 gene disruption to cognitive and behavioral categories. While protein-coding genes are regarded as well known, our work shows that integration of multiple omics datasets can unveil novel coding elements. From a clinical perspective, our work demonstrates that two new DLG2 promoters and exons are crucial for the neurodevelopmental phenotypes associated with this gene. In addition, our work brings evidence for the lack of cross-annotation in human versus mouse reference genomes and nucleotide versus protein databases.

  3. Highly tissue specific expression of Sphinx supports its male courtship related role in Drosophila melanogaster.

    PubMed

    Chen, Ying; Dai, Hongzheng; Chen, Sidi; Zhang, Luoying; Long, Manyuan

    2011-04-26

    Sphinx is a lineage-specific non-coding RNA gene involved in regulating courtship behavior in Drosophila melanogaster. The 5' flanking region of the gene is conserved across Drosophila species, with the proximal 300 bp being conserved out to D. virilis and a further 600 bp region being conserved amongst the melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta). Using a green fluorescence protein transformation system, we demonstrated that a 253 bp region of the highly conserved segment was sufficient to drive sphinx expression in male accessory gland. GFP signals were also observed in brain, wing hairs and leg bristles. An additional ∼800 bp upstream region was able to enhance expression specifically in proboscis, suggesting the existence of enhancer elements. Using anti-GFP staining, we identified putative sphinx expression signal in the brain antennal lobe and inner antennocerebral tract, suggesting that sphinx might be involved in olfactory neuron mediated regulation of male courtship behavior. Whole genome expression profiling of the sphinx knockout mutation identified significant up-regulated gene categories related to accessory gland protein function and odor perception, suggesting sphinx might be a negative regulator of its target genes.

  4. Highly Tissue Specific Expression of Sphinx Supports Its Male Courtship Related Role in Drosophila melanogaster

    PubMed Central

    Chen, Sidi; Zhang, Luoying; Long, Manyuan

    2011-01-01

    Sphinx is a lineage-specific non-coding RNA gene involved in regulating courtship behavior in Drosophila melanogaster. The 5′ flanking region of the gene is conserved across Drosophila species, with the proximal 300 bp being conserved out to D. virilis and a further 600 bp region being conserved amongst the melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba, and D. erecta). Using a green fluorescence protein transformation system, we demonstrated that a 253 bp region of the highly conserved segment was sufficient to drive sphinx expression in male accessory gland. GFP signals were also observed in brain, wing hairs and leg bristles. An additional ∼800 bp upstream region was able to enhance expression specifically in proboscis, suggesting the existence of enhancer elements. Using anti-GFP staining, we identified putative sphinx expression signal in the brain antennal lobe and inner antennocerebral tract, suggesting that sphinx might be involved in olfactory neuron mediated regulation of male courtship behavior. Whole genome expression profiling of the sphinx knockout mutation identified significant up-regulated gene categories related to accessory gland protein function and odor perception, suggesting sphinx might be a negative regulator of its target genes. PMID:21541324

  5. Cloning and bacterial expression of adenosine-5'-triphosphate sulfurylase from the enteric protozoan parasite Entamoeba histolytica.

    PubMed

    Nozaki, T; Arase, T; Shigeta, Y; Asai, T; Leustek, T; Takeuchi, T

    1998-12-08

    A gene encoding adenosine-5'-triphosphate sulfurylase (AS) was cloned from the enteric protozoan parasite Entamoeba histolytica by polymerase chain reaction using degenerate oligonucleotide primers corresponding to conserved regions of the protein from a variety of organisms. The deduced amino acid sequence of E. histolytica AS revealed a calculated molecular mass of 47925 Da and an unusual basic pI of 9.38. The amebic protein sequence showed 23-48% identities with AS from bacteria, yeasts, fungi, plants, and animals with the highest identities being to Synechocystis sp. and Bacillus subtilis (48 and 44%, respectively). Four conserved blocks including putative sulfate-binding and phosphate-binding regions were highly conserved in the E. histolytica AS. The upstream region of the AS gene contained three conserved elements reported for other E. histolytica genes. A recombinant E. histolytica AS revealed enzymatic activity, measured in both the forward and reverse directions. Expression of the E. histolytica AS complemented cysteine auxotrophy of the AS-deficient Escherichia coli strains. Genomic hybridization revealed that the AS gene exists as a single copy gene. In the literature, this is the first description of an AS gene in Protozoa.

  6. Ancient genomic architecture for mammalian olfactory receptor clusters

    PubMed Central

    Aloni, Ronny; Olender, Tsviya; Lancet, Doron

    2006-01-01

    Background Mammalian olfactory receptor (OR) genes reside in numerous genomic clusters of up to several dozen genes. Whole-genome sequence alignment nets of five mammals allow their comprehensive comparison, aimed at reconstructing the ancestral olfactory subgenome. Results We developed a new and general tool for genome-wide definition of genomic gene clusters conserved in multiple species. Syntenic orthologs, defined as gene pairs showing conservation of both genomic location and coding sequence, were subjected to a graph theory algorithm for discovering CLICs (clusters in conservation). When applied to ORs in five mammals, including the marsupial opossum, more than 90% of the OR genes were found within a framework of 48 multi-species CLICs, invoking a general conservation of gene order and composition. A detailed analysis of individual CLICs revealed multiple differences among species, interpretable through species-specific genomic rearrangements and reflecting complex mammalian evolutionary dynamics. One significant instance involves CLIC #1, which lacks a human member, implying the human-specific deletion of an OR cluster, whose mouse counterpart has been tentatively associated with isovaleric acid odorant detection. Conclusion The identified multi-species CLICs demonstrate that most of the mammalian OR clusters have a common ancestry, preceding the split between marsupials and placental mammals. However, only two of these CLICs were capable of incorporating chicken OR genes, parsimoniously implying that all other CLICs emerged subsequent to the avian-mammalian divergence. PMID:17010214

  7. The evolution of meiotic sex and its alternatives.

    PubMed

    Mirzaghaderi, Ghader; Hörandl, Elvira

    2016-09-14

    Meiosis is an ancestral, highly conserved process in eukaryotic life cycles, and for all eukaryotes the shared component of sexual reproduction. The benefits and functions of meiosis, however, are still under discussion, especially considering the costs of meiotic sex. To get a novel view on this old problem, we filter out the most conserved elements of meiosis itself by reviewing the various modifications and alterations of modes of reproduction. Our rationale is that the indispensable steps of meiosis for viability of offspring would be maintained by strong selection, while dispensable steps would be variable. We review evolutionary origin and processes in normal meiosis, restitutional meiosis, polyploidization and the alterations of meiosis in forms of uniparental reproduction (apomixis, apomictic parthenogenesis, automixis, selfing) with a focus on plants and animals. This overview suggests that homologue pairing, double-strand break formation and homologous recombinational repair at prophase I are the least dispensable elements, and they are more likely optimized for repair of oxidative DNA damage rather than for recombination. Segregation, ploidy reduction and also a biparental genome contribution can be skipped for many generations. The evidence supports the theory that the primary function of meiosis is DNA restoration rather than recombination. © 2016 The Authors.

  8. The evolution of meiotic sex and its alternatives

    PubMed Central

    Mirzaghaderi, Ghader

    2016-01-01

    Meiosis is an ancestral, highly conserved process in eukaryotic life cycles, and for all eukaryotes the shared component of sexual reproduction. The benefits and functions of meiosis, however, are still under discussion, especially considering the costs of meiotic sex. To get a novel view on this old problem, we filter out the most conserved elements of meiosis itself by reviewing the various modifications and alterations of modes of reproduction. Our rationale is that the indispensable steps of meiosis for viability of offspring would be maintained by strong selection, while dispensable steps would be variable. We review evolutionary origin and processes in normal meiosis, restitutional meiosis, polyploidization and the alterations of meiosis in forms of uniparental reproduction (apomixis, apomictic parthenogenesis, automixis, selfing) with a focus on plants and animals. This overview suggests that homologue pairing, double-strand break formation and homologous recombinational repair at prophase I are the least dispensable elements, and they are more likely optimized for repair of oxidative DNA damage rather than for recombination. Segregation, ploidy reduction and also a biparental genome contribution can be skipped for many generations. The evidence supports the theory that the primary function of meiosis is DNA restoration rather than recombination. PMID:27605505

  9. A SINE in the genome of the cephalochordate amphioxus is an Alu element

    PubMed Central

    Holland, Linda Z.

    2006-01-01

    Transposable elements of about 300 bp, termed “short interspersed nucleotide elements or SINEs are common in eukaryotes. However, Alu elements, SINEs containing restriction sites for the AluI enzyme, have been known only from primates. Here I report the first SINE found in the genome of the cephalochordate, amphioxus. It is an Alu element of 375 bp that does not share substantial identity with any genomic sequences in vertebrates. It was identified because it was located in the FoxD regulatory region in a cosmid derived from one individual, but absent from the two FoxD alleles of BACs from a second individual. However, searches of sequences of BACs and genomic traces from this second individual gave an estimate of 50-100 copies in the amphioxus genome. The finding of an Alu element in amphioxus raises the question of whether Alu elements in amphioxus and primates arose by convergent evolution or by inheritance from a common ancestor. Genome-wide analyses of transposable elements in amphioxus and other chordates such as tunicates, agnathans and cartilaginous fishes could well provide the answer. PMID:16733535

  10. Targeting RNA–Protein Interactions within the Human Immunodeficiency Virus Type 1 Lifecycle

    PubMed Central

    2013-01-01

    RNA–protein interactions are vital throughout the HIV-1 life cycle for the successful production of infectious virus particles. One such essential RNA–protein interaction occurs between the full-length genomic viral RNA and the major structural protein of the virus. The initial interaction is between the Gag polyprotein and the viral RNA packaging signal (psi or Ψ), a highly conserved RNA structural element within the 5′-UTR of the HIV-1 genome, which has gained attention as a potential therapeutic target. Here, we report the application of a target-based assay to identify small molecules, which modulate the interaction between Gag and Ψ. We then demonstrate that one such molecule exhibits potent inhibitory activity in a viral replication assay. The mode of binding of the lead molecules to the RNA target was characterized by 1H NMR spectroscopy. PMID:24358934

  11. Long-range evolutionary constraints reveal cis-regulatory interactions on the human X chromosome

    PubMed Central

    Naville, Magali; Ishibashi, Minaka; Ferg, Marco; Bengani, Hemant; Rinkwitz, Silke; Krecsmarik, Monika; Hawkins, Thomas A.; Wilson, Stephen W.; Manning, Elizabeth; Chilamakuri, Chandra S. R.; Wilson, David I.; Louis, Alexandra; Lucy Raymond, F.; Rastegar, Sepand; Strähle, Uwe; Lenhard, Boris; Bally-Cuif, Laure; van Heyningen, Veronica; FitzPatrick, David R.; Becker, Thomas S.; Roest Crollius, Hugues

    2015-01-01

    Enhancers can regulate the transcription of genes over long genomic distances. This is thought to lead to selection against genomic rearrangements within such regions that may disrupt this functional linkage. Here we test this concept experimentally using the human X chromosome. We describe a scoring method to identify evolutionary maintenance of linkage between conserved noncoding elements and neighbouring genes. Chromatin marks associated with enhancer function are strongly correlated with this linkage score. We test >1,000 putative enhancers by transgenesis assays in zebrafish to ascertain the identity of the target gene. The majority of active enhancers drive a transgenic expression in a pattern consistent with the known expression of a linked gene. These results show that evolutionary maintenance of linkage is a reliable predictor of an enhancer's function, and provide new information to discover the genetic basis of diseases caused by the mis-regulation of gene expression. PMID:25908307

  12. Localized Plasticity in the Streamlined Genomes of Vinyl Chloride Respiring Dehalococcoides

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McMurdie, Paul J.; Behrens, Sebastien F.; Muller, Jochen A.

    2009-06-30

    Vinyl chloride (VC) is a human carcinogen and widespread priority pollutant. Here we report the first, to our knowledge, complete genome sequences of microorganisms able to respire VC, Dehalococcoides sp. strains VS and BAV1. Notably, the respective VC reductase encoding genes, vcrAB and bvcAB, were found embedded in distinct genomic islands (GEIs) with different predicted integration sites, suggesting that these genes were acquired horizontally and independently by distinct mechanisms. A comparative analysis that included two previously sequenced Dehalococcoides genomes revealed a contextually conserved core that is interrupted by two high plasticity regions (HPRs) near the Ori. These HPRs contain themore » majority of GEIs and strain-specific genes identified in the four Dehalococcoides genomes, an elevated number of repeated elements including insertion sequences (IS), as well as 91 of 96 rdhAB, genes that putatively encode terminal reductases in organohalide respiration. Only three core rdhA orthologous groups were identified, and only one of these groups is supported by synteny. The low number of core rdhAB, contrasted with the high rdhAB numbers per genome (up to 36 in strain VS), as well as their colocalization with GEIs and other signatures for horizontal transfer, suggests that niche adaptation via organohalide respiration is a fundamental ecological strategy in Dehalococccoides. This adaptation has been exacted through multiple mechanisms of recombination that are mainly confined within HPRs of an otherwise remarkably stable, syntenic, streamlined genome among the smallest of any free-living microorganism.« less

  13. A Pan-Genomic Approach to Understand the Basis of Host Adaptation in Achromobacter

    PubMed Central

    Jeukens, Julie; Freschi, Luca; Vincent, Antony T.; Emond-Rheault, Jean-Guillaume; Kukavica-Ibrulj, Irena; Charette, Steve J.

    2017-01-01

    Over the past decade, there has been a rising interest in Achromobacter sp., an emerging opportunistic pathogen responsible for nosocomial and cystic fibrosis lung infections. Species of this genus are ubiquitous in the environment, can outcompete resident microbiota, and are resistant to commonly used disinfectants as well as antibiotics. Nevertheless, the Achromobacter genus suffers from difficulties in diagnosis, unresolved taxonomy and limited understanding of how it adapts to the cystic fibrosis lung, not to mention other host environments. The goals of this first genus-wide comparative genomics study were to clarify the taxonomy of this genus and identify genomic features associated with pathogenicity and host adaptation. This was done with a widely applicable approach based on pan-genome analysis. First, using all publicly available genomes, a combination of phylogenetic analysis based on 1,780 conserved genes with average nucleotide identity and accessory genome composition allowed the identification of a largely clinical lineage composed of Achromobacter xylosoxidans, Achromobacter insuavis, Achromobacter dolens, and Achromobacter ruhlandii. Within this lineage, we identified 35 positively selected genes involved in metabolism, regulation and efflux-mediated antibiotic resistance. Second, resistome analysis showed that this clinical lineage carried additional antibiotic resistance genes compared with other isolates. Finally, we identified putative mobile elements that contribute 53% of the genus’s resistome and support horizontal gene transfer between Achromobacter and other ecologically similar genera. This study provides strong phylogenetic and pan-genomic bases to motivate further research on Achromobacter, and contributes to the understanding of opportunistic pathogen evolution. PMID:28383665

  14. Regulatory elements of Caenorhabditis elegans ribosomal protein genes

    PubMed Central

    2012-01-01

    Background Ribosomal protein genes (RPGs) are essential, tightly regulated, and highly expressed during embryonic development and cell growth. Even though their protein sequences are strongly conserved, their mechanism of regulation is not conserved across yeast, Drosophila, and vertebrates. A recent investigation of genomic sequences conserved across both nematode species and associated with different gene groups indicated the existence of several elements in the upstream regions of C. elegans RPGs, providing a new insight regarding the regulation of these genes in C. elegans. Results In this study, we performed an in-depth examination of C. elegans RPG regulation and found nine highly conserved motifs in the upstream regions of C. elegans RPGs using the motif discovery algorithm DME. Four motifs were partially similar to transcription factor binding sites from C. elegans, Drosophila, yeast, and human. One pair of these motifs was found to co-occur in the upstream regions of 250 transcripts including 22 RPGs. The distance between the two motifs displayed a complex frequency pattern that was related to their relative orientation. We tested the impact of three of these motifs on the expression of rpl-2 using a series of reporter gene constructs and showed that all three motifs are necessary to maintain the high natural expression level of this gene. One of the motifs was similar to the binding site of an orthologue of POP-1, and we showed that RNAi knockdown of pop-1 impacts the expression of rpl-2. We further determined the transcription start site of rpl-2 by 5’ RACE and found that the motifs lie 40–90 bases upstream of the start site. We also found evidence that a noncoding RNA, contained within the outron of rpl-2, is co-transcribed with rpl-2 and cleaved during trans-splicing. Conclusions Our results indicate that C. elegans RPGs are regulated by a complex novel series of regulatory elements that is evolutionarily distinct from those of all other species examined up until now. PMID:22928635

  15. Metagenomic Analysis of Ammonia-Oxidizing Archaea Affiliated with the Soil Group

    PubMed Central

    Bartossek, Rita; Spang, Anja; Weidler, Gerhard; Lanzen, Anders; Schleper, Christa

    2012-01-01

    Ammonia-oxidizing archaea (AOA) have recently been recognized as a significant component of many microbial communities and represent one of the most abundant prokaryotic groups in the biosphere. However, only few AOA have been successfully cultivated so far and information on the physiology and genomic content remains scarce. We have performed a metagenomic analysis to extend the knowledge of the AOA affiliated with group I.1b that is widespread in terrestrial habitats and of which no genome sequences has been described yet. A fosmid library was generated from samples of a radioactive thermal cave (46°C) in the Austrian Central Alps in which AOA had been found as a major part of the microbial community. Out of 16 fosmids that possessed either an amoA or 16S rRNA gene affiliating with AOA, 5 were fully sequenced, 4 of which grouped with the soil/I.1b (Nitrososphaera-) lineage, and 1 with marine/I.1a (Nitrosopumilus-) lineage. Phylogenetic analyses of amoBC and an associated conserved gene were congruent with earlier analyses based on amoA and 16S rRNA genes and supported the separation of the soil and marine group. Several putative genes that did not have homologs in currently available marine Thaumarchaeota genomes indicated that AOA of the soil group contain specific genes that are distinct from their marine relatives. Potential cis-regulatory elements around conserved promoter motifs found upstream of the amo genes in sequenced (meta-) genomes differed in marine and soil group AOA. On one fosmid, a group of genes including amoA and amoB were flanked by identical transposable insertion sequences, indicating that amoAB could potentially be co-mobilized in the form of a composite transposon. This might be one of the mechanisms that caused the greater variation in gene order compared to genomes in the marine counterparts. Our findings highlight the genetic diversity within the two major and widespread lineages of Thaumarchaeota. PMID:22723795

  16. Identification of novel MITEs (miniature inverted-repeat transposable elements) in Coxiella burnetii: implications for protein and small RNA evolution.

    PubMed

    Wachter, Shaun; Raghavan, Rahul; Wachter, Jenny; Minnick, Michael F

    2018-04-11

    Coxiella burnetii is a Gram-negative gammaproteobacterium and zoonotic agent of Q fever. C. burnetii's genome contains an abundance of pseudogenes and numerous selfish genetic elements. MITEs (miniature inverted-repeat transposable elements) are non-autonomous transposons that occur in all domains of life and are thought to be insertion sequences (ISs) that have lost their transposase function. Like most transposable elements (TEs), MITEs are thought to play an active role in evolution by altering gene function and expression through insertion and deletion activities. However, information regarding bacterial MITEs is limited. We describe two MITE families discovered during research on small non-coding RNAs (sRNAs) of C. burnetii. Two sRNAs, Cbsr3 and Cbsr13, were found to originate from a novel MITE family, termed QMITE1. Another sRNA, CbsR16, was found to originate from a separate and novel MITE family, termed QMITE2. Members of each family occur ~ 50 times within the strains evaluated. QMITE1 is a typical MITE of 300-400 bp with short (2-3 nt) direct repeats (DRs) of variable sequence and is often found overlapping annotated open reading frames (ORFs). Additionally, QMITE1 elements possess sigma-70 promoters and are transcriptionally active at several loci, potentially influencing expression of nearby genes. QMITE2 is smaller (150-190 bps), but has longer (7-11 nt) DRs of variable sequences and is mainly found in the 3' untranslated region of annotated ORFs and intergenic regions. QMITE2 contains a GTAG repetitive extragenic palindrome (REP) that serves as a target for IS1111 TE insertion. Both QMITE1 and QMITE2 display inter-strain linkage and sequence conservation, suggesting that they are adaptive and existed before divergence of C. burnetii strains. We have discovered two novel MITE families of C. burnetii. Our finding that MITEs serve as a source for sRNAs is novel. QMITE2 has a unique structure and occurs in large or small versions with unique DRs that display linkage and sequence conservation between strains, allowing for tracking of genomic rearrangements. QMITE1 and QMITE2 copies are hypothesized to influence expression of neighboring genes involved in DNA repair and virulence through transcriptional interference and ribonuclease processing.

  17. Acquisition and evolution of SXT-R391 integrative conjugative elements in the seventh-pandemic Vibrio cholerae lineage.

    PubMed

    Spagnoletti, Matteo; Ceccarelli, Daniela; Rieux, Adrien; Fondi, Marco; Taviani, Elisa; Fani, Renato; Colombo, Mauro M; Colwell, Rita R; Balloux, François

    2014-08-19

    SXT-R391 Integrative conjugative elements (ICEs) are self-transmissible mobile genetic elements able to confer multidrug resistance and other adaptive features to bacterial hosts, including Vibrio cholerae, the causative agent of cholera. ICEs are arranged in a mosaic genetic structure composed of a conserved backbone interspersed with variable DNA clusters located in conserved hot spots. In this study, we investigated ICE acquisition and subsequent microevolution in pandemic V. cholerae. Ninety-six ICEs were retrieved from publicly available sequence databases from V. cholerae clinical strains and were compared to a set of reference ICEs. Comparative genomics highlighted the existence of five main ICE groups with a distinct genetic makeup, exemplified by ICEVchInd5, ICEVchMoz10, SXT, ICEVchInd6, and ICEVchBan11. ICEVchInd5 (the most frequent element, represented by 70 of 96 elements analyzed) displayed no sequence rearrangements and was characterized by 46 single nucleotide polymorphisms (SNPs). SNP analysis revealed that recent inter-ICE homologous recombination between ICEVchInd5 and other ICEs circulating in gammaproteobacteria generated ICEVchMoz10, ICEVchInd6, and ICEVchBan11. Bayesian phylogenetic analyses indicated that ICEVchInd5 and SXT were independently acquired by the current pandemic V. cholerae O1 and O139 lineages, respectively, within a period of only a few years. SXT-R391 ICEs have been recognized as key vectors of antibiotic resistance in the seventh-pandemic lineage of V. cholerae, which remains a major cause of mortality and morbidity on a global scale. ICEs were acquired only recently in this clade and are acknowledged to be major contributors to horizontal gene transfer and the acquisition of new traits in bacterial species. We have reconstructed the temporal dynamics of SXT-R391 ICE acquisition and spread and have identified subsequent recombination events generating significant diversity in ICEs currently circulating among V. cholerae clinical strains. Our results showed that acquisition of SXT-R391 ICEs provided the V. cholerae seventh-pandemic lineage not only with a multidrug resistance phenotype but also with a powerful molecular tool for rapidly accessing the pan-genome of a large number of gammaproteobacteria. Copyright © 2014 Spagnoletti et al.

  18. Genomic Imprinting Was Evolutionarily Conserved during Wheat Polyploidization[OPEN

    PubMed Central

    Yang, Guanghui; Liu, Zhenshan; Gao, Lulu; Yu, Kuohai; Feng, Man; Peng, Huiru; Sun, Qixin; Ni, Zhongfu

    2018-01-01

    Genomic imprinting is an epigenetic phenomenon that causes genes to be differentially expressed depending on their parent of origin. To evaluate the evolutionary conservation of genomic imprinting and the effects of ploidy on this process, we investigated parent-of-origin-specific gene expression patterns in the endosperm of diploid (Aegilops spp), tetraploid, and hexaploid wheat (Triticum spp) at various stages of development via high-throughput transcriptome sequencing. We identified 91, 135, and 146 maternally or paternally expressed genes (MEGs or PEGs, respectively) in diploid, tetraploid, and hexaploid wheat, respectively, 52.7% of which exhibited dynamic expression patterns at different developmental stages. Gene Ontology enrichment analysis suggested that MEGs and PEGs were involved in metabolic processes and DNA-dependent transcription, respectively. Nearly half of the imprinted genes exhibited conserved expression patterns during wheat hexaploidization. In addition, 40% of the homoeolog pairs originating from whole-genome duplication were consistently maternally or paternally biased in the different subgenomes of hexaploid wheat. Furthermore, imprinted expression was found for 41.2% and 50.0% of homolog pairs that evolved by tandem duplication after genome duplication in tetraploid and hexaploid wheat, respectively. These results suggest that genomic imprinting was evolutionarily conserved between closely related Triticum and Aegilops species and in the face of polyploid hybridization between species in these genera. PMID:29298834

  19. Identification and Characterization of Small Noncoding RNAs in Genome Sequences of the Edible Fungus Pleurotus ostreatus

    PubMed Central

    Zhao, Mengran; Hsiang, Tom; Feng, Xiaoxing

    2016-01-01

    Noncoding RNAs (ncRNAs) have been identified in many fungi. However, no genome-scale identification of ncRNAs has been inventoried for basidiomycetes. In this research, we detected 254 small noncoding RNAs (sncRNAs) in a genome assembly of an isolate (CCEF00389) of Pleurotus ostreatus, which is a widely cultivated edible basidiomycetous fungus worldwide. The identified sncRNAs include snRNAs, snoRNAs, tRNAs, and miRNAs. SnRNA U1 was not found in CCEF00389 genome assembly and some other basidiomycetous genomes by BLASTn. This implies that if snRNA U1 of basidiomycetes exists, it has a sequence that varies significantly from other organisms. By analyzing the distribution of sncRNA loci, we found that snRNAs and most tRNAs (88.6%) were located in pseudo-UTR regions, while miRNAs are commonly found in introns. To analyze the evolutionary conservation of the sncRNAs in P. ostreatus, we aligned all 254 sncRNAs to the genome assemblies of some other Agaricomycotina fungi. The results suggest that most sncRNAs (77.56%) were highly conserved in P. ostreatus, and 20% were conserved in Agaricomycotina fungi. These findings indicate that most sncRNAs of P. ostreatus were not conserved across Agaricomycotina fungi. PMID:27703969

  20. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

Top